Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. The complete show notes for this episode can be found at https://twimlai.com/go/693.

Episoder(778)

Solving Imperfect-Information Games with Tuomas Sandholm - NIPS ’17 Best Paper - TWiML Talk #99

Solving Imperfect-Information Games with Tuomas Sandholm - NIPS ’17 Best Paper - TWiML Talk #99

In this episode I speak with Tuomas Sandholm, Carnegie Mellon University Professor and Founder and CEO of startups Optimized Markets and Strategic Machine. Tuomas, along with his PhD student Noam Brow...

22 Jan 201827min

Separating Vocals in Recorded Music at Spotify with Eric Humphrey - TWiML Talk #98

Separating Vocals in Recorded Music at Spotify with Eric Humphrey - TWiML Talk #98

In today’s show, I sit down with Eric Humphrey, Research Scientist in the music understanding group at Spotify. Eric was at the Deep Learning Summit to give a talk on Advances in Deep Architectures an...

19 Jan 201827min

Accelerating Deep Learning with Mixed Precision Arithmetic with Greg Diamos - TWiML Talk #97

Accelerating Deep Learning with Mixed Precision Arithmetic with Greg Diamos - TWiML Talk #97

In this show I speak with Greg Diamos, senior computer systems researcher at Baidu. Greg joined me before his talk at the Deep Learning Summit, where he spoke on “The Next Generation of AI Chips.” Gre...

17 Jan 201839min

Composing Graphical Models With Neural Networks with David Duvenaud - TWiML Talk #96

Composing Graphical Models With Neural Networks with David Duvenaud - TWiML Talk #96

In this episode, we hear from David Duvenaud, assistant professor in the Computer Science and Statistics departments at the University of Toronto. David joined me after his talk at the Deep Learning S...

15 Jan 201835min

Embedded Deep Learning at Deep Vision with Siddha Ganju - TWiML Talk #95

Embedded Deep Learning at Deep Vision with Siddha Ganju - TWiML Talk #95

In this episode we hear from Siddha Ganju, data scientist at computer vision startup Deep Vision. Siddha joined me at the AI Conference a while back to chat about the challenges of developing deep lea...

12 Jan 201834min

Neuroevolution: Evolving Novel Neural Network Architectures with Kenneth Stanley - TWiML Talk #94

Neuroevolution: Evolving Novel Neural Network Architectures with Kenneth Stanley - TWiML Talk #94

Today, I'm joined by Kenneth Stanley, Professor in the Department of Computer Science at the University of Central Florida and senior research scientist at Uber AI Labs. Kenneth studied under TWiML Ta...

11 Jan 201845min

A Quantum Computing Primer and Implications for AI with Davide Venturelli - TWiML Talk #93

A Quantum Computing Primer and Implications for AI with Davide Venturelli - TWiML Talk #93

Today, I'm joined by Davide Venturelli, science operations manager and quantum computing team lead for the Universities Space Research Association’s Institute for Advanced Computer Science at NASA Ame...

8 Jan 201834min

Learning State Representations with Yael Niv - TWiML Talk #92

Learning State Representations with Yael Niv - TWiML Talk #92

This week on the podcast we’re featuring a series of conversations from the NIPs conference in Long Beach, California. I attended a bunch of talks and learned a ton, organized an impromptu roundtable ...

22 Des 201747min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
forklart
stopp-verden
popradet
det-store-bildet
fotballpodden-2
dine-penger-pengeradet
rss-gukild-johaug
bt-dokumentar-2
nokon-ma-ga
lydartikler-fra-aftenposten
aftenbla-bla
hanna-de-heldige
rss-dannet-uten-piano
e24-podden
frokostshowet-pa-p5
rss-ness
rss-penger-polser-og-politikk