Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. The complete show notes for this episode can be found at https://twimlai.com/go/693.

Avsnitt(778)

Solving Imperfect-Information Games with Tuomas Sandholm - NIPS ’17 Best Paper - TWiML Talk #99

Solving Imperfect-Information Games with Tuomas Sandholm - NIPS ’17 Best Paper - TWiML Talk #99

In this episode I speak with Tuomas Sandholm, Carnegie Mellon University Professor and Founder and CEO of startups Optimized Markets and Strategic Machine. Tuomas, along with his PhD student Noam Brow...

22 Jan 201827min

Separating Vocals in Recorded Music at Spotify with Eric Humphrey - TWiML Talk #98

Separating Vocals in Recorded Music at Spotify with Eric Humphrey - TWiML Talk #98

In today’s show, I sit down with Eric Humphrey, Research Scientist in the music understanding group at Spotify. Eric was at the Deep Learning Summit to give a talk on Advances in Deep Architectures an...

19 Jan 201827min

Accelerating Deep Learning with Mixed Precision Arithmetic with Greg Diamos - TWiML Talk #97

Accelerating Deep Learning with Mixed Precision Arithmetic with Greg Diamos - TWiML Talk #97

In this show I speak with Greg Diamos, senior computer systems researcher at Baidu. Greg joined me before his talk at the Deep Learning Summit, where he spoke on “The Next Generation of AI Chips.” Gre...

17 Jan 201839min

Composing Graphical Models With Neural Networks with David Duvenaud - TWiML Talk #96

Composing Graphical Models With Neural Networks with David Duvenaud - TWiML Talk #96

In this episode, we hear from David Duvenaud, assistant professor in the Computer Science and Statistics departments at the University of Toronto. David joined me after his talk at the Deep Learning S...

15 Jan 201835min

Embedded Deep Learning at Deep Vision with Siddha Ganju - TWiML Talk #95

Embedded Deep Learning at Deep Vision with Siddha Ganju - TWiML Talk #95

In this episode we hear from Siddha Ganju, data scientist at computer vision startup Deep Vision. Siddha joined me at the AI Conference a while back to chat about the challenges of developing deep lea...

12 Jan 201834min

Neuroevolution: Evolving Novel Neural Network Architectures with Kenneth Stanley - TWiML Talk #94

Neuroevolution: Evolving Novel Neural Network Architectures with Kenneth Stanley - TWiML Talk #94

Today, I'm joined by Kenneth Stanley, Professor in the Department of Computer Science at the University of Central Florida and senior research scientist at Uber AI Labs. Kenneth studied under TWiML Ta...

11 Jan 201845min

A Quantum Computing Primer and Implications for AI with Davide Venturelli - TWiML Talk #93

A Quantum Computing Primer and Implications for AI with Davide Venturelli - TWiML Talk #93

Today, I'm joined by Davide Venturelli, science operations manager and quantum computing team lead for the Universities Space Research Association’s Institute for Advanced Computer Science at NASA Ame...

8 Jan 201834min

Learning State Representations with Yael Niv - TWiML Talk #92

Learning State Representations with Yael Niv - TWiML Talk #92

This week on the podcast we’re featuring a series of conversations from the NIPs conference in Long Beach, California. I attended a bunch of talks and learned a ton, organized an impromptu roundtable ...

22 Dec 201747min

Populärt inom Politik & nyheter

p3-krim
rss-krimstad
svenska-fall
rss-viva-fotboll
flashback-forever
motiv
aftonbladet-daily
rss-vad-fan-hande
rss-sanning-konsekvens
aftonbladet-krim
rss-krimreportrarna
olyckan-inifran
rss-frandfors-horna
fordomspodden
dagens-eko
spar
rss-flodet
blenda-2
politiken
rss-klubbland-en-podd-mest-om-frolunda