Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. The complete show notes for this episode can be found at https://twimlai.com/go/693.

Episoder(778)

Optimal Transport and Machine Learning with Marco Cuturi - TWiML Talk #131

Optimal Transport and Machine Learning with Marco Cuturi - TWiML Talk #131

In this episode, i’m joined by Marco Cuturi, professor of statistics at Université Paris-Saclay. Marco and I spent some time discussing his work on Optimal Transport Theory at NIPS last year. In our d...

26 Apr 201832min

Collecting and Annotating Data for AI with Kiran Vajapey - TWiML Talk #130

Collecting and Annotating Data for AI with Kiran Vajapey - TWiML Talk #130

In this episode, I’m joined by Kiran Vajapey, a human-computer interaction developer at Figure Eight. In this interview, Kiran shares some of what he’s has learned through his work developing applicat...

23 Apr 201840min

Autonomous Aerial Guidance, Navigation and Control Systems with Christopher Lum - TWiML Talk #129

Autonomous Aerial Guidance, Navigation and Control Systems with Christopher Lum - TWiML Talk #129

Ok, In this episode, I'm joined by Christopher Lum, Research Assistant Professor in the University of Washington’s Department of Aeronautics and Astronautics. Chris also co-heads the University’s Auto...

19 Apr 201852min

Infrastructure for Autonomous Vehicles with Missy Cummings - TWiML Talk #128

Infrastructure for Autonomous Vehicles with Missy Cummings - TWiML Talk #128

In this episode, I’m joined by Missy Cummings, head of Duke University’s Humans and Autonomy Lab and professor in the department of mechanical engineering. In addition to being an accomplished researc...

16 Apr 201843min

Hyper-Personalizing the Customer Experience w/ AI with Rob Walker - TWiML Talk #127

Hyper-Personalizing the Customer Experience w/ AI with Rob Walker - TWiML Talk #127

In this episode, we're joined by Rob Walker, Vice President of decision management and analytics at Pegasystems, a leading provider of software for customer engagement and operational excellence. Rob ...

12 Apr 201841min

Information Extraction from Natural Document Formats with David Rosenberg - TWiML Talk #126

Information Extraction from Natural Document Formats with David Rosenberg - TWiML Talk #126

In this episode, I’m joined by David Rosenberg, data scientist in the office of the CTO at financial publisher Bloomberg, to discuss his work on “Extracting Data from Tables and Charts in Natural Docu...

9 Apr 201845min

Human-in-the-Loop AI for Emergency Response & More w/ Robert Munro - TWiML Talk #125

Human-in-the-Loop AI for Emergency Response & More w/ Robert Munro - TWiML Talk #125

In this episode, I chat with Rob Munro, CTO of the newly branded Figure Eight, formerly known as CrowdFlower. Figure Eight’s Human-in-the-Loop AI platform supports data science & machine learning team...

5 Apr 201848min

Systems and Software for Machine Learning at Scale with Jeff Dean - TWiML Talk #124

Systems and Software for Machine Learning at Scale with Jeff Dean - TWiML Talk #124

In this episode I’m joined by Jeff Dean, Google Senior Fellow and head of the company’s deep learning research team Google Brain, who I had a chance to sit down with last week at the Googleplex in Mou...

2 Apr 201854min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
forklart
stopp-verden
popradet
det-store-bildet
dine-penger-pengeradet
rss-gukild-johaug
bt-dokumentar-2
lydartikler-fra-aftenposten
hanna-de-heldige
fotballpodden-2
nokon-ma-ga
e24-podden
frokostshowet-pa-p5
aftenbla-bla
rss-ness
rss-penger-polser-og-politikk
rss-dannet-uten-piano