Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. The complete show notes for this episode can be found at https://twimlai.com/go/693.

Episoder(779)

Knowledge Graphs and Expert Augmentation with Marisa Boston - TWiML Talk #204

Knowledge Graphs and Expert Augmentation with Marisa Boston - TWiML Talk #204

Today we’re joined by Marisa Boston, Director of Cognitive Technology in KPMG’s Cognitive Automation Lab. We caught up to discuss some of the ways that KPMG is using AI to build tools that help augmen...

29 Nov 201846min

ML/DL for Non-Stationary Time Series Analysis in Financial Markets and Beyond with Stuart Reid - TWiML Talk #203

ML/DL for Non-Stationary Time Series Analysis in Financial Markets and Beyond with Stuart Reid - TWiML Talk #203

Today, we’re joined by Stuart Reid, Chief Scientist at NMRQL Research. NMRQL is an investment management firm that uses ML algorithms to make adaptive, unbiased, scalable, and testable trading decisi...

26 Nov 201858min

Industrializing Machine Learning at Shell with Daniel Jeavons - TWiML Talk #202

Industrializing Machine Learning at Shell with Daniel Jeavons - TWiML Talk #202

In this episode of our AI Platforms series, we’re joined by Daniel Jeavons, General Manager of Data Science at Shell. In our conversation, we explore the evolution of analytics and data science at Sh...

21 Nov 201845min

Resurrecting a Recommendations Platform at Comcast with Leemay Nassery - TWiML Talk #201

Resurrecting a Recommendations Platform at Comcast with Leemay Nassery - TWiML Talk #201

In this episode of our AI Platforms series, we’re joined by Leemay Nassery, Senior Engineering Manager and head of the recommendations team at Comcast. In our conversation, Leemay and I discuss just h...

19 Nov 201847min

Productive Machine Learning at LinkedIn with Bee-Chung Chen - TWiML Talk #200

Productive Machine Learning at LinkedIn with Bee-Chung Chen - TWiML Talk #200

In this episode of our AI Platforms series, we’re joined by Bee-Chung Chen, Principal Staff Engineer and Applied Researcher at LinkedIn. Bee-Chung and I caught up to discuss LinkedIn’s internal AI aut...

15 Nov 201847min

Scaling Deep Learning on Kubernetes at OpenAI with Christopher Berner - TWiML Talk #199

Scaling Deep Learning on Kubernetes at OpenAI with Christopher Berner - TWiML Talk #199

In this episode of our AI Platforms series we’re joined by OpenAI’s Head of Infrastructure, Christopher Berner. In our conversation, we discuss the evolution of OpenAI’s deep learning platform, the co...

12 Nov 201849min

Bighead: Airbnb's Machine Learning Platform with Atul Kale - TWiML Talk #198

Bighead: Airbnb's Machine Learning Platform with Atul Kale - TWiML Talk #198

In this episode of our AI Platforms series, we’re joined by Atul Kale, Engineering Manager on the machine learning infrastructure team at Airbnb. In our conversation, we discuss Airbnb’s internal mac...

8 Nov 201849min

Facebook's FBLearner Platform with Aditya Kalro - TWiML Talk #197

Facebook's FBLearner Platform with Aditya Kalro - TWiML Talk #197

In the kickoff episode of our AI Platforms series, we’re joined by Aditya Kalro, Engineering Manager at Facebook, to discuss their internal machine learning platform FBLearner Flow. FBLearner Flow is ...

6 Nov 201838min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
forklart
popradet
stopp-verden
det-store-bildet
bt-dokumentar-2
rss-gukild-johaug
dine-penger-pengeradet
nokon-ma-ga
lydartikler-fra-aftenposten
fotballpodden-2
hanna-de-heldige
frokostshowet-pa-p5
rss-penger-polser-og-politikk
aftenbla-bla
e24-podden
rss-dannet-uten-piano
rss-ness