Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. The complete show notes for this episode can be found at https://twimlai.com/go/693.

Episoder(778)

Are Vector DBs the Future Data Platform for AI? with Ed Anuff - #664

Are Vector DBs the Future Data Platform for AI? with Ed Anuff - #664

Today we’re joined by Ed Anuff, chief product officer at DataStax. In our conversation, we discuss Ed’s insights on RAG, vector databases, embedding models, and more. We dig into the underpinnings of ...

28 Des 202348min

Quantizing Transformers by Helping Attention Heads Do Nothing with Markus Nagel - #663

Quantizing Transformers by Helping Attention Heads Do Nothing with Markus Nagel - #663

Today we’re joined by Markus Nagel, research scientist at Qualcomm AI Research, who helps us kick off our coverage of NeurIPS 2023. In our conversation with Markus, we cover his accepted papers at the...

26 Des 202346min

Responsible AI in the Generative Era with Michael Kearns - #662

Responsible AI in the Generative Era with Michael Kearns - #662

Today we’re joined by Michael Kearns, professor in the Department of Computer and Information Science at the University of Pennsylvania and an Amazon scholar. In our conversation with Michael, we disc...

22 Des 202336min

Edutainment for AI and AWS PartyRock with Mike Miller - #661

Edutainment for AI and AWS PartyRock with Mike Miller - #661

Today we’re joined by Mike Miller, director of product at AWS responsible for the company’s “edutainment” products. In our conversation with Mike, we explore AWS PartyRock, a no-code generative AI app...

18 Des 202329min

Data, Systems and ML for Visual Understanding with Cody Coleman - #660

Data, Systems and ML for Visual Understanding with Cody Coleman - #660

Today we’re joined by Cody Coleman, co-founder and CEO of Coactive AI. In our conversation with Cody, we discuss how Coactive has leveraged modern data, systems, and machine learning techniques to del...

14 Des 202338min

Patterns and Middleware for LLM Applications with Kyle Roche - #659

Patterns and Middleware for LLM Applications with Kyle Roche - #659

Today we’re joined by Kyle Roche, founder and CEO of Griptape to discuss patterns and middleware for LLM applications. We dive into the emerging patterns for developing LLM applications, such as off p...

11 Des 202335min

AI Access and Inclusivity as a Technical Challenge with Prem Natarajan - #658

AI Access and Inclusivity as a Technical Challenge with Prem Natarajan - #658

Today we’re joined by Prem Natarajan, chief scientist and head of enterprise AI at Capital One. In our conversation, we discuss AI access and inclusivity as technical challenges and explore some of Pr...

4 Des 202341min

Building LLM-Based Applications with Azure OpenAI with Jay Emery - #657

Building LLM-Based Applications with Azure OpenAI with Jay Emery - #657

Today we’re joined by Jay Emery, director of technical sales & architecture at Microsoft Azure. In our conversation with Jay, we discuss the challenges faced by organizations when building LLM-based a...

28 Nov 202343min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
forklart
stopp-verden
popradet
det-store-bildet
dine-penger-pengeradet
rss-gukild-johaug
bt-dokumentar-2
lydartikler-fra-aftenposten
hanna-de-heldige
fotballpodden-2
nokon-ma-ga
e24-podden
frokostshowet-pa-p5
aftenbla-bla
rss-ness
rss-penger-polser-og-politikk
rss-dannet-uten-piano