Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. The complete show notes for this episode can be found at https://twimlai.com/go/693.

Avsnitt(781)

How Microsoft Scales Testing and Safety for Generative AI with Sarah Bird - #691

How Microsoft Scales Testing and Safety for Generative AI with Sarah Bird - #691

Today, we're joined by Sarah Bird, chief product officer of responsible AI at Microsoft. We discuss the testing and evaluation techniques Microsoft applies to ensure safe deployment and use of generat...

1 Juli 202457min

Long Context Language Models and their Biological Applications with Eric Nguyen - #690

Long Context Language Models and their Biological Applications with Eric Nguyen - #690

Today, we're joined by Eric Nguyen, PhD student at Stanford University. In our conversation, we explore his research on long context foundation models and their application to biology particularly Hye...

25 Juni 202445min

Accelerating Sustainability with AI with Andres Ravinet - #689

Accelerating Sustainability with AI with Andres Ravinet - #689

Today, we're joined by Andres Ravinet, sustainability global black belt at Microsoft, to discuss the role of AI in sustainability. We explore real-world use cases where AI-driven solutions are leverag...

18 Juni 202447min

Gen AI at the Edge: Qualcomm AI Research at CVPR 2024 with Fatih Porikli - #688

Gen AI at the Edge: Qualcomm AI Research at CVPR 2024 with Fatih Porikli - #688

Today we’re joined by Fatih Porikli, senior director of technology at Qualcomm AI Research. In our conversation, we covered several of the Qualcomm team’s 16 accepted main track and workshop papers at...

10 Juni 20241h 10min

Energy Star Ratings for AI Models with Sasha Luccioni - #687

Energy Star Ratings for AI Models with Sasha Luccioni - #687

Today, we're joined by Sasha Luccioni, AI and Climate lead at Hugging Face, to discuss the environmental impact of AI models. We dig into her recent research into the relative energy consumption of ge...

3 Juni 202448min

Language Understanding and LLMs with Christopher Manning - #686

Language Understanding and LLMs with Christopher Manning - #686

Today, we're joined by Christopher Manning, the Thomas M. Siebel professor in Machine Learning at Stanford University and a recent recipient of the 2024 IEEE John von Neumann medal. In our conversatio...

27 Maj 202456min

Chronos: Learning the Language of Time Series with Abdul Fatir Ansari - #685

Chronos: Learning the Language of Time Series with Abdul Fatir Ansari - #685

Today we're joined by Abdul Fatir Ansari, a machine learning scientist at AWS AI Labs in Berlin, to discuss his paper, "Chronos: Learning the Language of Time Series." Fatir explains the challenges of...

20 Maj 202443min

Powering AI with the World's Largest Computer Chip with Joel Hestness - #684

Powering AI with the World's Largest Computer Chip with Joel Hestness - #684

Today we're joined by Joel Hestness, principal research scientist and lead of the core machine learning team at Cerebras. We discuss Cerebras’ custom silicon for machine learning, Wafer Scale Engine 3...

13 Maj 202455min

Populärt inom Politik & nyheter

svenska-fall
p3-krim
aftonbladet-krim
spar
fordomspodden
rss-krimstad
flashback-forever
rss-sanning-konsekvens
motiv
aftonbladet-daily
krimmagasinet
rss-krimreportrarna
rss-frandfors-horna
rss-vad-fan-hande
politiken
grans
rss-flodet
sydsvenskan-dok
dagens-eko
rss-aftonbladet-krim