Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. The complete show notes for this episode can be found at https://twimlai.com/go/693.

Episoder(779)

Understanding AI’s Impact on Social Disparities with Vinodkumar Prabhakaran - #617

Understanding AI’s Impact on Social Disparities with Vinodkumar Prabhakaran - #617

Today we’re joined by Vinodkumar Prabhakaran, a Senior Research Scientist at Google Research. In our conversation with Vinod, we discuss his two main areas of research, using ML, specifically NLP, to ...

20 Feb 202331min

AI Trends 2023: Causality and the Impact on Large Language Models with Robert Osazuwa Ness - #616

AI Trends 2023: Causality and the Impact on Large Language Models with Robert Osazuwa Ness - #616

Today we’re joined by Robert Osazuwa Ness, a senior researcher at Microsoft Research, to break down the latest trends in the world of causal modeling. In our conversation with Robert, we explore advan...

14 Feb 20231h 22min

Data-Centric Zero-Shot Learning for Precision Agriculture with Dimitris Zermas - #615

Data-Centric Zero-Shot Learning for Precision Agriculture with Dimitris Zermas - #615

Today we’re joined by Dimitris Zermas, a principal scientist at agriscience company Sentera. Dimitris’ work at Sentera is focused on developing tools for precision agriculture using machine learning, ...

6 Feb 202332min

How LLMs and Generative AI are Revolutionizing AI for Science with Anima Anandkumar - #614

How LLMs and Generative AI are Revolutionizing AI for Science with Anima Anandkumar - #614

Today we’re joined by Anima Anandkumar, Bren Professor of Computing And Mathematical Sciences at Caltech and Sr Director of AI Research at NVIDIA. In our conversation, we take a broad look at the emer...

30 Jan 20231h 1min

AI Trends 2023: Natural Language Proc - ChatGPT, GPT-4 and Cutting Edge Research with Sameer Singh - #613

AI Trends 2023: Natural Language Proc - ChatGPT, GPT-4 and Cutting Edge Research with Sameer Singh - #613

Today we continue our AI Trends 2023 series joined by Sameer Singh, an associate professor in the department of computer science at UC Irvine and fellow at the Allen Institute for Artificial Intellige...

23 Jan 20231h 45min

AI Trends 2023: Reinforcement Learning - RLHF, Robotic Pre-Training, and Offline RL with Sergey Levine - #612

AI Trends 2023: Reinforcement Learning - RLHF, Robotic Pre-Training, and Offline RL with Sergey Levine - #612

Today we’re taking a deep dive into the latest and greatest in the world of Reinforcement Learning with our friend Sergey Levine, an associate professor, at UC Berkeley. In our conversation with Serge...

16 Jan 202359min

Supporting Food Security in Africa Using ML with Catherine Nakalembe - #611

Supporting Food Security in Africa Using ML with Catherine Nakalembe - #611

Today we conclude our coverage of the 2022 NeurIPS series joined by Catherine Nakalembe, an associate research professor at the University of Maryland, and Africa Program Director under NASA Harvest. ...

9 Jan 20231h 6min

Service Cards and ML Governance with Michael Kearns - #610

Service Cards and ML Governance with Michael Kearns - #610

Today we conclude our AWS re:Invent 2022 series joined by Michael Kearns, a professor in the department of computer and information science at UPenn, as well as an Amazon Scholar. In our conversation,...

2 Jan 202339min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
i-retten
forklart
popradet
fotballpodden-2
rss-gukild-johaug
dine-penger-pengeradet
stopp-verden
nokon-ma-ga
det-store-bildet
bt-dokumentar-2
hanna-de-heldige
rss-penger-polser-og-politikk
chit-chat-med-helle
frokostshowet-pa-p5
aftenbla-bla
e24-podden
rss-dannet-uten-piano