Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. The complete show notes for this episode can be found at https://twimlai.com/go/693.

Jaksot(779)

Transformers On Large-Scale Graphs with Bayan Bruss - #641

Transformers On Large-Scale Graphs with Bayan Bruss - #641

Today we’re joined by Bayan Bruss, Vice President of Applied ML Research at Capital One. In our conversation with Bayan, we covered a pair of papers his team presented at this year’s ICML conference. ...

7 Elo 202338min

The Enterprise LLM Landscape with Atul Deo - #640

The Enterprise LLM Landscape with Atul Deo - #640

Today we’re joined by Atul Deo, General Manager of Amazon Bedrock. In our conversation with Atul, we discuss the process of training large language models in the enterprise, including the pain points ...

31 Heinä 202337min

BloombergGPT - an LLM for Finance with David Rosenberg - #639

BloombergGPT - an LLM for Finance with David Rosenberg - #639

Today we’re joined by David Rosenberg, head of the machine learning strategy team in the Office of the CTO at Bloomberg. In our conversation with David, we discuss the creation of BloombergGPT, a cust...

24 Heinä 202336min

Are LLMs Good at Causal Reasoning? with Robert Osazuwa Ness - #638

Are LLMs Good at Causal Reasoning? with Robert Osazuwa Ness - #638

Today we’re joined by Robert Osazuwa Ness, a senior researcher at Microsoft Research, Professor at Northeastern University, and Founder of Altdeep.ai. In our conversation with Robert, we explore wheth...

17 Heinä 202348min

Privacy vs Fairness in Computer Vision with Alice Xiang - #637

Privacy vs Fairness in Computer Vision with Alice Xiang - #637

Today we’re joined by Alice Xiang, Lead Research Scientist at Sony AI, and Global Head of AI Ethics at Sony Group Corporation. In our conversation with Alice, we discuss the ongoing debate between pri...

10 Heinä 202337min

Unifying Vision and Language Models with Mohit Bansal - #636

Unifying Vision and Language Models with Mohit Bansal - #636

Today we're joined by Mohit Bansal, Parker Professor, and Director of the MURGe-Lab at UNC, Chapel Hill. In our conversation with Mohit, we explore the concept of unification in AI models, highlightin...

3 Heinä 202348min

Data Augmentation and Optimized Architectures for Computer Vision with Fatih Porikli - #635

Data Augmentation and Optimized Architectures for Computer Vision with Fatih Porikli - #635

Today we kick off our coverage of the 2023 CVPR conference joined by Fatih Porikli, a Senior Director of Technology at Qualcomm. In our conversation with Fatih, we covered quite a bit of ground, touch...

26 Kesä 202352min

Mojo: A Supercharged Python for AI with Chris Lattner - #634

Mojo: A Supercharged Python for AI with Chris Lattner - #634

Today we’re joined by Chris Lattner, Co-Founder and CEO of Modular. In our conversation with Chris, we discuss Mojo, a new programming language for AI developers. Mojo is unique in this space and simp...

19 Kesä 202357min

Suosittua kategoriassa Politiikka ja uutiset

aikalisa
tervo-halme
rss-ootsa-kuullut-tasta
ootsa-kuullut-tasta-2
politiikan-puskaradio
rss-vaalirankkurit-podcast
rss-podme-livebox
viisupodi
otetaan-yhdet
et-sa-noin-voi-sanoo-esittaa
rss-tasta-on-kyse-ivan-puopolo-verkkouutiset
io-techin-tekniikkapodcast
linda-maria
rikosmyytit
rss-polikulaari-humanisti-vastaa-ja-muut-ts-podcastit
rss-merja-mahkan-rahat
mtv-uutiset-polloraati
rss-aika-ankkuri
rss-kaikki-uusiksi
rss-raha-talous-ja-politiikka