Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. The complete show notes for this episode can be found at https://twimlai.com/go/693.

Jaksot(778)

Personalization for Text-to-Image Generative AI with Nataniel Ruiz - #648

Personalization for Text-to-Image Generative AI with Nataniel Ruiz - #648

Today we’re joined by Nataniel Ruiz, a research scientist at Google. In our conversation with Nataniel, we discuss his recent work around personalization for text-to-image AI models. Specifically, we ...

25 Syys 202344min

Ensuring LLM Safety for Production Applications with Shreya Rajpal - #647

Ensuring LLM Safety for Production Applications with Shreya Rajpal - #647

Today we’re joined by Shreya Rajpal, founder and CEO of Guardrails AI. In our conversation with Shreya, we discuss ensuring the safety and reliability of language models for production applications. W...

18 Syys 202340min

What’s Next in LLM Reasoning? with Roland Memisevic - #646

What’s Next in LLM Reasoning? with Roland Memisevic - #646

Today we’re joined by Roland Memisevic, a senior director at Qualcomm AI Research. In our conversation with Roland, we discuss the significance of language in humanlike AI systems and the advantages a...

11 Syys 202359min

Is ChatGPT Getting Worse? with James Zou - #645

Is ChatGPT Getting Worse? with James Zou - #645

Today we’re joined by James Zou, an assistant professor at Stanford University. In our conversation with James, we explore the differences in ChatGPT’s behavior over the last few months. We discuss th...

4 Syys 202342min

Why Deep Networks and Brains Learn Similar Features with Sophia Sanborn - #644

Why Deep Networks and Brains Learn Similar Features with Sophia Sanborn - #644

Today we’re joined by Sophia Sanborn, a postdoctoral scholar at the University of California, Santa Barbara. In our conversation with Sophia, we explore the concept of universality between neural repr...

28 Elo 202345min

Inverse Reinforcement Learning Without RL with Gokul Swamy - #643

Inverse Reinforcement Learning Without RL with Gokul Swamy - #643

Today we’re joined by Gokul Swamy, a Ph.D. Student at the Robotics Institute at Carnegie Mellon University. In the final conversation of our ICML 2023 series, we sat down with Gokul to discuss his acc...

21 Elo 202333min

Explainable AI for Biology and Medicine with Su-In Lee - #642

Explainable AI for Biology and Medicine with Su-In Lee - #642

Today we’re joined by Su-In Lee, a professor at the Paul G. Allen School of Computer Science And Engineering at the University Of Washington. In our conversation, Su-In details her talk from the ICML ...

14 Elo 202338min

Transformers On Large-Scale Graphs with Bayan Bruss - #641

Transformers On Large-Scale Graphs with Bayan Bruss - #641

Today we’re joined by Bayan Bruss, Vice President of Applied ML Research at Capital One. In our conversation with Bayan, we covered a pair of papers his team presented at this year’s ICML conference. ...

7 Elo 202338min

Suosittua kategoriassa Politiikka ja uutiset

aikalisa
tervo-halme
rss-ootsa-kuullut-tasta
ootsa-kuullut-tasta-2
politiikan-puskaradio
viisupodi
rss-vaalirankkurit-podcast
rss-podme-livebox
et-sa-noin-voi-sanoo-esittaa
otetaan-yhdet
linda-maria
io-techin-tekniikkapodcast
rss-tasta-on-kyse-ivan-puopolo-verkkouutiset
rikosmyytit
rss-polikulaari-humanisti-vastaa-ja-muut-ts-podcastit
viela-yksi-sivu
rss-uusi-juttu
rss-aika-ankkuri
rss-kaikki-uusiksi
rss-merja-mahkan-rahat