Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. The complete show notes for this episode can be found at https://twimlai.com/go/693.

Jaksot(779)

Are Large Language Models a Path to AGI? with Ben Goertzel - #625

Are Large Language Models a Path to AGI? with Ben Goertzel - #625

Today we’re joined by Ben Goertzel, CEO of SingularityNET. In our conversation with Ben, we explore all things AGI, including the potential scenarios that could arise with the advent of AGI and his pr...

17 Huhti 202359min

Open Source Generative AI at Hugging Face with Jeff Boudier - #624

Open Source Generative AI at Hugging Face with Jeff Boudier - #624

Today we’re joined by Jeff Boudier, head of product at Hugging Face 🤗. In our conversation with Jeff, we explore the current landscape of open-source machine learning tools and models, the recent shi...

11 Huhti 202333min

Generative AI at the Edge with Vinesh Sukumar - #623

Generative AI at the Edge with Vinesh Sukumar - #623

Today we’re joined by Vinesh Sukumar, a senior director and head of AI/ML product management at Qualcomm Technologies. In our conversation with Vinesh, we explore how mobile and automotive devices hav...

3 Huhti 202339min

Runway Gen-2: Generative AI for Video Creation with Anastasis Germanidis - #622

Runway Gen-2: Generative AI for Video Creation with Anastasis Germanidis - #622

Today we’re joined by Anastasis Germanidis, Co-Founder and CTO of RunwayML. Amongst all the product and model releases over the past few months, Runway threw its hat into the ring with Gen-1, a model ...

27 Maalis 202349min

Watermarking Large Language Models to Fight Plagiarism with Tom Goldstein - 621

Watermarking Large Language Models to Fight Plagiarism with Tom Goldstein - 621

Today we’re joined by Tom Goldstein, an associate professor at the University of Maryland. Tom’s research sits at the intersection of ML and optimization and has previously been featured in the New Yo...

20 Maalis 202351min

Does ChatGPT “Think”? A Cognitive Neuroscience Perspective with Anna Ivanova - #620

Does ChatGPT “Think”? A Cognitive Neuroscience Perspective with Anna Ivanova - #620

Today we’re joined by Anna Ivanova, a postdoctoral researcher at MIT Quest for Intelligence. In our conversation with Anna, we discuss her recent paper Dissociating language and thought in large langu...

13 Maalis 202345min

Robotic Dexterity and Collaboration with Monroe Kennedy III - #619

Robotic Dexterity and Collaboration with Monroe Kennedy III - #619

Today we’re joined by Monroe Kennedy III, an assistant professor at Stanford, director of the Assistive Robotics and Manipulation Lab, and a national director of Black in Robotics. In our conversation...

6 Maalis 202352min

Privacy and Security for Stable Diffusion and LLMs with Nicholas Carlini - #618

Privacy and Security for Stable Diffusion and LLMs with Nicholas Carlini - #618

Today we’re joined by Nicholas Carlini, a research scientist at Google Brain. Nicholas works at the intersection of machine learning and computer security, and his recent paper “Extracting Training Da...

27 Helmi 202343min

Suosittua kategoriassa Politiikka ja uutiset

aikalisa
rss-ootsa-kuullut-tasta
tervo-halme
ootsa-kuullut-tasta-2
politiikan-puskaradio
rss-vaalirankkurit-podcast
viisupodi
rss-podme-livebox
otetaan-yhdet
et-sa-noin-voi-sanoo-esittaa
rss-tasta-on-kyse-ivan-puopolo-verkkouutiset
rss-asiastudio
the-ulkopolitist
mtv-uutiset-polloraati
rss-kaikki-uusiksi
rss-hyvaa-huomenta-bryssel
rss-merja-mahkan-rahat
rss-kuka-mina-olen
rss-raha-talous-ja-politiikka
rss-sanna-ukkola-show-verkkouutiset