Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757

In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running all AI workloads on high-end GPUs is unsustainable for agents, which consume significantly more tokens than traditional LLM applications. We explore Gimlet’s approach to heterogeneous inference, which involves disaggregating workloads across a mix of hardware—from H100s to older GPUs and CPUs—to optimize unit economics without sacrificing performance. We dive into their "three-layer cake" architecture: workload disaggregation, a compilation layer that maps models to specific hardware targets, and a novel system that uses LLMs to autonomously rewrite and optimize compute kernels. Finally, we discuss the complexities of networking in heterogeneous environments, the trade-offs between numerical precision and application accuracy, and the future of hardware-aware scheduling. The complete show notes for this episode can be found at https://twimlai.com/go/757.

Jaksot(780)

Building an AI Mathematician with Carina Hong - #754

Building an AI Mathematician with Carina Hong - #754

In this episode, Carina Hong, founder and CEO of Axiom, joins us to discuss her work building an "AI Mathematician." Carina explains why this is a pivotal moment for AI in mathematics, citing a conver...

4 Marras 202555min

High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753

High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753

In this episode, Hung Bui, Technology Vice President at Qualcomm, joins us to explore the latest high-efficiency techniques for running generative AI, particularly diffusion models, on-device. We dive...

28 Loka 202552min

Vibe Coding's Uncanny Valley with Alexandre Pesant - #752

Vibe Coding's Uncanny Valley with Alexandre Pesant - #752

Today, we're joined by Alexandre Pesant, AI lead at Lovable, who joins us to discuss the evolution and practice of vibe coding. Alex shares his take on how AI is enabling a shift in software developme...

22 Loka 20251h 12min

Dataflow Computing for AI Inference with Kunle Olukotun - #751

Dataflow Computing for AI Inference with Kunle Olukotun - #751

In this episode, we're joined by Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and co-founder and chief technologist at Sambanova Systems, to discuss ...

14 Loka 202557min

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750

Today, we're joined by Jacob Buckman, co-founder and CEO of Manifest AI to discuss achieving long context in transformers. We discuss the bottlenecks of scaling context length and recent techniques to...

7 Loka 202557min

The Decentralized Future of Private AI with Illia Polosukhin - #749

The Decentralized Future of Private AI with Illia Polosukhin - #749

In this episode, Illia Polosukhin, a co-author of the seminal "Attention Is All You Need" paper and co-founder of Near AI, joins us to discuss his vision for building private, decentralized, and user-...

30 Syys 20251h 5min

Inside Nano Banana 🍌 and the Future of Vision-Language Models with Oliver Wang - #748

Inside Nano Banana 🍌 and the Future of Vision-Language Models with Oliver Wang - #748

Today, we’re joined by Oliver Wang, principal scientist at Google DeepMind and tech lead for Gemini 2.5 Flash Image—better known by its code name, “Nano Banana.” We dive into the development and capab...

23 Syys 20251h 3min

Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747

Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747

Today, we're joined by Aditi Raghunathan, assistant professor at Carnegie Mellon University, to discuss the limitations of LLMs and how we can build more adaptable and creative models. We dig into her...

16 Syys 202558min

Suosittua kategoriassa Politiikka ja uutiset

uutiscast
aikalisa
rss-ootsa-kuullut-tasta
politiikan-puskaradio
ootsa-kuullut-tasta-2
tervo-halme
viisupodi
rss-podme-livebox
otetaan-yhdet
et-sa-noin-voi-sanoo-esittaa
rss-asiastudio
the-ulkopolitist
rss-sanna-ukkola-show-verkkouutiset
io-techin-tekniikkapodcast
rikosmyytit
rss-mina-ukkola
rss-kovin-paikka
rss-hyvaa-huomenta-bryssel
rss-terveisia-seelannista
rss-tasta-on-kyse-ivan-puopolo-verkkouutiset