Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757

In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running all AI workloads on high-end GPUs is unsustainable for agents, which consume significantly more tokens than traditional LLM applications. We explore Gimlet’s approach to heterogeneous inference, which involves disaggregating workloads across a mix of hardware—from H100s to older GPUs and CPUs—to optimize unit economics without sacrificing performance. We dive into their "three-layer cake" architecture: workload disaggregation, a compilation layer that maps models to specific hardware targets, and a novel system that uses LLMs to autonomously rewrite and optimize compute kernels. Finally, we discuss the complexities of networking in heterogeneous environments, the trade-offs between numerical precision and application accuracy, and the future of hardware-aware scheduling. The complete show notes for this episode can be found at https://twimlai.com/go/757.

Episoder(775)

(2/5) Klustera - Location-Based Intelligence for Smarter Marketing - TWiML Talk #18

(2/5) Klustera - Location-Based Intelligence for Smarter Marketing - TWiML Talk #18

This week I'm on location at NYU/ffVC AI NexusLab startup accelerator, speaking with founders from the 5 companies in the program's inaugural batch. This interview is with Klustera, a company applying location-based intelligence and machine learning to help brands execute smarter marketing campaigns. The notes for this series can be found at twimlai.com/nexuslab. Thanks to Future Labs at NYU Tandon and ffVenture Capital for sponsoring the series!

7 Apr 201722min

(1/5) HelloVera - AI-Powered Customer Support  - TWiML Talk #18

(1/5) HelloVera - AI-Powered Customer Support - TWiML Talk #18

This week I'm on location at NYU/ffVC AI NexusLab startup accelerator, speaking with founders from the 5 companies in the program's inaugural batch. This interview is with HelloVera, a company applying artificial intelligence to the challenge of automating customer support experiences. The notes for this series can be found at https://twimlai.com/nexuslab. Thanks to Future Labs at NYU Tandon and ffVenture Capital for sponsoring the series!

7 Apr 201725min

Interactive Machine Learning Systems with Alekh Agarwal - TWiML Talk #17

Interactive Machine Learning Systems with Alekh Agarwal - TWiML Talk #17

This week my guest is Alekh Agarwal. Alekh is a researcher with Microsoft Research whose research is focused on Interactive Machine Learning. In our discussion, Alekh and I discuss various aspects of this exciting area of research such as active learning, reinforcement learning, contextual bandits and more.

31 Mar 201730min

Machine Learning in Cybersecurity with Evan Wright - TWiML Talk #16

Machine Learning in Cybersecurity with Evan Wright - TWiML Talk #16

This week my guest is Evan Wright, principal data scientist at cybersecurity startup Anomali. In my interview with Evan, he and I discussed about a number of topics surrounding the use of machine learning in cybersecurity. If Evan’s name sounds familiar, it’s because Evan was the winner of the O’Reilly Strata+Hadoop World ticket giveaway earlier this month. We met up at the conference last week and took advantage of the opportunity to record this show. Our conversation covers, among other topics, the three big problems in cybersecurity that ML can help out with, the challenges of acquiring ground truth in cybersecurity and some ways to accomplish it, and the use of decision trees, generative adversarial networks, and other algorithms in the field. The show notes can be found at twimlai.com/talk/16.

24 Mar 20171h 4min

Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - TWiML Talk #15

Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - TWiML Talk #15

My guest this week is Stefano Ermon, Assistant Professor of Computer Science at Stanford University, and Fellow at Stanford’s Woods Institute for the Environment. Stefano and I met at the Re-Work Deep Learning Summit earlier this year, where he gave a presentation on Machine Learning for Sustainability. Stefano and I spoke about a wide range of topics, including the relationship between fundamental and applied machine learning research, incorporating domain knowledge in machine learning models, dimensionality reduction, and his interest in applying ML & AI to addressing sustainability issues such as poverty, food security and the environment. The show notes can be found at twimlai.com/talk/15.

17 Mar 201754min

Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta — TWiML Talk #14

Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta — TWiML Talk #14

This week my guest is Shubho Sengupta, Research Scientist at Baidu. I had the pleasure of meeting Shubho at the Rework Deep Learning Summit earlier this year, where he delivered a presentation on Systems Challenges for Deep Learning. We dig into this topic in the interview, and discuss a variety of issues including network architecture, productionalization, operationalization and hardware. The show notes can be found at twimlai.com/talk/14.

10 Mar 20171h 12min

Understanding Deep Neural Nets with Dr. James McCaffrey - TWiML Talk #13

Understanding Deep Neural Nets with Dr. James McCaffrey - TWiML Talk #13

My guest this week is Dr. James McCaffrey, research engineer at Microsoft Research. James and I cover a ton of ground in this conversation, including recurrent neural nets (RNNs), convolutional neural nets (CNNs), long short term memory (LSTM) networks, residual networks (ResNets), generative adversarial networks (GANs), and more. We also discuss neural network architecture and promising alternative approaches such as symbolic computation and particle swarm optimization. The show notes can be found at twimlai.com/talk/13.

3 Mar 20171h 16min

Brendan Frey - Reprogramming the Human Genome with AI - TWiML Talk #12

Brendan Frey - Reprogramming the Human Genome with AI - TWiML Talk #12

My guest this week is Brendan Frey, Professor of Engineering and Medicine at the University of Toronto and Co-Founder and CEO of the startup Deep Genomics. Brendan and I met at the Re-Work Deep Learning Summit in San Francisco last month, where he delivered a great presentation called “Reprogramming the Human Genome: Why AI is Needed.” In this podcast we discuss the application of AI to healthcare. In particular, we dig into how Brendan’s research lab and company are applying machine learning and deep learning to treating and preventing human genetic disorders. The show notes can be found at twimlai.com/talk/12

24 Feb 20171h

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
forklart
aftenpodden-usa
popradet
nokon-ma-ga
stopp-verden
fotballpodden-2
det-store-bildet
dine-penger-pengeradet
rss-gukild-johaug
e24-podden
frokostshowet-pa-p5
rss-ness
aftenbla-bla
rss-penger-polser-og-politikk
rss-dannet-uten-piano
unitedno
bt-dokumentar-2
oppdatert