How to Find the Agent Failures Your Evals Miss with Scott Clark - #767

How to Find the Agent Failures Your Evals Miss with Scott Clark - #767

In this episode, Scott Clark, co-founder and CEO of Distributional, joins us to explore how teams can reliably operate and improve complex LLM systems and agents in production. Scott introduces a Maslow’s hierarchy of observability: telemetry for logging, monitoring for known signals, and post-production or online analytics to surface unknown unknowns. We dig into examples of real-world failures Scott’s team has seen in production systems, such as “lazy” tool-use hallucinations that standard evals miss, and how mapping traces into vector fingerprints enables clustering and topic discovery to uncover emergent behaviors. Scott explains how analytics can feed the data flywheel by generating evals, guardrails, and training data, and why online, adaptive approaches are essential for non-stationary models. We also touch on practical how-to’s such as instrumentation with OpenTelemetry, the GenAI semantic conventions, and the role of dedicated analytics tools. The complete show notes for this episode can be found at https://twimlai.com/go/767.

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(786)

Relational Foundation Models for Enterprise Data with Jure Leskovec - #768

Relational Foundation Models for Enterprise Data with Jure Leskovec - #768

In this episode, Jure Leskovec, co-founder and chief scientist at Kumo and professor of computer science at Stanford, joins us to explore two fronts of his work: AI for science and relational deep lea...

21 Touko 1h 6min

How to Engineer AI Inference Systems with Philip Kiely - #766

How to Engineer AI Inference Systems with Philip Kiely - #766

In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most cri...

30 Huhti 54min

How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765

How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765

In this episode, Rashmi Shetty, senior director of enterprise generative AI platform at Capital One, joins us to explore how the company is designing, deploying, and scaling multi-agent systems in a h...

16 Huhti 54min

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764

Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used...

26 Maalis 1h 3min

Agent Swarms and Knowledge Graphs for Autonomous Software Development with Siddhant Pardeshi - #763

Agent Swarms and Knowledge Graphs for Autonomous Software Development with Siddhant Pardeshi - #763

In this episode, Sid Pardeshi, co-founder and CTO of Blitzy, joins us to discuss building autonomous development systems able to deliver production-ready software at enterprise scale. Sid contrasts AI...

10 Maalis 1h 16min

AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762

AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762

In this episode, Sebastian Raschka, independent LLM researcher and author, joins us to break down how the LLM landscape has changed over the past year and what is likely to matter most in 2026. We dis...

26 Helmi 1h 18min

The Evolution of Reasoning in Small Language Models with Yejin Choi - #761

The Evolution of Reasoning in Small Language Models with Yejin Choi - #761

Today, we're joined by Yejin Choi, professor and senior fellow at Stanford University in the Computer Science Department and the Institute for Human-Centered AI (HAI). In this conversation, we explore...

29 Tammi 1h 6min

Suosittua kategoriassa Politiikka ja uutiset

uutiscast
aikalisa
politiikan-puskaradio
ootsa-kuullut-tasta-2
rss-ootsa-kuullut-tasta
rss-podme-livebox
tervo-halme
otetaan-yhdet
et-sa-noin-voi-sanoo-esittaa
rss-vaalirankkurit-podcast
rss-kaikki-uusiksi
rss-asiastudio
rss-ulkopoditiikkaa
rss-pinnalla
the-ulkopolitist
rss-sinivalkoinen-islam
rss-hyvaa-huomenta-bryssel