Proactive Agents for the Web with Devi Parikh - #756

Proactive Agents for the Web with Devi Parikh - #756

Today, we're joined by Devi Parikh, co-founder and co-CEO of Yutori, to discuss browser use models and a future where we interact with the web through proactive, autonomous agents. We explore the technical challenges of creating reliable web agents, the advantages of visually-grounded models that operate on screenshots rather than the browser’s more brittle document object model, or DOM, and why this counterintuitive choice has proven far more robust and generalizable for handling complex web interfaces. Devi also shares insights into Yutori’s training pipeline, which has evolved from supervised fine-tuning to include rejection sampling and reinforcement learning. Finally, we discuss how Yutori’s “Scouts” agents orchestrate multiple tools and sub-agents to handle complex queries, the importance of background, "ambient" operation for these systems, and what the path looks like from simple monitoring to full task automation on the web. The complete show notes for this episode can be found at https://twimlai.com/go/756.

Jaksot(780)

How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730

How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730

Today, we're joined by Josh Tobin, member of technical staff at OpenAI, to discuss the company’s approach to building AI agents. We cover OpenAI's three agentic offerings—Deep Research for comprehensi...

6 Touko 20251h 7min

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evalu...

30 Huhti 202556min

Generative Benchmarking with Kelly Hong - #728

Generative Benchmarking with Kelly Hong - #728

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly...

23 Huhti 202554min

Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727

Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727

In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two recent papers: "Circuit Tracing: Revealing Language Model Computational Graphs" and "On the Biology of a Lar...

14 Huhti 20251h 34min

Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726

Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726

Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into...

8 Huhti 202551min

Waymo's Foundation Model for Autonomous Driving with Drago Anguelov - #725

Waymo's Foundation Model for Autonomous Driving with Drago Anguelov - #725

Today, we're joined by Drago Anguelov, head of AI foundations at Waymo, for a deep dive into the role of foundation models in autonomous driving. Drago shares how Waymo is leveraging large-scale machi...

31 Maalis 20251h 9min

Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724

Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724

Today, we're joined by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible L...

24 Maalis 202550min

Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping - #723

Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping - #723

Today, we're joined by Jonas Geiping, research group leader at Ellis Institute and the Max Planck Institute for Intelligent Systems to discuss his recent paper, “Scaling up Test-Time Compute with Late...

17 Maalis 202558min

Suosittua kategoriassa Politiikka ja uutiset

aikalisa
ootsa-kuullut-tasta-2
tervo-halme
rss-ootsa-kuullut-tasta
politiikan-puskaradio
rss-vaalirankkurit-podcast
viisupodi
rss-podme-livebox
rss-asiastudio
otetaan-yhdet
et-sa-noin-voi-sanoo-esittaa
the-ulkopolitist
rss-tasta-on-kyse-ivan-puopolo-verkkouutiset
rss-kaikki-uusiksi
rss-hyvaa-huomenta-bryssel
radio-antro
rss-kiina-ilmiot
rss-kovin-paikka
rss-polikulaari-pitka-kiekko-ja-muut-ts-podcastit
rss-vain-talouselamaa