AI Agents: Substance or Snake Oil with Arvind Narayanan - #704

AI Agents: Substance or Snake Oil with Arvind Narayanan - #704

Today, we're joined by Arvind Narayanan, professor of Computer Science at Princeton University to discuss his recent works, AI Agents That Matter and AI Snake Oil. In “AI Agents That Matter”, we explore the range of agentic behaviors, the challenges in benchmarking agents, and the ‘capability and reliability gap’, which creates risks when deploying AI agents in real-world applications. We also discuss the importance of verifiers as a technique for safeguarding agent behavior. We then dig into the AI Snake Oil book, which uncovers examples of problematic and overhyped claims in AI. Arvind shares various use cases of failed applications of AI, outlines a taxonomy of AI risks, and shares his insights on AI’s catastrophic risks. Additionally, we also touched on different approaches to LLM-based reasoning, his views on tech policy and regulation, and his work on CORE-Bench, a benchmark designed to measure AI agents' accuracy in computational reproducibility tasks. The complete show notes for this episode can be found at https://twimlai.com/go/704.

Avsnitt(778)

LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736

LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736

Today, we're joined by Ben Wellington, deputy head of feature forecasting at Two Sigma. We dig into the team’s end-to-end approach to leveraging AI in equities feature forecasting, covering how they i...

17 Juni 202559min

Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735

Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735

Today, we're joined by Jason Corso, co-founder of Voxel51 and professor at the University of Michigan, to explore automated labeling in computer vision. Jason introduces FiftyOne, an open-source platf...

10 Juni 202556min

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles fro...

5 Juni 20251h 25min

Google I/O 2025 Special Edition - #733

Google I/O 2025 Special Edition - #733

Today, I’m excited to share a special crossover edition of the podcast recorded live from Google I/O 2025! In this episode, I join Shawn Wang aka Swyx from the Latent Space Podcast, to interview Logan...

28 Maj 202526min

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732

Today, we're joined by Sebastian Gehrmann, head of responsible AI in the Office of the CTO at Bloomberg, to discuss AI safety in retrieval-augmented generation (RAG) systems and generative AI in high-...

21 Maj 202557min

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731

Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mah...

13 Maj 20251h 1min

How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730

How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730

Today, we're joined by Josh Tobin, member of technical staff at OpenAI, to discuss the company’s approach to building AI agents. We cover OpenAI's three agentic offerings—Deep Research for comprehensi...

6 Maj 20251h 7min

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evalu...

30 Apr 202556min

Populärt inom Politik & nyheter

p3-krim
rss-krimstad
svenska-fall
rss-viva-fotboll
flashback-forever
motiv
aftonbladet-daily
rss-vad-fan-hande
rss-sanning-konsekvens
aftonbladet-krim
rss-krimreportrarna
olyckan-inifran
rss-frandfors-horna
fordomspodden
dagens-eko
spar
rss-flodet
blenda-2
politiken
rss-klubbland-en-podd-mest-om-frolunda