AI Agents: Substance or Snake Oil with Arvind Narayanan - #704

AI Agents: Substance or Snake Oil with Arvind Narayanan - #704

Today, we're joined by Arvind Narayanan, professor of Computer Science at Princeton University to discuss his recent works, AI Agents That Matter and AI Snake Oil. In “AI Agents That Matter”, we explore the range of agentic behaviors, the challenges in benchmarking agents, and the ‘capability and reliability gap’, which creates risks when deploying AI agents in real-world applications. We also discuss the importance of verifiers as a technique for safeguarding agent behavior. We then dig into the AI Snake Oil book, which uncovers examples of problematic and overhyped claims in AI. Arvind shares various use cases of failed applications of AI, outlines a taxonomy of AI risks, and shares his insights on AI’s catastrophic risks. Additionally, we also touched on different approaches to LLM-based reasoning, his views on tech policy and regulation, and his work on CORE-Bench, a benchmark designed to measure AI agents' accuracy in computational reproducibility tasks. The complete show notes for this episode can be found at https://twimlai.com/go/704.

Episoder(778)

AI Agents for Data Analysis with Shreya Shankar - #703

AI Agents for Data Analysis with Shreya Shankar - #703

Today, we're joined by Shreya Shankar, a PhD student at UC Berkeley to discuss DocETL, a declarative system for building and optimizing LLM-powered data processing pipelines for large-scale and comple...

30 Sep 202448min

Stealing Part of a Production Language Model with Nicholas Carlini - #702

Stealing Part of a Production Language Model with Nicholas Carlini - #702

Today, we're joined by Nicholas Carlini, research scientist at Google DeepMind to discuss adversarial machine learning and model security, focusing on his 2024 ICML best paper winner, “Stealing part o...

23 Sep 20241h 3min

Supercharging Developer Productivity with ChatGPT and Claude with Simon Willison - #701

Supercharging Developer Productivity with ChatGPT and Claude with Simon Willison - #701

Today, we're joined by Simon Willison, independent researcher and creator of Datasette to discuss the many ways software developers and engineers can take advantage of large language models (LLMs) to ...

16 Sep 20241h 14min

Automated Design of Agentic Systems with Shengran Hu - #700

Automated Design of Agentic Systems with Shengran Hu - #700

Today, we're joined by Shengran Hu, a PhD student at the University of British Columbia, to discuss Automated Design of Agentic Systems (ADAS), an approach focused on automatically creating agentic sy...

2 Sep 202459min

The EU AI Act and Mitigating Bias in Automated Decisioning with Peter van der Putten - #699

The EU AI Act and Mitigating Bias in Automated Decisioning with Peter van der Putten - #699

Today, we're joined by Peter van der Putten, director of the AI Lab at Pega and assistant professor of AI at Leiden University. We discuss the newly adopted European AI Act and the challenges of apply...

27 Aug 202445min

The Building Blocks of Agentic Systems with Harrison Chase - #698

The Building Blocks of Agentic Systems with Harrison Chase - #698

Today, we're joined by Harrison Chase, co-founder and CEO of LangChain to discuss LLM frameworks, agentic systems, RAG, evaluation, and more. We dig into the elements of a modern LLM framework, includ...

19 Aug 202459min

Simplifying On-Device AI for Developers with Siddhika Nevrekar - #697

Simplifying On-Device AI for Developers with Siddhika Nevrekar - #697

Today, we're joined by Siddhika Nevrekar, AI Hub head at Qualcomm Technologies, to discuss on-device AI and how to make it easier for developers to take advantage of device capabilities. We unpack the...

12 Aug 202446min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
forklart
stopp-verden
popradet
det-store-bildet
dine-penger-pengeradet
rss-gukild-johaug
bt-dokumentar-2
lydartikler-fra-aftenposten
hanna-de-heldige
fotballpodden-2
nokon-ma-ga
e24-podden
frokostshowet-pa-p5
aftenbla-bla
rss-ness
rss-penger-polser-og-politikk
rss-dannet-uten-piano