Beyond Accuracy: Behavioral Testing of NLP Models with Sameer Singh - #406

Beyond Accuracy: Behavioral Testing of NLP Models with Sameer Singh - #406

Today we’re joined by Sameer Singh, an assistant professor in the department of computer science at UC Irvine. Sameer’s work centers on large-scale and interpretable machine learning applied to information extraction and natural language processing. We caught up with Sameer right after he was awarded the best paper award at ACL 2020 for his work on Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. In our conversation, we explore CheckLists, the task-agnostic methodology for testing NLP models introduced in the paper. We also discuss how well we understand the cause of pitfalls or failure modes in deep learning models, Sameer’s thoughts on embodied AI, and his work on the now famous LIME paper, which he co-authored alongside Carlos Guestrin. The complete show notes for this episode can be found at twimlai.com/go/406.

Avsnitt(781)

Agent Swarms and Knowledge Graphs for Autonomous Software Development with Siddhant Pardeshi - #763

Agent Swarms and Knowledge Graphs for Autonomous Software Development with Siddhant Pardeshi - #763

In this episode, Sid Pardeshi, co-founder and CTO of Blitzy, joins us to discuss building autonomous development systems able to deliver production-ready software at enterprise scale. Sid contrasts AI...

10 Mars 1h 16min

AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762

AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762

In this episode, Sebastian Raschka, independent LLM researcher and author, joins us to break down how the LLM landscape has changed over the past year and what is likely to matter most in 2026. We dis...

26 Feb 1h 18min

The Evolution of Reasoning in Small Language Models with Yejin Choi - #761

The Evolution of Reasoning in Small Language Models with Yejin Choi - #761

Today, we're joined by Yejin Choi, professor and senior fellow at Stanford University in the Computer Science Department and the Institute for Human-Centered AI (HAI). In this conversation, we explore...

29 Jan 1h 6min

Intelligent Robots in 2026: Are We There Yet? with Nikita Rudin - #760

Intelligent Robots in 2026: Are We There Yet? with Nikita Rudin - #760

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion Robotics to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real wo...

8 Jan 1h 6min

Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery - #759

Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery - #759

Today, we're joined by Aakanksha Chowdhery, member of technical staff at Reflection, to explore the fundamental shifts required to build true agentic AI. While the industry has largely focused on post...

17 Dec 202552min

Why Vision Language Models Ignore What They See with Munawar Hayat - #758

Why Vision Language Models Ignore What They See with Munawar Hayat - #758

In this episode, we’re joined by Munawar Hayat, researcher at Qualcomm AI Research, to discuss a series of papers presented at NeurIPS 2025 focusing on multimodal and generative AI. We dive into the p...

9 Dec 202557min

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757

In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running al...

2 Dec 202548min

Proactive Agents for the Web with Devi Parikh - #756

Proactive Agents for the Web with Devi Parikh - #756

Today, we're joined by Devi Parikh, co-founder and co-CEO of Yutori, to discuss browser use models and a future where we interact with the web through proactive, autonomous agents. We explore the tech...

19 Nov 202556min

Populärt inom Politik & nyheter

svenska-fall
p3-krim
aftonbladet-krim
spar
fordomspodden
rss-krimstad
flashback-forever
rss-vad-fan-hande
aftonbladet-daily
motiv
rss-sanning-konsekvens
krimmagasinet
politiken
rss-krimreportrarna
rss-frandfors-horna
kungligt
grans
rss-flodet
dagens-eko
sydsvenskan-dok