Episode 49: Rethinking AI Agent Evaluations

Episode 49: Rethinking AI Agent Evaluations

In this episode we explore how companies should evaluate AI agents across multiple dimensions — including correctness, tool selection, multi-turn reasoning, and safety . The conversation covers building reliable evaluation frameworks, balancing automated vs. human-in-the-loop testing, and leveraging observability to debug agent behavior in production.


Links from the Show

AgentCore Evaluation: https://github.com/awslabs/agentcore-samples/tree/main/01-tutorials/07-AgentCore-evaluations

Strands Evaluation: https://strandsagents.com/docs/user-guide/evals-sdk/quickstart/


AWS Hosts: Nolan Chen & Malini Chatterjee

Email Your Feedback: rethinkpodcast@amazon.com

Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(51)

Episode 51: Rethinking Cloud Security in the Age of Zero-Days and AI

Episode 51: Rethinking Cloud Security in the Age of Zero-Days and AI

Modern cloud environments are evolving faster than traditional security models can keep up. In this episode, we sit down with Yarin Pinyan, VP Products at Upwind, to explore how real-time runtime visi...

10 Juni 42min

Episode 50: Rethink Agentic AI: From Shadow Risk to Strategic Value

Episode 50: Rethink Agentic AI: From Shadow Risk to Strategic Value

AI agents move at machine speed. Your security doesn't. Discover how enterprises are finding shadow agents, measuring agent ROI, and transforming agentic AI from risk to strategic value. Learn from Ar...

29 Apr 51min

Episode 48: Rethinking Software Engineering through a Spec-Driven Approach with Kiro

Episode 48: Rethinking Software Engineering through a Spec-Driven Approach with Kiro

In this episode we join AWS Senior GTM Specialists Aidin Khosrowshahi and Rakesh Kumar to unpack how Kiro is transforming software development through a spec-driven approach that goes beyond tradition...

10 Mars 38min

Episode 47: What it Takes to Win in 2026

Episode 47: What it Takes to Win in 2026

"Its amazing to see the rate and pace of advancements in technology in 2025" Todd Pond, AWS Director of Strategic Sales, is out in the field with commercial customers every day, helping them leverage ...

30 Jan 25min

Episode 46: Rethinking Bio Pharma Compliance in the Cloud with Aizon

Episode 46: Rethinking Bio Pharma Compliance in the Cloud with Aizon

Aizon.ai provides an AI software platform for the pharmaceutical and biotech industries to optimize manufacturing processes, ensure GxP compliance, and improve product quality. Their top outcomes incl...

21 Jan 26min

Episode 45: re:Invent 2025 Recap: AI’s Continuing Impact on Individuals and Organizations

Episode 45: re:Invent 2025 Recap: AI’s Continuing Impact on Individuals and Organizations

Nolan and Malini join Shaown Nandi, AWS Director of AGS Technology and Subhash Vanga, Director of Solutions Architecture. They recap re:Invent 2025 announcements and discuss how AI and new technology ...

18 Dec 202544min

Episode 44: The Future Of Wellness, Powered by Gen AI at mindbodygreen

Episode 44: The Future Of Wellness, Powered by Gen AI at mindbodygreen

mindbodygreen has built a global reputation as a go-to destination for wellness — combining expert content, functional nutrition supplements, and a thriving community to help millions live healthier, ...

20 Nov 202532min

Populärt inom Business & ekonomi

badfluence
framgangspodden
varvet
uppgang-och-fall
rss-borsens-finest
svd-tech-brief
avanzapodden
bathina-en-podcast
rss-dagen-med-di
24fragor
lastbilspodden
fill-or-kill
kapitalet-en-podd-om-ekonomi
borsmorgon
rss-den-nya-ekonomin
rss-inga-dumma-fragor-om-pengar
tabberaset
rikatillsammans-om-privatekonomi-rikedom-i-livet
dynastin
affarsvarlden