From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731

Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust alternative to prompting, and how it can improve multi-step tool use capabilities. We also explore the limitations of supervised fine-tuning (SFT) for tool-augmented reasoning tasks, the reward-shaping strategies they’ve used, and Bespoke Labs’ open-source libraries like Curator. We also touch on the models MiniCheck for hallucination detection and MiniChart for chart-based QA. The complete show notes for this episode can be found at https://twimlai.com/go/731.

Jaksot(778)

Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta — TWiML Talk #14

Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta — TWiML Talk #14

This week my guest is Shubho Sengupta, Research Scientist at Baidu. I had the pleasure of meeting Shubho at the Rework Deep Learning Summit earlier this year, where he delivered a presentation on Syst...

10 Maalis 20171h 12min

Understanding Deep Neural Nets with Dr. James McCaffrey - TWiML Talk #13

Understanding Deep Neural Nets with Dr. James McCaffrey - TWiML Talk #13

My guest this week is Dr. James McCaffrey, research engineer at Microsoft Research. James and I cover a ton of ground in this conversation, including recurrent neural nets (RNNs), convolutional neural...

3 Maalis 20171h 16min

Brendan Frey - Reprogramming the Human Genome with AI - TWiML Talk #12

Brendan Frey - Reprogramming the Human Genome with AI - TWiML Talk #12

My guest this week is Brendan Frey, Professor of Engineering and Medicine at the University of Toronto and Co-Founder and CEO of the startup Deep Genomics. Brendan and I met at the Re-Work Deep Learni...

24 Helmi 20171h

Hilary Mason - Building AI Products - TWiML Talk #11

Hilary Mason - Building AI Products - TWiML Talk #11

My guest this time is Hilary Mason. Hilary was one of the first “famous” data scientists. I remember hearing her speak back in 2011 at the Strange Loop conference in St. Louis. At the time she was Chi...

25 Tammi 201717min

Francisco Webber - Statistics vs Semantics for Natural Language Processing - TWiML Talk #10

Francisco Webber - Statistics vs Semantics for Natural Language Processing - TWiML Talk #10

My guest this time is Francisco Webber, founder and General Manager of artificial intelligence startup Cortical.io. Francisco presented at the O’Reilly AI conference on an approach to natural language...

3 Joulu 201649min

Pascale Fung - Emotional AI: Teaching Computers Empathy - TWiML Talk #9

Pascale Fung - Emotional AI: Teaching Computers Empathy - TWiML Talk #9

My guest this time is Pascale Fung, professor of electrical & computer engineering at Hong Kong University of Science and Technology. Pascale delivered a presentation at the recent O'Reilly AI confere...

8 Marras 201634min

Diogo Almeida - Deep Learning: Modular in Theory, Inflexible in Practice - TWiML Talk #8

Diogo Almeida - Deep Learning: Modular in Theory, Inflexible in Practice - TWiML Talk #8

My guest this time is Diogo Almeida, senior data scientist at healthcare startup Enlitic. Diogo and I met at the O'Reilly AI conference, where he delivered a great presentation on in-the-trenches deep...

23 Loka 201646min

Carlos Guestrin - Explaining the Predictions of Machine Learning Models - TWiML Talk #7

Carlos Guestrin - Explaining the Predictions of Machine Learning Models - TWiML Talk #7

My guest this time is Carlos Guestrin, the Amazon professor of Machine Learning at the University of Washington. Carlos and I recorded this podcast at a conference, shortly after Apple's acquisition o...

9 Loka 201631min

Suosittua kategoriassa Politiikka ja uutiset

aikalisa
tervo-halme
rss-ootsa-kuullut-tasta
ootsa-kuullut-tasta-2
politiikan-puskaradio
viisupodi
rss-vaalirankkurit-podcast
rss-podme-livebox
et-sa-noin-voi-sanoo-esittaa
otetaan-yhdet
linda-maria
io-techin-tekniikkapodcast
rss-tasta-on-kyse-ivan-puopolo-verkkouutiset
rikosmyytit
rss-polikulaari-humanisti-vastaa-ja-muut-ts-podcastit
viela-yksi-sivu
rss-uusi-juttu
rss-aika-ankkuri
rss-kaikki-uusiksi
rss-merja-mahkan-rahat