Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs.

They discuss the concept of vibes-based evaluation, which examines not just accuracy but also the style and tone of model responses, and how Chatbot Arena has become a community-driven benchmark for open-source and commercial LLMs. Joseph shares insights on democratizing model evaluation, refining AI-human interactions, and leveraging human preferences to improve model performance. This episode provides a deep dive into the evolving landscape of LLM evaluation and its impact on AI development.

🎙 Get our podcasts on these platforms:

Apple Podcasts: http://wandb.me/apple-podcasts

Spotify: http://wandb.me/spotify

Google: http://wandb.me/gd_google

YouTube: http://wandb.me/youtube

Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Join the Weights & Biases Discord Server:

https://discord.gg/CkZKRNnaf3

Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(136)

Sarah Catanzaro — Remembering the Lessons of the Last AI Renaissance

Sarah Catanzaro — Remembering the Lessons of the Last AI Renaissance

Sarah Catanzaro is a General Partner at Amplify Partners, and one of the leading investors in AI and ML. Her investments include RunwayML, OctoML, and Gantry.Sarah and Lukas discuss lessons learned fr...

2 Feb 20231h 16min

Cristóbal Valenzuela — The Next Generation of Content Creation and AI

Cristóbal Valenzuela — The Next Generation of Content Creation and AI

Cristóbal Valenzuela is co-founder and CEO of Runway ML, a startup that's building the future of AI-powered content creation tools. Runway's research areas include diffusion systems for image generati...

19 Jan 202340min

Jeremy Howard — The Simple but Profound Insight Behind Diffusion

Jeremy Howard — The Simple but Profound Insight Behind Diffusion

Jeremy Howard is a co-founder of fast.ai, the non-profit research group behind the popular massive open online course "Practical Deep Learning for Coders", and the open source deep learning library "f...

5 Jan 20231h 12min

Jerome Pesenti — Large Language Models, PyTorch, and Meta

Jerome Pesenti — Large Language Models, PyTorch, and Meta

Jerome Pesenti is the former VP of AI at Meta, a tech conglomerate that includes Facebook, WhatsApp, and Instagram, and one of the most exciting places where AI research is happening today.Jerome shar...

22 Dec 202252min

D. Sculley — Technical Debt, Trade-offs, and Kaggle

D. Sculley — Technical Debt, Trade-offs, and Kaggle

D. Sculley is CEO of Kaggle, the beloved and well-known data science and machine learning community.D. discusses his influential 2015 paper "Machine Learning: The High Interest Credit Card of Technica...

1 Dec 20221h

Emad Mostaque — Stable Diffusion, Stability AI, and What’s Next

Emad Mostaque — Stable Diffusion, Stability AI, and What’s Next

Emad Mostaque is CEO and co-founder of Stability AI, a startup and network of decentralized developer communities building open AI tools. Stability AI is the company behind Stable Diffusion, the well-...

15 Nov 20221h 10min

Jehan Wickramasuriya — AI in High-Stress Scenarios

Jehan Wickramasuriya — AI in High-Stress Scenarios

Jehan Wickramasuriya is the Vice President of AI, Platform & Data Services at Motorola Solutions, a global leader in public safety and enterprise security.In this episode, Jehan discusses how Motorola...

6 Okt 20221h

Will Falcon — Making Lightning the Apple of ML

Will Falcon — Making Lightning the Apple of ML

Will Falcon is the CEO and co-founder of Lightning AI, a platform that enables users to quickly build and publish ML models.In this episode, Will explains how Lightning addresses the challenges of a f...

15 Sep 202245min

Populärt inom Business & ekonomi

framgangspodden
varvet
rss-jossan-nina
badfluence
rss-svart-marknad
rss-borsens-finest
svd-tech-brief
avanzapodden
uppgang-och-fall
bathina-en-podcast
fill-or-kill
kapitalet-en-podd-om-ekonomi
dynastin
tabberaset
rss-dagen-med-di
lastbilspodden
rss-inga-dumma-fragor-om-pengar
rikatillsammans-om-privatekonomi-rikedom-i-livet
rss-kort-lang-analyspodden-fran-di
rss-dr-bjorklund