Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs.

They discuss the concept of vibes-based evaluation, which examines not just accuracy but also the style and tone of model responses, and how Chatbot Arena has become a community-driven benchmark for open-source and commercial LLMs. Joseph shares insights on democratizing model evaluation, refining AI-human interactions, and leveraging human preferences to improve model performance. This episode provides a deep dive into the evolving landscape of LLM evaluation and its impact on AI development.

🎙 Get our podcasts on these platforms:

Apple Podcasts: http://wandb.me/apple-podcasts

Spotify: http://wandb.me/spotify

Google: http://wandb.me/gd_google

YouTube: http://wandb.me/youtube


Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb


Join the Weights & Biases Discord Server:

https://discord.gg/CkZKRNnaf3

Avsnitt(134)

Matthew Davis — Bringing Genetic Insights to Everyone

Matthew Davis — Bringing Genetic Insights to Everyone

Matthew explains how combining machine learning and computational biology can provide mainstream medicine with better diagnostics and insights. --- Matthew Davis is Head of AI at Invitae, the larges...

17 Juni 202143min

Clément Delangue — The Power of the Open Source Community

Clément Delangue — The Power of the Open Source Community

Clem explains the virtuous cycles behind the creation and success of Hugging Face, and shares his thoughts on where NLP is heading. --- Clément Delangue is co-founder and CEO of Hugging Face, the AI...

10 Juni 202146min

Wojciech Zaremba — What Could Make AI Conscious?

Wojciech Zaremba — What Could Make AI Conscious?

Wojciech joins us to talk the principles behind OpenAI, the Fermi Paradox, and the future stages of developments in AGI. --- Wojciech Zaremba is a co-founder of OpenAI, a research company dedicated ...

3 Juni 202144min

Phil Brown — How IPUs are Advancing Machine Intelligence

Phil Brown — How IPUs are Advancing Machine Intelligence

Phil shares some of the approaches, like sparsity and low precision, behind the breakthrough performance of Graphcore's Intelligence Processing Units (IPUs). --- Phil Brown leads the Applications te...

27 Maj 202157min

Alyssa Simpson Rochwerger — Responsible ML in the Real World

Alyssa Simpson Rochwerger — Responsible ML in the Real World

From working on COVID-19 vaccine rollout to writing a book on responsible ML, Alyssa shares her thoughts on meaningful projects and the importance of teamwork. --- Alyssa Simpson Rochwerger is as a ...

20 Maj 202145min

Sean Taylor — Business Decision Problems

Sean Taylor — Business Decision Problems

Sean joins us to chat about ML models and tools at Lyft Rideshare Labs, Python vs R, time series forecasting with Prophet, and election forecasting. --- Sean Taylor is a Data Scientist at (and forme...

13 Maj 202145min

Polly Fordyce — Microfluidic Platforms and Machine Learning

Polly Fordyce — Microfluidic Platforms and Machine Learning

Polly explains how microfluidics allow bioengineering researchers to create high throughput data, and shares her experiences with biology and machine learning. --- Polly Fordyce is an Assistant Pro...

29 Apr 202145min

Adrien Gaidon — Advancing ML Research in Autonomous Vehicles

Adrien Gaidon — Advancing ML Research in Autonomous Vehicles

Adrien Gaidon shares his approach to building teams and taking state-of-the-art research from conception to production at Toyota Research Institute. --- Adrien Gaidon is the Head of Machine Learning...

22 Apr 202148min

Populärt inom Business & ekonomi

framgangspodden
varvet
badfluence
rss-jossan-nina
rss-borsens-finest
rss-svart-marknad
uppgang-och-fall
svd-tech-brief
avanzapodden
rss-dagen-med-di
lastbilspodden
borsmorgon
fill-or-kill
rss-inga-dumma-fragor-om-pengar
rss-kort-lang-analyspodden-fran-di
bathina-en-podcast
kapitalet-en-podd-om-ekonomi
rss-den-nya-ekonomin
tabberaset
affarsvarlden