Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs.

They discuss the concept of vibes-based evaluation, which examines not just accuracy but also the style and tone of model responses, and how Chatbot Arena has become a community-driven benchmark for open-source and commercial LLMs. Joseph shares insights on democratizing model evaluation, refining AI-human interactions, and leveraging human preferences to improve model performance. This episode provides a deep dive into the evolving landscape of LLM evaluation and its impact on AI development.

🎙 Get our podcasts on these platforms:

Apple Podcasts: http://wandb.me/apple-podcasts

Spotify: http://wandb.me/spotify

Google: http://wandb.me/gd_google

YouTube: http://wandb.me/youtube


Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb


Join the Weights & Biases Discord Server:

https://discord.gg/CkZKRNnaf3

Episoder(134)

Ines & Sofie — Building Industrial-Strength NLP Pipelines

Ines & Sofie — Building Industrial-Strength NLP Pipelines

Sofie and Ines walk us through how the new spaCy library helps build end to end SOTA natural language processing workflows. Ines Montani is the co-founder of Explosion AI, a digital studio specializi...

29 Okt 202058min

Daeil Kim — The Unreasonable Effectiveness of Synthetic Data

Daeil Kim — The Unreasonable Effectiveness of Synthetic Data

Supercharging computer vision model performance by generating years of training data in minutes. Daeil Kim is the co-founder and CEO of AI.Reverie(https://aireverie.com/), a startup that specializes ...

16 Okt 202037min

Joaquin Candela — Definitions of Fairness

Joaquin Candela — Definitions of Fairness

Joaquin chats about scaling and democratizing AI at Facebook, while understanding fairness and algorithmic bias. --- Joaquin Quiñonero Candela is Distinguished Tech Lead for Responsible AI at Facebo...

1 Okt 20201h 19min

Richard Socher — The Challenges of Making ML Work in the Real World

Richard Socher — The Challenges of Making ML Work in the Real World

Richard Socher, ex-Chief Scientist at Salesforce, joins us to talk about The AI Economist, NLP protein generation and biggest challenge in making ML work in the real world. Richard Socher was the Chi...

29 Sep 202050min

Zack Chase Lipton — The Medical Machine Learning Landscape

Zack Chase Lipton — The Medical Machine Learning Landscape

How Zack went from being a musician to professor, how medical applications of Machine Learning are developing, and the challenges of counteracting bias in real world applications. Zachary Chase Lipto...

17 Sep 202059min

Anthony Goldbloom — How to Win Kaggle Competitions

Anthony Goldbloom — How to Win Kaggle Competitions

Anthony Goldbloom is the founder and CEO of Kaggle. In 2011 & 2012, Forbes Magazine named Anthony as one of the 30 under 30 in technology. In 2011, Fast Company featured him as one of the innovative t...

9 Sep 202044min

Suzana Ilić — Cultivating Machine Learning Communities

Suzana Ilić — Cultivating Machine Learning Communities

👩‍💻Today our guest is Suzanah Ilić! Suzanah is a founder of Machine Learning Tokyo which is a nonprofit organization dedicated to democratizing Machine Learning. They are a team of ML Engineers and ...

2 Sep 202034min

Jeremy Howard — The Story of fast.ai and Why Python Is Not the Future of ML

Jeremy Howard — The Story of fast.ai and Why Python Is Not the Future of ML

Jeremy Howard is a founding researcher at fast.ai, a research institute dedicated to making Deep Learning more accessible. Previously, he was the CEO and Founder at Enlitic, an advanced machine learni...

25 Aug 202051min

Populært innen Business og økonomi

lydartikler-fra-aftenposten
stopp-verden
dine-penger-pengeradet
e24-podden
rss-borsmorgen-okonominyhetene
rss-penger-polser-og-politikk
finansredaksjonen
livet-pa-veien-med-jan-erik-larssen
pengepodden-2
utbytte
rss-sunn-okonomi
tid-er-penger-en-podcast-med-peter-warren
pengesnakk
liberal-halvtime
stormkast-med-valebrokk-stordalen
morgenkaffen-med-finansavisen
lederpodden
okonomiamatorene
rss-politisk-preik
rss-markedspuls-2