Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs.

They discuss the concept of vibes-based evaluation, which examines not just accuracy but also the style and tone of model responses, and how Chatbot Arena has become a community-driven benchmark for open-source and commercial LLMs. Joseph shares insights on democratizing model evaluation, refining AI-human interactions, and leveraging human preferences to improve model performance. This episode provides a deep dive into the evolving landscape of LLM evaluation and its impact on AI development.

🎙 Get our podcasts on these platforms:

Apple Podcasts: http://wandb.me/apple-podcasts

Spotify: http://wandb.me/spotify

Google: http://wandb.me/gd_google

YouTube: http://wandb.me/youtube


Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb


Join the Weights & Biases Discord Server:

https://discord.gg/CkZKRNnaf3

Episoder(134)

Piero Molino — The Secret Behind Building Successful Open Source Projects

Piero Molino — The Secret Behind Building Successful Open Source Projects

Piero shares the story of how Ludwig was created, as well as the ins and outs of how Ludwig works and the future of machine learning with no code. Piero is a Staff Research Scientist in the Hazy Rese...

11 Feb 202136min

Rosanne Liu — Conducting Fundamental ML Research as a Nonprofit

Rosanne Liu — Conducting Fundamental ML Research as a Nonprofit

How Rosanne is working to democratize AI research and improve diversity and fairness in the field through starting a non-profit after being a founding member of Uber AI Labs, doing lots of amazing res...

5 Feb 202149min

Sean Gourley — NLP, National Defense, and Establishing Ground Truth

Sean Gourley — NLP, National Defense, and Establishing Ground Truth

In this episode of Gradient Dissent, Primer CEO Sean Gourley and Lukas Biewald sit down to talk about NLP, working with vast amounts of information, and how crucially it relates to national defense. T...

28 Jan 202147min

Peter Wang — Anaconda, Python, and Scientific Computing

Peter Wang — Anaconda, Python, and Scientific Computing

Peter Wang talks about his journey of being the CEO of and co-founding Anaconda, his perspective on the Python programming language, and its use for scientific computing. Peter Wang has been developi...

22 Jan 202150min

Chris Anderson — Robocars, Drones, and WIRED Magazine

Chris Anderson — Robocars, Drones, and WIRED Magazine

Chris shares his journey starting from playing in R.E.M, becoming interested in physics to leading WIRED Magazine for 11 years. His robot fascination lead to starting a company that manufactures drone...

14 Jan 20211h 3min

Adrien Treuille — Building Blazingly Fast Tools That People Love

Adrien Treuille — Building Blazingly Fast Tools That People Love

Adrien shares his journey from making games that advance science (Eterna, Foldit) to creating a Streamlit, an open-source app framework enabling ML/Data practitioners to easily build powerful and inte...

4 Des 202045min

Peter Norvig – Singularity Is in the Eye of the Beholder

Peter Norvig – Singularity Is in the Eye of the Beholder

We're thrilled to have Peter Norvig join us to talk about the evolution of deep learning, his industry-defining book, his work at Google, and what he thinks the future holds for machine learning resea...

20 Nov 202047min

Robert Nishihara — The State of Distributed Computing in ML

Robert Nishihara — The State of Distributed Computing in ML

The story of Ray and what lead Robert to go from reinforcement learning researcher to creating open-source tools for machine learning and beyond Robert is currently working on Ray, a high-performance...

13 Nov 202035min

Populært innen Business og økonomi

lydartikler-fra-aftenposten
stopp-verden
dine-penger-pengeradet
e24-podden
rss-penger-polser-og-politikk
rss-borsmorgen-okonominyhetene
pengesnakk
tid-er-penger-en-podcast-med-peter-warren
finansredaksjonen
pengepodden-2
livet-pa-veien-med-jan-erik-larssen
utbytte
rss-sunn-okonomi
morgenkaffen-med-finansavisen
stormkast-med-valebrokk-stordalen
liberal-halvtime
lederpodden
rss-markedspuls-2
okonomiamatorene
rss-politisk-preik