Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs.

They discuss the concept of vibes-based evaluation, which examines not just accuracy but also the style and tone of model responses, and how Chatbot Arena has become a community-driven benchmark for open-source and commercial LLMs. Joseph shares insights on democratizing model evaluation, refining AI-human interactions, and leveraging human preferences to improve model performance. This episode provides a deep dive into the evolving landscape of LLM evaluation and its impact on AI development.

🎙 Get our podcasts on these platforms:

Apple Podcasts: http://wandb.me/apple-podcasts

Spotify: http://wandb.me/spotify

Google: http://wandb.me/gd_google

YouTube: http://wandb.me/youtube

Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Join the Weights & Biases Discord Server:

https://discord.gg/CkZKRNnaf3

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(136)

Adrien Treuille — Building Blazingly Fast Tools That People Love

Adrien Treuille — Building Blazingly Fast Tools That People Love

Adrien shares his journey from making games that advance science (Eterna, Foldit) to creating a Streamlit, an open-source app framework enabling ML/Data practitioners to easily build powerful and inte...

4 Joulu 202045min

Peter Norvig – Singularity Is in the Eye of the Beholder

Peter Norvig – Singularity Is in the Eye of the Beholder

We're thrilled to have Peter Norvig join us to talk about the evolution of deep learning, his industry-defining book, his work at Google, and what he thinks the future holds for machine learning resea...

20 Marras 202047min

Robert Nishihara — The State of Distributed Computing in ML

Robert Nishihara — The State of Distributed Computing in ML

The story of Ray and what lead Robert to go from reinforcement learning researcher to creating open-source tools for machine learning and beyondRobert is currently working on Ray, a high-performance d...

13 Marras 202035min

Ines & Sofie — Building Industrial-Strength NLP Pipelines

Ines & Sofie — Building Industrial-Strength NLP Pipelines

Sofie and Ines walk us through how the new spaCy library helps build end to end SOTA natural language processing workflows.Ines Montani is the co-founder of Explosion AI, a digital studio specializing...

29 Loka 202058min

Daeil Kim — The Unreasonable Effectiveness of Synthetic Data

Daeil Kim — The Unreasonable Effectiveness of Synthetic Data

Supercharging computer vision model performance by generating years of training data in minutes.Daeil Kim is the co-founder and CEO of AI.Reverie(https://aireverie.com/), a startup that specializes in...

15 Loka 202037min

Joaquin Candela — Definitions of Fairness

Joaquin Candela — Definitions of Fairness

Joaquin chats about scaling and democratizing AI at Facebook, while understanding fairness and algorithmic bias.---Joaquin Quiñonero Candela is Distinguished Tech Lead for Responsible AI at Facebook, ...

1 Loka 20201h 19min

Richard Socher — The Challenges of Making ML Work in the Real World

Richard Socher — The Challenges of Making ML Work in the Real World

Richard Socher, ex-Chief Scientist at Salesforce, joins us to talk about The AI Economist, NLP protein generation and biggest challenge in making ML work in the real world.Richard Socher was the Chief...

29 Syys 202050min

Zack Chase Lipton — The Medical Machine Learning Landscape

Zack Chase Lipton — The Medical Machine Learning Landscape

How Zack went from being a musician to professor, how medical applications of Machine Learning are developing, and the challenges of counteracting bias in real world applications.Zachary Chase Lipton ...

17 Syys 202059min

Suosittua kategoriassa Liike-elämä ja talous

sijotuskasti
psykopodiaa-podcast
rss-rahapodi
rss-oivalluksia-rahasta-elamasta
mimmit-sijoittaa
rss-rahamania
rss-startup-ministerio
rss-sami-miettinen-neuvottelija
hyva-paha-johtaminen
asuntoasiaa-paivakirjat
ostan-asuntoja-podcast
rahapuhetta
pomojen-suusta
sijoituspodi
juristipodi
rss-uskalla-yrittaa
rss-lahtijat
rss-bisnesta-bebeja
rss-karon-grilli
rss-seuraava-potilas