Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs.

They discuss the concept of vibes-based evaluation, which examines not just accuracy but also the style and tone of model responses, and how Chatbot Arena has become a community-driven benchmark for open-source and commercial LLMs. Joseph shares insights on democratizing model evaluation, refining AI-human interactions, and leveraging human preferences to improve model performance. This episode provides a deep dive into the evolving landscape of LLM evaluation and its impact on AI development.

🎙 Get our podcasts on these platforms:

Apple Podcasts: http://wandb.me/apple-podcasts

Spotify: http://wandb.me/spotify

Google: http://wandb.me/gd_google

YouTube: http://wandb.me/youtube


Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb


Join the Weights & Biases Discord Server:

https://discord.gg/CkZKRNnaf3

Avsnitt(134)

Stephan Fabel — Efficient Supercomputing with NVIDIA's Base Command Platform

Stephan Fabel — Efficient Supercomputing with NVIDIA's Base Command Platform

Stephan Fabel is Senior Director of Infrastructure Systems & Software at NVIDIA, where he works on Base Command, a software platform to coordinate access to NVIDIA's DGX SuperPOD infrastructure.Lukas ...

6 Jan 202252min

Chris Padwick — Smart Machines for More Sustainable Farming

Chris Padwick — Smart Machines for More Sustainable Farming

Chris Padwick is Director of Computer Vision Machine Learning at Blue River Technology, a subsidiary of John Deere. Their core product, See & Spray, is a weeding robot that identifies crops and weeds ...

23 Dec 20211h

Kathryn Hume — Financial Models, ML, and 17th-Century Philosophy

Kathryn Hume — Financial Models, ML, and 17th-Century Philosophy

Kathryn Hume is Vice President Digital Investments Technology at the Royal Bank of Canada (RBC). At the time of recording, she was Interim Head of Borealis AI, RBC's research institute for machine lea...

16 Dec 202152min

Sean & Greg — Biology and ML for Drug Discovery

Sean & Greg — Biology and ML for Drug Discovery

Sean McClain is the founder and CEO, and Gregory Hannum is the VP of AI Research at Absci, a biotech company that's using deep learning to expedite drug discovery and development.Lukas, Sean, and Greg...

2 Dec 202155min

Chris, Shawn, and Lukas — The Weights & Biases Journey

Chris, Shawn, and Lukas — The Weights & Biases Journey

You might know him as the host of Gradient Dissent, but Lukas is also the CEO of Weights & Biases, a developer-first ML tools platform!In this special episode, the three W&B co-founders — Chris (CVP),...

5 Nov 202149min

Pete Warden — Practical Applications of TinyML

Pete Warden — Practical Applications of TinyML

Pete is the Technical Lead of the TensorFlow Micro team, which works on deep learning for mobile and embedded devices.Lukas and Pete talk about hacking a Raspberry Pi to run AlexNet, the power and siz...

21 Okt 202153min

Pieter Abbeel — Robotics, Startups, and Robotics Startups

Pieter Abbeel — Robotics, Startups, and Robotics Startups

Pieter is the Chief Scientist and Co-founder at Covariant, where his team is building universal AI for robotic manipulation. Pieter also hosts The Robot Brains Podcast, in which he explores how far hu...

7 Okt 202157min

Chris Albon — ML Models and Infrastructure at Wikimedia

Chris Albon — ML Models and Infrastructure at Wikimedia

In this episode we're joined by Chris Albon, Director of Machine Learning at the Wikimedia Foundation.Lukas and Chris talk about Wikimedia's approach to content moderation, what it's like to work in a...

23 Sep 202156min

Populärt inom Business & ekonomi

framgangspodden
varvet
badfluence
rss-jossan-nina
rss-borsens-finest
avanzapodden
svd-tech-brief
rss-svart-marknad
uppgang-och-fall
fill-or-kill
rss-dagen-med-di
borsmorgon
kapitalet-en-podd-om-ekonomi
affarsvarlden
rss-kort-lang-analyspodden-fran-di
tabberaset
lastbilspodden
24fragor
bathina-en-podcast
borslunch-2