
Chris Albon — ML Models and Infrastructure at Wikimedia
In this episode we're joined by Chris Albon, Director of Machine Learning at the Wikimedia Foundation.Lukas and Chris talk about Wikimedia's approach to content moderation, what it's like to work in a place so transparent that even internal chats are public, how Wikimedia uses machine learning (spoiler: they do a lot of models to help editors), and why they're switching to Kubeflow and Docker. Chris also shares how his focus on outcomes has shaped his career and his approach to technical interviews.Show notes: http://wandb.me/gd-chris-albon---Connect with Chris:- Twitter: https://twitter.com/chrisalbon- Website: https://chrisalbon.com/---Timestamps: 0:00 Intro1:08 How Wikimedia approaches moderation9:55 Working in the open and embracing humility16:08 Going down Wikipedia rabbit holes20:03 How Wikimedia uses machine learning27:38 Wikimedia's ML infrastructure42:56 How Chris got into machine learning46:43 Machine Learning Flashcards and technical interviews52:10 Low-power models and MLOps55:58 Outro
23 Sep 202156min

Emily M. Bender — Language Models and Linguistics
In this episode, Emily and Lukas dive into the problems with bigger and bigger language models, the difference between form and meaning, the limits of benchmarks, and why it's important to name the languages we study.Show notes (links to papers and transcript): http://wandb.me/gd-emily-m-bender---Emily M. Bender is a Professor of Linguistics at and Faculty Director of the Master's Program in Computational Linguistics at University of Washington. Her research areas include multilingual grammar engineering, variation (within and across languages), the relationship between linguistics and computational linguistics, and societal issues in NLP.---Timestamps:0:00 Sneak peek, intro1:03 Stochastic Parrots9:57 The societal impact of big language models16:49 How language models can be harmful26:00 The important difference between linguistic form and meaning34:40 The octopus thought experiment42:11 Language acquisition and the future of language models49:47 Why benchmarks are limited54:38 Ways of complementing benchmarks1:01:20 The #BenderRule1:03:50 Language diversity and linguistics1:12:49 Outro
9 Sep 20211h 12min

Jeff Hammerbacher — From data science to biomedicine
Jeff talks about building Facebook's early data team, founding Cloudera, and transitioning into biomedicine with Hammer Lab and Related Sciences.(Read more: http://wandb.me/gd-jeff-hammerbacher)---Jeff Hammerbacher is a scientist, software developer, entrepreneur, and investor. Jeff's current work focuses on drug discovery at Related Sciences, a biotech venture creation firm that he co-founded in 2020.Prior to his work at Related Sciences, Jeff was the Principal Investigator of Hammer Lab, a founder and the Chief Scientist of Cloudera, an Entrepreneur-in-Residence at Accel, and the manager of the Data team at Facebook.---Follow Gradient Dissent on Twitter: https://twitter.com/weights_biases---0:00 Sneak peek, intro1:13 The start of Facebook's data science team6:53 Facebook's early tech stack14:20 Early growth strategies at Facebook17:37 The origin story of Cloudera24:51 Cloudera's success, in retrospect31:05 Jeff's transition into biomedicine38:38 Immune checkpoint blockade in cancer therapy48:55 Data and techniques for biomedicine53:00 Why Jeff created Related Sciences56:32 Outro
26 Aug 202156min

Josh Bloom — The Link Between Astronomy and ML
Josh explains how astronomy and machine learning have informed each other, their current limitations, and where their intersection goes from here. (Read more: http://wandb.me/gd-josh-bloom)---Josh is a Professor of Astronomy and Chair of the Astronomy Department at UC Berkeley. His research interests include the intersection of machine learning and physics, time-domain transients events, artificial intelligence, and optical/infared instrumentation.---Follow Gradient Dissent on Twitter: https://twitter.com/weights_biases---0:00 Intro, sneak peek1:15 How astronomy has informed ML4:20 The big questions in astronomy today10:15 On dark matter and dark energy16:37 Finding life on other planets19:55 Driving advancements in astronomy27:05 Putting telescopes in space31:05 Why Josh started using ML in his research33:54 Crowdsourcing in astronomy36:20 How ML has (and hasn't) informed astronomy47:22 The next generation of cross-functional grad students50:50 How Josh started coding56:11 Incentives and maintaining research codebases1:00:01 ML4Science's tech stack1:02:11 Uncertainty quantification in a sensor-based world1:04:28 Why it's not good to always get an answer1:07:47 Outro
20 Aug 20211h 8min

Xavier Amatriain — Building AI-powered Primary Care
Xavier shares his experience deploying healthcare models, augmenting primary care with AI, the challenges of "ground truth" in medicine, and robustness in ML. --- Xavier Amatriain is co-founder and CTO of Curai, an ML-based primary care chat system. Previously, he was VP of Engineering at Quora, and Research/Engineering Director at Neflix, where he started and led the Algorithms team responsible for Netflix's recommendation systems. --- ⏳ Timestamps: 0:00 Sneak peak, intro 0:49 What is Curai? 5:48 The role of AI within Curai 8:44 Why Curai keeps humans in the loop 15:00 Measuring diagnostic accuracy 18:53 Patient safety 22:39 Different types of models at Curai 25:42 Using GPT-3 to generate training data 32:13 How Curai monitors and debugs models 35:19 Model explainability 39:27 Robustness in ML 45:52 Connecting metrics to impact 49:32 Outro 🌟 Show notes: - http://wandb.me/gd-xavier-amatriain - Transcription of the episode - Links to papers, projects, and people --- Follow us on Twitter! 📍 https://twitter.com/wandb_gd Get our podcast on these platforms: 👉 Apple Podcasts: http://wandb.me/apple-podcasts 👉 Spotify: http://wandb.me/spotify 👉 Google Podcasts: http://wandb.me/google-podcasts 👉 YouTube: http://wandb.me/youtube 👉 Soundcloud: http://wandb.me/soundcloud
30 Jul 202150min

Spence Green — Enterprise-scale Machine Translation
Spence shares his experience creating a product around human-in-the-loop machine translation, and explains how machine translation has evolved over the years. --- Spence Green is co-founder and CEO of Lilt, an AI-powered language translation platform. Lilt combines human translators and machine translation in order to produce high-quality translations more efficiently. --- 🌟 Show notes: - http://wandb.me/gd-spence-green - Transcription of the episode - Links to papers, projects, and people ⏳ Timestamps: 0:00 Sneak peak, intro 0:45 The story behind Lilt 3:08 Statistical MT vs neural MT 6:30 Domain adaptation and personalized models 8:00 The emergence of neural MT and development of Lilt 13:09 What success looks like for Lilt 18:20 Models that self-correct for gender bias 19:39 How Lilt runs its models in production 26:33 How far can MT go? 29:55 Why Lilt cares about human-computer interaction 35:04 Bilingual grammatical error correction 37:18 Human parity in MT 39:41 The unexpected challenges of prototype to production --- Get our podcast on these platforms: 👉 Apple Podcasts: http://wandb.me/apple-podcasts 👉 Spotify: http://wandb.me/spotify 👉 Google Podcasts: http://wandb.me/google-podcasts 👉 YouTube: http://wandb.me/youtube 👉 Soundcloud: http://wandb.me/soundcloud Join our community of ML practitioners where we host AMAs, share interesting projects and meet other people working in Deep Learning: http://wandb.me/slack Check out Fully Connected, which features curated machine learning reports by researchers exploring deep learning techniques, Kagglers showcasing winning models, industry leaders sharing best practices, and more: https://wandb.ai/fully-connected
16 Jul 202143min

Roger & DJ — The Rise of Big Data and CA's COVID-19 Response
Roger and DJ share some of the history behind data science as we know it today, and reflect on their experiences working on California's COVID-19 response. --- Roger Magoulas is Senior Director of Data Strategy at Astronomer, where he works on data infrastructure, analytics, and community development. Previously, he was VP of Research at O'Reilly and co-chair of O'Reilly's Strata Data and AI Conference. DJ Patil is a board member and former CTO of Devoted Health, a healthcare company for seniors. He was also Chief Data Scientist under the Obama administration and the Head of Data Science at LinkedIn. Roger and DJ recently volunteered for the California COVID-19 response, and worked with data to understand case counts, bed capacities and the impact of intervention. Connect with Roger and DJ: 📍 Roger's Twitter: https://twitter.com/rogerm 📍 DJ's Twitter: https://twitter.com/dpatil --- 🌟 Transcript: http://wandb.me/gd-roger-and-dj 🌟 ⏳ Timestamps: 0:00 Sneak peek, intro 1:03 Coining the terms "big data" and "data scientist" 7:12 The rise of data science teams 15:28 Big Data, Hadoop, and Spark 23:10 The importance of using the right tools 29:20 BLUF: Bottom Line Up Front 34:44 California's COVID response 41:21 The human aspects of responding to COVID 48:33 Reflecting on the impact of COVID interventions 57:06 Advice on doing meaningful data science work 1:04:18 Outro 🍀 Links: 1. "MapReduce: Simplified Data Processing on Large Clusters" (Dean and Ghemawat, 2004): https://research.google/pubs/pub62/ 2. "Big Data: Technologies and Techniques for Large-Scale Data" (Magoulas and Lorica, 2009): https://academics.uccs.edu/~ooluwada/courses/datamining/ExtraReading/BigData 3. The O'RLY book covers: https://www.businessinsider.com/these-hilarious-memes-perfectly-capture-what-its-like-to-work-in-tech-2016-4 4. "The Premonition" (Lewis, 2021): https://www.npr.org/2021/05/03/991570372/michael-lewis-the-premonition-is-a-sweeping-indictment-of-the-cdc 5. Why California's beaches are glowing with bioluminescence: https://www.youtube.com/watch?v=AVYSr19ReOs 6. 7. Sturgis Motorcyle Rally: https://en.wikipedia.org/wiki/Sturgis_Motorcycle_Rally --- Get our podcast on these platforms: 👉 Apple Podcasts: http://wandb.me/apple-podcasts 👉 Spotify: http://wandb.me/spotify 👉 Google Podcasts: http://wandb.me/google-podcasts 👉 YouTube: http://wandb.me/youtube 👉 Soundcloud: http://wandb.me/soundcloud Join our community of ML practitioners where we host AMAs, share interesting projects and meet other people working in Deep Learning: http://wandb.me/slack Check out Fully Connected, which features curated machine learning reports by researchers exploring deep learning techniques, Kagglers showcasing winning models, industry leaders sharing best practices, and more: https://wandb.ai/fully-connected
8 Jul 20211h 4min

Amelia & Filip — How Pandora Deploys ML Models into Production
Amelia and Filip give insights into the recommender systems powering Pandora, from developing models to balancing effectiveness and efficiency in production. --- Amelia Nybakke is a Software Engineer at Pandora. Her team is responsible for the production system that serves models to listeners. Filip Korzeniowski is a Senior Scientist at Pandora working on recommender systems. Before that, he was a PhD student working on deep neural networks for acoustic and language modeling applied to musical audio recordings. Connect with Amelia and Filip: 📍 Amelia's LinkedIn: https://www.linkedin.com/in/amelia-nybakke-60bba5107/ 📍 Filip's LinkedIn: https://www.linkedin.com/in/filip-korzeniowski-28b33815a/ --- ⏳ Timestamps: 0:00 Sneak peek, intro 0:42 What type of ML models are at Pandora? 3:39 What makes two songs similar or not similar? 7:33 Improving models and A/B testing 8:52 Chaining, retraining, versioning, and tracking models 13:29 Useful development tools 15:10 Debugging models 18:28 Communicating progress 20:33 Tuning and improving models 23:08 How Pandora puts models into production 29:45 Bias in ML models 36:01 Repetition vs novelty in recommended songs 38:01 The bottlenecks of deployment 🌟 Transcript: http://wandb.me/gd-amelia-and-filip 🌟 Links: 📍 Amelia's "Women's History Month" playlist: https://www.pandora.com/playlist/PL:1407374934299927:100514833 --- Get our podcast on these platforms: 👉 Apple Podcasts: http://wandb.me/apple-podcasts 👉 Spotify: http://wandb.me/spotify 👉 Google Podcasts: http://wandb.me/google-podcasts 👉 YouTube: http://wandb.me/youtube 👉 Soundcloud: http://wandb.me/soundcloud Join our community of ML practitioners where we host AMAs, share interesting projects and meet other people working in Deep Learning: http://wandb.me/slack Check out Fully Connected, which features curated machine learning reports by researchers exploring deep learning techniques, Kagglers showcasing winning models, industry leaders sharing best practices, and more: https://wandb.ai/fully-connected
1 Jul 202140min