#73 - YASAMAN RAZEGHI & Prof. SAMEER SINGH - NLP benchmarks

#73 - YASAMAN RAZEGHI & Prof. SAMEER SINGH - NLP benchmarks

Patreon: https://www.patreon.com/mlst

Discord: https://discord.gg/ESrGqhf5CB

YT version: https://youtu.be/RzGaI7vXrkk

This week we speak with Yasaman Razeghi and Prof. Sameer Singh from UC Urvine. Yasaman recently published a paper called Impact of Pretraining Term Frequencies on Few-Shot Reasoning where she demonstrated comprehensively that large language models only perform well on reasoning tasks because they memorise the dataset. For the first time she showed the accuracy was linearly correlated to the occurance rate in the training corpus, something which OpenAI should have done in the first place!

We also speak with Sameer who has been a pioneering force in the area of machine learning interpretability for many years now, he created LIME with Marco Riberio and also had his hands all over the famous Checklist paper and many others.

We also get into the metric obsession in the NLP world and whether metrics are one of the principle reasons why we are failing to make any progress in NLU.

[00:00:00] Impact of Pretraining Term Frequencies on Few-Shot Reasoning

[00:14:59] Metrics

[00:18:55] Definition of reasoning

[00:25:12] Metrics (again)

[00:28:52] On true believers

[00:33:04] Sameers work on model explainability / LIME

[00:36:58] Computational irreducability

[00:41:07] ML DevOps and Checklist

[00:45:58] Future of ML devops

[00:49:34] Thinking about future


Prof. Sameer Singh

https://sameersingh.org/


Yasaman Razeghi

https://yasamanrazeghi.com/


References;


Impact of Pretraining Term Frequencies on Few-Shot Reasoning [Razeghi et al with Singh]

https://arxiv.org/pdf/2202.07206.pdf


Beyond Accuracy: Behavioral Testing of NLP Models with CheckList [Riberio et al with Singh]

https://arxiv.org/pdf/2005.04118.pdf


“Why Should I Trust You?” Explaining the Predictions of Any Classifier (LIME) [Riberio et al with Singh]

https://arxiv.org/abs/1602.04938


Tim interviewing LIME Creator Marco Ribeiro in 2019

https://www.youtube.com/watch?v=6aUU-Ob4a8I


Tim video on LIME/SHAP on his other channel

https://www.youtube.com/watch?v=jhopjN08lTM


Our interview with Christoph Molar

https://www.youtube.com/watch?v=0LIACHcxpHU


Interpretable Machine Learning book @ChristophMolnar

https://christophm.github.io/interpretable-ml-book/


Machine Teaching: A New Paradigm for Building Machine Learning Systems [Simard]

https://arxiv.org/abs/1707.06742


Whimsical notes on machine teaching

https://whimsical.com/machine-teaching-Ntke9EHHSR25yHnsypHnth


Gopher paper (Deepmind)

https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval

https://arxiv.org/pdf/2112.11446.pdf


EleutherAI

https://www.eleuther.ai/

https://github.com/kingoflolz/mesh-transformer-jax/

https://pile.eleuther.ai/


A Theory of Universal Artificial Intelligence based on Algorithmic Complexity [Hutter]

https://arxiv.org/pdf/cs/0004001.pdf







Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(252)

When AI Decides You're a Threat — Brad Carson

When AI Decides You're a Threat — Brad Carson

Brad Carson was the Army's General Counsel, served two terms in Congress and was Acting Under Secretary of Defense for Personnel and Readiness. He now heads Americans for Responsible Innovation, the A...

31 Maj 1h 20min

Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)

Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)

Michael I. Jordan, described by Science magazine as the most influential computer scientist alive, has never thought of himself as an AI researcher. In this conversation he explains why that distincti...

21 Maj 1h 17min

 The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]

The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]

Beth Barnes and David Rein on the one graph that ate the AI timelines discourse, and why the two people who built it are the most careful about how you read it.**SPONSOR**Prolific - Quality data. From...

4 Maj 1h 53min

When AI Discovers The Next Transformer - Robert Lange (Sakana)

When AI Discovers The Next Transformer - Robert Lange (Sakana)

Robert Lange, founding researcher at Sakana AI, joins Tim to discuss *Shinka Evolve* — a framework that combines LLMs with evolutionary algorithms to do open-ended program search. The core claim: syst...

13 Mars 1h 18min

"Vibe Coding is a Slot Machine" - Jeremy Howard

"Vibe Coding is a Slot Machine" - Jeremy Howard

Dive into the realities of AI-assisted coding, the origins of modern fine-tuning, and the cognitive science behind machine learning with fast.ai founder Jeremy Howard. In this episode, we unpack why A...

3 Mars 1h 26min

 Evolution "Doesn't Need" Mutation - Blaise Agüera y Arcas

Evolution "Doesn't Need" Mutation - Blaise Agüera y Arcas

What if life itself is just a really sophisticated computer program that wrote itself into existence?Blaise Agüera y Arcas presenting at ALife 2025 — the most technically detailed public walkthrough o...

16 Feb 55min

VAEs Are Energy-Based Models? [Dr. Jeff Beck]

VAEs Are Energy-Based Models? [Dr. Jeff Beck]

What makes something truly *intelligent?* Is a rock an agent? Could a perfect simulation of your brain actually *be* you? In this fascinating conversation, Dr. Jeff Beck takes us on a journey through ...

25 Jan 46min

Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]

Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]

Professor Mazviita Chirimuuta joins us for a fascinating deep dive into the philosophy of neuroscience and what it really means to understand the mind.*What can neuroscience actually tell us about how...

23 Jan 53min

Populärt inom Teknik

uppgang-och-fall
market-makers
elbilsveckan
rss-laddstationen-med-elbilen-i-sverige
rss-elektrikerpodden
bli-saker-podden
rss-technokratin
natets-morka-sida
developers-mer-an-bara-kod
bilar-med-sladd
skogsforum-podcast
rss-veckans-ai
hej-bruksbil
rss-uppgang-och-fall
rss-it-sakerhetspodden
rss-snacka-om-ai
dom-kallar-oss-krypto
bosse-bildoktorn-och-hasse-p
rss-fabriken-2
rss-powerboat-sverige-podcast