Machine Learning Street Talk (MLST)12 Helmi 2025

Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

Daniel Franzen and Jan Disselhoff, the "ARChitects" are the official winners of the ARC Prize 2024. Filmed at Tufa Labs in Zurich - they revealed how they achieved a remarkable 53.5% accuracy by creatively utilising large language models (LLMs) in new ways. Discover their innovative techniques, including depth-first search for token selection, test-time training, and a novel augmentation-based validation system. Their results were extremely surprising.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

Jan Disselhoff

https://www.linkedin.com/in/jan-disselhoff-1423a2240/

Daniel Franzen

https://github.com/da-fr

ARC Prize: http://arcprize.org/

TRANSCRIPT AND BACKGROUND READING:

https://www.dropbox.com/scl/fi/utkn2i1ma79fn6an4yvjw/ARCHitects.pdf?rlkey=67pe38mtss7oyhjk2ad0d2aza&dl=0

TOC

1. Solution Architecture and Strategy Overview

[00:00:00] 1.1 Initial Solution Overview and Model Architecture

[00:04:25] 1.2 LLM Capabilities and Dataset Approach

[00:10:51] 1.3 Test-Time Training and Data Augmentation Strategies

[00:14:08] 1.4 Sampling Methods and Search Implementation

[00:17:52] 1.5 ARC vs Language Model Context Comparison

2. LLM Search and Model Implementation

[00:21:53] 2.1 LLM-Guided Search Approaches and Solution Validation

[00:27:04] 2.2 Symmetry Augmentation and Model Architecture

[00:30:11] 2.3 Model Intelligence Characteristics and Performance

[00:37:23] 2.4 Tokenization and Numerical Processing Challenges

3. Advanced Training and Optimization

[00:45:15] 3.1 DFS Token Selection and Probability Thresholds

[00:49:41] 3.2 Model Size and Fine-tuning Performance Trade-offs

[00:53:07] 3.3 LoRA Implementation and Catastrophic Forgetting Prevention

[00:56:10] 3.4 Training Infrastructure and Optimization Experiments

[01:02:34] 3.5 Search Tree Analysis and Entropy Distribution Patterns

REFS

[00:01:05] Winning ARC 2024 solution using 12B param model, Franzen, Disselhoff, Hartmann

https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf

[00:03:40] Robustness of analogical reasoning in LLMs, Melanie Mitchell

https://arxiv.org/html/2411.14215

[00:07:50] Re-ARC dataset generator for ARC task variations, Michael Hodel

https://github.com/michaelhodel/re-arc

[00:15:00] Analysis of search methods in LLMs (greedy, beam, DFS), Chen et al.

https://arxiv.org/html/2408.00724v2

[00:16:55] Language model reachability space exploration, University of Toronto

https://www.youtube.com/watch?v=Bpgloy1dDn0

[00:22:30] GPT-4 guided code solutions for ARC tasks, Ryan Greenblatt

https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

[00:41:20] GPT tokenization approach for numbers, OpenAI

https://platform.openai.com/docs/guides/text-generation/tokenizer-examples

[00:46:25] DFS in AI search strategies, Russell & Norvig

https://www.amazon.com/Artificial-Intelligence-Modern-Approach-4th/dp/0134610997

[00:53:10] Paper on catastrophic forgetting in neural networks, Kirkpatrick et al.

https://www.pnas.org/doi/10.1073/pnas.1611835114

[00:54:00] LoRA for efficient fine-tuning of LLMs, Hu et al.

https://arxiv.org/abs/2106.09685

[00:57:20] NVIDIA H100 Tensor Core GPU specs, NVIDIA

https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/

[01:04:55] Original MCTS in computer Go, Yifan Jin

https://stanford.edu/~rezab/classes/cme323/S15/projects/montecarlo_search_tree_report.pdf

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(252)

When AI Decides You're a Threat — Brad Carson

Brad Carson was the Army's General Counsel, served two terms in Congress and was Acting Under Secretary of Defense for Personnel and Readiness. He now heads Americans for Responsible Innovation, the A...

31 Touko 1h 20min

Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)

Michael I. Jordan, described by Science magazine as the most influential computer scientist alive, has never thought of himself as an AI researcher. In this conversation he explains why that distincti...

21 Touko 1h 17min

The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]

Beth Barnes and David Rein on the one graph that ate the AI timelines discourse, and why the two people who built it are the most careful about how you read it.**SPONSOR**Prolific - Quality data. From...

4 Touko 1h 53min

When AI Discovers The Next Transformer - Robert Lange (Sakana)

Robert Lange, founding researcher at Sakana AI, joins Tim to discuss *Shinka Evolve* — a framework that combines LLMs with evolutionary algorithms to do open-ended program search. The core claim: syst...

13 Maalis 1h 18min

"Vibe Coding is a Slot Machine" - Jeremy Howard

Dive into the realities of AI-assisted coding, the origins of modern fine-tuning, and the cognitive science behind machine learning with fast.ai founder Jeremy Howard. In this episode, we unpack why A...

3 Maalis 1h 26min

Evolution "Doesn't Need" Mutation - Blaise Agüera y Arcas

What if life itself is just a really sophisticated computer program that wrote itself into existence?Blaise Agüera y Arcas presenting at ALife 2025 — the most technically detailed public walkthrough o...

16 Helmi 55min

VAEs Are Energy-Based Models? [Dr. Jeff Beck]

What makes something truly *intelligent?* Is a rock an agent? Could a perfect simulation of your brain actually *be* you? In this fascinating conversation, Dr. Jeff Beck takes us on a journey through ...

25 Tammi 46min

Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]

Professor Mazviita Chirimuuta joins us for a fascinating deep dive into the philosophy of neuroscience and what it really means to understand the mind.*What can neuroscience actually tell us about how...

23 Tammi 53min

Kaikki yhdessä sovelluksessa

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi yhdessä paikassa.

Sinulle valikoitua sisältöä

Podme-sovelluksessa kokoat suosikkisi helposti omaan kirjastoosi. Saat meiltä myös kuuntelusuosituksia!

Jatka kuuntelua koska tahansa

Voit jatkaa siitä mihin jäit, myös offline-tilassa.

Tarinat ja äänet, joita rakastat kuunnella

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi

Lue lisää