Ryan Greenblatt - Solving ARC with GPT4o

Ryan Greenblatt - Solving ARC with GPT4o

Ryan Greenblatt from Redwood Research recently published "Getting 50% on ARC-AGI with GPT-4.0," where he used GPT4o to reach a state-of-the-art accuracy on Francois Chollet's ARC Challenge by generating many Python programs.


Sponsor:

Sign up to Kalshi here https://kalshi.onelink.me/1r91/mlst -- the first 500 traders who deposit $100 will get a free $20 credit! Important disclaimer - In case it's not obvious - this is basically gambling and a *high risk* activity - only trade what you can afford to lose.


We discuss:

- Ryan's unique approach to solving the ARC Challenge and achieving impressive results.

- The strengths and weaknesses of current AI models.

- How AI and humans differ in learning and reasoning.

- Combining various techniques to create smarter AI systems.

- The potential risks and future advancements in AI, including the idea of agentic AI.


https://x.com/RyanPGreenblatt

https://www.redwoodresearch.org/



Refs:

Getting 50% (SoTA) on ARC-AGI with GPT-4o [Ryan Greenblatt]

https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt


On the Measure of Intelligence [Chollet]

https://arxiv.org/abs/1911.01547


Connectionism and Cognitive Architecture: A Critical Analysis [Jerry A. Fodor and Zenon W. Pylyshyn]

https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/proseminars/Proseminar13/ConnectionistArchitecture.pdf


Software 2.0 [Andrej Karpathy]

https://karpathy.medium.com/software-2-0-a64152b37c35


Why Greatness Cannot Be Planned: The Myth of the Objective [Kenneth Stanley]

https://amzn.to/3Wfy2E0


Biographical account of Terence Tao’s mathematical development. [M.A.(KEN) CLEMENTS]

https://gwern.net/doc/iq/high/smpy/1984-clements.pdf


Model Evaluation and Threat Research (METR)

https://metr.org/


Why Tool AIs Want to Be Agent AIs

https://gwern.net/tool-ai


Simulators - Janus

https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators


AI Control: Improving Safety Despite Intentional Subversion

https://www.lesswrong.com/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion

https://arxiv.org/abs/2312.06942


What a Compute-Centric Framework Says About Takeoff Speeds

https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/


Global GDP over the long run

https://ourworldindata.org/grapher/global-gdp-over-the-long-run?yScale=log


Safety Cases: How to Justify the Safety of Advanced AI Systems

https://arxiv.org/abs/2403.10462


The Danger of a “Safety Case"

http://sunnyday.mit.edu/The-Danger-of-a-Safety-Case.pdf


The Future Of Work Looks Like A UPS Truck (~02:15:50)

https://www.npr.org/sections/money/2014/05/02/308640135/episode-536-the-future-of-work-looks-like-a-ups-truck


SWE-bench

https://www.swebench.com/


Using DeepSpeed and Megatron to Train Megatron-Turing NLG

530B, A Large-Scale Generative Language Model

https://arxiv.org/pdf/2201.11990


Algorithmic Progress in Language Models

https://epochai.org/blog/algorithmic-progress-in-language-models

Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(252)

When AI Decides You're a Threat — Brad Carson

When AI Decides You're a Threat — Brad Carson

Brad Carson was the Army's General Counsel, served two terms in Congress and was Acting Under Secretary of Defense for Personnel and Readiness. He now heads Americans for Responsible Innovation, the A...

31 Maj 1h 20min

Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)

Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)

Michael I. Jordan, described by Science magazine as the most influential computer scientist alive, has never thought of himself as an AI researcher. In this conversation he explains why that distincti...

21 Maj 1h 17min

 The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]

The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]

Beth Barnes and David Rein on the one graph that ate the AI timelines discourse, and why the two people who built it are the most careful about how you read it.**SPONSOR**Prolific - Quality data. From...

4 Maj 1h 53min

When AI Discovers The Next Transformer - Robert Lange (Sakana)

When AI Discovers The Next Transformer - Robert Lange (Sakana)

Robert Lange, founding researcher at Sakana AI, joins Tim to discuss *Shinka Evolve* — a framework that combines LLMs with evolutionary algorithms to do open-ended program search. The core claim: syst...

13 Mars 1h 18min

"Vibe Coding is a Slot Machine" - Jeremy Howard

"Vibe Coding is a Slot Machine" - Jeremy Howard

Dive into the realities of AI-assisted coding, the origins of modern fine-tuning, and the cognitive science behind machine learning with fast.ai founder Jeremy Howard. In this episode, we unpack why A...

3 Mars 1h 26min

 Evolution "Doesn't Need" Mutation - Blaise Agüera y Arcas

Evolution "Doesn't Need" Mutation - Blaise Agüera y Arcas

What if life itself is just a really sophisticated computer program that wrote itself into existence?Blaise Agüera y Arcas presenting at ALife 2025 — the most technically detailed public walkthrough o...

16 Feb 55min

VAEs Are Energy-Based Models? [Dr. Jeff Beck]

VAEs Are Energy-Based Models? [Dr. Jeff Beck]

What makes something truly *intelligent?* Is a rock an agent? Could a perfect simulation of your brain actually *be* you? In this fascinating conversation, Dr. Jeff Beck takes us on a journey through ...

25 Jan 46min

Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]

Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]

Professor Mazviita Chirimuuta joins us for a fascinating deep dive into the philosophy of neuroscience and what it really means to understand the mind.*What can neuroscience actually tell us about how...

23 Jan 53min

Populärt inom Teknik

uppgang-och-fall
market-makers
elbilsveckan
rss-laddstationen-med-elbilen-i-sverige
rss-elektrikerpodden
bli-saker-podden
rss-technokratin
natets-morka-sida
developers-mer-an-bara-kod
bilar-med-sladd
skogsforum-podcast
rss-veckans-ai
hej-bruksbil
rss-uppgang-och-fall
rss-it-sakerhetspodden
rss-snacka-om-ai
dom-kallar-oss-krypto
bosse-bildoktorn-och-hasse-p
rss-fabriken-2
rss-powerboat-sverige-podcast