Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(538)

Why I Believe in SOTA Models Over Custom Ones

Why I Believe in SOTA Models Over Custom Ones

I think the future is cheaper and Open Source SOTA models combined with context, not custom, narrow models.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy in...

11 Mars 1min

AI Quality Inversion

AI Quality Inversion

A troubling thought about what we will think about high-quality content in the future. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

6 Mars 1min

The Great Transition

The Great Transition

There are a bunch of different transitions happening right now—all at the same time, all (I think) heading in the same direction. Here is a long-form exploration of the various pieces.Become a Member:...

28 Feb 1h 24min

Starting 2026

Starting 2026

A welcome back and early entry into 2026. Sponsored by: Knocknoc!Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

30 Jan 25min

Judge AI based on Output, Not Mechanism

Judge AI based on Output, Not Mechanism

How we can use an output-based system to judge whether or not different kinds of technology achieve understanding or intelligence. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com...

22 Nov 20256min

Humans Need Entropy

Humans Need Entropy

How humans and AI models both share the weakness of deterioration without novel inputs. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

16 Nov 20254min

Why I Think Karpathy is Wrong on the AGI Timeline

Why I Think Karpathy is Wrong on the AGI Timeline

Karpathy is confusing LLM limitations with AI system limitations, and that makes all the difference. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy informat...

20 Okt 20259min

Novelty Exploration vs. Pattern Exploitation

Novelty Exploration vs. Pattern Exploitation

How going from exploration to exploitation can help you as both a consumer and creator of everything.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy informat...

15 Okt 20253min

Populärt inom Teknik

uppgang-och-fall
elbilsveckan
market-makers
natets-morka-sida
rss-elektrikerpodden
skogsforum-podcast
rss-laddstationen-med-elbilen-i-sverige
rss-veckans-ai
rss-uppgang-och-fall
rss-technokratin
bilar-med-sladd
bli-saker-podden
developers-mer-an-bara-kod
hej-bruksbil
rss-digitala-influencer-podden
teknikveckan
ai-sweden-podcast
rss-fabriken-2
rss-snacka-om-ai
rss-powerboat-sverige-podcast