Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

NO. 369 | Reddit Hack, Deepfake Scams, Embracing Change…

NO. 369 | Reddit Hack, Deepfake Scams, Embracing Change…

NO. 369 | Reddit Hack, Deepfake Scams, Embracing Change… Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

13 Feb 202322min

NO. 368 | China Balloons, CustomGPT, 90s++…

NO. 368 | China Balloons, CustomGPT, 90s++…

NO. 368 | China Balloons, CustomGPT, 90s++…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

6 Feb 202318min

NO. 367 | Hive Ransom, Anti-Google, Software 2.0…

NO. 367 | Hive Ransom, Anti-Google, Software 2.0…

NO. 367 | Hive Ransom, Anti-Google, Software 2.0… The FBI infiltrated the HIVE ransomware group, stopping over $130 million in ransomware attacks Riot had the League of Legends source code stolen by a ransomware group, but they're refusing to pay the $10 million ransom ODIN Intelligence got hacked, resulting in the loss of police raid plans, facial recognition data, and surveillance information The FBI says North Korea was behind the $100 million Horizon Bridge crypto hack And much more! Sponsored by PlexTrac: Streamline your security testing reporting so you can get back to the work that matters! https://plextrac.com/unsupervisedlearningBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

30 Jan 202314min

NO. 366 | T-Breach, Siri++, Conception Ages…

NO. 366 | T-Breach, Siri++, Conception Ages…

NO. 366 | T-Breach, Siri++, Conception Ages… TOPICS INCLUDE: -T-Mobile has had another security breach, this one affecting at least 37 million accounts -Canary Cards now available to use as credit cards -Hook Malware allows attackers to fully control Android phones -Attackers are now spreading malware through Microsoft OneNote attachments -Many attackers are migrating from Cobalt Strike to the more defender-focused Silver C2 framework -Git patched two critical RCEs …and many more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

23 Jan 202315min

NO. 365 | China's Decline, MicrosoftAI, Creativity Ratio…

NO. 365 | China's Decline, MicrosoftAI, Creativity Ratio…

China's Decline, MicrosoftAI, Creativity Ratio…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

17 Jan 202311min

NO. 364 | Reality Headset, BingPT, AI+Cyber

NO. 364 | Reality Headset, BingPT, AI+Cyber

NO. 364 | Reality Headset, BingPT, AI+CyberBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

9 Jan 202315min

NO. 363 | NEWS, ANALYSIS, and DISCOVERY SERIES

NO. 363 | NEWS, ANALYSIS, and DISCOVERY SERIES

Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

3 Jan 202313min

NO. 362 | Dependency Scanner, Citrix Attacks, AI Analysis…

NO. 362 | Dependency Scanner, Citrix Attacks, AI Analysis…

Dependency Scanner, Citrix Attacks, AI Analysis…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

19 Dec 202212min

Populärt inom Teknik

uppgang-och-fall
rss-racevecka
elbilsveckan
bilar-med-sladd
market-makers
skogsforum-podcast
rss-laddstationen-med-elbilen-i-sverige
bosse-bildoktorn-och-hasse-p
natets-morka-sida
rss-technokratin
developers-mer-an-bara-kod
rss-elektrikerpodden
ai-sweden-podcast
hej-bruksbil
mediepodden
rss-veckans-ai
bli-saker-podden
rss-uppgang-och-fall
rss-it-sakerhetspodden
rss-snacka-om-ai