Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

Unsupervised Learning: No. 215

Unsupervised Learning: No. 215

Iran DDoS, Jigsaw Picture Validation, 1000 Chinese Espionage Cases, Twitter Deepfake Labeling, Android Bluetooth Vuln, Cisco Discovery Vuln, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

10 Feb 202014min

Unsupervised Learning: No. 214

Unsupervised Learning: No. 214

London Facial Recognition, Coalfire Freedom, NYT Reporter Spyware, Avast Sells Customer Data, Google's Bounty Program, Kali 2020, Harvard Chemist Espionage, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

4 Feb 202027min

Unsupervised Learning: No. 213

Unsupervised Learning: No. 213

Saudi Bezos Hack, MIT Davos AI, Moar Energy Attacks, NIST Privacy, Ohio CISO, Microsoft Data Breach, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

27 Jan 202018min

Unsupervised Learning: No. 212

Unsupervised Learning: No. 212

Clearview AI Surveillance De-anonymizing Faces, Face Obscuring Tech, Google Cookies, San Diego GE Surveillance, Oregon Selling DMV Data, Windows 7 Done, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

21 Jan 202021min

Unsupervised Learning: No. 211

Unsupervised Learning: No. 211

California's Privacy Law, SHA1 exploit, Ransomware Storage, Ring Voyeurs, 20 vs. 2020, ATT&CK ICS, Telecom SMS, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

13 Jan 202017min

Visibility and Understanding Create Both Tools and Weapons

Visibility and Understanding Create Both Tools and Weapons

How increased understanding leads to the creation of better and better tools, and why tools are inexorable from weapons.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

12 Jan 20205min

Unsupervised Learning: No. 210

Unsupervised Learning: No. 210

War with Iran, TikTok, New GIAC cert, Mystery Drones, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

8 Jan 202012min

Unsupervised Learning: No. 209

Unsupervised Learning: No. 209

Ring Sued, Mean Time to Hardening, APT20 2FA, China Base Pictures, China Satellites, Angled Toilets, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

30 Dec 201915min

Populärt inom Teknik

uppgang-och-fall
rss-racevecka
elbilsveckan
rss-badfluence
bilar-med-sladd
bosse-bildoktorn-och-hasse-p
market-makers
skogsforum-podcast
rss-veckans-ai
natets-morka-sida
rss-technokratin
rss-laddstationen-med-elbilen-i-sverige
hej-bruksbil
garagehang
mediepodden
solcellskollens-podcast
rss-uppgang-och-fall
rss-snacka-om-ai
developers-mer-an-bara-kod
ai-sweden-podcast