Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

News & Analysis | NO. 351

News & Analysis | NO. 351

Cloudflare vs. CAPTCHA, Exchange 0-Day, NSA Leaker Sponsor: Zerofox: Download the External Cybersecurity GuideBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

3 Okt 202217min

News & Analysis | NO. 350

News & Analysis | NO. 350

Infowar Audit, Zoom Reflections, SF CamerasBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

26 Sep 202215min

News & Analysis: NO. 349

News & Analysis: NO. 349

Uber Hacked, GTA Leak, Goodbyes Listen to JJAgha's comments on Relentless Iterations and What He Expects from a Modern SIEM: https://panther.com/resources/podcasts/compass-ciso-jj-agha-on-relentless-iterations-and-what-he-expects-from-a-modern-siem/ Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

20 Sep 202214min

News & Analysis | NO. 348 | Spearmishing, Patreon Security, and Triple-Threat Ransomware

News & Analysis | NO. 348 | Spearmishing, Patreon Security, and Triple-Threat Ransomware

Spearmishing, Patreon Security, and Triple-Threat Ransomware Sponsored by JupiterOne: https://www.jupiterone.com/unsupervisedlearningBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

12 Sep 202223min

Metagaming: An Interview with Andrew Ringlein

Metagaming: An Interview with Andrew Ringlein

In today’s standalone episode I’m going to talk with Andrew Ringlein about some interesting new gaming ideas I’ve not seen anywhere else. He's releasing them in a new game called Rifters, and we chat through the concepts themselves and how they manifest in his new release.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

7 Sep 202253min

News & Analysis | NO. 347

News & Analysis | NO. 347

TikTok Hack, Cloudflare Kiwi, Google OSS Bounty Sponsored by: Keeper Security http://keepersecurity.com/unsupervisedlearning Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

6 Sep 202215min

News & Analysis | NO. 346

News & Analysis | NO. 346

🗞️ Unsupervised Learning NO. 346 | Twitter Whistle, LastPass Plex, Satellite PhonesBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

29 Aug 202219min

News & Analysis | NO. 345

News & Analysis | NO. 345

Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

22 Aug 202216min

Populärt inom Teknik

uppgang-och-fall
rss-racevecka
elbilsveckan
market-makers
bilar-med-sladd
bosse-bildoktorn-och-hasse-p
rss-badfluence
rss-laddstationen-med-elbilen-i-sverige
skogsforum-podcast
rss-veckans-ai
rss-technokratin
natets-morka-sida
hej-bruksbil
developers-mer-an-bara-kod
mediepodden
rss-uppgang-och-fall
rss-snacka-om-ai
garagehang
bli-saker-podden
rss-it-sakerhetspodden