Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Jaksot(532)

NO. 376 | AI transforms security, existential risk, and how to stay in front…

NO. 376 | AI transforms security, existential risk, and how to stay in front…

NO. 376 | AI transforms security, existential risk, and how to stay in front…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

3 Huhti 202320min

NO. 375 — 6 Post-GPT Phases, Github's Private Key, New Assistant Interfaces

NO. 375 — 6 Post-GPT Phases, Github's Private Key, New Assistant Interfaces

6 Post-GPT Phases, Github's Private Key, New Assistant InterfacesBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

27 Maalis 202317min

NO. 374 — AI Response Shaping, SpaceX Blueprints, GPT-4 Innovation Explosion…

NO. 374 — AI Response Shaping, SpaceX Blueprints, GPT-4 Innovation Explosion…

NO. 374 — AI Response Shaping, SpaceX Blueprints, GPT-4 Innovation Explosion…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

21 Maalis 202312min

NO. 373 — SPQA Architecture, LLaMA on M1 Mac, Loved Ones Voice Scams…

NO. 373 — SPQA Architecture, LLaMA on M1 Mac, Loved Ones Voice Scams…

NO. 373 — SPQA Architecture, LLaMA on M1 Mac, Loved Ones Voice Scams… Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

13 Maalis 202317min

Sponsored Interview — Kolide

Sponsored Interview — Kolide

Today I’m doing a Sponsored Interview with Kolide — a company I’ve heard a lot about recently and have been looking forward to chatting with. I’m talking to Jason Meller, the founder and CEO of Kolide and we talk about: The problems in the BOYD space Kolide’s approach to solving the problem A user-centric approach to policy compliance His view of what stops other players from being successful And other topics So with that, here’s Jason Meller… https://kolide.com/unsupervisedlearning  Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

13 Maalis 202337min

NO. 372 — LastPass Employee Hack, State AI Propaganda, Crowdstrike Report Analysis…

NO. 372 — LastPass Employee Hack, State AI Propaganda, Crowdstrike Report Analysis…

NO. 372 — LastPass Employee Hack, State AI Propaganda, Crowdstrike Report Analysis…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

7 Maalis 202329min

NO. 371 | Covid Lab Leak, Military Server Exposed, OAI Foundry…

NO. 371 | Covid Lab Leak, Military Server Exposed, OAI Foundry…

NO. 371 | Covid Lab Leak, Military Server Exposed, OAI Foundry…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

27 Helmi 202322min

NO. 370 | GoDaddy Hack, EU Chinese APTs, Hacking with ChatGPT

NO. 370 | GoDaddy Hack, EU Chinese APTs, Hacking with ChatGPT

NO. 370 | GoDaddy Hack, EU Chinese APTs, Hacking with ChatGPTBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

21 Helmi 202314min