Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Jaksot(532)

Unsupervised Learning: No. 129

Unsupervised Learning: No. 129

Reboot your router, China hacked a U.S. Navy contractor and stole around 600GB of top secret data. Newark, NJ is monitoring much of the city with surveillance cameras, and they're making the camera footage available to the public. Facebook also shared data with a number of Chinese companies. Tech, Humans, Ideas, Discovery, Reconmendations, Aphorism… Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

12 Kesä 201811min

Unsupervised Learning: No. 128

Unsupervised Learning: No. 128

Pentagon background checks, China using machine learning in schools, Rusian ethnicity detecting AI, US Military presence in Africa, Atlanta lost dashcam footage, Kidnapping insurance, Technology News, Ideas, Recommendation, Aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

4 Kesä 201810min

Unsupervised Learning: No. 127

Unsupervised Learning: No. 127

VPNFilter botnet, Echo private convo, Ghostery GDPR fail, PornHub VPN, Technology News, Human News, Ideas, Trends, & Analysis, Discovery, Recommendations, the weekly Aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

29 Touko 20189min

Unsupervised Learning: No. 126

Unsupervised Learning: No. 126

VPNFilter botnet, LA + Palantir, Amazon Surveillance, Momentum report, Clapper says Russia turned the election, Chinese supply chain attacks, Tech News, Human News, Ideas, Discovery, Recommendation, the Aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

25 Touko 20189min

Unsupervised Learning: No. 125

Unsupervised Learning: No. 125

Regulators aren't staffed to audit you on GDPR, inaudible Siri and Alexa commands, iOS 4 is bringing lots of privacy updates, California DNA storage, technology news, human news, Ideas, recommendation, the weekly aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

18 Touko 201812min

If You’re Not Doing Continuous Asset Management You’re Not Doing Security

If You’re Not Doing Continuous Asset Management You’re Not Doing Security

How enterprises are completely ignoring the security activity that could help the most.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

16 Touko 20187min

Unsupervised Learning: No. 120

Unsupervised Learning: No. 120

It's 2 billion users now, Liinux beep, Digital Shadows finds fail files, cloud misconfiguration, AlterEgo, AI applications, Alexa sending payments, Tech, Ideas, Recommendation, Aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

9 Huhti 201819min

Unsupervised Learning: No. 119

Unsupervised Learning: No. 119

Atlanta disabled, MyFitnessPal hacked, Cambridge Analytica election tampering, Drupal, Saks, DARPA drones, Cloudflare 1.1.1.1, Slack bosses, Democratic Chinese AIs, Georgia facepalm, tech, humans, ideas, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

2 Huhti 201827min