Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Jaksot(532)

Unsupervised Learning: No. 208

Unsupervised Learning: No. 208

Mobile Tracking, Chinese Drone-Flu Terrorism, Message Spying, Bing Misinformation, 23andMe GlaxoSmithKline, Spam Laws, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

23 Joulu 201915min

Unsupervised Learning: No. 207

Unsupervised Learning: No. 207

Pentagon vendor requirements, Ring camera freakout, Bluetooth Thieves, Palantir Pentagon, Amazon Rekognition, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

17 Joulu 201928min

Unsupervised Learning: No. 206

Unsupervised Learning: No. 206

Vietnamese BMW APT, Defense Contractor Prep, China replacing a culture, HackerOne Cookie Snafu, Chinese Also Worried About Privacy, China Mobile Face, CDC Flu Warning, AWS Sagemaker, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

9 Joulu 201921min

Unsupervised Learning: No. 205

Unsupervised Learning: No. 205

Spam trends, CWE's latest 25, Uber audio recordings, Uber unauthorized drivers, Chinese research theft, Google state-actor notifications, bluetooth burglars, Nixon deepface, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

2 Joulu 201934min

Unsupervised Learning: No. 203

Unsupervised Learning: No. 203

Google health care, Google checking, Github open source, China policy hack, Hactivist bounties, healthcare attacks, facial protests, OSINT CTF, surveillance robots, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

18 Marras 201918min

Unsupervised Learning: No. 202

Unsupervised Learning: No. 202

Capital fired, DHS biodata, Twitter insiders, Baltimore Cyber Insurance, Airbnb Assessment, Google Play Malware, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

11 Marras 201914min

Unsupervised Learning: No. 201

Unsupervised Learning: No. 201

Unify drama, Fancy cheating, NSO lawsuits, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

4 Marras 201919min

Unsupervised Learning: No. 200

Unsupervised Learning: No. 200

200th episode!, White House cyber vacancies, AT&T SIM bribery, South Africa ultimatum, climate change power crash, Bahgdadi dead, RuNET, NYT insanity, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

28 Loka 201917min