Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Episoder(532)

Unsupervised Learning: No. 99

Unsupervised Learning: No. 99

Information Warfare, AI vs. CAPTCHA, Google Bug Bug, DARPA Drone Swarms, USB Fail, Medical Extortion, tech news, human news, ideas, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

31 Okt 201729min

InfoSec Needs to Embrace New Tech Instead of Ridiculing It

InfoSec Needs to Embrace New Tech Instead of Ridiculing It

The InfoSec community needs to learn how to shepherd the public through new technology instead of joining them in fleeing from it.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

26 Okt 20176min

The Difference Between Violence and Terrorism

The Difference Between Violence and Terrorism

The ways that terrorism and violence are different, and why it's important that we don't confuse them.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

26 Okt 20174min

Unsupervised Learning: No. 98

Unsupervised Learning: No. 98

The Reaper botnet, Google Advanced Email Protection, Bitcoin Over $6,000, Duo's $70 million, Dubai going to facial recognition, tech news, human news, ideas, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

23 Okt 201732min

Unsupervised Learning: No. 97

Unsupervised Learning: No. 97

Major WPA2 Flaw, Suburu hack, Vulnerable Container Ships, F-35 Data Stolen, Accenture S3 Buckets, tech news, human news, ideas, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

16 Okt 201736min

Unsupervised Learning: No. 96

Unsupervised Learning: No. 96

Russians vs. NSA, ArcSight vs. Russia, DISQUS breach, TrendMicro vulnerability, Stamos, tech news, human news, ideas, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

11 Okt 201734min

Unsupervised Learning: No. 95

Unsupervised Learning: No. 95

IE leak, Whole Foods, Sonic, Apple Open-sources Kernels, Equifax $15 million retirement, tech news, human news, ideas, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

2 Okt 201711min

Unsupervised Learning: No. 94

Unsupervised Learning: No. 94

Deloitte hacked, Equifax fumbles, SEC hacked, iCloud ransom, Adobe PGP facepalm, Verizon S3 buckets, CCleaner, tech news, human news, ideas, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

25 Sep 201733min

Populært innen Teknologi

romkapsel
rss-avskiltet
teknisk-sett
tomprat-med-gunnar-tjomlid
energi-og-klima
shifter
rss-impressions-2
elektropodden
fornybaren
smart-forklart
nasjonal-sikkerhetsmyndighet-nsm
rss-alt-som-gar-pa-strom
rss-alt-vi-kan
teknologi-og-mennesker
kunstig-intelligens-med-morten-goodwin
rss-snakk-om-sikkerhet
rss-bouvet-bobler
rss-digitaliseringspadden
rss-teams-cast-away
rss-bits-and-bytes-for-advokater