Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

NO. 355 | NEWS & ANALYSIS SERIES

NO. 355 | NEWS & ANALYSIS SERIES

Critical TLS, Liz Russia, AI Sweater… Sponsor: Keeper Security | Protect employee passwords in minutes with Keeper — the award-winning password manager that is secure, easy to set up, and easy to use. Keeper works out-of-the-box with identity, MFA, and SIEM solutions including Okta, Azure AD, Ping Identity, G Suite, YubiKey and many others…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

31 Okt 202213min

Why Everyone Needs a Blog | THE IDEA SERIES

Why Everyone Needs a Blog | THE IDEA SERIES

People used to be defined by where they work, and now they’re defined by their knowledge, capabilities, and opinions.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

27 Okt 20224min

Creativity Comes From Idleness | THE IDEA SERIES

Creativity Comes From Idleness | THE IDEA SERIES

A few years ago I figured out why we’re so creative in the shower…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

26 Okt 20222min

AI Art Will Push the Top 1% to Human Artists | THE IDEA SERIES

AI Art Will Push the Top 1% to Human Artists | THE IDEA SERIES

https://danielmiessler.com/blog/ai-art-push-1-percent-human-artists/Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

25 Okt 20224min

NO. 354 | THE NEWS & ANALYSIS SERIES

NO. 354 | THE NEWS & ANALYSIS SERIES

China Controls, TikTok Tracking, Infra Sabotage…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

24 Okt 202218min

Humiliation is Deadly | THE IDEA SERIES

Humiliation is Deadly | THE IDEA SERIES

Exploring a status game model for understanding negative behavior. https://danielmiessler.com/blog/humiliation-is-deadly/Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

21 Okt 202211min

NO. 353 | THE NEWS & ANALYSIS SERIES

NO. 353 | THE NEWS & ANALYSIS SERIES

🗞️ Caffeine Phishing, Cyber Labeling, Kamikaze Drones… Sponsor: Panther Security https://panther.com/ul22Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

17 Okt 20229min

News & Analysis | NO. 352

News & Analysis | NO. 352

CISA Assets, Contractor Hack, China CVEs… Sponsored by: Jupiter One @ jupiterone.com/unsupervisedlearning Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

12 Okt 202213min

Populärt inom Teknik

uppgang-och-fall
rss-racevecka
elbilsveckan
bilar-med-sladd
market-makers
bosse-bildoktorn-och-hasse-p
rss-laddstationen-med-elbilen-i-sverige
skogsforum-podcast
rss-technokratin
developers-mer-an-bara-kod
natets-morka-sida
hej-bruksbil
mediepodden
rss-veckans-ai
ai-sweden-podcast
rss-uppgang-och-fall
bli-saker-podden
rss-it-sakerhetspodden
rss-snacka-om-ai
rss-badfluence