Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Episoder(531)

Unsupervised Learning: No. 63

Unsupervised Learning: No. 63

Peak Prevention at AppSec Cali, Austrian Hotel Ransomware, Russian FSB Drama, WordPress Issues, AV Conflicts, Uber Pays Another Company's Bounty, Data Science, Rules for Rulers…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

30 Jan 201746min

Unsupervised Learning: No. 62

Unsupervised Learning: No. 62

An OWASP Gaming Security Framework, infosec news, OPSEC is obscurity, AMP is a horrible idea, the End of Twitter, the Sound of Silence, chaning your Echo wake word, RAWGraphs, Ask Lesley, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

22 Jan 201728min

Unsupervised Learning: No. 61

Unsupervised Learning: No. 61

Nasty new GMail phishing bug, Microsoft kills security bulletins, ShadowBrokers go dark, Cellebrite hacked, Combining sensor data with machine learning, the tradeoff between privacy and IoT functionality, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

16 Jan 201739min

Gratitude is the Epicenter of Happiness

Gratitude is the Epicenter of Happiness

The elusive center of happiness is gratitude, and the reason seems to be evolution.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

14 Jan 20174min

If You Believe Nothing You Can Be Convinced of Anything

If You Believe Nothing You Can Be Convinced of Anything

An essay about the Russian hacking attribution issue, and how people who cannot differentiate the credibility of information sources are ultimately set to believe anything rather than nothing.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

13 Jan 201712min

Unsupervised Learning: No. 60

Unsupervised Learning: No. 60

How we know Russia did it, the FBI using Best Buy, an IBM study on ransomware, MongoDB hacks, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

11 Jan 201731min

4 Things To Do in the First Week of Every January

4 Things To Do in the First Week of Every January

A short piece on why I don't like New Years resolutions, and the four things I prefer to do instead. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

27 Des 20162min

Unsupervised Learning: No. 58

Unsupervised Learning: No. 58

This week's topics: Yahoo!, Shadowbrokers, Building Your Own Honeytrapping Infrastructure, The Power of Newsletters, Project Aristotle, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

19 Des 201614min

Populært innen Teknologi

romkapsel
rss-avskiltet
teknisk-sett
tomprat-med-gunnar-tjomlid
energi-og-klima
rss-impressions-2
shifter
nasjonal-sikkerhetsmyndighet-nsm
elektropodden
fornybaren
rss-alt-vi-kan
rss-alt-som-gar-pa-strom
smart-forklart
rss-snakk-om-sikkerhet
teknologi-og-mennesker
kunstig-intelligens-med-morten-goodwin
rss-bouvet-bobler
i-loopen
pedagogisk-intelligens
rss-digitaliseringspadden