Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Episoder(532)

Unsupervised Learning: No. 93

Unsupervised Learning: No. 93

Equifax fallout, BlueBorne, Microsoft RCE, iPhone X, Dumping AWS, Cassini, tech news, human news, ideas, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

18 Sep 201742min

Unsupervised Learning: No. 92

Unsupervised Learning: No. 92

Equifax, Hutchins got Krebs'd, Russia used Facebook, Energy hacking, Anti-protester AI, High-pitched Assistant hacking, tech news, human news, ideas, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

11 Sep 201729min

Unsupervised Learning: No. 91

Unsupervised Learning: No. 91

465K pacemaker patches, instagram leak, DJI bounty, Marketing departments messing up security news, false dichotomy in complex issues, IRS social media mining, death of the Sun, more fake Wells Fargo accounts, human echolocation, facial gestures as interface, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

4 Sep 201737min

Unsupervised Learning: No. 90

Unsupervised Learning: No. 90

Swedish gov leak, OPM hacking arrest, cybersecurity spending $1T, Oreo, Whole Amazon Foods, intelligence genes, false dichotomy of conflicting ideas, OPSEC obscurity, discovery, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

28 Aug 201730min

Unsupervised Learning: No. 89

Unsupervised Learning: No. 89

Serious CANBUS issue, Cyber as a branch of the service?, iOS 11 Cop Mode, biometric wearables, Bill Joy battery, bitcoin forking again, ideas, discovery, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

20 Aug 201735min

Unsupervised Learning: No. 88

Unsupervised Learning: No. 88

Amazon Macie, APT28, Cuba sonic attacks, Palantir and police, DNA malware, confusing self-driving cars, ideas, discovery, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

15 Aug 201722min

Unsupervised Learning: No. 85

Unsupervised Learning: No. 85

The future of security testing, nuclear plant hacks, Android malware, satellite decryption, wildcard certs, military encryption, gsuite protections, WWE S3, tesla 3, jawbone, drone hacking, mental aging, millionare GPAs, discovery, recommendations, the weekly aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

10 Jul 201726min

Unsupervised Learning: No. 83

Unsupervised Learning: No. 83

Petya ransomware worm, RNC breach, Anthem settlement, Russians want source code, risk ratings, patching, ICOs, ideas, discovery, recommendation, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

27 Jun 201726min

Populært innen Teknologi

romkapsel
rss-avskiltet
teknisk-sett
tomprat-med-gunnar-tjomlid
energi-og-klima
rss-impressions-2
shifter
nasjonal-sikkerhetsmyndighet-nsm
elektropodden
fornybaren
rss-alt-vi-kan
rss-alt-som-gar-pa-strom
smart-forklart
rss-snakk-om-sikkerhet
teknologi-og-mennesker
kunstig-intelligens-med-morten-goodwin
rss-bouvet-bobler
i-loopen
pedagogisk-intelligens
rss-digitaliseringspadden