Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Episoder(532)

It's Wrong to Fear-monger on IoT Security

It's Wrong to Fear-monger on IoT Security

How it's shortsighted and irresponsible for InfoSec professionals to fear-monger on IoT Security, and what we should be saying instead.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

3 Jan 20185min

Unsupervised Learning: No. 106

Unsupervised Learning: No. 106

Swatting death, Ethereum kidnap, Chinese dystopia, Alteryx S3 bucket, Starbucks Monero, Forever21, Microphone ads, technology news, human news, discovery, notes, recommendations, and the aphorism of the week…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

3 Jan 201828min

Unsupervised Learning: No. 105

Unsupervised Learning: No. 105

TRITON, 1.4 billion credentials, HP keyloggers, iTunes Bitcoin laundering, removing credit card signatures, technologgy news, human news, discovery, notes, recommendations, and the aphorism of the week…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

18 Des 201723min

Unsupervised Learning: No. 104

Unsupervised Learning: No. 104

NiceHash hacked, Apple bugs, Stealing Cars via Relay, Crypto Collusion, technologgy news, human news, discovery, notes, recommendations, and the aphorism of the week…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

12 Des 201725min

Unsupervised Learning: No. 103

Unsupervised Learning: No. 103

Uber's mess, Google tracking users, AI finding missiles, drone disclosure, net neutrality, tech news, human news, ideas, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

27 Nov 201728min

Unsupervised Learning: No. 102

Unsupervised Learning: No. 102

Github security, China IW, Brexit IW, S3 again, Quad9 DNS security, tech news, human news, ideas, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

20 Nov 201726min

Unsupervised Learning: No. 101

Unsupervised Learning: No. 101

Verizon’s DBIR Report, sleeping fingerprints, IoT legislation, S3 security tools, AI tricks scammers, SEALs kill Green Beret, tech news, human news, ideas, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

13 Nov 201735min

Unsupervised Learning: No. 100

Unsupervised Learning: No. 100

Russian IW memes, POTUS Twitter, Texas Attack, Silence Trojan, NotPetya Damages, tech news, human news, ideas, discovery, recommendations, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

6 Nov 201723min

Populært innen Teknologi

romkapsel
rss-avskiltet
teknisk-sett
energi-og-klima
shifter
tomprat-med-gunnar-tjomlid
rss-impressions-2
nasjonal-sikkerhetsmyndighet-nsm
elektropodden
smart-forklart
rss-alt-som-gar-pa-strom
fornybaren
kunstig-intelligens-med-morten-goodwin
rss-snakk-om-sikkerhet
rss-alt-vi-kan
rss-bouvet-bobler
teknologi-og-mennesker
rss-digitaliseringspadden
i-loopen
rss-polypod