Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Episoder(532)

Unsupervised Learning: No. 116

Unsupervised Learning: No. 116

Chinese at CanSecWest, Applebees POS, Palantir, Poisoning, TensorFlow DoD, Amazon laughing, Google 72-qbits, Amazon FinTech, Android P, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

13 Mar 201817min

Unsupervised Learning: No. 115

Unsupervised Learning: No. 115

GitHub DDoS, Celebrite Attacks, AI warnings, Palantir in New Orleans, Grub Backspace, 4G attacks, Space Corps, Amazon wins Defense Department deal, tech news, human news, discovery, notes, recommendation, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

6 Mar 201812min

Unsupervised Learning: No. 113

Unsupervised Learning: No. 113

Parkland tampering, Avoid Huawei, Bongo S3, Facebook 2FA Spam, Android Cryptojacking, Spyware Hacking, Password Dating, Technology News, Human News, Trends, Ideas & Analysis, Data & Statistics, Discovery, Recommendations, Aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

20 Feb 201851min

Unsupervised Learning: No. 112

Unsupervised Learning: No. 112

Chinese AR glasses, Cisco ASA flaws, Russian Nuclear Cryptomining, Marine quadcopters, POS Skimmers, Chrome HTTP, technology news, human news, discovery, notes, recommendations, and the aphorism of the week…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

12 Feb 201822min

Unsupervised Learning: No. 111

Unsupervised Learning: No. 111

Olympic security drones, Alexa trickery, Chinese quantum satellite, Audio Adversary Examples, BeeToken Ethereum theft, App Store Security, Cryptomining, technology news, human news, discovery, notes, recommendations, and the aphorism of the week…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

5 Feb 201814min

Unsupervised Learning: No. 109

Unsupervised Learning: No. 109

Social engineering, breach impact, Chinese turncoat, Android spy kit, Hawaiian OPSEC, Russian cables, bypassing CloudFlare, technology news, human news, discovery, notes, recommendations, and the aphorism of the week…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

22 Jan 201814min

Unsupervised Learning: No. 107

Unsupervised Learning: No. 107

Meltdown & Spectre, India's Database, Criminals and Monero, Equifax Non-action, technology news, human news, discovery, notes, recommendations, and the aphorism of the week…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

8 Jan 201830min

The Biggest Advantage in Machine Learning Will Come From Superior Coverage, Not Superior Analysis

The Biggest Advantage in Machine Learning Will Come From Superior Coverage, Not Superior Analysis

Many people, in many fields, think Machine Learning won't replace their analysts because their humans are better than an algorithm. But it's not just about side-by-side comparisons. The bigger question is, "what percentage of the data can humans actually look at?", and the answer to that question (a tiny fraction) is the reason ML will be so helpful.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

3 Jan 20188min

Populært innen Teknologi

romkapsel
rss-avskiltet
teknisk-sett
energi-og-klima
shifter
tomprat-med-gunnar-tjomlid
rss-impressions-2
nasjonal-sikkerhetsmyndighet-nsm
elektropodden
smart-forklart
rss-alt-som-gar-pa-strom
fornybaren
kunstig-intelligens-med-morten-goodwin
rss-snakk-om-sikkerhet
rss-alt-vi-kan
rss-bouvet-bobler
teknologi-og-mennesker
rss-digitaliseringspadden
i-loopen
rss-polypod