Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

Unsupervised Learning: No. 222

Unsupervised Learning: No. 222

Who's hiring, freezing, and laying off, models predict 100-200K US deaths, April distancing, Adversarial Capital, Booz Russia, Google State Phishes, Worker Monitoring, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism… Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

30 Mars 202034min

Unsupervised Learning: No. 221

Unsupervised Learning: No. 221

Health-justified Video Surveillance, FDA Emergency Approval of a C19 Test, Israel Mobile Monitoring, Amazon Essentials, Pandemic Drone Monitoring, Retasking Factories, Rich People Ventilators, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

24 Mars 202026min

Unsupervised Learning: No. 220

Unsupervised Learning: No. 220

Virus updates, Github gets NPM, New Stimulus, Amazon Hiring 100K, Saltwater Nozzles, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

17 Mars 202020min

Unsupervised Learning: No. 219

Unsupervised Learning: No. 219

Coronavirus Update, Nation-state Exchange Hacking, FuzzBench, New Artillery, Germ Catapults, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

9 Mars 202013min

Unsupervised Learning: No. 218

Unsupervised Learning: No. 218

SARS-CoV-2 update, China's health tracking, Firefox DNS over HTTPS, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

2 Mars 202014min

Unsupervised Learning: No. 217

Unsupervised Learning: No. 217

MGM breach, DDoS and Ransomware on the Rise, Twitter v. Bloomberg, Tesla Tape, Russia Pro Trump & Pro Bernie, Tapping Cables, Insider Concern, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

24 Feb 202019min

Unsupervised Learning: No. 216

Unsupervised Learning: No. 216

Adsense Extortion, OT Ransomware Attack, Ring 2FA, Smart Speaker Jamming Bracelet, DARPA's Flying Gun, Lots of Advisories, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

20 Feb 202013min

A Conversation With General Earl Matthews on Election Security

A Conversation With General Earl Matthews on Election Security

In this episode I speak with retired Air Force Major General Earl Matthews on the topic of election security. We talk about digital elections, attacking trust in the US system, social media influence campaigns, and possible motives for foreign interference in US elections.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

15 Feb 202039min

Populärt inom Teknik

uppgang-och-fall
rss-racevecka
elbilsveckan
rss-badfluence
market-makers
bosse-bildoktorn-och-hasse-p
bilar-med-sladd
natets-morka-sida
rss-laddstationen-med-elbilen-i-sverige
garagehang
hej-bruksbil
rss-veckans-ai
solcellskollens-podcast
skogsforum-podcast
rss-uppgang-och-fall
rss-it-sakerhetspodden
rss-snacka-om-ai
rss-technokratin
rss-elektrikerpodden
developers-mer-an-bara-kod