Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

The Dark Web Has Nothing on Data Brokers

The Dark Web Has Nothing on Data Brokers

How so-called legitimate Data Brokers are a far worse threat to peoples' privacy than cyber-criminals operating on the Dark Web.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

15 Juni 20207min

Unsupervised Learning: No. 233

Unsupervised Learning: No. 233

SMBleed, Republicans. vs. China, Hawkey Surveillance, COVID in August 2019, IBM Facial PR, Palantir NHS, Blockchain Misinformation, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

15 Juni 202020min

The Problem With Extracted Versions of Things

The Problem With Extracted Versions of Things

A short essay on how we might get more pleasure from things that take longer to process and attain, and what we can do with that information.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

12 Juni 20205min

Unsupervised Learning: No. 232

Unsupervised Learning: No. 232

COVID-19 Trends, New Zoom Trouble, Facebook Blocking, Chrome Incognito Suit, Retail Rents, Nuclear Contractor Hack, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

8 Juni 202033min

Unsupervised Learning: No. 231

Unsupervised Learning: No. 231

US Protests & Unrest, Trump Goes Into the Bunker, NSA Warns on Exim, Octopus Scanner, Stanford's SIO Virality Project, Windows 10 Update, SHA-1 Deprecated in SSH, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

1 Juni 202018min

Unsupervised Learning: No. 230

Unsupervised Learning: No. 230

Twitter Bots, Face Recognition Headsets, Chrome Bug Memories, Virtual Currency, White House OPSEC, Realtime Language Translation, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

25 Maj 202024min

Analysis of the 2020 Verizon Data Breach Report

Analysis of the 2020 Verizon Data Breach Report

In this episode, Daniel takes a look at the 2020 Verizon Data Breach Investigations Report. He looks at the key findings and talks about what they might mean to us going forward.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

20 Maj 202010min

Unsupervised Learning: No. 229

Unsupervised Learning: No. 229

Feds Release Top Vulns, China Brainwave Tracking, Europe CISSP Masters, Army Electronic Warfare, Microsoft Third-largest Patch Tuesday, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

18 Maj 202019min

Populärt inom Teknik

uppgang-och-fall
market-makers
rss-badfluence
rss-racevecka
elbilsveckan
rss-laddstationen-med-elbilen-i-sverige
rss-technokratin
solcellskollens-podcast
skogsforum-podcast
natets-morka-sida
hej-bruksbil
rss-elektrikerpodden
bilar-med-sladd
garagehang
rss-uppgang-och-fall
rss-veckans-ai
developers-mer-an-bara-kod
teknikveckan
rss-digitala-influencer-podden
rss-snacka-om-ai