Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

Unsupervised Learning: No. 239

Unsupervised Learning: No. 239

Pentagon Information Warfare, Fancy GRU Attacks, 2 Chinese COVID Hackers, Chief Software Officer, Space Force DEVOPS, FBI Chinese Tax Software, DJI Drone Vulns, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

27 Juli 202010min

Unsupervised Learning: No. 238

Unsupervised Learning: No. 238

Twitter's Breach, The US Attacked IRA, Bloomberg FBI Sabre, Iran Keeps Getting Hacked, Russia's Cozy Bear, Cloudflare Outage, UIPath Automation, Verizon Uses Google AI to Automate Customer Service, Gamers Are Spending More, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

21 Juli 202029min

Our Lighted Path to Totalitarianism

Our Lighted Path to Totalitarianism

An essay on how five trends seem to naturally guide civilizations towards Totalitarianism as they progress, and what we can do to avoid that outcome.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

16 Juli 202014min

Unsupervised Learning: No. 237

Unsupervised Learning: No. 237

Americans in China, TikTok Banning, Chinese Critics, BlueLeaks, Router Security, COVID Accelerating Trends, Twitter Subscriptions?, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

13 Juli 202014min

Searching for the Ultimate Obstacle to Creativity

Searching for the Ultimate Obstacle to Creativity

This essay looks at Training as Avoidance, The Toolbox Fallacy, and procrastination, and explores a potential root cause that underpins them all to inhibit creativity.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

7 Juli 202017min

Unsupervised Learning: No. 236

Unsupervised Learning: No. 236

Encrochat breach, F5 Big Problem, DHS Social Election Query, WastedLocker, India Bans Chinese Apps, Florida DNA Privacy, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

6 Juli 202027min

Unsupervised Learning: No. 235

Unsupervised Learning: No. 235

Chinese diplomats stealing secrets, COVID flying risk, RT interviewing US cops, Army Ignite future predictors, China launches its GPS network, Russians paid bounties to kill US troops, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

29 Juni 202018min

Unsupervised Learning: No. 234

Unsupervised Learning: No. 234

Ripple20 IoT Vulns, Homeland Security Surveillance, US Cyber Budget, Adobe EOL, AWS DDoS, Bellingcat Poison Investigation, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

21 Juni 202020min

Populärt inom Teknik

uppgang-och-fall
market-makers
rss-badfluence
rss-racevecka
elbilsveckan
rss-laddstationen-med-elbilen-i-sverige
rss-technokratin
solcellskollens-podcast
skogsforum-podcast
natets-morka-sida
hej-bruksbil
rss-elektrikerpodden
bilar-med-sladd
garagehang
rss-uppgang-och-fall
rss-veckans-ai
developers-mer-an-bara-kod
teknikveckan
rss-digitala-influencer-podden
rss-snacka-om-ai