Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

News & Analysis | NO. 344

News & Analysis | NO. 344

Blackhat/DEFCON, TikTok Lockdown, MailChimp Breach… Sponsor: JupiterOne https://www.jupiterone.com/unsupervisedlearning  Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

17 Aug 202213min

News & Analysis | NO. 343

News & Analysis | NO. 343

UL NO. 343 | Emergency Hack, Chinese Cobalt Strike, Solana Drainage Sponsor: ZeroFox https://www.get.zerofox.com/ti-guideBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

8 Aug 202212min

News & Analysis | NO. 342

News & Analysis | NO. 342

NK Hackers, TikTok Influence, Amazon Police… Sponsor: Hyperproof. Security, Compliance, and Risk Management leaders need to be able to articulately advocate for their programs to gain collaboration from their peers, support from their leadership, as well as budget and headcount.In this Hyperproof guide, you'll see how to gain active and passive support for your various security initiatives, and you'll get tips and talking points you can use in executive conversations to gain support and drive urgency. info.hyperproof.io/getting-to-yes-ebookBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

2 Aug 202211min

News & Analysis | NO. 341

News & Analysis | NO. 341

Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

25 Juli 202215min

News & Analysis | NO. 340 | SF Surveillance, APTs vs. Journalists, TikTok Changes…

News & Analysis | NO. 340 | SF Surveillance, APTs vs. Journalists, TikTok Changes…

SF Surveillance, APTs vs. Journalists, TikTok Changes… Sponsored by Jupiter One.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

18 Juli 202221min

News & Analysis | NO. 339

News & Analysis | NO. 339

Lockdown Mode, Paid Pentagon Bounty, China's IP Threat… Sponsors: Cerby.com, CrowdSec.netBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

11 Juli 202219min

Sponsored Lunch Interview: Keeper Security

Sponsored Lunch Interview: Keeper Security

I had the opportunity to sit down with Zane Bond from Keeper Security. We spent around 40 minutes talking about Keeper's products, the problems they solve, and how they think about the password problem. Learn more at keepersecurity.com.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

11 Juli 202233min

News & Analysis | NO. 338 | Deepfake Interviews, China Leak, Hacker Services…

News & Analysis | NO. 338 | Deepfake Interviews, China Leak, Hacker Services…

This week's sponsor: Storyblok: Upgrade the Security of Your CMSBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

5 Juli 202220min

Populärt inom Teknik

uppgang-och-fall
rss-racevecka
elbilsveckan
rss-badfluence
market-makers
bosse-bildoktorn-och-hasse-p
bilar-med-sladd
natets-morka-sida
rss-laddstationen-med-elbilen-i-sverige
garagehang
hej-bruksbil
rss-veckans-ai
solcellskollens-podcast
skogsforum-podcast
rss-uppgang-och-fall
rss-it-sakerhetspodden
rss-snacka-om-ai
rss-technokratin
rss-elektrikerpodden
developers-mer-an-bara-kod