Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Episoder(538)

UL NO. 489: STANDARD EDITION | My personal toolchain updates, Google tracking through DuckDuckGo, Anthropic’s Pentagon Deal, Grok4 NSFW, Substack Crushes WSJ, and more...

UL NO. 489: STANDARD EDITION | My personal toolchain updates, Google tracking through DuckDuckGo, Anthropic’s Pentagon Deal, Grok4 NSFW, Substack Crushes WSJ, and more...

UL NO. 489: STANDARD EDITION | My personal toolchain updates, Google tracking through DuckDuckGo, Anthropic’s Pentagon Deal, Grok4 NSFW, Substack Crushes WSJ, and more... You are currently listening t...

17 Jul 202522min

UL NO. 488: STANDARD EDITION | Google Granting Confusing Access to Gemini, A New Favorite Creator, Russia's new Autonomous Drones, Claude Code Madness and Neovim Config, and more...

UL NO. 488: STANDARD EDITION | Google Granting Confusing Access to Gemini, A New Favorite Creator, Russia's new Autonomous Drones, Claude Code Madness and Neovim Config, and more...

UL NO. 488: STANDARD EDITION | Google Granting Confusing Access to Gemini, A New Favorite Creator, Russia's new Autonomous Drones, Claude Code Madness and Neovim Config, and more... You are currently ...

10 Jul 202530min

UL NO. 487: STANDARD EDITION: Iranian Critical Infra Attacks, Insane Recent Productivity, A Chinese Mosquito Drone, Marcus's Response to Our AI Debate, "Context Engineering" Ain't It, and more...

UL NO. 487: STANDARD EDITION: Iranian Critical Infra Attacks, Insane Recent Productivity, A Chinese Mosquito Drone, Marcus's Response to Our AI Debate, "Context Engineering" Ain't It, and more...

UL NO. 487: STANDARD EDITION: Iranian Critical Infra Attacks, Insane Recent Productivity, A Chinese Mosquito Drone, Marcus's Response to Our AI Debate, "Context Engineering" Ain't It, and more... You ...

2 Jul 202541min

An AI Debate with Marcus Hutchins

An AI Debate with Marcus Hutchins

Marcus and I debate AIs capabilities from nearly polar opposite ends. He thinks it's basically autocomplete, and I think it's the most important tech we've ever built as humans. It was a fantastic, an...

26 Jun 20252h

UL NO. 486 STANDARD EDITION: Fully Automated AI Malware (Binary and Web), My Debate with Marcus Hutchins on AI and more

UL NO. 486 STANDARD EDITION: Fully Automated AI Malware (Binary and Web), My Debate with Marcus Hutchins on AI and more

UL NO. 486: STANDARD EDITION: Fully Automated AI Malware (Binary and Web), My Debate with Marcus Hutchins on AI, The 'Did You Notice?' Psyop, The METR AI Metric for Longterm Tasks, and more... You are...

26 Jun 202555min

UL NO. 485: STANDARD EDITION: Netflix RCE, My Current AI Stack, All-in on Claude Code, and more...

UL NO. 485: STANDARD EDITION: Netflix RCE, My Current AI Stack, All-in on Claude Code, and more...

STANDARD EDITION: Netflix RCE, My Current AI Stack, All-in on Claude Code, and more... You are currently listening to the Standard version of the podcast, consider upgrading and becoming a member to u...

19 Jun 202536min

UL NO. 484: STANDARD EDITION: OpenAI's Malicious AI Report, Disappointed with WWDC, AI's First Actual Science Breakthrough, and more...

UL NO. 484: STANDARD EDITION: OpenAI's Malicious AI Report, Disappointed with WWDC, AI's First Actual Science Breakthrough, and more...

UL NO. 484: STANDARD EDITION: OpenAI's Malicious AI Report, Disappointed with WWDC, AI's First Actual Science Breakthrough, and more... You are currently listening to the Standard version of the podca...

12 Jun 202543min

UL NO. 483 | STANDARD EDITION: A Chrome 0-Day, Meta Automates Security Assessments, New Essays, My New Video on Hacking with AI, Ukraine's Asymmetrical Attack, Thoughts on My AI Skeptical Friends, The Dangers of Winning the Wrong Game, and more...

UL NO. 483 | STANDARD EDITION: A Chrome 0-Day, Meta Automates Security Assessments, New Essays, My New Video on Hacking with AI, Ukraine's Asymmetrical Attack, Thoughts on My AI Skeptical Friends, The Dangers of Winning the Wrong Game, and more...

A Chrome 0-Day, Meta Automates Security Assessments, New Essays, My New Video on Hacking with AI, Ukraine's Asymmetrical Attack, Thoughts on My AI Skeptical Friends, The Dangers of Winning the Wrong G...

5 Jun 202531min

Populært innen Teknologi

lydartikler-fra-aftenposten
romkapsel
tomprat-med-gunnar-tjomlid
smart-forklart
teknisk-sett
energi-og-klima
rss-impressions-2
nasjonal-sikkerhetsmyndighet-nsm
elektropodden
shifter
pedagogisk-intelligens
i-loopen
rss-ki-praten
fornybaren
kunstig-intelligens-med-morten-goodwin
rss-praktisk-proptech
rss-heis
rss-alt-som-gar-pa-strom
rss-nerding-med-netlife
rss-anleggspraten