Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

The Relationship Between Hardship, Struggle, and Meaning

The Relationship Between Hardship, Struggle, and Meaning

My essay on how struggle could be necessary for meaning, and how this could be the underlying cause of much of America's mental health problem.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

15 Okt 202013min

News & Analysis | No. 250

News & Analysis | No. 250

CrowdSec, Nudge, Trickbot Trickery, CISA Ransomware Guide, Twitter and Facebook anti-Disinformation, QAnon Takedowns, Putin Turning on Trump, Azure Vulnerabilities, PC shipments up, Virtual Sales Call AI, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

12 Okt 202026min

News & Analysis | No. 249

News & Analysis | No. 249

Operation Fortify, Cyber Pearl Harbor, Github Code Scanning, E-6B Flights, Blackbaud++, Grinder Password Reset, Cloudflare API Security, QNAP Drama, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

5 Okt 202021min

News & Analysis | No. 248

News & Analysis | No. 248

Everyday Threat Modeling, Why I Like TikTok So Much, Windows XP Leak, SSH 8.4, Renée DiResta's Latest, Student Visas Changes, Cisco IOS Vulns, QAonon Gamification, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

28 Sep 202016min

Why Creators Should Move to Direct Support Monetization

Why Creators Should Move to Direct Support Monetization

My essay about why I think creators—especially in InfoSec—should be setting up their own domains and moving to a direct model for monetization.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

24 Sep 202011min

No, Changing Your SSH Port Isn't Security by Obscurity

No, Changing Your SSH Port Isn't Security by Obscurity

My latest essay on the timeless debate on SSH ports and Security by Obscurity. I talk about why changing your port is not usually obscurity, and give what I believe to be an airtight method of how you can tell the difference between regular security and Security by Obscurity.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

23 Sep 202013min

News & Analysis | No. 247

News & Analysis | No. 247

SSH Port Obscurity, The TikTok Deal, Ransomware Death, Chinese Espionage CRM, Amazon Bribery, Instant Domain Admin, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

21 Sep 202020min

Book Summary | Naked Statistics, by Charles Wheelen

Book Summary | Naked Statistics, by Charles Wheelen

In this episode, I review the book Naked Statistics, by Charles Wheelen. I cover: My one-sentence summary of the text The table of contents, which is super helpful to see the structure of the argument My capture of the main points My takeaways, questions, and ideas that came from reading it My final summarization And then my rating of the book and whether I recommend you read the full text Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

16 Sep 202022min

Populärt inom Teknik

uppgang-och-fall
market-makers
elbilsveckan
rss-badfluence
rss-racevecka
rss-laddstationen-med-elbilen-i-sverige
natets-morka-sida
rss-technokratin
skogsforum-podcast
rss-elektrikerpodden
hej-bruksbil
rss-uppgang-och-fall
bilar-med-sladd
garagehang
developers-mer-an-bara-kod
solcellskollens-podcast
rss-digitala-influencer-podden
rss-veckans-ai
har-vi-akt-till-mars-an
rss-snacka-om-ai