Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(541)

UL NO. 402: Israeli Footage & Analysis, WSFTP + MOVEIT, AI Explainability, Andreessen vs. Perell on Writing, and more…

UL NO. 402: Israeli Footage & Analysis, WSFTP + MOVEIT, AI Explainability, Andreessen vs. Perell on Writing, and more…

Israel analysis, a genetic data breach, active exploits against critical vulnerabilities, and a brilliant conversation between two writers about creativity 📢 Sponsored by Kolide: Concerned about data...

11 Loka 202326min

UL NO. 401: Sony hit again?, Taiwan Disinformation, Corporations Demand Hardcore Workers, and GPTVision Examples…

UL NO. 401: Sony hit again?, Taiwan Disinformation, Corporations Demand Hardcore Workers, and GPTVision Examples…

We also look at Lex's first meaningful conversation in the metaverse, fixing Science, and TikTok's impact on reading 📢 Sponsored by Kolide: Concerned about data breaches and hacks? 🔒 Discover Kolide...

3 Loka 202325min

UL NO. 400: What Hiring Managers Want, CVE Farming, Hunt Forward Operations, and AI vs. B2B Services

UL NO. 400: What Hiring Managers Want, CVE Farming, Hunt Forward Operations, and AI vs. B2B Services

Discover how AI is set to revolutionize the B2B services economy and the implications for GDP. Plus, unravel the paradox of the cyber job market, explore the urgent need for a content source authentic...

28 Syys 202331min

UL NO. 399: Wisdom Extraction From Any Text, Vegas Gets Cyber Jesus, AI Creativity Performance, Pentagon Cyber Strategy…

UL NO. 399: Wisdom Extraction From Any Text, Vegas Gets Cyber Jesus, AI Creativity Performance, Pentagon Cyber Strategy…

This week we talk about how I extract manual-quality wisdom from any text/transcript, what I learn from biographies, 25 lessons in 17 years of infosec, and tons of new tools and projects. 📢Sponsored ...

19 Syys 202338min

UL NO. 398: Storm Vuln Stacking, CloudRecon, The S-Tier Guide to AI Whispering, Full-body MRIs…

UL NO. 398: Storm Vuln Stacking, CloudRecon, The S-Tier Guide to AI Whispering, Full-body MRIs…

Explore the explosive separation of society into the Thriving 10% vs. the Suffering 90%, how AI is becoming an integral part of our brains, and how to defend your family's privacy 📢Sponsored by Vanta...

12 Syys 202320min

UL NO. 397: Propaganda in a Box, Glacier-like Security, AGI by 2028?, Ancient Wisdom via AI, and Newsletter Differentiation

UL NO. 397: Propaganda in a Box, Glacier-like Security, AGI by 2028?, Ancient Wisdom via AI, and Newsletter Differentiation

🎥 Embracing Short-Form Video Creation🔬 Piping into Portscanner: A Guide📚 Long/Slow Content: The UL Book of the Month🛡️ Defensive Security: A Glacier's Pace🧠 Predicting AGI Attainment by 2025-2028...

7 Syys 202326min

No. 396 - Elon's Doxxing FSD, ATHI AI Threat Modeling Framework, Cardboard Drones, and GPT Enterprise…

No. 396 - Elon's Doxxing FSD, ATHI AI Threat Modeling Framework, Cardboard Drones, and GPT Enterprise…

In this episode: 🤔 Thoughts on the Eliezer vs. Hotz AI Safety Debate🎥 Musk's FSD and Privacy Demo🔒 Duolingo Data Breach💥 MOVEit Mass Hack🔎 Putin Critics' Fate🚨 Leaseweb Security Breach🔬 Lazarus...

29 Elo 202326min

What I'm Doing and How It's Going

What I'm Doing and How It's Going

How I went from a $350K FTE to $700K+ doing my own thing. This is the first time I've ever shared anything about what I'm doing and how I make money. It covers: Why I got out of the corporate game Wh...

21 Elo 202322min