Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

NO. 361 | GPT++, Apple Security, CISA Cuba…

NO. 361 | GPT++, Apple Security, CISA Cuba…

GPT++, Apple Security, CISA Cuba…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

12 Dec 202212min

NO. 360 | NEWS, ANALYSIS & DISCOVERY SERIES

NO. 360 | NEWS, ANALYSIS & DISCOVERY SERIES

Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

5 Dec 202213min

Erkang Zheng of JupiterOne | SPONSORED INTERVIEW SERIES

Erkang Zheng of JupiterOne | SPONSORED INTERVIEW SERIES

In this standalone episode we’re doing a sponsored interview with Erkang Zheng of Jupiter One. So JupiterOne is a special company to me. I just built a vuln management program at Robinhood based around them, and I believe so much in their vision that I’m looking to actually become an advisor. I mention this because when I fanboy for something, like Apple, or whoever, I want you to know that I’m fanboying and/or have a relationship with them. Or that I want to. The interview here talks mostly about concepts, however, and not so much specific features. But I just wanted to mention my orientation to the company prior to starting. I’m speaking with Erkang Zheng who is the founder and CEO of the company, and as you can hear we have a similar take on many of the problems currently in security. So with that, here’s Erkang Zheng. — Start a JupiterOne Account for FreeBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

3 Dec 202227min

NO. 359 | THE NEWS, ANALYSIS & DISCOVERY SERIES

NO. 359 | THE NEWS, ANALYSIS & DISCOVERY SERIES

Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

28 Nov 20229min

Scott Kuffer of Nucleus Security | SPONSORED INTERVIEW SERIES

Scott Kuffer of Nucleus Security | SPONSORED INTERVIEW SERIES

In this standalone episode we’re doing a sponsored interview with Scott Kuffer, co-founder and COO of Nucleus Security. I was already excited by this vendor just based on the research I did to allow them to be a sponsor, but the conversation with them really made me think they’re approaching the vulnerability management problem the right way. Namely, by tackling a lot of the non-technical problems using technical solutions rather than obsessing over vuln prioritization. If you are in the VM space or are about to be in it, you will love this conversation. And with that, here’s Scott Kuffer with Nucleus Security.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

28 Nov 202247min

NO. 358 | NEWS, ANALYSIS, & DISCOVERY SERIES

NO. 358 | NEWS, ANALYSIS, & DISCOVERY SERIES

Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

22 Nov 202214min

NO. 357 | NEWS, ANALYSIS, & DISCOVERY SERIES

NO. 357 | NEWS, ANALYSIS, & DISCOVERY SERIES

NSA Languages, GPT-4 Hype, Chinese AirDrop…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

14 Nov 202212min

NO. 356 | NEWS, ANALYSIS & DISCOVERY SERIES

NO. 356 | NEWS, ANALYSIS & DISCOVERY SERIES

Sponsored by JupiterOne: jupiterone.com/unsupervisedlearning Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

7 Nov 202211min

Populärt inom Teknik

uppgang-och-fall
rss-racevecka
elbilsveckan
bilar-med-sladd
market-makers
skogsforum-podcast
rss-laddstationen-med-elbilen-i-sverige
bosse-bildoktorn-och-hasse-p
natets-morka-sida
rss-technokratin
developers-mer-an-bara-kod
rss-elektrikerpodden
ai-sweden-podcast
hej-bruksbil
mediepodden
rss-veckans-ai
bli-saker-podden
rss-uppgang-och-fall
rss-it-sakerhetspodden
rss-snacka-om-ai