Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(538)

Mr. Robot Episode 3 Review

Mr. Robot Episode 3 Review

[ NOTE: There are spoilers below, not just for this episode but for the show in general. ] Enough people have asked me to start doing reviews of Mr. Robot episodes that I’m going to have a go at it. ...

19 Juli 201518min

Take 1 Security Podcast: Episode 17

Take 1 Security Podcast: Episode 17

Topics for this episode: Announcements * [ ] New desk, new mic setup News * [ ] SSL vuln spoofing issue, requires mitm * [ ] Sleepy puppy XSS Payload Management Framework * [ ] Troy Hunt on tec...

12 Juli 201525min

Take 1 Security Podcast: Episode 16

Take 1 Security Podcast: Episode 16

Topics for this episode: * [ ] Hacking Team Hacked, show which oppressive governments bought their software * [ ] No exploits for non-jailbroken iPhone * [ ] The FBI spent 775K on Hacking Team softw...

7 Juli 20156min

Take 1 Security Podcast: Episode 15

Take 1 Security Podcast: Episode 15

Topics for this episode: * iOS flaw * The Chinese hacking campaign against the US * Breach at Recorded future * Hacking cars through key fobs * NSA/GCHQ hacking of people through security software *...

29 Juni 201514min

Take 1 Security Podcast: Episode 14

Take 1 Security Podcast: Episode 14

Notes * The intro track is from one of my favorite EDM artists: Zomby. The song is ‘Orion’, and it’s from the ‘With Love’ album. Highly recommended if you like chill EDM. Become a Member: https://da...

15 Juni 201522min

Take 1 Security Podcast: Episode 13

Take 1 Security Podcast: Episode 13

Notes * The intro track is from one of my favorite EDM artists: Zomby. The song is ‘Orion’, and it’s from the ‘With Love’ album. Highly recommended if you like chill EDM. Become a Member: https://da...

12 Juni 201542min

Take 1 Security Podcast: Episode 12

Take 1 Security Podcast: Episode 12

Play Podcast START CONTENT * Singtel buys Trustwave * Snowden does interview with John Oliver * CheckPoint buys Lacoon * Everyone’s trying to do everything, which gives the big people a major adv...

8 Apr 201513min

Take 1 Security Podcast: Episode 11

Take 1 Security Podcast: Episode 11

Play Podcast START CONTENT * Twitch, a game streaming service owned by Amazon, was hacked last week * Passwords, emails, usernames, addresses, phone numbers, dates of birth * Amazon bought them l...

30 Mars 201516min

Populärt inom Teknik

uppgang-och-fall
elbilsveckan
market-makers
natets-morka-sida
rss-elektrikerpodden
skogsforum-podcast
rss-laddstationen-med-elbilen-i-sverige
rss-veckans-ai
rss-uppgang-och-fall
rss-technokratin
bilar-med-sladd
bli-saker-podden
developers-mer-an-bara-kod
hej-bruksbil
rss-digitala-influencer-podden
teknikveckan
ai-sweden-podcast
rss-fabriken-2
rss-snacka-om-ai
rss-powerboat-sverige-podcast