Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(541)

Unsupervised Learning: Episode 43

Unsupervised Learning: Episode 43

Subscribe to the Podcast via: iTunes | Android | RSS | Newsletter News Internet disinformation service for hire [ Link ] Rob Fuller (@mubix) has found a way to pull credentials from a locked machine u...

7 Sep 201642min

Unsupervised Learning: Episode 42

Unsupervised Learning: Episode 42

[ Subscribe to the Podcast: iTunes | Android | RSS ] InfoSec news and articles Dropbox hacked 68 million accounts Back in 2012 Malware infected all Eddie Bauer stores in U.S. and Canada All 350 stores...

1 Sep 20161h 4min

Unsupervised Learning: Episode 41

Unsupervised Learning: Episode 41

[ Subscribe to the Podcast: iTunes | Android | RSS ] InfoSec news and articles NSA hacking tools supposedly leaked back in 2013 Could have just been a jump box, which rival groups commonly attack from...

18 Aug 201634min

Unsupervised Learning: Episode 40

Unsupervised Learning: Episode 40

- LinkedIn breach from 2013 | 65.5 million emails and salted and hashed passwords - XSS in Wordpress plugin (JetPack) - DerbyCon is going to stream live this year | you can’t stream the networking, so...

31 Mai 201654min

Unsupervised Learning: Episode 39

Unsupervised Learning: Episode 39

[ Subscribe to the Podcast: iTunes | Android | RSS ] InfoSec news and articles BAE systems saying that SWIFT hack is linked to the Sony breach [ Link ] Kaspersky is saying ransomware is the #1 threat ...

14 Mai 201623min

Unsupervised Learning: Episode 38

Unsupervised Learning: Episode 38

[ Subscribe to the Podcast: iTunes | Android | RSS ] InfoSec news and articles Michigan lawmakers want life sentence for hacking cars | will that apply to changing the speed of your turn signal? SWIFT...

2 Mai 201645min

Unsupervised Learning: Episode 37

Unsupervised Learning: Episode 37

[ Subscribe to the Podcast: iTunes | Android | RSS ] InfoSec news Feds paid over 1M to get into San Bernardino iPhone Continued fallout from Panama papers 3.2 million servers vulnerable to JBoss attac...

25 Apr 201635min

Unsupervised Learning: Episode 36

Unsupervised Learning: Episode 36

[ Subscribe to the Podcast: iTunes | Android | RSS ] News [ ] Nothing useful found on Farook’s phone | http://www.theregister.co.uk/2016/04/14/nothing_useful_on_farook_iphone/?utm_source=dlvr.it&utm_m...

18 Apr 201620min

Populært innen Teknologi

lydartikler-fra-aftenposten
romkapsel
teknisk-sett
tomprat-med-gunnar-tjomlid
energi-og-klima
teknologi-og-mennesker
shifter
elektropodden
rss-heis
nasjonal-sikkerhetsmyndighet-nsm
pedagogisk-intelligens
rss-ai-forklart
smart-forklart
fornybaren
rss-for-alarmen-gar
rss-vi-leser-dommer-om-personvern
i-loopen
rss-metadama-data-management-in-the-nordics
rss-ki-praten
rss-alt-som-gar-pa-strom