Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(541)

T1SP: Episode 27

T1SP: Episode 27

[ Subscribe to the Podcast: iTunes | Android | RSS ] News [ ] Heavy surveillance around the Super Bowl [ ] A new BlackEnergy spear phishing campaign is targeting more Ukrainian companies [ ] Magneto, ...

2 Helmi 201622min

T1SP: Episode 26

T1SP: Episode 26

[ Subscribe to the Podcast: iTunes | Android | RSS ] News [ ] Backdoor found in AMX devices that run corporate and government conference rooms [ ] Autopwn every Android device on your network using Be...

25 Tammi 201649min

T1SP: Episode 25

T1SP: Episode 25

[ Subscribe to the Podcast: iTunes | Android | RSS ] News * [ ] TrendMicro node.js server listening on localhost can execute commands; exposed to the internet * [ ] SSH backdoor found in Fortinet f...

19 Tammi 201626min

T1SP: Episode 24

T1SP: Episode 24

[ Subscribe to the Podcast: iTunes | Android | RSS ] News * [ ] Norse lays of 20 people; not clear what percentage that is; threat intel not going so well? * [ ] OPM declines to release details on ...

11 Tammi 201628min

T1SP: Episode 23

T1SP: Episode 23

[ Subscribe to the Podcast: iTunes | Android | RSS ] News * [ ] Juniper backdoor; could have been found with diff; signs point to NSA * [ ] RCE on FireEye appliances * [ ] Hyatt got hacked; malware...

4 Tammi 201655min

Security and Obscurity

Security and Obscurity

[ Subscribe to the Podcast: iTunes | Android | RSS ] In this episode I explore the topic of Security and Obscurity by reading my popular essay on the topic. Notes * The intro track is from one of ...

13 Joulu 201510min

T1SP: Episode 21

T1SP: Episode 21

[ Subscribe to the Podcast: iTunes | Android | RSS ] Topics for this episode: News * [ ] Stringing Shodan to exploitation * [ ] Why you need to check HaveIBeenPwned * [ ] Another DELL root cert ...

13 Joulu 201518min

Take 1 Security Podcast: Episode 20

Take 1 Security Podcast: Episode 20

Topics for this episode: News and analysis * [ ] Ads using high frequency sound to communicate across devices. The ultrasonic pitches are embedded into TV commercials or are played when a user enco...

7 Joulu 201523min