Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(538)

Take 1 Security Podcast: Episode 10

Take 1 Security Podcast: Episode 10

Play Podcast START CONTENT * There was another SQL Injection bug found in SEO by Yoast * It required admins to click a malicious link * Was patched quickly * It’s the plugins that make WordPress ...

16 Mars 201522min

Take 1 Security Podcast: Episode 9

Take 1 Security Podcast: Episode 9

START CONTENT * Sorry about the audio last week; wireless headsets don’t compare to the Yeti * The CIA is focusing on cyberespionage in its new management * Anthem is refusing an audit by the OIG of...

9 Mars 201512min

Take 1 Security Podcast: Episode 8

Take 1 Security Podcast: Episode 8

START CONTENT * New SSL attack called FREAK * Has to do with falling RSA back to a deprecated and weak level * Requires the client and server are both vulnerable * The solution is to patch * Many ...

3 Mars 201516min

Take 1 Security Podcast: Episode 7

Take 1 Security Podcast: Episode 7

START CONTENT * New stuxnet like piece of malware was discovered * Was found by Kaspersky * Has infected thousands of computers, mostly in Iran * The malware is the most advanced ever found * Can ...

24 Feb 20158min

Take 1 Security Podcast: Episode 6

Take 1 Security Podcast: Episode 6

START CONTENT * Ukrainian banks hacked for up to 1 Billion dollars * Evidently installed malware on bank admin machines using phishing * Not sure they have an FDIC * As if the Ukraine didn’t have ...

17 Feb 201512min

Take 1 Security Podcast: Episode 5

Take 1 Security Podcast: Episode 5

START CONTENT * Anthem, the second largest healthcare company, had a major breach * They lost around 80 million socials, addresses, emails, etc., which is roughly double the Target breach * There’...

8 Feb 20157min

Take 1 Security Podcast: Episode 4

Take 1 Security Podcast: Episode 4

START CONTENT * Ghost bug in PHP could affect millions of servers * Flaw is in glibc, which is extensively by all Linux distributions * Patch and reboot using yum or aptitude * The US Army Releas...

2 Feb 20157min

Take 1 Security Podcast: Episode 3

Take 1 Security Podcast: Episode 3

START CONTENT * There was an issue with the Marriott website that exposed reservations and payment information. It’s now been fixed * Police are now using a new radar to see into peoples’ homes with...

25 Jan 201510min

Populärt inom Teknik

uppgang-och-fall
elbilsveckan
market-makers
natets-morka-sida
rss-elektrikerpodden
skogsforum-podcast
rss-laddstationen-med-elbilen-i-sverige
rss-veckans-ai
rss-uppgang-och-fall
rss-technokratin
bilar-med-sladd
bli-saker-podden
developers-mer-an-bara-kod
hej-bruksbil
rss-digitala-influencer-podden
teknikveckan
ai-sweden-podcast
rss-fabriken-2
rss-snacka-om-ai
rss-powerboat-sverige-podcast