Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

News & Analysis | No. 271

News & Analysis | No. 271

Hafnium Fallout and Response, Software Supply Chain Naming Attacks, SITA Airline Attack, REvil, China vs. India in Cyberspace, Russian Cybercrime Forum Hacks, Russians Underming American Vaccines, US Not Ready For AI Competition, CPU Side-channel Attacks, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

8 Mars 202129min

News & Analysis | No. 270

News & Analysis | No. 270

SolarWinds Malware Tool, SolarWinds Blaming the Intern, Amazon Whistleblowers, Google Linux Devs, NYC Black Mirror Dog, Portswigger Top 10, API Security Top 10, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

1 Mars 202124min

News & Analysis: No. 269

News & Analysis: No. 269

US charges North Korean hackers, Egregor users arrested, Let’s Encrypt Upgraded, Very Few Vulnerabilities Are Dangerous, North Korea Pursued COVID Vaccine Data, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

22 Feb 202143min

News & Analysis | No. 268

News & Analysis | No. 268

Florida water hack, ESET Reports 768% More UDP Attacks, 223 Vulns Being Used in Ransomware, Microsoft Will Report State Hack Attempts, Cops Using Copyright Weapons, TikTok Russian Battles, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

15 Feb 202127min

News & Analysis | No. 267

News & Analysis | No. 267

Supercookies, Mobile App Tracking, 80% PII, Moody's Cyber Rates, Facial Recognition California, Chinese Men Feminine, Google Bounty Payouts, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

8 Feb 202135min

News & Analysis | No. 266

News & Analysis | No. 266

China has 80% of US Adult PII, Chris DeRusha now US CISO, New Version of NAT Slipstreaming, Exposing.AI Looks For Your Face, Birdwatch Misinformation, Pentagon Vaccination Program, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

1 Feb 202122min

News & Analysis | No. 265

News & Analysis | No. 265

FireEye Solar Details, Cyberinsurace Supporting Crime, FBI Tracking Cell Pings, RDP DDoS Amplification, Palantir Stock, Fake Job Offers, DDoS Ransomware, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

25 Jan 202127min

What They Don’t Tell You About Being a Bounty Hunter or Content Creator

What They Don’t Tell You About Being a Bounty Hunter or Content Creator

How the dopamine hits of bugs and praise can become a trap.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

22 Jan 20215min

Populärt inom Teknik

uppgang-och-fall
market-makers
elbilsveckan
rss-badfluence
rss-racevecka
rss-laddstationen-med-elbilen-i-sverige
natets-morka-sida
rss-technokratin
skogsforum-podcast
rss-elektrikerpodden
hej-bruksbil
rss-uppgang-och-fall
bilar-med-sladd
garagehang
developers-mer-an-bara-kod
solcellskollens-podcast
rss-digitala-influencer-podden
rss-veckans-ai
har-vi-akt-till-mars-an
rss-snacka-om-ai