Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

Unsupervised Learning: No. 228

Unsupervised Learning: No. 228

Thunderbolt Attack, Celebrity Ransomware, ClearView Government, Blackhat DEFCON Virtual, War Thunder, 5G Bio Attacks, PC Game Cheating, Zoom Keybase, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

12 Maj 202016min

Unsupervised Learning: No. 227

Unsupervised Learning: No. 227

VICE vs. Chinese Surveillance, Indian Contact Tracing, NHS + GCHQ, Banjo Racism, Singapore Requires Check-ins, Bruce on Contact Tracing, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

4 Maj 202020min

Unsupervised Learning: No. 226

Unsupervised Learning: No. 226

Bay Area Lockdown Til May, The Swedish Approach, California Autopsies, Zoom Security Updates, Palantir Contacts, NSA Web Vulns, GreyNoise Services, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

28 Apr 202020min

A Conversation with Renée DiResta: Disinformation and Conspiracy Propagation

A Conversation with Renée DiResta: Disinformation and Conspiracy Propagation

In this episode, Daniel speaks with Renée DiResta about her work tracking narratives online. They discuss: The different strains of false information Her work at the Stanford Internet Observatory How the same narrative can be used by multiple sides The origin of the Bill Gates conspiracies Mapping campaigns to actor strategies What she recommends others do who are interested in her field Other topics around disinformation, conspiracy, and narrative tracking Renée DiResta is the technical research manager at Stanford Internet Observatory, a cross-disciplinary program of research, teaching and policy engagement for the study of abuse in current information technologies. Renee investigates the spread of malicious narratives across social networks, and assists policymakers in devising responses to the problem. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

22 Apr 20201h 6min

Unsupervised Learning: No. 225

Unsupervised Learning: No. 225

Flu Simulations, Amazon Thermal Cameras, Facebook Bad Info Tracing, 5G Gates Conspiracies, Google Slows Hiring, Amazon Hires More, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

21 Apr 202014min

Unsupervised Learning: No. 224

Unsupervised Learning: No. 224

Biogen Superspreaders, African Locusts, Game of Life, Meat Troubles, 5G Conspiracies, Japan Getting Out of China, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

14 Apr 202020min

Unsupervised Learning: No. 223

Unsupervised Learning: No. 223

Coronavirus unemployment rate, 2 million guns, UK 5G attacks, German Antibodies, Zoom Drama, New Cloudflare Servers, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

7 Apr 202015min

A Conversation With Leif Dreizler About Security Engineering at Segment

A Conversation With Leif Dreizler About Security Engineering at Segment

So today I’m talking to Leif Dreizler. Leif is a buddy of mine who also works in San Francisco. He’s a developer at a company called Segment, and over the last year or so he’s been telling me about all kinds of cool stuff he’s been working on, how his team is set up, and how they see security teams being built in the future. So we’re going to cover those topics and more in a conversation that ranges from security engineering strategy to solving specific problems through custom tooling.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

2 Apr 202054min

Populärt inom Teknik

uppgang-och-fall
rss-racevecka
rss-badfluence
market-makers
elbilsveckan
bilar-med-sladd
rss-laddstationen-med-elbilen-i-sverige
natets-morka-sida
rss-technokratin
garagehang
rss-veckans-ai
solcellskollens-podcast
skogsforum-podcast
hej-bruksbil
rss-uppgang-och-fall
rss-elektrikerpodden
teknikveckan
bosse-bildoktorn-och-hasse-p
har-vi-akt-till-mars-an
rss-snacka-om-ai