Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

UL NO. 408: OpenAI Coup Theory, SEC vs. SolarWinds Analysis, Deepfake D&D Summaries

UL NO. 408: OpenAI Coup Theory, SEC vs. SolarWinds Analysis, Deepfake D&D Summaries

My Theory Of What Happened At OpenAI, A New Ransomware Tactic, Analysis Of What The SEC Case Will Do To Cybersecurity, Live David Attenborough Narration, And More… Read the episode here. 📢Sponsored by: Panoptica.app - Simplify container deployment, monitoring, and securityBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

27 Nov 202335min

UL NO. 407: OpenAI Prompt Injection, Leaky GPTs, AGI by 2028, Huberman Routine AI

UL NO. 407: OpenAI Prompt Injection, Leaky GPTs, AGI by 2028, Huberman Routine AI

Extremist groups using AI for propaganda, NYC restaurant bots, Wegovy and Cannabis studies, my favorite collections of GPTs… 📢Sponsored by Moonlock — cybersecurity wing of MacPaw. Developers of Moonlock Engine, the antimalware tech in CleanMyMac X. 📢Sponsored by Automox - AI-powered modern IT automation is here. Learn more at automox.com. Read the episode here.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

14 Nov 202340min

OpenAI's New Releases Are a Watershed Moment for Human Creativity—and Prompt Injection

OpenAI's New Releases Are a Watershed Moment for Human Creativity—and Prompt Injection

Making it trivial to create and share AI Agents that connect to real-word APIs will have a drastic impact on Information Security.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

13 Nov 20233min

Why I'm Not Getting the New Humane AI Pin

Why I'm Not Getting the New Humane AI Pin

Why I should be super excited by the Humane AI pin, but I'm not.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

13 Nov 20233min

UL NO. 406: OpenAI Launches Custom AIs, Okta's New Breach, EFF's Browser Privacy Checker

UL NO. 406: OpenAI Launches Custom AIs, Okta's New Breach, EFF's Browser Privacy Checker

DOJ and Pentagon emails hacked by Russians, OpenAI's DevDay announcements, when DeepMind thinks we'll see AGI, and more… 📢Sponsored by: Panoptica.app - Simplify container deployment, monitoring, and securityBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

10 Nov 202328min

UL NO. 404: ServiceNow Widget Flaws, North Korean Infiltrators, and the New Top-performing Prompt String…

UL NO. 404: ServiceNow Widget Flaws, North Korean Infiltrators, and the New Top-performing Prompt String…

In this edition we dive into North Korean IT Infiltration, the top performing prompt technique, Google's traffic optimization, American sick day increases, ServiceNow's Widget problem, the US murder rates, and more Read online here: https://danielmiessler.com/p/ul-no-404-servicenow-widget-flaws-north-korean-infiltrators-new-topperforming-prompt-stringBecome a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

26 Okt 202326min

UL NO. 403: Signal Investigates Rumored Zero-Day Bug, AI Predicts New COVID-19 Strains, Dwindling US-China Scientific Collaboration...

UL NO. 403: Signal Investigates Rumored Zero-Day Bug, AI Predicts New COVID-19 Strains, Dwindling US-China Scientific Collaboration...

In This Edition We Look Into Signal's Investigation Into A Rumored Zero-Day Bug, How Harvard And Oxford Researchers Are Using AI To Predict New COVID-19 Strains, The Dwindling Collaboration Between American And Chinese Scientists, And The European Commission's CSAM Detection Bypass View this week's podcast online at https://danielmiessler.com/p/403Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

16 Okt 202328min

UL NO. 402: Israeli Footage & Analysis, WSFTP + MOVEIT, AI Explainability, Andreessen vs. Perell on Writing, and more…

UL NO. 402: Israeli Footage & Analysis, WSFTP + MOVEIT, AI Explainability, Andreessen vs. Perell on Writing, and more…

Israel analysis, a genetic data breach, active exploits against critical vulnerabilities, and a brilliant conversation between two writers about creativity 📢 Sponsored by Kolide: Concerned about data breaches and hacks? 🔒 Discover Kolide, the device trust solution that secures your company's devices and credentials, making phishing attempts useless to hackers. See it in action at www.kolide.com/unsupervisedlearning View today's episode online here: https://danielmiessler.com/p/402Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

11 Okt 202326min

Populärt inom Teknik

uppgang-och-fall
rss-racevecka
elbilsveckan
bilar-med-sladd
market-makers
skogsforum-podcast
rss-laddstationen-med-elbilen-i-sverige
natets-morka-sida
rss-technokratin
rss-elektrikerpodden
mediepodden
developers-mer-an-bara-kod
hej-bruksbil
ai-sweden-podcast
solcellskollens-podcast
rss-uppgang-och-fall
rss-veckans-ai
bli-saker-podden
bosse-bildoktorn-och-hasse-p
rss-it-sakerhetspodden