Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(541)

UL NO. 423: AI is Becoming Like Reading

UL NO. 423: AI is Becoming Like Reading

Google AI Espionage, My macOS UI, Cloudflare AI Firewall, Midnight Blizzard, and more… Subscribe to the newsletter at: https://danielmiessler.com/subscribe Join the UL community at:https://danielmiess...

12 Mars 202420min

UL NO. 422: To Survive AI, We Must Become Creators

UL NO. 422: To Survive AI, We Must Become Creators

Fabric Threat Models, An AI Worm, GitHub Auto-blocks, Long Covid IQ, and more… Subscribe to the newsletter at: https://danielmiessler.com/subscribe Join the UL community at:https://danielmiessler.com/...

6 Mars 202420min

A Conversation With Ismael Valenzuela About AI and Threat Intelligence

A Conversation With Ismael Valenzuela About AI and Threat Intelligence

In this sponsored standalone episode I speak with Ismael Valenzuela, VP of Threat Research and Intelligence at Blackberry Cylance. We discuss: Modern Threat Intelligence The shifting attention of att...

4 Mars 202445min

UL NO. 420: APTs using ChatGPT, Bugs Putin, The good side of AI jobs loss?, AI Monitoring Culture, AI patents, and more…

UL NO. 420: APTs using ChatGPT, Bugs Putin, The good side of AI jobs loss?, AI Monitoring Culture, AI patents, and more…

APTs using ChatGPT, Bugs Putin, The good side of AI jobs loss?, AI Monitoring Culture, AI patents, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy i...

25 Feb 202423min

UL NO. 419: Problem Quality, 0-Day Spyware, LOTL, Ollama + OpenAI

UL NO. 419: Problem Quality, 0-Day Spyware, LOTL, Ollama + OpenAI

Two new agent types, Ollama's new API structure, $7 Trillion for chips, American satisfaction, and more… Read the episode online here.Become a Member: https://danielmiessler.com/upgradeSee omnystudio....

12 Feb 202430min

UL NO. 418: DEFCON Moves, AnyCloudDesk, Ransomware Learnings, My Top AI Projects

UL NO. 418: DEFCON Moves, AnyCloudDesk, Ransomware Learnings, My Top AI Projects

My favorite 2 AI projects, US spending habits, and your security program is sh*t… 📢Sponsored by Kolide Kolide ensures that if a device isn't secure, it can't access your apps. Zero Trust auth for Ok...

8 Feb 202427min

UL NO. 417: NSA's Broker Buys, AI-Assisted Attacks, Companies Only Want Killers

UL NO. 417: NSA's Broker Buys, AI-Assisted Attacks, Companies Only Want Killers

Companies Demand AI, Breach Overload, More Tech Layoffs, Chip Investment, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

8 Feb 202423min

A Conversation with Shil Sircar from BlackBerry Data Science

A Conversation with Shil Sircar from BlackBerry Data Science

In this episode of Unsupervised Learning, we talked to Shil Sircar. Shil is the Senior VP of Engineering and Data Science at BlackBerry, and we talked about: - Machine Learning in Cybersecurity - The ...

29 Jan 202437min

Populärt inom Teknik

uppgang-och-fall
elbilsveckan
market-makers
rss-laddstationen-med-elbilen-i-sverige
rss-technokratin
natets-morka-sida
rss-elektrikerpodden
skogsforum-podcast
rss-uppgang-och-fall
bilar-med-sladd
rss-powerboat-sverige-podcast
developers-mer-an-bara-kod
bli-saker-podden
rss-it-sakerhetspodden
rss-digitala-influencer-podden
rss-veckans-ai
hej-bruksbil
rss-fabriken-2
rss-en-ai-till-kaffet
rss-snacka-om-ai