Unsupervised Learning19 Apr 2025

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(541)

UL NO. 436: Thoughts on the Future of AI & Societal Stability

When SuperIntelligence? Apple's WWDC updates, new Fabric pattern, GPT-4 Hacking Paper, China/Russia Using OpenAI for Misinformation, and more… ➡ Check out Kolide:kolide.com/unsupervisedlearning Subscr...

14 Juni 202453min

A Conversation with Abhishek Agrawal from Material Security

In this conversation, I speak with Abhishek Agrawal, co-founder and CEO of Material Security. We talk about: - Material's Security innovative approach to email security by not just preventing unauthor...

7 Juni 202454min

UL NO. 435: Making New Things is Post-AI Safety

Jason Haddix's AI Course, Microsoft Recall analysis, exercise erasing trauma, AI and the jobs problem… ➡ Check out Vanta and get $1000 off:vanta.com/unsupervised Subscribe to the newsletter at: https:...

7 Juni 202424min

UL NO. 434: Can You Articulate Yourself in 50 Words?

NetworkChuck's Fabric Video, Algorithms Replace Degrees, AI Transparency, New Grad Difficulty, Windows Goes Full AI, and more… ➡ Check out the Autonomous IT Podcast:https://community.automox.com/auton...

1 Juni 202427min

UL NO. 433: China's Flawed Strategy

A new book, A new Fabric pattern, Autonomous fighter jets, Friend trips, and more… ➡ Check out Vanta and get $1000 off:vanta.com/unsupervised Subscribe to the newsletter at: https://danielmiessler.com...

29 Maj 202410min

A Conversation with Mike Privette from Return on Security

In this conversation, I speak with Mike Privette. Mike is the CISO and Cybersecurity Economist at Return on Security. We discuss:- The economic impact of COVID-19, the shift from prioritizing growth t...

24 Maj 202446min

UL NO. 432: Can You Summarize Your Work in a Sentence?

Thoughts on GPT-4o, Dell's API Hack, Russian Campus Campaigns, Google's Pretend Work, and more… ➡ Check out Vanta and get $1000 off:vanta.com/unsupervised Subscribe to the newsletter at: https://danie...

24 Maj 202427min

A Conversation on Maritime Security with BlackBerry Threat Intelligence

In this sponsored conversation, I speak with Corey Ranslem, CEO of Dryad—and the resident expert on Maritime Attacks—and Ismael Valenzuela, VP of Threat Intelligence and Research at Blackberry. We tal...

16 Maj 202440min