Unsupervised Learning19 Huhti 2025

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(541)

UL NO. 436: Thoughts on the Future of AI & Societal Stability

When SuperIntelligence? Apple's WWDC updates, new Fabric pattern, GPT-4 Hacking Paper, China/Russia Using OpenAI for Misinformation, and more… ➡ Check out Kolide:kolide.com/unsupervisedlearning Subscr...

14 Kesä 202453min

A Conversation with Abhishek Agrawal from Material Security

In this conversation, I speak with Abhishek Agrawal, co-founder and CEO of Material Security. We talk about: - Material's Security innovative approach to email security by not just preventing unauthor...

7 Kesä 202454min

UL NO. 435: Making New Things is Post-AI Safety

Jason Haddix's AI Course, Microsoft Recall analysis, exercise erasing trauma, AI and the jobs problem… ➡ Check out Vanta and get $1000 off:vanta.com/unsupervised Subscribe to the newsletter at: https:...

7 Kesä 202424min

UL NO. 434: Can You Articulate Yourself in 50 Words?

NetworkChuck's Fabric Video, Algorithms Replace Degrees, AI Transparency, New Grad Difficulty, Windows Goes Full AI, and more… ➡ Check out the Autonomous IT Podcast:https://community.automox.com/auton...

1 Kesä 202427min

UL NO. 433: China's Flawed Strategy

A new book, A new Fabric pattern, Autonomous fighter jets, Friend trips, and more… ➡ Check out Vanta and get $1000 off:vanta.com/unsupervised Subscribe to the newsletter at: https://danielmiessler.com...

29 Touko 202410min

A Conversation with Mike Privette from Return on Security

In this conversation, I speak with Mike Privette. Mike is the CISO and Cybersecurity Economist at Return on Security. We discuss:- The economic impact of COVID-19, the shift from prioritizing growth t...

24 Touko 202446min

UL NO. 432: Can You Summarize Your Work in a Sentence?

Thoughts on GPT-4o, Dell's API Hack, Russian Campus Campaigns, Google's Pretend Work, and more… ➡ Check out Vanta and get $1000 off:vanta.com/unsupervised Subscribe to the newsletter at: https://danie...

24 Touko 202427min

A Conversation on Maritime Security with BlackBerry Threat Intelligence

In this sponsored conversation, I speak with Corey Ranslem, CEO of Dryad—and the resident expert on Maritime Attacks—and Ismael Valenzuela, VP of Threat Intelligence and Research at Blackberry. We tal...

16 Touko 202440min

Kaikki yhdessä sovelluksessa

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi yhdessä paikassa.

Sinulle valikoitua sisältöä

Podme-sovelluksessa kokoat suosikkisi helposti omaan kirjastoosi. Saat meiltä myös kuuntelusuosituksia!

Jatka kuuntelua koska tahansa

Voit jatkaa siitä mihin jäit, myös offline-tilassa.

Tarinat ja äänet, joita rakastat kuunnella

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi

Lue lisää