Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

News & Analysis | NO. 321

News & Analysis | NO. 321

The latest in Security News, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism… Web Version: https://danielmiessler.com/podcast/news-analysis-no-321/Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

9 Mars 202214min

Sponsored Conversation: Ev Kontsevoy from Teleport

Sponsored Conversation: Ev Kontsevoy from Teleport

In this sponsored conversation, I talk with Ev Kontsevoy of Teleport. In this series I have organic conversations with entrepreneurs as if having lunch with them and hearing about the product for the first time. They give their pitch, and I dig deeper with questions. Teleport, in my own words, is a way of rethinking how people access and use computing resources. It's a policy-based system that controls who can do what across your entire infrastructure using a central access plane. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

7 Mars 202240min

Andrew Ringlein's 5 Crypto Accelerators in Gaming and Business

Andrew Ringlein's 5 Crypto Accelerators in Gaming and Business

This standalone episode is a conversation with my friend Andrew Ringlein on the topic of how crypto is best thought of as a set of accelerators for business, with gaming being the initial flagship. We talk about Andrew's 5 principles that accelerate gaming companies adopting crypto first, and then look at how those same concepts will soon be adopted by all types of businesses. We also discuss legitimate doubts around crypto in general, and discuss why we think the concepts are more durable (and inevitable) than the technology.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

6 Mars 20221h 5min

News & Analysis | NO. 320

News & Analysis | NO. 320

The latest in Security News, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism… Web Version: https://danielmiessler.com/podcast/news-analysis-no-320/Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

28 Feb 202218min

News & Analysis | NO. 319

News & Analysis | NO. 319

The latest in Security News, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism… Web Version: https://danielmiessler.com/podcast/news-analysis-no-319/Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

22 Feb 20228min

News & Analysis | NO. 318

News & Analysis | NO. 318

The latest in Security News, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism… Web Version: https://danielmiessler.com/podcast/news-analysis-no-318/Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

14 Feb 202211min

News & Analysis | NO. 317

News & Analysis | NO. 317

The latest in Security News, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism… Web Version: https://danielmiessler.com/podcast/news-analysis-no-317/Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

7 Feb 202214min

News & Analysis | NO. 316

News & Analysis | NO. 316

The latest in Security News, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism… Web Version: https://danielmiessler.com/podcast/news-analysis-no-316/Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

31 Jan 202212min

Populärt inom Teknik

uppgang-och-fall
market-makers
elbilsveckan
rss-badfluence
rss-racevecka
rss-laddstationen-med-elbilen-i-sverige
natets-morka-sida
rss-technokratin
skogsforum-podcast
rss-elektrikerpodden
hej-bruksbil
rss-uppgang-och-fall
bilar-med-sladd
garagehang
developers-mer-an-bara-kod
solcellskollens-podcast
rss-digitala-influencer-podden
rss-veckans-ai
har-vi-akt-till-mars-an
rss-snacka-om-ai