Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(541)

A Political Discussion with Jeremiah Grossman

A Political Discussion with Jeremiah Grossman

Today's standalone episode of Unsupervised Learning is a political conversation with Jeremiah Grossman, who many of you will know as the founder of Whitehat Security, current CEO of BitDiscovery, Juji...

14 Huhti 20191h 45min

Unsupervised Learning: No. 173

Unsupervised Learning: No. 173

Amazon has many thousands of people doing quality control on Alexa, meaning that they're listening to incoming audio captured on Echo devices. This shouldn't be surprising. The question is how they're...

14 Huhti 201924min

Unsupervised Learning: No. 171

Unsupervised Learning: No. 171

Mastercard is looking to create a Digital ID service that can bind your digital presence to your mobile device, which will be able to verify you to various services. Palantir has won an $800 million c...

1 Huhti 201919min

Unsupervised Learning: No. 169

Unsupervised Learning: No. 169

Multiple governments have now blacklisted Huawei, which Huawei seems very confused by. The best explanation I've heard so far about why this move makes sense for western countries came from Rob Joyce ...

18 Maalis 201918min

Unsupervised Learning: No. 167

Unsupervised Learning: No. 167

This is a description of cyberwar that sounds quite realistic to me, and it's based around the thousand-cuts idea. Ring Doorbells have a vulnerability that allows one to capture clear-text videos and ...

3 Maalis 201934min

Unsupervised Learning: No. 165

Unsupervised Learning: No. 165

OpenAI text spoofing, Twitter DMs, Chinese tracking database, Ponemon Cyber Risk Score, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become...

21 Helmi 201923min

Unsupervised Learning: No. 163

Unsupervised Learning: No. 163

My takeaways from ENIGMA 2019—one of my two favorite conferences in the world. The US has charged Huawei with stealing trade secrets, money laundering, and fraud. This escalates the already tense situ...

4 Helmi 201916min

An Overview of the OWASP IoT Top 10 for 2018

An Overview of the OWASP IoT Top 10 for 2018

We just released the 2018 version of the OWASP Internet of Things Top 10, and in this episode I talk you through the list and give the philosophy, methodology, and next steps for the project.Become a ...

7 Tammi 201914min