Unsupervised Learning19 Apr 2025

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(541)

Unsupervised Learning: No. 192

Ring has already partnered with over 400 police departments. As you know, I'm torn on this kind of tech. Neighborhood watch can be a good thing, and it can also be a bad thing. Technology tends to mag...

2 Sep 201935min

Unsupervised Learning: No. 191

Protestors in Hong Kong are physically attacking and destroying facial recognition cameras. MorePalo Alto says 7 out of 10 new domain registrations (NDRs) are either malicious or not safe for work, an...

26 Aug 201925min

The Difference Between Data, Information, and Intelligence

The terms intelligence, information, and data are thrown around pretty loosely in most tech circles, and this inevitably leads to people confusing and/or conflating them. What follows is a simple expl...

19 Aug 20195min

Unsupervised Learning: No. 190

There are some seriously nasty Windows RDP bugs out there. If you have RDP facing the internet, make sure you're patched. And try to get to VPN as soon as possible. MoreA huge survey of firmware secur...

19 Aug 201922min

Unsupervised Learning: No. 189

Ring is developing two-way relationships with hundreds of police departments in the US. This allows Ring users to be alerted to crime in their area via 911 data, and police departments to pull video f...

13 Aug 20198min

Unsupervised Learning: No. 188

Marcus Hutchins got off with time-served, and people have feelings. The range basically goes from 'he did nothing wrong', to, 'he should rot in prison'. In my mind this outcome was close to perfect. R...

29 Juli 201919min

Humans Are Genebots

Unpacking the evolution-granted bliss of prep schools and elite institutions, and why they resonate so much with us.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for p...

26 Juli 20197min

Machine Learning Doesn’t Introduce Unfairness—It Reveals It

The difference between unfairness and bias in machine learning.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

25 Juli 20198min