Unsupervised Learning19 Apr 2025

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(541)

Venture Capitalists Favor Risk-Takers: The Rise of Self-Made Billionaires and Tech Innovators

Venture capitalists aren't looking for nice founders; they want risk-takers. Nate Silver highlights that 70% of the billionaires on the 2023 Forbes 400 list are self-made, often coming from modest bac...

28 Sep 20245min

AI Comedians by 2026? The Future of Comedy and the Turing Test for Laughter

Comedians are increasingly using AI to help write jokes and brainstorm ideas, with mixed results. I think this is similar to the Turing Test in terms of the importance of AI progress. If AI can write ...

27 Sep 20244min

The Alarming Power of Deepfakes

Trump shared a fake image of Harris speaking at a Communist event. This one looks fairly fake, but 1) lots of people will still believe it’s real, and 2) current tech can already make more believable ...

26 Sep 20246min

UL NO. 451: Altman Says ASI in "Thousands of Days"

A new Fabric web app called FabricUI!, Many AI Eyes, PagerAttack Analysis, a new Ripgrep, and more... Subscribe to the newsletter at: https://danielmiessler.com/subscribe Join the UL community at:http...

26 Sep 202431min

Russia Is Paying Right Wing Influencers?

A whole bunch of right-wing influencers received millions from Russia in return for promoting pro-Russian talking points. Hilarious to me since their whole narrative is to be skeptical and discerning....

25 Sep 20247min

This Is The Future Career For Creators - Virtual Realities, Economies, and Meaning

The more I think about it, the more I think a major career for creators going forward will be building entire realities for people to live inside of. So think post-AG/SI and post UBI, and where games ...

24 Sep 20248min

My First Thoughts on New OpenAI Strawberry Model ( OpenAI o1-preview)

Here are my first thoughts after using OpenAI's New Strawberry Model for a couple of hours Subscribe to the newsletter at: https://danielmiessler.com/subscribe Join the UL community at:https://danielm...

19 Sep 202422min

UL NO. 450: Thoughts on o1-preview and the Path to AGI

80% Chinese Cranes, Drones vs. Abrahams, a RAG kickstart, a Canary-based Security Maturity Model, and more... Check out Wiz for a Free Could Security Scan:https://www.wiz.io/ul Subscribe to the newsle...

17 Sep 202424min