Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(541)

Operation Fortify: A US Ransomware Plan

Operation Fortify: A US Ransomware Plan

A simple yet comprehensive plan for how the United States could address its devastating ransomware problem.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy in...

15 Okt 20209min

The Relationship Between Hardship, Struggle, and Meaning

The Relationship Between Hardship, Struggle, and Meaning

My essay on how struggle could be necessary for meaning, and how this could be the underlying cause of much of America's mental health problem.Become a Member: https://danielmiessler.com/upgradeSee om...

15 Okt 202013min

News & Analysis | No. 250

News & Analysis | No. 250

CrowdSec, Nudge, Trickbot Trickery, CISA Ransomware Guide, Twitter and Facebook anti-Disinformation, QAnon Takedowns, Putin Turning on Trump, Azure Vulnerabilities, PC shipments up, Virtual Sales Call...

12 Okt 202026min

News & Analysis | No. 249

News & Analysis | No. 249

Operation Fortify, Cyber Pearl Harbor, Github Code Scanning, E-6B Flights, Blackbaud++, Grinder Password Reset, Cloudflare API Security, QNAP Drama, Technology News, Human News, Ideas Trends & Analysi...

5 Okt 202021min

News & Analysis | No. 248

News & Analysis | No. 248

Everyday Threat Modeling, Why I Like TikTok So Much, Windows XP Leak, SSH 8.4, Renée DiResta's Latest, Student Visas Changes, Cisco IOS Vulns, QAonon Gamification, Technology News, Human News, Ideas T...

28 Sep 202016min

Why Creators Should Move to Direct Support Monetization

Why Creators Should Move to Direct Support Monetization

My essay about why I think creators—especially in InfoSec—should be setting up their own domains and moving to a direct model for monetization.Become a Member: https://danielmiessler.com/upgradeSee om...

24 Sep 202011min

No, Changing Your SSH Port Isn't Security by Obscurity

No, Changing Your SSH Port Isn't Security by Obscurity

My latest essay on the timeless debate on SSH ports and Security by Obscurity. I talk about why changing your port is not usually obscurity, and give what I believe to be an airtight method of how you...

23 Sep 202013min

News & Analysis | No. 247

News & Analysis | No. 247

SSH Port Obscurity, The TikTok Deal, Ransomware Death, Chinese Espionage CRM, Amazon Bribery, Instant Domain Admin, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, an...

21 Sep 202020min

Populært innen Teknologi

lydartikler-fra-aftenposten
romkapsel
teknisk-sett
energi-og-klima
teknologi-og-mennesker
tomprat-med-gunnar-tjomlid
shifter
nasjonal-sikkerhetsmyndighet-nsm
elektropodden
rss-heis
hans-petter-og-co
rss-ai-forklart
fornybaren
rss-for-alarmen-gar
smart-forklart
pedagogisk-intelligens
rss-vi-leser-dommer-om-personvern
rss-alt-vi-kan
rss-trippel-bunnlinje
kunstig-intelligens-med-morten-goodwin