Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(541)

Unsupervised Learning: No. 159

Unsupervised Learning: No. 159

German politicians hacked, NSA's new RE tool, Weather Channel tracking, sick TSA agents, Facebook dust tracking, Technology News, Human News, Ideas, Discovery, Recommendations, and the weekly Aphorism...

7 Jan 201928min

Unsupervised Learning: No. 155

Unsupervised Learning: No. 155

Google+ breach, Android flaws, China's long game against the US, Australia's encryption blunder, NYPD drones, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener fo...

11 Des 201816min

Unsupervised Learning: No. 153

Unsupervised Learning: No. 153

Ukraine malware, China's Black Mirror, DARPA's Mosaic, FBI trolling, Silicon Valley jobs, Technology News, Human News, Ideas, Trends, & Analysis, Discovery, Notes, Recommendations, and the weekly Apho...

26 Nov 201814min

Unsupervised Learning: No. 147

Unsupervised Learning: No. 147

OWASP IoT Top 10 Draft, Facebook compromise, Fornite cheating, Pentagon weapons, spam calls, technology news, human news, ideas, discovery, recommendation, and the weekly aphorism…Become a Member: htt...

25 Okt 201812min

Unsupervised Learning: No. 141

Unsupervised Learning: No. 141

AMA Summer 2018, Security News, Technology News, Human News, Ideas, Discovery, and the weekly Recommendation and Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener...

4 Sep 201858min

Unsupervised Learning: No. 139

Unsupervised Learning: No. 139

TLS 1.3, BurpSuite Improvements, Google Ad Database, Russian Attack Sattelites, Amazon Theaters, Google AI Cooling, Wheat Genome, Giant Magellan Telescope, Carb Ratios, Leg Exercise and Cognitive Heal...

20 Aug 201817min

Unsupervised Learning: No. 135

Unsupervised Learning: No. 135

GRU ATT&CK analysis, Assange to the UK, Cisco backdoors, DARPA electronics, faces from genomes, viz.ai, open plans are bad, Best Buy consulting, ultrasound vs. dementia, 4 day work weeks, ideas, recom...

22 Jul 201827min

Unsupervised Learning: No. 133

Unsupervised Learning: No. 133

Twitter deleting accounts, deepfakes, location leaks, Rekognition, bio databases, juggalo makeup, iOS 12 security, Siri upgrades, and more…Become a Member: https://danielmiessler.com/upgradeSee omnyst...

11 Jul 201834min

Populært innen Teknologi

lydartikler-fra-aftenposten
romkapsel
teknisk-sett
tomprat-med-gunnar-tjomlid
energi-og-klima
teknologi-og-mennesker
shifter
elektropodden
rss-heis
nasjonal-sikkerhetsmyndighet-nsm
rss-ai-forklart
smart-forklart
fornybaren
pedagogisk-intelligens
rss-vi-leser-dommer-om-personvern
hans-petter-og-co
rss-for-alarmen-gar
rss-alt-vi-kan
rss-a-entelios-poden
rss-plateprat