Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(541)

A Conversation With Leif Dreizler About Security Engineering at Segment

A Conversation With Leif Dreizler About Security Engineering at Segment

So today I’m talking to Leif Dreizler. Leif is a buddy of mine who also works in San Francisco. He’s a developer at a company called Segment, and over the last year or so he’s been telling me about al...

2 Apr 202054min

Unsupervised Learning: No. 222

Unsupervised Learning: No. 222

Who's hiring, freezing, and laying off, models predict 100-200K US deaths, April distancing, Adversarial Capital, Booz Russia, Google State Phishes, Worker Monitoring, Technology News, Human News, Ide...

30 Mar 202034min

Unsupervised Learning: No. 221

Unsupervised Learning: No. 221

Health-justified Video Surveillance, FDA Emergency Approval of a C19 Test, Israel Mobile Monitoring, Amazon Essentials, Pandemic Drone Monitoring, Retasking Factories, Rich People Ventilators, Technol...

24 Mar 202026min

Unsupervised Learning: No. 220

Unsupervised Learning: No. 220

Virus updates, Github gets NPM, New Stimulus, Amazon Hiring 100K, Saltwater Nozzles, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a ...

17 Mar 202020min

Unsupervised Learning: No. 219

Unsupervised Learning: No. 219

Coronavirus Update, Nation-state Exchange Hacking, FuzzBench, New Artillery, Germ Catapults, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…B...

9 Mar 202013min

Unsupervised Learning: No. 218

Unsupervised Learning: No. 218

SARS-CoV-2 update, China's health tracking, Firefox DNS over HTTPS, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://d...

2 Mar 202014min

Unsupervised Learning: No. 217

Unsupervised Learning: No. 217

MGM breach, DDoS and Ransomware on the Rise, Twitter v. Bloomberg, Tesla Tape, Russia Pro Trump & Pro Bernie, Tapping Cables, Insider Concern, Technology News, Human News, Ideas Trends & Analysis, Dis...

24 Feb 202019min

Unsupervised Learning: No. 216

Unsupervised Learning: No. 216

Adsense Extortion, OT Ransomware Attack, Ring 2FA, Smart Speaker Jamming Bracelet, DARPA's Flying Gun, Lots of Advisories, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendati...

20 Feb 202013min

Populært innen Teknologi

lydartikler-fra-aftenposten
romkapsel
teknisk-sett
tomprat-med-gunnar-tjomlid
energi-og-klima
teknologi-og-mennesker
shifter
nasjonal-sikkerhetsmyndighet-nsm
elektropodden
rss-heis
rss-for-alarmen-gar
rss-ai-forklart
smart-forklart
hans-petter-og-co
fornybaren
pedagogisk-intelligens
rss-vi-leser-dommer-om-personvern
rss-alt-vi-kan
rss-a-entelios-poden
rss-trippel-bunnlinje