Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Episoder(532)

Unsupervised Learning: No. 82

Unsupervised Learning: No. 82

Live from London, Gamestop hacked, PowerPoint malware, Chinese Apple Hack, XSS, WWDC summary, FDA approves cancer drug, heroin $51B, ideas, discovery, recommendation, aphorism, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

12 Jun 201719min

Unsupervised Learning: No. 81

Unsupervised Learning: No. 81

OneLogin, Extortion, Coinbase, Pandemic, Booz, Mobile Apps, Electricity, AI voices, Sheets, Walmart, Karoshi, APIs, discovery, aphorisms, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

4 Jun 201727min

Unsupervised Learning: No 79

Unsupervised Learning: No 79

WannaCry, Intel leaks, DocuSign phishing, cockpit codes, Delta facial recognition, China vs. CIA, WordPress bug bounty, Marines and drones, HPE R&D, Watts, graduates only making 40K, China's DNA project, honeymoons vs. rings, Sherrif Eli, retirees hoarding money, boo restaurant kiosks, investing in employees, discovery, aphorisms, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

23 Mai 201732min

Unsupervised Learning: No. 78

Unsupervised Learning: No. 78

The WannaCry ransomware worm, the president's EO, Macron hacking, HP backdoors, laptop bans, Amazon releases, Chinese online commerce, CRISPR, Germany and renewable energy, beetles, dental health as social indicator, Reading superpowers, Net Neutrality, serverless, deep learning black box, The Three Body Problem, you can now support the site, The Mechanical Universe, TrueCaller, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

14 Mai 201730min

Unsupervised Learning: No.76

Unsupervised Learning: No.76

Verizon's DBIR report, Chipotle (again), USAF bounty, NSA surveillance hampered, Android hacks, Taser and computer vision, Google fights fake news, Exercise types & mental skills, Perfect pitch recording, Lifecasting, RF X-Ray, discovered links, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

2 Mai 201717min

Unsupervised Learning: No. 75

Unsupervised Learning: No. 75

DoublePulsar in the wild, vigilante IoT worms, Bose listening headphones, PoS hacking sentence, Google ad blocking, best anti-aging exercises, unqualified Indian engineers, , discovered links, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

23 Apr 201734min

Unsupervised Learning: No 74

Unsupervised Learning: No 74

Shadow Brokers, fingerprinting Netflix traffic, Magneto vuln, Juniper advisories, Amazon speaker tech, Facebook's 100Gbit optical switches, Google Hire, Minecraft currency, a solar-powered water harvester, OWASP Top 10 draft comments, remote SSH, EC2 and NAT firewalls, deep learning is a black box, discovered links, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

17 Apr 201752min

Unsupervised Learning: No 73

Unsupervised Learning: No 73

Word 0-day, BrickerBot, iOS GIF, Russian arrested, Tizen, OilRig, APT10 MSPs, Dallas sirens, ATM drilling, Watson golf, Uber Italy, AI memory, links, projects, and more…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

10 Apr 20171h 16min

Populært innen Teknologi

romkapsel
rss-avskiltet
teknisk-sett
tomprat-med-gunnar-tjomlid
energi-og-klima
rss-impressions-2
shifter
nasjonal-sikkerhetsmyndighet-nsm
elektropodden
fornybaren
rss-alt-vi-kan
rss-alt-som-gar-pa-strom
smart-forklart
rss-snakk-om-sikkerhet
teknologi-og-mennesker
kunstig-intelligens-med-morten-goodwin
rss-bouvet-bobler
i-loopen
pedagogisk-intelligens
rss-digitaliseringspadden