Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(541)

Unsupervised Learning: No 73

Unsupervised Learning: No 73

Word 0-day, BrickerBot, iOS GIF, Russian arrested, Tizen, OilRig, APT10 MSPs, Dallas sirens, ATM drilling, Watson golf, Uber Italy, AI memory, links, projects, and more…Become a Member: https://daniel...

10 Apr 20171h 16min

Unsupervised Learning: No. 72

Unsupervised Learning: No. 72

Apple fixed tons of bugs, hacking smart TVs over DVB-T, gift card bots, handgun AIs, Uber manipulations, AI vs. jobs, how to read more, cloud secret management, OPSEC and phishing, links, projects, an...

3 Apr 20171h 3min

Unsupervised Learning: No. 71

Unsupervised Learning: No. 71

Half of Android devices haven't been patched in over a year, Tavisclosure, NEST camera flaws, senate vs. privacy, electronics ban, bad Let's Encrypt certs, Moodle SQLi, infosec venture capital drying ...

26 Mar 201742min

Unsupervised Learning: No. 70

Unsupervised Learning: No. 70

Russians at it again, Microsoft and Adobe updates, PoS breaches, US-CERT throws TLS shade, epilepsy tweet stalking, Tesla's billion, lip-reading AI, autonomous BMWs, Fiber Lasers, taxing robots, Green...

20 Mar 201724min

Unsupervised Learning: No. 69

Unsupervised Learning: No. 69

The Vault7 CIA dump, Russian shenanigans, Dahua, Verifone, mandatory genetic testing, Wordpress, atomic storage, Google Kaggles, presenting at HouSecCon, fasting research, data wars, chaos, voice inte...

13 Mar 201727min

Unsupervised Learning: No. 68

Unsupervised Learning: No. 68

Amazon's S3 outage, Uber greyballing, fooling AI, DNS RATs, automating human jobs, suicide and ML, post-work IQ and creativity, greatness vs. imperfection, media choice, tools, projects, and more…Beco...

6 Mar 201737min

Unsupervised Learning: No. 67

Unsupervised Learning: No. 67

CloudBleed, SHA1-1, White House Leaks, Planets, Satellites, Drones vs. Eagles, InfoSec Jobs, ExFil, IQ and Creativity in a Post-work World, Weaponized Narrative, Security Tools, Tons of Great Links, a...

27 Feb 201731min

Unsupervised Learning: No. 66

Unsupervised Learning: No. 66

My recap of RSA 2017, Google's zero-trust implementation, Trump domain hacked, robots doing your taxes, the IoT Security train analogy, the future of authentication, toolswatch best tools of 2016, and...

21 Feb 201729min

Populært innen Teknologi

lydartikler-fra-aftenposten
romkapsel
teknisk-sett
tomprat-med-gunnar-tjomlid
energi-og-klima
teknologi-og-mennesker
shifter
elektropodden
rss-heis
nasjonal-sikkerhetsmyndighet-nsm
pedagogisk-intelligens
rss-ai-forklart
smart-forklart
fornybaren
rss-for-alarmen-gar
rss-vi-leser-dommer-om-personvern
i-loopen
rss-metadama-data-management-in-the-nordics
rss-ki-praten
rss-alt-som-gar-pa-strom