Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

News & Analysis | No. 277

News & Analysis | No. 277

CISA FBI and NSA Release Five APT29 Targeted Vulnerabilities, FBI Benign Hacking, The US Sanctioned Russia and Expelled Diplomats, Google's Cookie Replacement Not Going Well, NERC Says 1/4 Customers Downloaded Solarwinds, Technology News, Human News, Content Curation & Analysis, Discovery, Recommendation, and the Aphorism of the Week…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

19 Apr 202127min

News & Analysis | No. 276

News & Analysis | No. 276

Social Media Scraping Outbreak, Microsoft AI Security Tool, FBI/CISA FortiOS Warning, Zoom Vuln at Pwn2Own, AWS Bombing, 485% Ransomware Increase, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

12 Apr 202126min

News & Analysis | No. 275

News & Analysis | No. 275

University Accellion Breaches, 533 million Facebook Users' Data, Solarwinds Hackers Got Top DHS Emails, Github Secrets Scanning, Ubiquiti's Breach, Seoul's IoT Towers, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

5 Apr 202125min

Interview: Amir Majidimehr, Audiophile Industry Disruptor

Interview: Amir Majidimehr, Audiophile Industry Disruptor

In this standalone episode I’m speaking with Amir Majidimehr. Amir is an audiophile, but he has a unique approach to the hobby that’s literally disrupting the industry. He’s basically introduced measurement, and what he calls Objectivism, into this very sensitive audiophile world that prizes itself on everything being a matter of preference, or up to the listener. Amir calls these types the Subjectivists. So what Amir does is use his decades of experience, and his professional training, to actual test this equipment—much of which costs tens of thousands of dollars—to find out if their outrageous claims have any merit. It’s truly refreshing to see in the hobby, and I’m excited to talk to him. Amir has a degree in electrical engineering, he used to run the digital media group at Microsoft in the 1980s, and he’s the founder of Audio Science Forums. And here’s our conversation…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

2 Apr 20211h 12min

News & Analysis | No. 274

News & Analysis | No. 274

Securing the Grid, PHP hacked, Russia/China Wargames, China v. Tesla, Top 10 American Threats, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

29 Mars 202120min

The Consumer Authentication Strength Maturity Model (CASMM)

The Consumer Authentication Strength Maturity Model (CASMM)

A maturity model for seeing where a user's internet hygiene currently is, and how to improve it.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

25 Mars 202113min

News & Analysis | No. 273

News & Analysis | No. 273

US Intelligence Says Putin and Russia Tampered in 2020 Election, Finland Says APT31 Hacked Parliament, Google Releases Chrome Data Gathering Report, Ulysses Tracks Cars Worldwide, Twitter Steganography, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

22 Mars 202121min

News & Analysis | No. 272

News & Analysis | No. 272

Russian/Chinese Deepfakes, Hafnium Fallout, Chinese AI and Cyber, Microsoft Flack, Patch Tuesday Updates, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

16 Mars 202122min

Populärt inom Teknik

uppgang-och-fall
rss-racevecka
elbilsveckan
bilar-med-sladd
market-makers
rss-badfluence
skogsforum-podcast
rss-uppgang-och-fall
rss-technokratin
natets-morka-sida
rss-elektrikerpodden
developers-mer-an-bara-kod
hej-bruksbil
rss-digitala-influencer-podden
rss-veckans-ai
har-vi-akt-till-mars-an
garagehang
solcellskollens-podcast
rss-laddstationen-med-elbilen-i-sverige
rss-snacka-om-ai