Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

News & Analysis | No. 246

News & Analysis | No. 246

Gullibility vs. Disinformation, Russia, Iran, and China Attacking US Elections, Oracle TikTok, US Revokes Chinese Visas, China vs. US Cyber, Patch Tuesday, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

14 Sep 202026min

Book Summary | Atomic Habits, by James Clear

Book Summary | Atomic Habits, by James Clear

In this episode, I review the book Atomic Habits, by James Clear. I cover: My one-sentence summary of the text The table of contents, which is super helpful to see the structure of the argument My capture of the main points My takeaways, questions, and ideas that came from reading it My final summarization And then my rating of the book and whether I recommend you read the full text Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

10 Sep 202016min

News & Analysis | No. 245

News & Analysis | No. 245

Anxiety and Freedom, Microsoft Deepfake Detection, Facebook Disinformation, Replacing Huawei, India China Apps, JEDI Microsoft, A Text Scam, Cisco Jabber Flaw, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

8 Sep 202025min

News & Analysis | No. 244

News & Analysis | No. 244

Russian attempted hack of Tesla, New Zealand SE DDoS, Drone Assassinations, China Unified Social Credit System, Cisco Sabotage, Stolen Gaming Accounts, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

31 Aug 202018min

News & Analysis | No. 243

News & Analysis | No. 243

InfoSec Creator Monetization, Initiating Contact with a Mentor, The Dark Side of Bounty/Creator Life, Facebook Election Threat Scenarios, Uber CISO Arrested, Spy HR Review Goes Bad, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

24 Aug 202025min

News & Analysis | No. 242

News & Analysis | No. 242

Clearview AI ICE, NSA/FBI Fancy Bear Malware, Indian Health Card, Trump TikTok 90 Days, Startups Dying, Uber/Lyft vs. Courts, Android Earthquakes, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

17 Aug 202024min

News & Analysis | No. 241

News & Analysis | No. 241

State Department Russian Media, Clean Network Plan, Cap One Fine, NSA Tracking Warning, YouTube Account Ban, Amazon Malls, No More Pixel 4, Audio RPGs, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

10 Aug 202024min

News & Analysis | No. 240

News & Analysis | No. 240

FBI Twitter Suspects, Recorded Future China Vatican, TikTok Microsoft Sep 15th, Amazon and Shopify Thriving, Forrester Ad Spending, Samsung Out of China, Technology News, Human News, Ideas Trends & Analysis, Discovery, Recommendations, and the Weekly Aphorism…Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

4 Aug 202016min

Populärt inom Teknik

uppgang-och-fall
market-makers
elbilsveckan
rss-badfluence
rss-racevecka
rss-laddstationen-med-elbilen-i-sverige
natets-morka-sida
rss-technokratin
skogsforum-podcast
rss-elektrikerpodden
hej-bruksbil
rss-uppgang-och-fall
bilar-med-sladd
garagehang
developers-mer-an-bara-kod
solcellskollens-podcast
rss-digitala-influencer-podden
rss-veckans-ai
har-vi-akt-till-mars-an
rss-snacka-om-ai