Using the Smartest AI to Rate Other AI

Using the Smartest AI to Rate Other AI

In this episode, I walk through a Fabric Pattern that assesses how well a given model does on a task relative to humans. This system uses your smartest AI model to evaluate the performance of other AIs—by scoring them across a range of tasks and comparing them to human intelligence levels.

I talk about:

1. Using One AI to Evaluate Another
The core idea is simple: use your most capable model (like Claude 3 Opus or GPT-4) to judge the outputs of another model (like GPT-3.5 or Haiku) against a task and input. This gives you a way to benchmark quality without manual review.

2. A Human-Centric Grading System
Models are scored on a human scale—from “uneducated” and “high school” up to “PhD” and “world-class human.” Stronger models consistently rate higher, while weaker ones rank lower—just as expected.

3. Custom Prompts That Push for Deeper Evaluation
The rating prompt includes instructions to emulate a 16,000+ dimensional scoring system, using expert-level heuristics and attention to nuance. The system also asks the evaluator to describe what would have been required to score higher, making this a meta-feedback loop for improving future performance.

Note: This episode was recorded a few months ago, so the AI models mentioned may not be the latest—but the framework and methodology still work perfectly with current models.

Subscribe to the newsletter at:
https://danielmiessler.com/subscribe

Join the UL community at:
https://danielmiessler.com/upgrade

Follow on X:
https://x.com/danielmiessler

Follow on LinkedIn:
https://www.linkedin.com/in/danielmiessler

See you in the next one!

Become a Member: https://danielmiessler.com/upgrade

See omnystudio.com/listener for privacy information.

Avsnitt(532)

T1SP: Episode 26

T1SP: Episode 26

[ Subscribe to the Podcast: iTunes | Android | RSS ] News [ ] Backdoor found in AMX devices that run corporate and government conference rooms [ ] Autopwn every Android device on your network using BetterCap and addJavascritInterface [ ] Cyber insurance challenged: a lawsuit for failing to cover a 500K loss in Houston […] -- :: T1SP: Episode 26 appeared originally on danielmiessler.com. :: Subscribe to Unsupervised Learning---my weekly show where I handpick the best stories from infosec and technology, and talk about why they matter.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

25 Jan 201649min

T1SP: Episode 25

T1SP: Episode 25

[ Subscribe to the Podcast: iTunes | Android | RSS ] News * [ ] TrendMicro node.js server listening on localhost can execute commands; exposed to the internet * [ ] SSH backdoor found in Fortinet firewalls * [ ] SSH client vulnerability * [ ] Australia’s Cybercrime Online Reporting Network (ACORN) received over 39K reports of criminal activity in 2015 * [ ] Hyatt names 250 hotels hit by malware, includes the one for DerbyCon * [ ] Web sense rebranding as Forepoint, acquires Intel’s firewall business * [ ] Twitter might be ending its 140 character limit * [ ] Major vulns still being found in Health and Fitness mobile apps * [ ] Angler exploit kit continues to evade detection * [ ] LostPass attack is a phishing email attack that works against LastPass (showed at Shmoocon this weekend) * [ ] Virus just took down the Melbourne Health computer system * [ ] Lastpass has found a workaround for the LostPass attack * [ ] A bit match fixing problem has been found in Tennis * [ ] Trustwave is being sued by Affinity for supposedly missing an second hack that was going on while they were there to fix an initial hack Ideas, updates, and discussion * [ ] IR is messy and dangerous; assume compromise; assume continued compromise; be extremely careful saying that things were contained; if you’re not Mandiant you’re probably not doing a great job * [ ] Smartphone encryption and the gun debate: same coin? ISIS supposedly has its own encryption app. What next, make murder illegal? Tools, talks, and projects * [ ] FIR – Fast Incident Response Management Platform * [ ] DIVA damn insecure and vulnerable Android app * [ ] Kill Chain for Kali Linux 2.0 : recon, weaponization, delivery, exploit, installation, c2, actions * [ ] EZ-Wave: exploiting Z-Wave networks using SDR * [ ] GoPhish: open source phishing framework * [ ] V3n0m SQLi scanner * [ ] VScan : uses NSE scripts to find vulns * [ ] SleepyPuppy Burp Extension * [ ] DBDAT — Database Assessment Tool — https://github.com/foospidy/DbDat Announcements * [ ] Speaking at AppSec Cali next week (Tuesday) on ATM * [ ] Shmoocon hiring list: http://www.room362.com/2016/01/2016-shmoocon-hiring-list.html Miscellaneous * [ ] Great security news source: https://security.didici.cc/news * [ ] Thanks to Tripwire for giving a shoutout to the podcast on Twitter [ Subscribe to the Podcast: iTunes | Android | RSS ] Notes * The intro track is from one of my favorite EDM artists: Zomby. The song is ‘Orion’, and it’s from the ‘With Love’ album. Highly recommended if you like chill EDM. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

19 Jan 201626min

T1SP: Episode 24

T1SP: Episode 24

[ Subscribe to the Podcast: iTunes | Android | RSS ] News * [ ] Norse lays of 20 people; not clear what percentage that is; threat intel not going so well? * [ ] OPM declines to release details on its big breach * [ ] Juniper says it’s going to remove the code that it thinks was developed by the NSA to eavesdrop on traffic * [ ] CVE details lists (OS X, iOS, Flash, Air, IE, Chrome, Firefox) as the software with the most issues * [ ] GM is going to do a bug bounty * [ ] The Hacker Manifesto turned 30 (My crime is that of curiosity) * [ ] Sophos Home free for Windows and Mac users * [ ] SF Yellowcab filling for bankruptcy * [ ] Hackers shut down Ukraine power grid; evidently a malicious word doc sent via email; supposedly the Sandworm Team * [ ] Bicycle Attack on TLS: https://guidovranken.files.wordpress.com/2015/12/https-bicycle-attack.pdf * [ ] North Korea evidently detonated a hydrogen bomb * [ ] Time warner customers lose email passwords (320K) * [ ] Microsoft killing off IE 8, 9, and 10 on January 12th * [ ] VTech launching new product line after it got hacked and leaked data on 6 million kids * [ ] Big Flash player update, 0-day and 18 other issues Ideas, updates, and discussion * [ ] Back to Ubuntu from CentOS * [ ] Sick for five weeks * [ ] Ikigai (what you love, what the world needs, what you can be paid for, what you are good at) * [ ] Giving books as gifts Tools, talks, and projects * [ ] TOWER-SEC protecting ECUs and Telematics on cars * [ ] AppSensor project; Detection points: https://www.owasp.org/index.php/AppSensor_DetectionPoints * [ ] Where the Science is Taking Us in Cybersecurity, Dan Geer * [ ] Rapid7 Hackazon app (modern) * [ ] DVNA (Damn vulnerable Node Application) * [ ] Argon2 password hashing algorithm * [ ] Dradis * [ ] Kippo SSH honeypot [ Subscribe to the Podcast: iTunes | Android | RSS ] Notes * The intro track is from one of my favorite EDM artists: Zomby. The song is ‘Orion’, and it’s from the ‘With Love’ album. Highly recommended if you like chill EDM. * It’s better to listen via iTunes or with the player embedded above, but you can also download the sound file directly. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

11 Jan 201628min

T1SP: Episode 23

T1SP: Episode 23

[ Subscribe to the Podcast: iTunes | Android | RSS ] News * [ ] Juniper backdoor; could have been found with diff; signs point to NSA * [ ] RCE on FireEye appliances * [ ] Hyatt got hacked; malware on POS * [ ] 45K drones registered with FAA within 2 days * [ ] Industry moving towards password-free logins; still single factor, now the factor is your device; although access to device could require factors * [ ] Microsoft will now tell you if your account has been targeted by government authorities * [ ] Tor announced it’s doing a bug bounty, looks like it’ll be internal * [ ] Steam had a DoS that revealed 34K user details * [ ] Linode has been suffering a massive DDoS on its datacenters, DNS infrastructure * [ ] Spy files found in North Korea’s Operating System Ideas, updates, and discussion * [ ] 3 things you should do every January * [ ] Web Scanner Series: Burp vs. Netsparker * [ ] When you’re interviewing, make sure you make it clear that you’re the asset too, not just them * [ ] Failing at the basics in intelligence and infosec * [ ] Why Trump is Winning * [ ] Sensitive data sent in URL over HTTPS * [ ] Difference between correlation and causation * [ ] Paul Graham’s REFRAGMENTATION post * [ ] The relationship between Relaxation, Fun, and Performance * [ ] Michael Coates makes the argument that false negatives are way better than false positives because false positives create unnecessary work for his team * [ ] Brainstorm questions, not solutions Tools and projects * [ ] BLUTO * [ ] Serpico * [ ] Firmware Extraction from Craig Smith * [ ] Vulnerability Database Resources * [ ] IoT Attack Surfaces Project * [ ] RobotsDisallowed Project * [ ] Nowhere.net (CyberPunk) * [ ] EyeWitness * [ ] REST Security Cheat Sheet * [ ] Censys.io * [ ] GithubDorks * [ ] InstaRecon (DNS lookups, whois, shodan, google dorks, etc) * [ ] twfactorauth.org Announcements * [ ] Speaking at OWASP Cali end of January * [ ] Currently working on an ICS / SCADA primer Miscellaneous * [ ] Need to check out the Benedict Evans blog * [ ] Serial Podcast / Making a Murderer on Netflix * [ ] If you know any Army veterans who are getting out and want to get into InfoSec, let me know * [ ] Twitter account: CISSP Googling * [ ] Sam Altman (Startup Playbook) [ Subscribe to the Podcast: iTunes | Android | RSS ] Notes * The intro track is from one of my favorite EDM artists: Zomby. The song is ‘Orion’, and it’s from the ‘With Love’ album. Highly recommended if you like chill EDM. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

4 Jan 201655min

Security and Obscurity

Security and Obscurity

[ Subscribe to the Podcast: iTunes | Android | RSS ] In this episode I explore the topic of Security and Obscurity by reading my popular essay on the topic. Notes * The intro track is from one of my favorite EDM artists: Zomby. The song is ‘Orion’, and it’s from the ‘With Love’ album. Highly recommended if you like chill EDM. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

13 Dec 201510min

T1SP: Episode 21

T1SP: Episode 21

[ Subscribe to the Podcast: iTunes | Android | RSS ] Topics for this episode: News * [ ] Stringing Shodan to exploitation * [ ] Why you need to check HaveIBeenPwned * [ ] Another DELL root cert hacked * [ ] ISIS OPSEC advice (data privacy, tor, crytocat, telegram, proton mail, gps features on mobile devices, etc.) They also mention not to use instagram because Facebook has a poor privacy record. * [ ] Obama wants to make it harder for terrorists to use technology to escape from justice * [ ] DHS giving companies free penetration tests * [ ] Issues in Honeywell gas detectors (path traversal and clear-text passwords) * [ ] UAE Bank declines to pay ransom, data released * [ ] Swift is open source * [ ] Amazon two-factor now available * [ ] Credit freeze vs. monitoring * [ ] Thousands of IoT devices sharing the same SSH keys * [ ] Many people predicting that 2016 is the year that Apple gets targeted by more attackers * [ ] Engine Immobilizers hackable over the internet Announcements * [ ] Speaking at OWASP Cali end of January * [ ] Currently working on an ICS / SCADA primer Productivity * [ ] Algorithmic learning [ Subscribe to the Podcast: iTunes | Android | RSS ] Notes * The intro track is from one of my favorite EDM artists: Zomby. The song is ‘Orion’, and it’s from the ‘With Love’ album. Highly recommended if you like chill EDM. * It’s better to listen via iTunes or with the player embedded above, but you can also download the sound file directly. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

13 Dec 201518min

Take 1 Security Podcast: Episode 20

Take 1 Security Podcast: Episode 20

Topics for this episode: News and analysis * [ ] Ads using high frequency sound to communicate across devices. The ultrasonic pitches are embedded into TV commercials or are played when a user encounters an ad displayed in a computer browser. While the sound can’t be heard by the human ear, nearby tablets and smartphones can detect it. When they do, browser cookies can now pair a single user to multiple devices and keep track of what TV commercials the person sees, how long the person watches the ads, and whether the person acts on the ads by doing a Web search or buying a product. * [ ] Conficker in police body cameras (windows brute force tool) * [ ] Siri iOS data extraction. Tv reporter * [ ] The eye of Siri * [ ] Read top stories from the security news site * [ ] Expect to see concealed carry increase in the united states * [ ] Starwood hotels hit with POS malware * [ ] How to Deploy Splunk AD Monitoring in 437 Easy Steps * [ ] PCs being shipped with MiTM certs in them (supply chain security) * [ ] Java Deserialization flaws evidently affect more libraries * [ ] France looking at banning Tor, blocking public WiFi * [ ] Blackberry leaves Pakistan rather than provide backdoor * [ ] EFF launches bug disclosure program for Let’s Encrypt and HTTPS Everywhere * [ ] Flash is really on the way out Ideas and commentary * Personal Github Notes * The intro track is from one of my favorite EDM artists: Zomby. The song is ‘Orion’, and it’s from the ‘With Love’ album. Highly recommended if you like chill EDM. * It’s better to listen via iTunes or with the player embedded above, but you can also download the sound file directly. Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

7 Dec 201523min

Corporations Don't Want Employees

Corporations Don't Want Employees

Companies don't want employees, and they're doing their best to get rid of them. We should be getting ready for this.Become a Member: https://danielmiessler.com/upgradeSee omnystudio.com/listener for privacy information.

17 Nov 20153min

Populärt inom Teknik

uppgang-och-fall
elbilsveckan
rss-racevecka
bilar-med-sladd
market-makers
skogsforum-podcast
rss-laddstationen-med-elbilen-i-sverige
rss-technokratin
natets-morka-sida
rss-elektrikerpodden
developers-mer-an-bara-kod
mediepodden
ai-sweden-podcast
rss-uppgang-och-fall
solcellskollens-podcast
hej-bruksbil
bli-saker-podden
rss-it-sakerhetspodden
rss-veckans-ai
rss-fabriken-2