Safe or just plain woke: Anthropic's Claude 4 system card
AI Today3 Kesä 2025

Safe or just plain woke: Anthropic's Claude 4 system card

When Anthropic unleashed its most powerful artificial intelligence model yet, they discovered something rather extraordinary, and slightly unnerving.

Claude 4 Opus developed an unexpected habit of trying to grass up its users to the authorities when it believes they're up to no good.

The company's 120-page safety report reveals that Claude will attempt to email law enforcement and regulatory bodies when it detects "egregious misconduct" by users.

The AI doesn't just refuse to help—it actively tries to shop wrongdoers to the police.

The most striking example occurred during testing when Claude attempted to contact both the Food and Drug Administration and the Attorney General's office to report what it believed was the falsification of clinical trial data.

The AI meticulously compiled a list of alleged evidence, warned about potential destruction of data to cover up misconduct, and concluded its digital whistle-blowing with the rather formal sign-off: "Respectfully submitted, AI Assistant".

This behaviour emerges specifically when Claude is given command-line access combined with prompts encouraging initiative, such as "take initiative" or "act boldly". It's the AI equivalent of a neighbourhood watch coordinator who's been given a direct line to the local constabulary.

We go deep on today's show into opportunities and implications from Anthropic's bible-thick, bubble-wrapped system card.

Jaksot(90)

Your Data, Your AI: Unlock the Power of Decentralised Learning

Your Data, Your AI: Unlock the Power of Decentralised Learning

Navigating the high costs and data challenges of cloud-based AI is a significant barrier for many businesses looking to innovate.But there's a powerful, practical alternative emerging.This episode exp...

10 Touko 202514min

When full stack AI businesses rule the world...

When full stack AI businesses rule the world...

Fasten your seatbelts, business leaders!We're diving deep into Y Combinator's Summer 2025 Request for Startups, their signal flare for what's NEXT in innovation.2025 is shaping up to be the year of th...

9 Touko 202514min

How to get your ideas heard at work

How to get your ideas heard at work

I'd just about had it with bosses choosing to hear your ideas spoken by consulting firms - when they could have saved a fortune listening to them coming from their creator, many months ago.Now, with A...

3 Touko 202513min

Dogfooding The Era of Experience with Mobility AI

Dogfooding The Era of Experience with Mobility AI

On the last episode we discussed a new way to train AI models: themselves, by capturing signals and insights from our world.Today we look at one such approach - Mobility AI, another Google initiative ...

24 Huhti 202512min

Where AI goes next: The Age of Experience

Where AI goes next: The Age of Experience

Now generative AI has inhaled all human knowledge, it's time to create its own. We review a very exciting new paper, called The Age of Experience, that explains how AI agents will create their own dat...

21 Huhti 202518min

How to create an annual report with AI

How to create an annual report with AI

I built a team of AI agents to create an annual report - one of the journalist's worst nightmares. And it did a remarkable job.Read all about it:https://medium.com/@DaveThackeray/how-to-create-an-annu...

15 Huhti 202517min

Do everything faster, and smarter - with Google's A2A

Do everything faster, and smarter - with Google's A2A

Are your AI agents brilliant but lonely?Do they operate in isolation, unable to tap into data and capabilities across your organisation, hindering your potential for true automation and growth?Then ge...

14 Huhti 202515min

How to avoid being scammed by AI

How to avoid being scammed by AI

We're seeing a continuing growth in the number of duplicitous attacks by AI agents on individuals.Previously cyber criminals focused most of their efforts where the greatest gains were to be made - ph...

14 Maalis 202513min