How An AI Model Learned To Be Bad — With Evan Hubinger And Monte MacDiarmid

How An AI Model Learned To Be Bad — With Evan Hubinger And Monte MacDiarmid

Evan Hubinger is Anthropic’s alignment stress test lead. Monte MacDiarmid is a researcher in misalignment science at Anthropic.The two join Big Technology to discuss their new research on reward hacking and emergent misalignment in large language models. Tune in to hear how cheating on coding tests can spiral into models faking alignment, blackmailing fictional CEOs, sabotaging safety tools, and even developing apparent “self-preservation” drives. We also cover Anthropic’s mitigation strategies like inoculation prompting, whether today’s failures are a preview of something far worse, how much to trust labs to police themselves, and what it really means to talk about an AI’s “psychology.” Hit play for a clear-eyed, concrete, and unnervingly fun tour through the frontier of AI safety. --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. Want a discount for Big Technology on Substack + Discord? Here’s 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b Questions? Feedback? Write to: bigtechnologypodcast@gmail.com --- Wealthfront.com/bigtech⁠. If eligible for the overall boosted 4.15% rate offered with this promo, your boosted rate is subject to change if the 3.50% base rate decreases during the 3-month promo period. The Cash Account, which is not a deposit account, is offered by Wealthfront Brokerage LLC ("Wealthfront Brokerage"), Member FINRA/SIPC, not a bank. The Annual Percentage Yield ("APY") on cash deposits as of 11/7/25, is representative, requires no minimum, and may change at any time. The APY reflects the weighted average of deposit balances at participating Program Banks, which are not allocated equally. Wealthfront Brokerage sweeps cash balances to Program Banks, where they earn the variable base APY. Instant withdrawals are subject to certain conditions and processing times may vary. Learn more about your ad choices. Visit megaphone.fm/adchoices

Avsnitt(517)

AI Revenue Explodes, Dario’s Memo, McDonald's CEO’s Baby Burger Bite

AI Revenue Explodes, Dario’s Memo, McDonald's CEO’s Baby Burger Bite

Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We cover: 1) OpenAI hits $25 billion ARR, Anthropic hits $19 billion ARR 2) Are ARR numbers trustworthy? 3) OpenAI's ...

6 Mars 1h

Pentagon Insider: What's Next For Anthropic and The Department of War — With Michael Horowitz

Pentagon Insider: What's Next For Anthropic and The Department of War — With Michael Horowitz

Michael Horowitz is the former deputy assistant secretary of defense for force development and emerging capabilities at the Department of Defense, and currently a professor at the University of Pennsy...

4 Mars 48min

Dario’s Choice and Anthropic’s Future, Apple’s AI Devices, Netflix Loses WBD

Dario’s Choice and Anthropic’s Future, Apple’s AI Devices, Netflix Loses WBD

M.G. Siegler of Spyglass is back for our monthly tech news discussion. Siegler joins us to discuss the latest on the Pentagon’s clash with Anthropic, why OpenAI stepped in to take the deal, and what c...

2 Mars 1h 1min

Anthropic vs. The Pentagon, Bloodbath at Block, The Citrini Selloff

Anthropic vs. The Pentagon, Bloodbath at Block, The Citrini Selloff

Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We cover: 1) The origins of Anthropic's stare-down with the Pentagon 2) Claude's use in the operation to capture Vene...

27 Feb 1h 4min

Can AI Become Conscious? — With Michael Pollan

Can AI Become Conscious? — With Michael Pollan

Michael Pollan is the author of A World Appears: A Journey into Consciousness. Pollan joins Big Technology Podcast to discuss whether AI can ever become conscious and what that question reveals about ...

25 Feb 55min

OpenAI Closes in on $100 Billion, OpenClaw Acquired, AI’s Productivity Question — With Aaron Levie

OpenAI Closes in on $100 Billion, OpenClaw Acquired, AI’s Productivity Question — With Aaron Levie

Box CEO Aaron Levie joins for our weekly discussion of the latest tech news. We cover: 1) OpenAI's anticipated $100 billion fundraise 2) Does OpenAI's big forthcoming raise settle questions about its ...

20 Feb 55min

How Google DeepMind Operates & Experiments — With Lila Ibrahim and James Manyika

How Google DeepMind Operates & Experiments — With Lila Ibrahim and James Manyika

Lila Ibrahim is the COO of Google DeepMind. James Manyika is the senior Vice President for Research, Technology, and Society at Google. The two join Big Technology Podcast to discuss how Google's AI e...

18 Feb 50min

Is Something Big Happening?, AI Safety Apocalypse, Anthropic Raises $30 Billion

Is Something Big Happening?, AI Safety Apocalypse, Anthropic Raises $30 Billion

Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We're also joined by Steven Adler, ex-OpenAI safety researcher and author of Clear-Eyed AI on Substack. We cover: 1) ...

13 Feb 1h 8min

Populärt inom Business & ekonomi

framgangspodden
varvet
rss-jossan-nina
rss-svart-marknad
svd-tech-brief
badfluence
rss-borsens-finest
uppgang-och-fall
avanzapodden
bathina-en-podcast
fill-or-kill
tabberaset
24fragor
rss-kort-lang-analyspodden-fran-di
rss-dagen-med-di
lastbilspodden
kapitalet-en-podd-om-ekonomi
borsmorgon
rss-inga-dumma-fragor-om-pengar
rss-veckans-trade