How An AI Model Learned To Be Bad — With Evan Hubinger And Monte MacDiarmid

How An AI Model Learned To Be Bad — With Evan Hubinger And Monte MacDiarmid

Evan Hubinger is Anthropic’s alignment stress test lead. Monte MacDiarmid is a researcher in misalignment science at Anthropic.The two join Big Technology to discuss their new research on reward hacking and emergent misalignment in large language models. Tune in to hear how cheating on coding tests can spiral into models faking alignment, blackmailing fictional CEOs, sabotaging safety tools, and even developing apparent “self-preservation” drives. We also cover Anthropic’s mitigation strategies like inoculation prompting, whether today’s failures are a preview of something far worse, how much to trust labs to police themselves, and what it really means to talk about an AI’s “psychology.” Hit play for a clear-eyed, concrete, and unnervingly fun tour through the frontier of AI safety. --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. Want a discount for Big Technology on Substack + Discord? Here’s 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b Questions? Feedback? Write to: bigtechnologypodcast@gmail.com --- Wealthfront.com/bigtech⁠. If eligible for the overall boosted 4.15% rate offered with this promo, your boosted rate is subject to change if the 3.50% base rate decreases during the 3-month promo period. The Cash Account, which is not a deposit account, is offered by Wealthfront Brokerage LLC ("Wealthfront Brokerage"), Member FINRA/SIPC, not a bank. The Annual Percentage Yield ("APY") on cash deposits as of 11/7/25, is representative, requires no minimum, and may change at any time. The APY reflects the weighted average of deposit balances at participating Program Banks, which are not allocated equally. Wealthfront Brokerage sweeps cash balances to Program Banks, where they earn the variable base APY. Instant withdrawals are subject to certain conditions and processing times may vary. Learn more about your ad choices. Visit megaphone.fm/adchoices

Avsnitt(515)

Why OpenAI Killed Sora, Did Apple Just Save Siri?, Meta’s Big Loss

Why OpenAI Killed Sora, Did Apple Just Save Siri?, Meta’s Big Loss

Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We cover: 1) Why AI-video didn't take off 2) Who wins now that OpenAI is shutting down Sora 3) The real reason OpenAI...

28 Mars 1h 3min

Senator Mark Warner: Nobody’s Ready for What AI Could Do To Us

Senator Mark Warner: Nobody’s Ready for What AI Could Do To Us

U.S. Senator Mark Warner is a three-term Virginia senator and vice chair of the Senate Intelligence Committee. Senator Warner joins Big Technology to discuss whether Washington is prepared for the eco...

25 Mars 48min

OpenAI’s Superapp Ambitions, Jensen on Jobs, Bezos’s $100 Billion Automation Fund

OpenAI’s Superapp Ambitions, Jensen on Jobs, Bezos’s $100 Billion Automation Fund

Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We cover: 1) OpenAI leadership says no more side quests 2) The company is focusing on enterprise and coding 3) Does t...

20 Mars 1h 1min

Are We Screwed If AI Works? — With Andrew Ross Sorkin

Are We Screwed If AI Works? — With Andrew Ross Sorkin

Andrew Ross Sorkin is an anchor at CNBC, columnist at The New York Times, and author of 1929, a bestselling book about the worst market crash in history. Sorkin joins Big Technology Podcast to discuss...

18 Mars 1h 5min

AI Backlash Intensifies, Nvidia GTC Preview, Meta’s Embarrassing Delay

AI Backlash Intensifies, Nvidia GTC Preview, Meta’s Embarrassing Delay

Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We cover: 1) Backlash against AI & specifically Sam Altman's comments about AI as a utility 2) Is this because people...

13 Mars 1h 1min

AI’s Unpopularity + Competing With ChatGPT — With Olivia Moore

AI’s Unpopularity + Competing With ChatGPT — With Olivia Moore

Olivia Moore is an AI partner at Andreessen Horowitz. Moore joins Big Technology Podcast to discuss whether startups still have a real shot at competing with the biggest AI chatbots as ChatGPT, Claude...

11 Mars 56min

AI Revenue Explodes, Dario’s Memo, McDonald's CEO’s Baby Burger Bite

AI Revenue Explodes, Dario’s Memo, McDonald's CEO’s Baby Burger Bite

Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We cover: 1) OpenAI hits $25 billion ARR, Anthropic hits $19 billion ARR 2) Are ARR numbers trustworthy? 3) OpenAI's ...

6 Mars 1h

Pentagon Insider: What's Next For Anthropic and The Department of War — With Michael Horowitz

Pentagon Insider: What's Next For Anthropic and The Department of War — With Michael Horowitz

Michael Horowitz is the former deputy assistant secretary of defense for force development and emerging capabilities at the Department of Defense, and currently a professor at the University of Pennsy...

4 Mars 48min

Populärt inom Business & ekonomi

framgangspodden
rss-jossan-nina
badfluence
varvet
svd-tech-brief
rss-borsens-finest
rss-svart-marknad
uppgang-och-fall
avanzapodden
bathina-en-podcast
fill-or-kill
lastbilspodden
rss-inga-dumma-fragor-om-pengar
rss-dagen-med-di
tabberaset
borsmorgon
rss-kort-lang-analyspodden-fran-di
rikatillsammans-om-privatekonomi-rikedom-i-livet
24fragor
montrosepodden