Podme logo
KotiLöydäKategoriatEtsiOpiskelijoille
#151 – Ajeya Cotra on accidentally teaching AI models to deceive us

#151 – Ajeya Cotra on accidentally teaching AI models to deceive us

02:49:402023-05-12

Jaksokuvaus

Imagine you are an orphaned eight-year-old whose parents left you a $1 trillion company, and no trusted adult to serve as your guide to the world. You have to hire a smart adult to run that company, guide your life the way that a parent would, and administer your vast wealth. You have to hire that adult based on a work trial or interview you come up with. You don't get to see any resumes or do reference checks. And because you're so rich, tonnes of people apply for the job — for all sorts of reasons.Today's guest Ajeya Cotra — senior research analyst at Open Philanthropy — argues that this peculiar setup resembles the situation humanity finds itself in when training very general and very capable AI models using current deep learning methods.Links to learn more, summary and full transcript.As she explains, such an eight-year-old faces a challenging problem. In the candidate pool there are likely some truly nice people, who sincerely want to help and make decisions that are in your interest. But there are probably other characters too — like people who will pretend to care about you while you're monitoring them, but intend to use the job to enrich themselves as soon as they think they can get away with it.Like a child trying to judge adults, at some point humans will be required to judge the trustworthiness and reliability of machine learning models that are as goal-oriented as people, and greatly outclass them in knowledge, experience, breadth, and speed. Tricky!Can't we rely on how well models have performed at tasks during training to guide us? Ajeya worries that it won't work. The trouble is that three different sorts of models will all produce the same output during training, but could behave very differently once deployed in a setting that allows their true colours to come through. She describes three such motivational archetypes:Saints — models that care about doing what we really wantSycophants — models that just want us to say they've done a good job, even if they get that praise by taking actions they know we wouldn't want them toSchemers — models that don't care about us or our interests at all, who are just pleasing us so long as that serves their own agendaAnd according to Ajeya, there are also ways we could end up actively selecting for motivations that we don't want.In today's interview, Ajeya and Rob discuss the above, as well as:How to predict the motivations a neural network will develop through trainingWhether AIs being trained will functionally understand that they're AIs being trained, the same way we think we understand that we're humans living on planet EarthStories of AI misalignment that Ajeya doesn't buy intoAnalogies for AI, from octopuses to aliens to can openersWhy it's smarter to have separate planning AIs and doing AIsThe benefits of only following through on AI-generated plans that make sense to human beingsWhat approaches for fixing alignment problems Ajeya is most excited about, and which she thinks are overratedHow one might demo actually scary AI failure mechanismsGet this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.Producer: Keiran HarrisAudio mastering: Ryan Kessler and Ben CordellTranscriptions: Katy Moore

Uusimmat jaksot

80,000 Hours Podcast
80,000 Hours Podcast

#202 – Venki Ramakrishnan on the cutting edge of anti-ageing science

2024-09-192h 20min
80,000 Hours Podcast
80,000 Hours Podcast

#201 – Ken Goldberg on why your robot butler isn’t here yet

2024-09-132h 1min
80,000 Hours Podcast
80,000 Hours Podcast

#200 – Ezra Karger on what superforecasters and experts think about existential risks

2024-09-042h 49min
80,000 Hours Podcast
80,000 Hours Podcast

#199 – Nathan Calvin on California’s AI bill SB 1047 and its potential to shape US AI policy

2024-08-291h 12min
80,000 Hours Podcast
80,000 Hours Podcast

#198 – Meghan Barrett on challenging our assumptions about insects

2024-08-263h 48min
80,000 Hours Podcast
80,000 Hours Podcast

#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

2024-08-222h 29min
80,000 Hours Podcast
80,000 Hours Podcast

#196 – Jonathan Birch on the edge cases of sentience and why they matter

2024-08-152h 1min
80,000 Hours Podcast
80,000 Hours Podcast

#195 – Sella Nevo on who's trying to steal frontier AI models, and what they could do with them

2024-08-012h 8min
80,000 Hours Podcast
80,000 Hours Podcast

#194 – Vitalik Buterin on defensive acceleration and how to regulate AI when you fear government

2024-07-263h 4min
80,000 Hours Podcast
80,000 Hours Podcast

#193 – Sihao Huang on the risk that US–China AI competition leads to war

2024-07-182h 23min
logo

PODME

TIEDOT

  • Evästekäytäntö
  • Käyttöehdot
  • Tietosuojakäytäntö
  • Medialle

LATAA SOVELLUKSEMME!

app storegoogle play store

ALUEELLA

flag
  • sweden_flag
  • norway_flag
  • finland_flag

© Podme AB 2024