84: Trust But Canary: Configuration Safety at Scale

84: Trust But Canary: Configuration Safety at Scale

Have you ever wondered how Meta makes config rollouts safe at scale? In this episode, Pascal sits down with Ishwari and Joe to discuss Meta's approach for propagating changes across services in seconds and discuss why speed increases the need for strong safeguards. Catch the episode to discover canarying and progressive rollouts, the health checks and monitoring signals used to catch regressions early, and how incident reviews focus on improving systems rather than blaming people. We also hear how data and early AI/ML are slashing alert noise and speeding up bisecting when something goes wrong.

Got feedback? Send it to us on Threads (https://threads.net/@metatechpod), Instagram (https://instagram.com/metatechpod) and don't forget to follow our host Pascal (https://mastodon.social/@passy, https://threads.net/@passy_). Fancy working with us? Check out https://www.metacareers.com/.

Links

Timestamps

  • Intro 0:06

  • Introduction and Overview of Configuration Changes 2:31

  • Understanding Configurations in Distributed Systems 4:02

  • Meta's Configuration Management Systems 6:43

  • Safeguards and Incident Prevention 9:22

  • Deployment Mechanisms: Canary and Progressive Rollouts 12:06

  • Challenges in Configuration Consumption 14:39

  • Health Checks and Incident Response 17:13

  • Mitigation Strategies for Configuration Issues 19:18

  • Balancing Developer Velocity and Configuration Safety 21:09

  • Data-Driven Improvements in Incident Management 22:12

  • Leveraging AI for Change Detection 26:05

  • Challenges in Deployment and Testing 28:21

  • Reinventing Change Safety Strategies 30:24

  • War Stories: Learning from Past Incidents 32:59

  • Outro 36:10

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(91)

85: Reel Friends: Building Social Discovery that Scales to Billions

85: Reel Friends: Building Social Discovery that Scales to Billions

You've probably spotted those little circles of your friends' faces popping up on Facebook Reels. They look simple enough, but building them was a proper engineering challenge. In this episode, Pascal...

8 Touko 38min

83: Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps

83: Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps

At Meta, even seemingly simple engineering tasks—like updating an API—become monumental undertakings when you're dealing with millions of lines of code and thousands of engineers, especially if the ch...

27 Helmi 47min

82: CSS at Scale with StyleX

82: CSS at Scale with StyleX

It's not just Not Invented Here Syndrome. Some technologies like CSS simply don't scale if you're building some of the largest websites on the planet with thousands of engineers committing to the same...

8 Tammi 44min

81: From Zero to Polish: Building Meta Ray-Ban Display

81: From Zero to Polish: Building Meta Ray-Ban Display

You've likely heard of Meta Ray-Ban Display by now — but what's it actually like to work on it? In this episode, Pascal talks to Kenan and Emanuel about the exciting features of Meta's First-Gen Displ...

12 Joulu 202547min

80: Lowering emissions with the Open Compute Project

80: Lowering emissions with the Open Compute Project

In this episode, Pascal talks to Dharmesh J. (DJ) and Lisa about the vision for the open, scalable future of networking hardware for AI and to break down Meta's big announcements from the 2025 Open Co...

14 Marras 202538min

79: Building Android apps in Meta's monorepository with Buck2

79: Building Android apps in Meta's monorepository with Buck2

How do you keep Android build times under control when your codebase spans tens of thousands of modules and millions of lines of Kotlin? In this episode, Pascal talks with Iveta, Navid, and Joshua fro...

10 Loka 202537min

78: Generating 3D Worlds with AI

78: Generating 3D Worlds with AI

Creating 3D assets can be daunting, but does it have to be? Mahima and Rakesh are on a quest to democratize 3D content creation with AssetGen, a foundation model for 3D. They discuss the challenges of...

19 Syys 202536min