The One With Carla Geisser and Crisis Engineering

The One With Carla Geisser and Crisis Engineering

Join us for a discussion with Carla Geisser of Layer Aleph, a company focused on "crisis engineering". Carla distinguishes a crisis from a standard incident by noting that a crisis is novel and lacks a playbook. She outlines five criteria for a true crisis: fundamental surprise, broken critical functions, high visibility, a rigid deadline (unlike internal tech deadlines), and perception breakdown. Crises often arise in organizations that struggle to admit computers control core decisions, leading to complex, glued-together systems. Carla emphasizes that SRE-adjacent skills are essential for connecting the dots and exposing the full system. The key takeaway for SREs is to recognize when a true crisis is happening, as leadership will only be willing to "break rules" and enable substantive change once three of these criteria are met.1

Jaksot(51)

The One With Damion Yates and Building AI systems

The One With Damion Yates and Building AI systems

How do you introduce Site Reliability Engineering to an AI research lab, bringing concepts of scale to engineers who are at the leading edge of AI systems? In the latest episode of The Prodcast, hosts...

26 Helmi 31min

The One with Parker Barnes, Felipe Tiengo Ferreira, and AI

The One with Parker Barnes, Felipe Tiengo Ferreira, and AI

This episode of the Prodcast tackles the challenges of maintaining AI safety and alignment in production. Guests Felipe Tiengo Ferreira and Parker Barnes join hosts Matt Siegler and Steve McGhee to di...

5 Helmi 36min

The One With Shannon Brady and Operating Systems

The One With Shannon Brady and Operating Systems

In this episode of the Prodcast, guest Shannon Brady speaks with hosts Jordan Greenberg and Florian Rathgeber about managing Google's vast fleet of internal devices. Shannon explains how Google's Linu...

28 Tammi 24min

The One With Denia Del Cid and AI

The One With Denia Del Cid and AI

Curious about the real impact of AI on Site Reliability Engineering? In this episode of The Prodcast, Google SRE Denia del Cid breaks down how her team is leveraging AI to transform production workflo...

21 Tammi 29min

The One With Heather Adkins and Security (and AI)

The One With Heather Adkins and Security (and AI)

Join us on The Prodcast as we host Heather Adkins, leader of Google's Office of Cybersecurity Resilience, for a critical look at the future of digital defenses. We explore the intersection of SRE and ...

14 Tammi 24min

The One With SLOs

The One With SLOs

In this episode, we welcome Alex Hidalgo and Brian Singer of nobl9 to discuss Service Level Objectives (SLOs). Alex and Brian talk about how SLOs can establish a vernacular across industry verticals, ...

7 Tammi 38min

The One With Steph Hippo and Observability

The One With Steph Hippo and Observability

In this episode, Steph Hippo, Platform Engineering Director at Honeycomb, joins The Prodcast to discuss AI and SRE.  Steph explains how observability helps us understand complex systems from their out...

16 Joulu 202533min