The One With the Future of SRE and Matt Zelesko
Google SRE Prodcast11 Kesä 2025

The One With the Future of SRE and Matt Zelesko

Matt Zelesko, the head of Site Reliability Engineering at Google, discusses the evolution of SRE, highlighting the shift from traditional operations to a model that balances velocity and reliability to better serve the rapid advancements in AI and ML. He emphasizes that SRE's core mission is to enable partners to move quickly while meeting reliability goals, and that the sheer scale of Google's infrastructure necessitates the SRE model for cross-system problem-solving. Zelesko envisions AI as a crucial assistant for SREs, improving incident detection, mitigation, and postmortem processes, and allowing SREs to focus on more complex engineering challenges and risk management earlier in the development cycle, while still valuing the hands-on experience of operating production infrastructure.

Jaksot(51)

The One With Damion Yates and Building AI systems

The One With Damion Yates and Building AI systems

How do you introduce Site Reliability Engineering to an AI research lab, bringing concepts of scale to engineers who are at the leading edge of AI systems? In the latest episode of The Prodcast, hosts...

26 Helmi 31min

The One With Carla Geisser and Crisis Engineering

The One With Carla Geisser and Crisis Engineering

Join us for a discussion with Carla Geisser of Layer Aleph, a company focused on "crisis engineering". Carla distinguishes a crisis from a standard incident by noting that a crisis is novel and lacks ...

11 Helmi 25min

The One with Parker Barnes, Felipe Tiengo Ferreira, and AI

The One with Parker Barnes, Felipe Tiengo Ferreira, and AI

This episode of the Prodcast tackles the challenges of maintaining AI safety and alignment in production. Guests Felipe Tiengo Ferreira and Parker Barnes join hosts Matt Siegler and Steve McGhee to di...

5 Helmi 36min

The One With Shannon Brady and Operating Systems

The One With Shannon Brady and Operating Systems

In this episode of the Prodcast, guest Shannon Brady speaks with hosts Jordan Greenberg and Florian Rathgeber about managing Google's vast fleet of internal devices. Shannon explains how Google's Linu...

28 Tammi 24min

The One With Denia Del Cid and AI

The One With Denia Del Cid and AI

Curious about the real impact of AI on Site Reliability Engineering? In this episode of The Prodcast, Google SRE Denia del Cid breaks down how her team is leveraging AI to transform production workflo...

21 Tammi 29min

The One With Heather Adkins and Security (and AI)

The One With Heather Adkins and Security (and AI)

Join us on The Prodcast as we host Heather Adkins, leader of Google's Office of Cybersecurity Resilience, for a critical look at the future of digital defenses. We explore the intersection of SRE and ...

14 Tammi 24min

The One With SLOs

The One With SLOs

In this episode, we welcome Alex Hidalgo and Brian Singer of nobl9 to discuss Service Level Objectives (SLOs). Alex and Brian talk about how SLOs can establish a vernacular across industry verticals, ...

7 Tammi 38min

The One With Steph Hippo and Observability

The One With Steph Hippo and Observability

In this episode, Steph Hippo, Platform Engineering Director at Honeycomb, joins The Prodcast to discuss AI and SRE.  Steph explains how observability helps us understand complex systems from their out...

16 Joulu 202533min