The One with STPA, Jeffrey Snover, and Theo Klein
Google SRE Prodcast2 Heinä 2025

The One with STPA, Jeffrey Snover, and Theo Klein

This episode discusses Systems Theoretic Process Analysis (STPA), a method for analyzing complex systems. Theo Klein, a Google SRE, and Jeffrey Snover, a Distinguished Engineer at Google, explain that STPA focuses on identifying how system accidents and losses occur due to a loss of control, rather than component failures. STPA helps identify design flaws early, even before code is written! The discussion highlights that STPA is a human-driven process, prompting critical questions about system goals and potential losses, and that Google is adapting the pure STPA approach for commercial software development to make it more practical and efficient.

Jaksot(51)

The One with Ben Good and Our Kubernetes Friends

The One with Ben Good and Our Kubernetes Friends

In this special episode hosts Steve McGhee from the Google SRE Prodcast and Kaslin Fields from the Google Kubernetes Podcast, welcome Google Cloud Solutions Architect Ben Good to discuss platform engi...

30 Heinä 202532min

The One With AI Agents, Ramón Llamas, and Swapnil Haria

The One With AI Agents, Ramón Llamas, and Swapnil Haria

Google Staff SRE Ramón Llamas and Google Software Engineer Swapnil Haria join our hosts to explore how AI agents are revolutionizing production management, from summarizing alerts and finding hidden e...

23 Heinä 202542min

The One with Technical Program Managers and Karanveer Anand

The One with Technical Program Managers and Karanveer Anand

This episode features Google Technical Program Manager (TPM) Karanveer Anand, who joins our hosts to discuss the unique role of TPMs in Site Reliability Engineering (SRE). The conversation highlights ...

16 Heinä 202527min

The One with Startups and Adam Fletcher

The One with Startups and Adam Fletcher

In this episode, hosts Steve McGhee and Matt Siegler are joined by guest, Adam Fletcher, CEO and Co-Founder of MarketStreet. They discuss the current state of web development with LLMs, managing techn...

25 Kesä 202541min

The One with SLOs and Sal Furino

The One with SLOs and Sal Furino

In this episode, Sal Furino, Customer Reliability Engineer at Bloomberg, discusses all things Service Level Objectives (SLOs) with hosts Steve McGhee and Matt Siegler. Together, they dig into what suc...

18 Kesä 202543min

The One With the Future of SRE and Matt Zelesko

The One With the Future of SRE and Matt Zelesko

Matt Zelesko, the head of Site Reliability Engineering at Google, discusses the evolution of SRE, highlighting the shift from traditional operations to a model that balances velocity and reliability t...

11 Kesä 202526min

The One with AI and Todd Underwood

The One with AI and Todd Underwood

In this Google Prodcast episode, Todd Underwood, a reliability expert from Anthropic with experience at Google and OpenAI, discusses the current state and future of AI in SRE. Todd and the hosts focus...

4 Kesä 202543min