#129: Handling Failure

#129: Handling Failure

Failure in our software systems is inevitable - be it a failing hard drive, broken network cable, power outage, virus, or simply a bug in the code.

"Hope is not a strategy" - thus we need to think about how we handle that failure.

Why you might be interesting in this episode:

  • The differences between how failures impact our traditional monolith applications and the more modern distributed application
  • To gain an understanding of the terms like Graceful Degredation, Cascading Failure, The Retry software pattern, The Circuit Breaker software pattern, and Deadline Propagation
  • And advice on how to find opportunities to use them

-----

Find this episodes show notes at: https://red-folder.com/podcasts/129

Have an idea for an episode topic, or want to see what is coming up: https://red-folder.com/podcasts/roadmap

Jaksot(206)

#133: DevOps Topologies - Anti-Types

#133: DevOps Topologies - Anti-Types

In this episode I want to talk about the team structures discussed on https://web.devopstopologies.com/ - with a focus this week on the anti-types. The devopstopologies.com website is based on the wor...

25 Touko 202211min

#132: Inverse Conway Maneuver

#132: Inverse Conway Maneuver

In the last episode, I introduced "Conway's Law" - an observation of how our organisational structures influence our software structures. In this episode, I want to talk about how we can utilise this ...

11 Touko 202210min

#131: Conway's Law

#131: Conway's Law

In this episode, I introduce Conway's Law, which talks about how our software structures will reflect the structures of the organisations that create them. Why you might be Interested in this episode:...

4 Touko 20227min

#130: To Checklist or not to Checklist

#130: To Checklist or not to Checklist

This episode, I want to take a look at Checklists - when to use and when not to. Much of this episode is inspired by the Sight Reliability Engineering practices that come out of Google. Why you might ...

27 Huhti 20228min

#128: Error Budgets

#128: Error Budgets

In this episode, I take a look at "Error Budgets" Much of this episode is inspired by the Sight Reliability Engineering practices that come out of Google Why you might be interested in this episode: ...

6 Huhti 20229min

#127: System Availability - Service Level Indicators, Objectives and Agreements

#127: System Availability - Service Level Indicators, Objectives and Agreements

In this episode, I take a look at how to measure the availability of our systems. Much of this episode is inspired by the Sight Reliability Engineering practices that come out of Google Why you might ...

30 Maalis 202210min

#126: State of DevOps 2021 - What it says about Site Reliability Engineering

#126: State of DevOps 2021 - What it says about Site Reliability Engineering

The State of DevOps report provides excellent insight through rigorous analysis of its wide reaching survey. The research provides evidence-based guidance to help focus on the capabilities that drive ...

23 Maalis 202211min