Who owns this outage? Building intelligent, automated escalation chains

Who owns this outage? Building intelligent, automated escalation chains

Maxwell, a solution architect at xMatters, took a winding road to get to where he is. After a computer engineering education, he held jobs as field support engineer, product manager, SRE, and finally his current role as a solutions architect, where he serves as something of an SRE for SREs, helping them solve incident management problems with the help of xMatters.

When he moved to the SRE role, Maxwell wanted to get back to doing technical work. It was a lateral move within his company, which was migrating an on-prem solution into the cloud. It’s a journey that plenty of companies are making now: breaking an application into microservices, running processes in containers, and using Kubernetes to orchestrate the whole thing. Non-production environments would go down and waste SRE time, making it harder to address problems in the production pipeline.

At the heart of their issues was the incident response process. They had several bottlenecks that prevented them from delivering value to their customers quickly. Incidents would send emails to the relevant engineers, sometimes 20 on a single email, which made it easy for any one engineer to ignore the problem—someone else has got this. They had a bad silo problem, where escalating to the right person across groups became an issue of its own. And of course, most of this was manual. Their MTTR—mean time to resolve—was lagging.

Maxwell moved over to xMatters because they managed to solve these problems through clever automation. Their product automates the scheduling and notification process so that the right person knows about the incident as soon as possible. At the core of this process was a different MTTR—mean time to respond. Once an engineer started working to resolve a problem, it was all down to runbooks and skill. But the lag between the initial incident and that start was the real slowdown.

It’s not just the response from the first SRE on call. It’s the other escalations down the line—to data engineers, for example—that can eat away time. They’ve worked hard to make escalation configuration easy. It not only handles who's responsible for specific services and metrics, but who’s in the escalation chain from there. When the incident hits, the notifications go out through a series of configured channels; maybe it tries a chat program first, then email, then SMS.

The on-call process is often a source of dread, but automating the escalation process can take some of the sting out of it. Check out the episode to learn more.

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Jaksot(911)

Even your voice is a data problem

Even your voice is a data problem

Recorded last December at AWS re:Invent, Ryan welcomes CEO and co-founder of Deepgram, Scott Stephenson, for a conversation on advancing voice AI technology. They cover how Deepgram is improving speec...

13 Helmi 35min

The logos, ethos, and pathos of your LLMs

The logos, ethos, and pathos of your LLMs

Ryan is joined by Professor Tom Griffiths, the head of Princeton University’s AI Lab, to dive into findings from his new book The Laws of Thought, which explores the history of the philosophy, mathema...

10 Helmi 34min

AI attention span so good it shouldn’t be legal

AI attention span so good it shouldn’t be legal

We have another two-for-one special this week, with two more interviews from the floor of re:Invent. First, Ryan welcomes Pathway CEO Zuzanna Stamirowska and CCO Victor Szczerba to dive into their dev...

6 Helmi 30min

Generating text with diffusion (and ROI with LLMs)

Generating text with diffusion (and ROI with LLMs)

Two guests for the price of one! This episode has two interviews recorded at AWS re:Invent back in December. In part 1, Ryan chats with the co-founder and CEO of Inception, Stefano Ermon, about diffus...

3 Helmi 30min

Wanna see a CSS magic trick?

Wanna see a CSS magic trick?

Ryan is joined by Chris Coyier, founder of CSS Tricks and CodePen, to talk all about what the state of the art of CSS is today, including new features like variables and scroll-driven animations. They...

30 Tammi 38min

Spy vs spy at scale

Spy vs spy at scale

Ryan welcomes Anthony Vinci, former senior intelligence officer and author of The Fourth Intelligence Revolution, to explore AI’s evolving role in intelligence in places like translation and image ana...

27 Tammi 35min

AI can 10x developers...in creating tech debt

AI can 10x developers...in creating tech debt

Ryan sits down with Michael Parker, VP of Engineering at TurinTech to discuss the newest kind of tech debt—AI-generated tech debt. They dive into the uneven productivity results of AI tools, how tech ...

23 Tammi 29min

Don’t let your backend write checks your frontend can’t cache

Don’t let your backend write checks your frontend can’t cache

Ryan welcomes Prakash Chandran, CEO and co-founder of Xano, to the show to discuss the intricate relationship between frontend and backend development, the potential challenges that universal frontend...

20 Tammi 30min

Suosittua kategoriassa Liike-elämä ja talous

sijotuskasti
psykopodiaa-podcast
mimmit-sijoittaa
rss-rahapodi
rss-draivi
rss-lahtijat
oppimisen-psykologia
rss-rahamania
rss-porssipuhetta
taloudellinen-mielenrauha
rss-seuraava-potilas
rahapuhetta
rss-h-asselmoilanen
rss-paatos-podcast-suomen-kovimmat-paatoksentekijat-2
rss-paasipodi
rss-inderes
io-techin-tekniikkapodcast
pomojen-suusta
rss-viisas-raha-podi
rss-40-ajatusta-aanesta