Listen

Description

We learn how to embrace risk as we continue our learning about Site Reliability Engineering while Johnny Underwood talked too much, Joe shares a (scary) journey through his mind, and Michael, Reader of Names, ends the show on a dark note.

The full show notes for this episode are available at https://www.codingblocks.net/episode182.

Sponsors

Survey Says

How do we feel about DevOps?

Reviews

Thanks for the help Richard Hopkins and JR! Want to help out the show? Leave us a review!

News

Chapter 3: Embracing Risk

Managing Risk

Cover of the
The famous "SRE Book" from Google

Measuring Service Risk

Risk Tolerance Services

Identifying the Risk Tolerance of Consumer Services

Factors in assessing the risk tolerance of a service

Target level of availability
Types of failures
Cost
Other service metrics

Identifying the Risk Tolerance of Infrastructure Services

Target level of availability

Types of failures

Cost

… Google SRE's unofficial motto is "Hope is not a strategy".

Site Reliability Engineering: How Google Runs Production Systems

Motivation for Error Budgets

Anatomy of an Incident: Google's Approach to Incident Management for Production Services

Forming Your Error Budget

Benefits

Resources we Like

Tip of the Week