Listen

Description

It's finally time to learn what Site Reliability Engineering is all about, while Jer can't speak nor type, Merkle got one (!!!), and Mr. Wunderwood is wrong.

The full show notes for this episode are available at https://www.codingblocks.net/episode181.

Survey Says

So, DevOps is a culture, but SRE is a job title?

Reviews

Thanks for the review "Amazon Customer"! (You, er, we know who you are.)

Site Reliability Engineering

It is about scaling a business process, rather than just the machinery.

Site Reliability Engineering: How Google Runs Production Systems

However, we acknowledge that smaller organizations may be wondering how they can best use the experience represented here: much like security, the earlier you care about reliability, the better.

Site Reliability Engineering: How Google Runs Production Systems

Hope is not a strategy.

Site Reliability Engineering: How Google Runs Production Systems

Chapter 1 – Introduction

The famous "SRE Book" from Google

What exactly is Site Reliability Engineering, as it has come to be defined at Google? My explanation is simple: SRE is what happens when you ask a software engineer to design an operations team.

Site Reliability Engineering: How Google Runs Production Systems

Google's Approach to this Problem?

… we want systems that are automatic, not just automated.

Site Reliability Engineering: How Google Runs Production Systems

Challenges

One could view DevOps as a generalization of several core SRE principles to a wider range of organizations, management structures, and personnel. One could equivalently view SRE as a specific implementation of DevOps with some idiosyncratic extensions.

Site Reliability Engineering: How Google Runs Production Systems

Tenants of SRE

Durable Focus on Engineering

Max Change Velocity

Monitoring

Reliability is a function of mean time to failure (MTTF) and mean time to repair (MTTR).

Site Reliability Engineering: How Google Runs Production Systems

Emergency Response

Change Management

Demand Forecasting and Capacity Planning

Provisioning

Efficiency and Performance

Resources we Like

Tip of the Week