In this interview, Tim Craig and fellow Googler Gustavo Franco, a site reliability engineer (SRE), discuss the wide range of events that qualify as “incidents;” the need for a conscious, robust, and well-defined process for understanding them; the role of training; and how to get buy-in from management so you can spread incident response training throughout an organization.