podcast
details
.com
Print
Share
Look for any podcast host, guest or anyone
Search
Showing episodes and shows of
Salim Virji
Shows
Google SRE Prodcast
The One with Startups and Adam Fletcher
In this episode, hosts Steve McGhee and Matt Siegler are joined by guest, Adam Fletcher, CEO and Co-Founder of MarketStreet. They discuss the current state of web development with LLMs, managing technical debt in startups, the evolution of infrastructure and reliability engineering, the role of community in technology, and the future of software engineering with AI.
2025-06-25
41 min
Google SRE Prodcast
The One with SLOs and Sal Furino
In this episode, Sal Furino, Customer Reliability Engineer at Bloomberg, discusses all things Service Level Objectives (SLOs) with hosts Steve McGhee and Matt Siegler. Together, they dig into what successful SLOs look like, how it relates to users, and how SLOs provide an effective framework for joint decisions about system reliability across product, engineering, and leadership teams.
2025-06-18
43 min
Google SRE Prodcast
The One With the Future of SRE and Matt Zelesko
Matt Zelesko, the head of Site Reliability Engineering at Google, discusses the evolution of SRE, highlighting the shift from traditional operations to a model that balances velocity and reliability to better serve the rapid advancements in AI and ML. He emphasizes that SRE's core mission is to enable partners to move quickly while meeting reliability goals, and that the sheer scale of Google's infrastructure necessitates the SRE model for cross-system problem-solving. Zelesko envisions AI as a crucial assistant for SREs, improving incident detection, mitigation, and postmortem processes, and allowing SREs to focus on more complex engineering challenges and risk...
2025-06-11
26 min
Google SRE Prodcast
The One with AI and Todd Underwood
In this Google Prodcast episode, Todd Underwood, a reliability expert from Anthropic with experience at Google and OpenAI, discusses the current state and future of AI in SRE. Todd and the hosts focus on the current state and future of AI and ML in production, particularly for SREs. Topics discussed include the challenges of AI-Ops, limitations of current anomaly detection, the potential for AI in config authoring and troubleshooting, trade-offs between product velocity and reliability, the evolving role of SREs in an AI-driven world, and book publication for optimal timing.
2025-06-04
43 min
Google SRE Prodcast
The One With Data Centers and Peter Pellerzi
This episode features guest, Peter Pellerzi (Distinguished Engineer, Google). Peter and the hosts, Matt Siegler and Steve McGhee, focus on the physical infrastructure side of SRE, discussing topics such as the scale of Google's data centers, handling incidents like power outages, testing and preparedness strategies, the use of AI for optimizing cooling plants, and more. Peter also emphasizes the importance of community support, proactive planning, and learning from real-world testing and incidents to ensure high availability and resilience in data center operations.
2025-05-28
36 min
Google SRE Prodcast
The One With Security and Jessica Theodat
Jessica Theodat (Senior SRE & Security Tech Lead, Google) joins hosts Jordan Greenberg and Steve McGhee to discuss the intersection of security and site reliability engineering at Google. Jessica touches on risk management, the unique nature of security incident responses, and the shared goals between security and SRE. The crew also delves into the balance between security and SRE, acknowledging the tension and the need for collaboration between teams to achieve business goals and user trust.
2025-05-21
19 min
Google SRE Prodcast
We’re back with Season 4!
In this "bumpisode", hosts and producers of Prodcast (including our new co-host, Matt Siegler!) reflect on the previous season and introduce the new season's focus on upcoming trends in Site Reliability Engineering (SRE) and AI, and the friends we make along the way. They also introduce new elements we are bringing in with Season 4, such as a video format and a feedback form.
2025-04-16
15 min
Google SRE Prodcast
Special Episode: You Missed a Page from Telebot
This episode features Javi Beltran, a Google engineering lead who created the "Telebot" theme song. With our beloved hosts, Steve McGhee and Jordan Greenberg, Beltran discusses the origins of the song, created in 2012 for Google's paging system. The song was meant to add a touch of levity to what could be a stressful situation for engineers on-call. Beltran also unveils a new, more modern remix of “Telebot” (created in collaboration with our host, Jordan Greenberg!) which will be used as the intro theme for the podcast's next season.
2025-01-29
16 min
Google SRE Prodcast
Imperative vs. Declarative Change Workflows with Dominic Hutton & Niccolo' Cascarano
In this episode of the Prodcast, guests Dominic Hutton (Staff SRE, HashiCorp) and Niccolo' Cascarano (Senior Staff SRE at Google) join hosts Steve McGhee and Jordan Greenberg to dive into configurations. They discuss the differences between imperative and declarative configuration, explore the benefits and challenges of each approach, and the need for careful consideration when choosing between the two. Ultimately, the goal is to achieve reliable and maintainable systems through effective configuration management.
2024-12-11
36 min
Google SRE Prodcast
Human Factors in Complex Systems with Casey Rosenthal and John Allspaw
This episode features Casey Rosenthal (Founder, Cirrusly.ai) and John Allspaw (Founder and Principal, Adaptive Capacity Labs), joining our hosts Steve McGhee and Jordan Greenberg. Together they discuss how resilience appears in Software Engineering and SRE and explore the importance of understanding the human factors involved in adapting to system failures—highlighting the need for a more qualitative and holistic approach to understanding how engineers successfully adapt to system behavior and improving overall reliability.
2024-12-04
41 min
Google SRE Prodcast
Embracing Complexity with Christina Schulman & Dr. Laura Maguire
In this episode of the Prodcast, we are joined by guests Christina Schulman (Staff SRE, Google) and Dr. Laura Maguire (Principal Engineer, Trace Cognitive Engineering). They emphasize the human element of SRE and the importance of fostering a culture of collaboration, learning, and resilience in managing complex systems. They touch upon topics such as the need for diverse perspectives and collaboration in incident response, the necessity of embracing complexity, and explore concepts such as aerodynamic stability, and more.
2024-11-20
33 min
Google SRE Prodcast
Maglev: load balancing at Google with Cody Smith and Trisha Weir
In this episode, Cody Smith (CTO and Co-founder, Camus Energy) & Trisha Weir (SRE Department Lead, Google) join hosts Steve McGhee and Jordan Greenberg, to discuss their experience developing Maglev, a highly available and distributed network load balancer (NLB) that is an integral part of the cloud architecture that manages traffic that comes in to a datacenter. Starting with Maglev’s humble beginnings as a skunkworks effort, Cody and Trisha recount the challenges they faced, and emphasize the importance of psychological safety, collaboration, and adaptability in SRE innovation.
2024-11-13
32 min
Google SRE Prodcast
Profiling data with Pat Somaru and Narayan Desai
In this episode, guests Narayan Desai (Principal SRE, Google) and Pat Somaru (Senior Production Engineer, Meta) join hosts Steve McGhee and Florian Rathgeber to discuss the challenges of observability and working with profiling data. The discussion covers intriguing topics like noise reduction, workload modeling, and the need for better tools and techniques to handle high-cardinality data.
2024-10-30
42 min
Google SRE Prodcast
Google Public DNS (8.8.8.8) with Wilmer van der Gaast and Andy Sykes
This episode features Google engineers Wilmer van der Gaast (Production on-tall) and Andy Sykes (Senior Staff Systems Engineer, SRE), joining hosts Steve McGhee and Jordan Greenberg, to discuss the development and maintenance of Google Public DNS (8.8.8.8). They highlight the initial motivations for creating the service, technical challenges like cache poisoning and load balancing, as well as the collaborative effort between SRE and SWE teams to address these issues. They also reflect on the evolving nature of SRE and advice for aspiring SREs.
2024-10-23
32 min
Google SRE Prodcast
SRE in the Retail and Gaming Worlds with Jordan Chernev & Scott Bowers
Guests Jordan Chernev (Senior Technology Executive) and Scott Bowers (SRE, Gearbox Software) who hail from the retail and gaming industries, respectively, join hosts Steve McGhee and Jordan Greenberg to discuss the unique challenges of Site Reliability Engineering in their industries. They share the importance of aligning SLOs with user experience, strategies for handling spikes in traffic, communicating with users during outages, and investing in reliability.
2024-10-16
33 min
Google SRE Prodcast
Incident Response with Sarah Butt and Vrai Stacey
Sarah Butt (Principal Engineer, Centralized Incident Response, Salesforce) and Vrai Stacey (Staff Software Engineer, Google) join hosts Steve McGhee and Jordan Greenberg to dive into incident response—particularly tooling and software for reliability incidents. Tune in for an in-depth discussion on topics such as the importance of communication and collaboration during incidents, and the role of tooling in supporting incident response processes. Sarah and Vrai also share personal takeaways from incidents they have experienced.
2024-10-09
43 min
Google SRE Prodcast
Building Reliable Systems with Silvia Botros and Niall Murphy
Silvia Botros (SRE Architect, Twilio | Author of "High Performance MySQL, 4th edition”) and Niall Murphy (Co-founder & CEO, Stanza) join hosts Steve McGhee and Jordan Greenberg, to discuss cultural shifts in database engineering, rate limiting, load shedding, holistic approaches to reliability, proactive measures to build customer trust, and much more!
2024-10-02
42 min
Google SRE Prodcast
Creating Systems that are Safe with Liz Fong-Jones
Liz Fong-Jones (former Google SRE and current Field CTO at honeycomb.io) joins hosts Steve McGhee and Jordan Greenberg for a lively discussion centered around observability, its evolution from monitoring, and its role in modern software development. Tune in for more on the importance of observability as a spectrum, the evolving role of SREs, and advice to aspiring software engineers.
2024-09-25
28 min
Google SRE Prodcast
Production Problems Are For All! with Ben Treynor Sloss
Ben Treynor Sloss (VP of Engineering, Google) joins hosts Steve McGhee and Dr. Jennifer Petoff (Director of Technical Infrastructure Education, Google) to share the evolution of SRE and its impact on software development, how AI and ML significantly impacts SRE practices, and the future of SRE. Ben coined the term "Site Reliability Engineering" for his team of (now) 4,000 software engineers, engaged in what were traditionally operations functions. Under Ben's leadership, Google SRE wrote two best-selling books on SRE. Since then, the rest of the SaaS industry has come to adopt the SRE name, mission, and practices.
2024-09-18
31 min
Google SRE Prodcast
There Remains a Huge Amount of Work to Do, with Healfdene Goguen
In this episode, Healfdene Goguen (Principal Engineer, Google) joins hosts Steve McGhee and Jordan Greenberg to discuss the vast amount of work to be done by SREs, and the fascinating challenges to tackle with clear real-world implications. It's a truly exciting time to be an SRE at Google!
2024-09-11
26 min
Google SRE Prodcast
SRE, a Basis of Influence, with Amy Tobey & Vladyslav Ukis
In this season of Google Prodcast, current and former SREs, both within and outside of Google, chat with hosts Steve McGhee and Jordan Greenberg to discuss software systems designed and built by SREs. For "episode zero", guests Amy Tobey (Live Services SRE, Netflix) and Dr. Vladyslav Ukis (Head of R&D, Siemens Healthineers, Author of "Establishing SRE Foundations") will set the stage for the season with a lively discussion about what Software Engineering means to Site Reliability Engineering.
2024-09-04
41 min
Google SRE Prodcast
Life of An SRE: Life after Google SRE, with Carla Geisser, Cody Smith, and Laura Nolan
Former Google SREs, or "Xooglers", talk with hosts MP and Steve McGhee about site reliability engineering outside of Google. What’s the difference in scale? What skills are generally valuable? And why can’t you build “SRE in a box” that jump-starts pretty much any organization? Join Carla Geisser, Cody Smith, and Laura Nolan in their lively conversation about what SRE skills and knowledge they have found useful in roles outside of Google.
2023-11-07
46 min
Google SRE Prodcast
Life of An SRE with Sabrina Farmer
Sabrina Farmer, VP of Engineering at Google, talks about her career journey through Site Reliability Engineering. What does management mean? What’s involved in being an effective manager? and what’s a feasibility study? Hear some great advice on how to get what you expect out of a role, wherever on the ladder it is.
2023-10-31
51 min
Google SRE Prodcast
Life of An SRE with Dave Reisner
Dave Reisner talks about his path to Staff SRE, from ArchLinux contributor through DevOps to software engineer. This episode emphasizes the value of strong mentoring and manager relationships, and the challenges of work-life balance.
2023-10-17
29 min
Google SRE Prodcast
Life of an SRE with Stephen Benjamin
Explore the role and responsibilities of an SRE manager with Stephen Benjamin.
2023-10-10
32 min
Google SRE Prodcast
Life of An SRE with Jessica Theodat
Explore the role and responsibilities of a Senior SRE with Jessica Theodat, as she discusses life-work balance, the value of mentoring, and being a Black woman in SRE.
2023-10-03
25 min
Google SRE Prodcast
Life of An SRE with Shannon Brady and Theo Klein
Explore the career path of SREs Shannon Brady and Theo Klein as they discusses their paths to Site Reliability Engineering and finding their areas of expertise.
2023-09-26
44 min
Google SRE Prodcast
Life of An SRE with Mariuxi Vasconez and Julian Alarcon
In this episode, Mariuxi and Julian discuss their paths to SRE: what drew them initially to SRE, and what motivates them to continue developing skills
2023-09-19
34 min
Google SRE Prodcast
Life of An SRE Episode 1: Tom Cranitch and Megan Yin
How does one become an SRE? And what’s the career like? In this episode, Tom and Megan discuss their path to SRE.
2023-09-12
27 min
Google SRE Prodcast
Creating the SRE Prodcast with John Reese (JTR)
Host MP English and former Google SRE John Reese (JTR) chat about the creation of the Prodcast. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
2022-06-07
10 min
Google SRE Prodcast
Postmortems with Ayelet Sachto
Ayelet Sachto offers advice on creating an actionable, transparent, and blameless postmortem culture. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
2022-05-31
28 min
Google SRE Prodcast
Incident Management with Adrienne Walcer
Adrienne Walcer discusses how to approach and organize incident management efforts throughout the production lifecycle. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
2022-05-24
39 min
Google SRE Prodcast
On-Call Rotations with Andrew Widdowson (APW)
Andrew Widdowson (APW) shares strategies for successful on-call rotations. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
2022-05-17
43 min
Google SRE Prodcast
Automation with Pierre Palatin
Pierre Palatin dives into different automation strategies, how to build confidence in your system, and why designing the UI may be your biggest challenge. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
2022-05-10
1h 00
Google SRE Prodcast
Client-Transparent Migrations with Pavan Adharapurapu
Pavan Adharapurapu details how to approach large-scale migrations while optimizing for user experience. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
2022-05-03
40 min
Google SRE Prodcast
Rethinking SLOs with Narayan Desai
Narayan Desai explains why SLOs can be problematic and proposes alternative methods for monitoring complex, large-scale systems. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
2022-04-26
25 min
Google SRE Prodcast
Alerting with Amelia Harrison
Amelia Harrison advises on when and how to alert, ideal coverage, and tuning. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
2022-04-19
26 min
Google SRE Prodcast
Customer-Centric Monitoring with Silvia Esparrachiari
Silvia Esparrachiari talks about the challenges of monitoring and the importance of understanding your users. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
2022-04-12
31 min
Google SRE Prodcast
SRE Philosophy with Jennifer Mace (Macey)
What is SRE, anyway? Jennifer Mace (Macey) gives us her definition of "site reliability engineer," discusses how to manage risk, and shares key questions to ask developers. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
2022-04-05
33 min