Look for any podcast host, guest or anyone
Showing episodes and shows of

Salim Virji

Shows

Google SRE ProdcastGoogle SRE ProdcastThe One with Startups and Adam FletcherIn this episode, hosts Steve McGhee and Matt Siegler are joined by guest, Adam Fletcher, CEO and Co-Founder of MarketStreet. They discuss the current state of web development with LLMs, managing technical debt in startups, the evolution of infrastructure and reliability engineering, the role of community in technology, and the future of software engineering with AI.2025-06-2541 minGoogle SRE ProdcastGoogle SRE ProdcastThe One with SLOs and Sal FurinoIn this episode, Sal Furino, Customer Reliability Engineer at Bloomberg, discusses all things Service Level Objectives (SLOs) with hosts Steve McGhee and Matt Siegler. Together, they dig into what successful SLOs look like, how it relates to users, and how SLOs provide an effective framework for joint decisions about system reliability across product, engineering, and leadership teams.2025-06-1843 minGoogle SRE ProdcastGoogle SRE ProdcastThe One With the Future of SRE and Matt ZeleskoMatt Zelesko, the head of Site Reliability Engineering at Google, discusses the evolution of SRE, highlighting the shift from traditional operations to a model that balances velocity and reliability to better serve the rapid advancements in AI and ML. He emphasizes that SRE's core mission is to enable partners to move quickly while meeting reliability goals, and that the sheer scale of Google's infrastructure necessitates the SRE model for cross-system problem-solving. Zelesko envisions AI as a crucial assistant for SREs, improving incident detection, mitigation, and postmortem processes, and allowing SREs to focus on more complex engineering challenges and risk...2025-06-1126 minGoogle SRE ProdcastGoogle SRE ProdcastThe One with AI and Todd UnderwoodIn this Google Prodcast episode, Todd Underwood, a reliability expert from Anthropic with experience at Google and OpenAI, discusses the current state and future of AI in SRE. Todd and the hosts focus on the current state and future of AI and ML in production, particularly for SREs. Topics discussed include the challenges of AI-Ops, limitations of current anomaly detection, the potential for AI in config authoring and troubleshooting, trade-offs between product velocity and reliability, the evolving role of SREs in an AI-driven world, and book publication for optimal timing.2025-06-0443 minGoogle SRE ProdcastGoogle SRE ProdcastThe One With Data Centers and Peter PellerziThis episode features guest, Peter Pellerzi (Distinguished Engineer, Google). Peter and the hosts, Matt Siegler and Steve McGhee, focus on the physical infrastructure side of SRE, discussing topics such as the scale of Google's data centers, handling incidents like power outages, testing and preparedness strategies, the use of AI for optimizing cooling plants, and more. Peter also emphasizes the importance of community support, proactive planning, and learning from real-world testing and incidents to ensure high availability and resilience in data center operations.2025-05-2836 minGoogle SRE ProdcastGoogle SRE ProdcastThe One With Security and Jessica TheodatJessica Theodat (Senior SRE & Security Tech Lead, Google) joins hosts Jordan Greenberg and Steve McGhee to discuss the intersection of security and site reliability engineering at Google. Jessica touches on risk management, the unique nature of security incident responses, and the shared goals between security and SRE. The crew also delves into the balance between security and SRE, acknowledging the tension and the need for collaboration between teams to achieve business goals and user trust.2025-05-2119 minGoogle SRE ProdcastGoogle SRE ProdcastWe’re back with Season 4!In this "bumpisode", hosts and producers of Prodcast (including our new co-host, Matt Siegler!) reflect on the previous season and introduce the new season's focus on upcoming trends in Site Reliability Engineering (SRE) and AI, and the friends we make along the way. They also introduce new elements we are bringing in with Season 4, such as a video format and a feedback form.2025-04-1615 minGoogle SRE ProdcastGoogle SRE ProdcastSpecial Episode: You Missed a Page from TelebotThis episode features Javi Beltran, a Google engineering lead who created the "Telebot" theme song. With our beloved hosts, Steve McGhee and Jordan Greenberg, Beltran discusses the origins of the song, created in 2012 for Google's paging system. The song was meant to add a touch of levity to what could be a stressful situation for engineers on-call. Beltran also unveils a new, more modern remix of “Telebot” (created in collaboration with our host, Jordan Greenberg!) which will be used as the intro theme for the podcast's next season.2025-01-2916 minGoogle SRE ProdcastGoogle SRE ProdcastImperative vs. Declarative Change Workflows with Dominic Hutton & Niccolo' CascaranoIn this episode of the Prodcast, guests Dominic Hutton (Staff SRE, HashiCorp) and Niccolo' Cascarano (Senior Staff SRE at Google) join hosts Steve McGhee and Jordan Greenberg to dive into configurations. They discuss the differences between imperative and declarative configuration, explore the benefits and challenges of each approach, and the need for careful consideration when choosing between the two. Ultimately, the goal is to achieve reliable and maintainable systems through effective configuration management.2024-12-1136 minGoogle SRE ProdcastGoogle SRE ProdcastHuman Factors in Complex Systems with Casey Rosenthal and John AllspawThis episode features Casey Rosenthal (Founder, Cirrusly.ai) and John Allspaw (Founder and Principal, Adaptive Capacity Labs), joining our hosts Steve McGhee and Jordan Greenberg. Together they discuss how resilience appears in Software Engineering and SRE and explore the importance of understanding the human factors involved in adapting to system failures—highlighting the need for a more qualitative and holistic approach to understanding how engineers successfully adapt to system behavior and improving overall reliability.2024-12-0441 minGoogle SRE ProdcastGoogle SRE ProdcastEmbracing Complexity with Christina Schulman & Dr. Laura MaguireIn this episode of the Prodcast, we are joined by guests Christina Schulman (Staff SRE, Google) and Dr. Laura Maguire (Principal Engineer, Trace Cognitive Engineering). They emphasize the human element of SRE and the importance of fostering a culture of collaboration, learning, and resilience in managing complex systems. They touch upon topics such as the need for diverse perspectives and collaboration in incident response, the necessity of embracing complexity, and explore concepts such as aerodynamic stability, and more.2024-11-2033 minGoogle SRE ProdcastGoogle SRE ProdcastMaglev: load balancing at Google with Cody Smith and Trisha WeirIn this episode, Cody Smith (CTO and Co-founder, Camus Energy) & Trisha Weir (SRE Department Lead, Google) join hosts Steve McGhee and Jordan Greenberg, to discuss their experience developing Maglev, a highly available and distributed network load balancer (NLB) that is an integral part of the cloud architecture that manages traffic that comes in to a datacenter. Starting with Maglev’s humble beginnings as a skunkworks effort, Cody and Trisha recount the challenges they faced, and emphasize the importance of psychological safety, collaboration, and adaptability in SRE innovation.2024-11-1332 minGoogle SRE ProdcastGoogle SRE ProdcastProfiling data with Pat Somaru and Narayan DesaiIn this episode, guests Narayan Desai (Principal SRE, Google) and Pat Somaru (Senior Production Engineer, Meta) join hosts Steve McGhee and Florian Rathgeber to discuss the challenges of observability and working with profiling data. The discussion covers intriguing topics like noise reduction, workload modeling, and the need for better tools and techniques to handle high-cardinality data.2024-10-3042 minGoogle SRE ProdcastGoogle SRE ProdcastGoogle Public DNS (8.8.8.8) with Wilmer van der Gaast and Andy SykesThis episode features Google engineers Wilmer van der Gaast (Production on-tall) and Andy Sykes (Senior Staff Systems Engineer, SRE), joining hosts Steve McGhee and Jordan Greenberg, to discuss the development and maintenance of Google Public DNS (8.8.8.8). They highlight the initial motivations for creating the service, technical challenges like cache poisoning and load balancing, as well as the collaborative effort between SRE and SWE teams to address these issues. They also reflect on the evolving nature of SRE and advice for aspiring SREs.2024-10-2332 minGoogle SRE ProdcastGoogle SRE ProdcastSRE in the Retail and Gaming Worlds with Jordan Chernev & Scott BowersGuests Jordan Chernev (Senior Technology Executive) and Scott Bowers (SRE, Gearbox Software) who hail from the retail and gaming industries, respectively, join hosts Steve McGhee and Jordan Greenberg  to discuss the unique challenges of Site Reliability Engineering in their industries. They share the importance of aligning SLOs with user experience, strategies for handling spikes in traffic, communicating with users during outages, and investing in reliability.2024-10-1633 minGoogle SRE ProdcastGoogle SRE ProdcastIncident Response with Sarah Butt and Vrai StaceySarah Butt (Principal Engineer, Centralized Incident Response, Salesforce) and Vrai Stacey (Staff Software Engineer, Google) join hosts Steve McGhee and Jordan Greenberg to dive into incident response—particularly tooling and software for reliability incidents. Tune in for an in-depth discussion on topics such as the importance of communication and collaboration during incidents, and the role of tooling in supporting incident response processes. Sarah and Vrai also share personal takeaways from incidents they have experienced.2024-10-0943 minGoogle SRE ProdcastGoogle SRE ProdcastBuilding Reliable Systems with Silvia Botros and Niall MurphySilvia Botros (SRE Architect, Twilio | Author of "High Performance MySQL, 4th edition”) and Niall Murphy (Co-founder & CEO, Stanza) join hosts Steve McGhee and Jordan Greenberg, to discuss cultural shifts in database engineering, rate limiting, load shedding, holistic approaches to reliability, proactive measures to build customer trust, and much more!2024-10-0242 minGoogle SRE ProdcastGoogle SRE ProdcastCreating Systems that are Safe with Liz Fong-JonesLiz Fong-Jones (former Google SRE and current Field CTO at honeycomb.io) joins hosts Steve McGhee and Jordan Greenberg for a lively discussion centered around observability, its evolution from monitoring, and its role in modern software development. Tune in for more on the importance of observability as a spectrum, the evolving role of SREs, and advice to aspiring software engineers.2024-09-2528 minGoogle SRE ProdcastGoogle SRE ProdcastProduction Problems Are For All! with Ben Treynor SlossBen Treynor Sloss (VP of Engineering, Google) joins hosts Steve McGhee and Dr. Jennifer Petoff (Director of Technical Infrastructure Education, Google) to share the evolution of SRE and its impact on software development, how AI and ML significantly impacts SRE practices, and the future of SRE. Ben coined the term "Site Reliability Engineering" for his team of (now) 4,000 software engineers, engaged in what were traditionally operations functions. Under Ben's leadership, Google SRE wrote two best-selling books on SRE. Since then, the rest of the SaaS industry has come to adopt the SRE name, mission, and practices. 2024-09-1831 minGoogle SRE ProdcastGoogle SRE ProdcastThere Remains a Huge Amount of Work to Do, with Healfdene GoguenIn this episode, Healfdene Goguen (Principal Engineer, Google) joins hosts Steve McGhee and Jordan Greenberg to discuss the vast amount of work to be done by SREs, and the fascinating challenges to tackle with clear real-world implications. It's a truly exciting time to be an SRE at Google!2024-09-1126 minGoogle SRE ProdcastGoogle SRE ProdcastSRE, a Basis of Influence, with Amy Tobey & Vladyslav UkisIn this season of Google Prodcast, current and former SREs, both within and outside of Google, chat with hosts Steve McGhee and Jordan Greenberg to discuss software systems designed and built by SREs.  For "episode zero", guests Amy Tobey (Live Services SRE, Netflix) and Dr. Vladyslav Ukis (Head of R&D, Siemens Healthineers, Author of "Establishing SRE Foundations") will set the stage for the season with a lively discussion about what Software Engineering means to Site Reliability Engineering.2024-09-0441 minGoogle SRE ProdcastGoogle SRE ProdcastLife of An SRE: Life after Google SRE, with Carla Geisser, Cody Smith, and Laura NolanFormer Google SREs, or "Xooglers", talk with hosts MP and Steve McGhee about site reliability engineering outside of Google. What’s the difference in scale? What skills are generally valuable? And why can’t you build “SRE in a box” that jump-starts pretty much any organization? Join Carla Geisser, Cody Smith, and Laura Nolan in their lively conversation about what SRE skills and knowledge they have found useful in roles outside of Google. 2023-11-0746 minGoogle SRE ProdcastGoogle SRE ProdcastLife of An SRE with Sabrina FarmerSabrina Farmer, VP of Engineering at Google, talks about her career journey through Site Reliability Engineering.  What does management mean? What’s involved in being an effective manager? and what’s a feasibility study? Hear some great advice on how to get what you expect out of a role, wherever on the ladder it is. 2023-10-3151 minGoogle SRE ProdcastGoogle SRE ProdcastLife of An SRE with Dave ReisnerDave Reisner talks about his path to Staff SRE, from ArchLinux contributor through DevOps to software engineer. This episode emphasizes the value of strong mentoring and manager relationships, and the challenges of work-life balance.  2023-10-1729 minGoogle SRE ProdcastGoogle SRE ProdcastLife of an SRE with Stephen BenjaminExplore the role and responsibilities of an SRE manager with Stephen Benjamin.2023-10-1032 minGoogle SRE ProdcastGoogle SRE ProdcastLife of An SRE with Jessica TheodatExplore the role and responsibilities of a Senior SRE with Jessica Theodat, as she discusses life-work balance, the value of mentoring, and being a Black woman in SRE.2023-10-0325 minGoogle SRE ProdcastGoogle SRE ProdcastLife of An SRE with Shannon Brady and Theo KleinExplore the career path of SREs Shannon Brady and Theo Klein as they discusses their paths to Site Reliability Engineering and finding their areas of expertise. 2023-09-2644 minGoogle SRE ProdcastGoogle SRE ProdcastLife of An SRE with Mariuxi Vasconez and Julian AlarconIn this episode, Mariuxi and Julian discuss their paths to SRE: what drew them initially to SRE, and what motivates them to continue developing skills  2023-09-1934 minGoogle SRE ProdcastGoogle SRE ProdcastLife of An SRE Episode 1: Tom Cranitch and Megan YinHow does one become an SRE? And what’s the career like? In this episode, Tom and Megan discuss their path to SRE.  2023-09-1227 minGoogle SRE ProdcastGoogle SRE ProdcastCreating the SRE Prodcast with John Reese (JTR)Host MP English and former Google SRE John Reese (JTR) chat about the creation of the Prodcast. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript2022-06-0710 minGoogle SRE ProdcastGoogle SRE ProdcastPostmortems with Ayelet SachtoAyelet Sachto offers advice on creating an actionable, transparent, and blameless postmortem culture. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript2022-05-3128 minGoogle SRE ProdcastGoogle SRE ProdcastIncident Management with Adrienne WalcerAdrienne Walcer discusses how to approach and organize incident management efforts throughout the production lifecycle. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript2022-05-2439 minGoogle SRE ProdcastGoogle SRE ProdcastOn-Call Rotations with Andrew Widdowson (APW)Andrew Widdowson (APW) shares strategies for successful on-call rotations. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript2022-05-1743 minGoogle SRE ProdcastGoogle SRE ProdcastAutomation with Pierre PalatinPierre Palatin dives into different automation strategies, how to build confidence in your system, and why designing the UI may be your biggest challenge. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript2022-05-101h 00Google SRE ProdcastGoogle SRE ProdcastClient-Transparent Migrations with Pavan AdharapurapuPavan Adharapurapu details how to approach large-scale migrations while optimizing for user experience. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript2022-05-0340 minGoogle SRE ProdcastGoogle SRE ProdcastRethinking SLOs with Narayan DesaiNarayan Desai explains why SLOs can be problematic and proposes alternative methods for monitoring complex, large-scale systems. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript2022-04-2625 minGoogle SRE ProdcastGoogle SRE ProdcastAlerting with Amelia HarrisonAmelia Harrison advises on when and how to alert, ideal coverage, and tuning. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript2022-04-1926 minGoogle SRE ProdcastGoogle SRE ProdcastCustomer-Centric Monitoring with Silvia EsparrachiariSilvia Esparrachiari talks about the challenges of monitoring and the importance of understanding your users. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript2022-04-1231 minGoogle SRE ProdcastGoogle SRE ProdcastSRE Philosophy with Jennifer Mace (Macey)What is SRE, anyway? Jennifer Mace (Macey) gives us her definition of "site reliability engineer," discusses how to manage risk, and shares key questions to ask developers. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript2022-04-0533 min