podcast
details
.com
Print
Share
Look for any podcast host, guest or anyone
Search
Showing episodes and shows of
Stephen Townshend
Shows
Tales from the Humpback Sky
Shame Hole (S02E03)
Send a text*New episodes every two weeks!*Owen and Ridge return to the Society of Reformed Pseudomorphs to find a scene of pandemonium. What shameful secrets will they uncover?Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music by Stephen Townshend and Harrison Sommerville. You can......find Harrison's music here: https://linktr.ee/ScoopMoosic...follow Harrison...
2026-02-17
45 min
Tales from the Humpback Sky
God Nangs and Demon Cubes (S02E02)
Send us a text*New episodes every two weeks!*Ridge and Owen kick up dust on their new hogs and return to Simeon Sours Sundries and Such for some much needed supplies.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music by Stephen Townshend and Harrison Sommerville. You can......find Harrison's music here: https://linktr.ee/ScoopMoosic...
2026-02-03
50 min
Tales from the Humpback Sky
Flexing and Sparring (S02E01)
Send us a text*New episodes every two weeks!*We kick off season two in the Screaming Velvet Tavern where Ridge and Owen debrief with Thalorg and plan their next adventure. Owen attempts another feat of great strength. Ridge returns to a demon stalker safe house from their previous life.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music...
2026-01-20
57 min
Slight Reliability
Starting a New Role (Episode 114)
Send a textThis week I kick off the 2026 season with some news and we explore how to prepare for a new role.You can buy Slight Reliability merch here (Note: you cannot order the mugs outside of New Zealand):https://slightreliability.digitees.co.nz/You can find Stephen on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Bluesky: https://bsky.app/profile/slightreliability.bsky.socialYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the...
2026-01-13
11 min
Tales from the Humpback Sky
Season 1 Intermission
Send us a textIn this special episode Stephen, Paul, and Tom take a moment to reflect on the first season and the adventure so far.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music by Stephen Townshend and Harrison Sommerville. You can......find Harrison's music here: https://linktr.ee/ScoopMoosic...follow Harrison on Instagram: https://www.instagram.com...
2026-01-06
59 min
Tales from the Humpback Sky
Best Friends (S01E20) Season Finale!
Send us a text*New episodes every two weeks!*In the season one finale Ridge and Owen attempt to escape the past and return to the city of Troika.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music by Stephen Townshend and Harrison Sommerville. You can......find Harrison's music here: https://linktr.ee/ScoopMoosic...follow Harrison on...
2025-12-24
50 min
Slight Reliability
AI Use-cases for SRE with Shmuel Kliger (Episode 113)
Send a textFrom the day we invented computers we've been struggling to keep applications running and delivering services to the business. Is this latest wave of AI helping or hurting us?This week I'm joined by Causely founder Shmuel Kliger to dive into...🌊 The three waves of AI hype over the decades (the history of AI)☠️ The dangers of over-promising and under-delivering what AI can do🧠 What is causal reasoning?😱 Is AI replacing SREs?🔮 AI as a way to allow humans to solve higher level problems...and much more.
2025-12-16
31 min
Tales from the Humpback Sky
Abomination (S01E19)
Send us a text*New episodes every two weeks!*In the penultimate episode of the season, Ridge and Owen face down the dark inquisitor Solvadis in the town of Macklebie's Ridge.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music by Stephen Townshend and Harrison Sommerville. You can......find Harrison's music here: https://linktr.ee/ScoopMoosic...follow...
2025-12-09
44 min
Slight Reliability
Operational Intelligence with Adam Kinniburgh (Episode 112)
Send a textWhat is operational intelligence and how is it different from observability or BI?This week I'm joined by SquaredUp's VP of Innovation Adam Kinniburgh to answer that question and many more including...❓ What is operational intelligence?🙈 Relating observability back to customer, business, or revenue😎 The value of giving stakeholders confidence🌉 Who bridges the gap between tech and business or engineers and leadership?🦋 Correlation VS causation and our innate desire to build connections...and much more.You can find Adam on:LinkedIn: http...
2025-12-09
31 min
Tales from the Humpback Sky
The Convergence (S01E18)
Send us a text*New episodes every two weeks!*As we close in on the end of the season, Ridge relives the moment that created them and Owen fights for his life as Kennick's machine begins to tear reality apart.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music by Stephen Townshend and Harrison Sommerville. You can......find...
2025-11-25
34 min
Slight Reliability
Leading Platform Teams with Dinesh Sukhija (Episode 111)
Send a textHow does leading platform teams differ from leading product teams?This week I'm joined by experienced technology leader Dinesh Sukhija to answer that question and many more including...❓ What is a platform team?⚽ Coaching engineers to focus on outcomes☀️ Connecting platform initiatives to business goals✋ Identifying the limiters in your team🎤 Spreading knowledge and avoiding single points of failure...and much more.You can find Dinesh on:LinkedIn: https://www.linkedin.com/in/dinesh-sukhija/You can find Stephen on:Lin...
2025-11-25
32 min
Tales from the Humpback Sky
Past Reflections (S01E17)
Send us a text*New episodes every two weeks!*Ridge travels back in time to a formative moment in their past. Owen tries to keep things together back in Eska Howel's laboratory and gets a visit from the future.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music by Stephen Townshend and Harrison Sommerville. You can......find Harrison's...
2025-11-18
50 min
Slight Reliability
The Implications of AI on Observability with Aaron "Checo" Pacheco (Episode 109)
Send a textHow could AI help human beings negotiate the mountains of telemetry we collect to get simple and fast insight?This week I'm joined by Ottermon AI CEO and founder Checo Pacheco about the lifecycle of observability coverage and tooling within organisations and how AI is helping to find signals amongst the noise and reduce cognitive load for SREs. We discuss...🎂 The need for a layer of logic on top of our telemetry data🚲 The observability lifecycle of a DevOps team🎶 How most orgs have many observability tools, and how we mi...
2025-11-04
38 min
Tales from the Humpback Sky
Smashing for Science (S01E16)
Send us a text*New episodes every two weeks!*Decisions from the past come back to haunt our heroes as they return to Eska Howell's laboratory. Kennick then begins his mirror-time-travel experiments! Some mirrors were harmed in the making of this episode.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music by Stephen Townshend and Harrison Sommerville. You can...
2025-10-28
47 min
Slight Reliability
Chaos Engineering with Kolton Andrus (Episode 108)
Send a textWhat is chaos engineering and how is it being used in 2025?This week I'm joined by Gremlin CEO and founder Kolton Andrus to discuss...🌪️ What is chaos engineering and what is its origins?🪴 How has it evolved over the year?🤖 The role of AI agents in SRE work💰 Justifying the value of chaos engineering🏃♀️➡️ How do I get started?...and much more.You can find Kolton on:LinkedIn: https://www.linkedin.com/in/kolton-andrus-77315a2/And you can find out more about Gremlin's...
2025-10-25
31 min
Tales from the Humpback Sky
Flotsam and Jetsam (S01E15)
Send us a text*New episodes every two weeks!*Kennick reveals his master plans for the three artifacts, Ridge and Owen rest and get better, and the party head out to Eska Howel's house to smash a bunch of mirrors... as the Titan screams from far above "FREE ME!"Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music by...
2025-10-14
41 min
Slight Reliability
Team Topologies with Luke McManus (Episode 107)
Send a textWhat are Team Topologies? How can they be used to deliver value simpler and more effectively (and in a more humane way)?This week I'm joined by Luke McManus to discuss...⛰️ What are the four team topologies?🏆 Can we have too much collaboration?⌚ Team interaction models🌏 Cognitive load🏃♀️➡️ Value dynamics mapping...and much more.You can find Luke on:LinkedIn: https://www.linkedin.com/in/luke-mcmanus-agile/Check out the recently released second edition of the Team Topologies book by Matthew Skelton and Manu...
2025-10-07
23 min
Tales from the Humpback Sky
Complicated (S01E14)
Send us a text*New episodes every two weeks!*Owen and Ridge infiltrate the Temple of the Opulent Oracle (with explosive consequences).Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music by Stephen Townshend and Harrison Sommerville. You can......find Harrison's music here: https://linktr.ee/ScoopMoosic...follow Harrison on Instagram: https://www.instagram.com/scoopmoosic/
2025-09-30
1h 06
Slight Reliability
Contributing to Open Source with Wendy Ha (Episode 106)
Send a textHow do you begin contributing to an open source project? What's it like? What do you get out of it?This week I'm joined by Wendy Ha who shares her unique story of joining the Kubernetes project and becoming a contributor. We explore...⛰️ What it's like working on one of the biggest open source projects in the world🏆 The benefits of contributing to open source⌚ How much time and effort does it take?🌏 The unique challenges of contributing from APAC (and the need for more contributors in Australia an...
2025-09-23
43 min
Tales from the Humpback Sky
Paper Mache (S01E13)
Send us a text*New episodes every two weeks!*Owen and Ridge get creative in their attempts to infiltrate the Temple of the Opulent Oracle.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music by Stephen Townshend and Harrison Sommerville. You can......find Harrison's music here: https://linktr.ee/ScoopMoosic...follow Harrison on Instagram: https://www.instagram...
2025-09-16
49 min
Slight Reliability
Influencing Leadership with Nora Jones (Episode 105)
Send a textAs an #SRE how do you influence senior leadership to get support and priority for the things you care about?To answer this question I'm joined by Nora Jones, founder of Jeli and now Head of Pricing, Product Strategy and Growth at PagerDuty. Our conversation touches on...🤝 How understanding needs to flow both ways (between engineers and leaders)🎨 Reliability is as much an art as a science📝 Using napkin math to start conversations🧠 Understand the system (your org) before trying to change it💬 Using micro-interactions to gradually imple...
2025-09-09
28 min
Tales from the Humpback Sky
Reforming the Reformers (S01E12)
Send us a text"Are we the good guys?" - Owen and Ridge.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music by Stephen Townshend and Harrison Sommerville. You can......find Harrison's music here: https://linktr.ee/ScoopMoosic...follow Harrison on Instagram: https://www.instagram.com/scoopmoosic/You can contact us on humpbacksky@gmail.com...
2025-09-02
1h 05
Slight Reliability
Slight Reliability Podcast Retrospective (Episode 104)
Send a textThis week I do a retrospective on the Slight Reliability podcast.👂 How many people listen to it?❤️ How do I feel about the show?🎉 What's going well?🪴 What could be better?❔ What's next for the show?If you want to check out the podcast that came before Slight Reliability, you can find Performance Time archived on YouTube here:https://www.youtube.com/@performance-timeYou can find Stephen on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Bluesky: https://bsky.app/profile/slightreliabili...
2025-08-26
27 min
Tales from the Humpback Sky
Innocent Fig (S01E11)
Send us a textA beautiful pig with dark eyes? What could possibly go wrong?Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music is by Harrison Sommerville. You can......find his music here: https://linktr.ee/ScoopMoosic...follow him on Instagram: https://www.instagram.com/scoopmoosic/You can contact us on humpbacksky@gmail.com...
2025-08-19
53 min
Slight Reliability
Burnout with Colette Alexander (Episode 103)
Send a textHave you burned out at work? What was your experience? How did you work through it?This week I'm joined by the incredible Colette Alexander to discuss what burnout is, what it means, and we both share our personal experiences burning out at work. We cover...🔥 What is burnout?❓ Why does it happen?🫀 What are the symptoms?🥊 Fight, flight, or freeze🧑🚒 Advice on how to recover...and much more.Resources from the show...Why you're so angry at work (and what to do about it...
2025-08-12
38 min
Tales from the Humpback Sky
Sundries and Such (S01E10)
Send us a textOwen and Ridge visit the enigmatic Simmeon Sours' Sundries and Such to do a little shopping and track down a Temporal Binder... and encounter more than they bargained for.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music is by Harrison Sommerville. You can......find his music here: https://linktr.ee/ScoopMoosic...follow him on...
2025-08-05
43 min
Slight Reliability
Mobile Observability with Hanson Ho (Episode 102)
Send a textThis week I'm joined by the wonderful Hanson Ho to discuss the unique challenges and opportunities in making our mobile apps observable! We cover...📱 The mobile/backend observability divide✍️ The challenge of distributed tracing on mobile apps🌏 The entire device runtime environment matters for your app👤 The quest for user-centric mobile observability✅ Advice on how to get started with mobile observability...and much more.You can find Hanson on:LinkedIn: https://www.linkedin.com/in/hanson-ho/Bluesky: https://bsky.app/profile/bidetofevil.wtf
2025-07-29
31 min
Tales from the Humpback Sky
Weird Flex But Ok (S01E09)
Send us a textOwen joins forces with Chud and Mandy to compete in a body building competition while Ridge looks for answers to his past in the Hall of the Infinite Ledger.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/Music is by Harrison Sommerville. You can......find his music here: https://linktr.ee/ScoopMoosic...follow him on...
2025-07-22
52 min
Tales from the Humpback Sky
Ink and Ash (S01E08)
Send us a textNew episodes released every two weeks!After conquering the Crimson Citadel, our heroes return to the city of Troika. They make new friends, drink exotic cocktails, grow stronger, get some body art, and *finally* take a well deserved rest.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/All music is by Harrison Sommerville. You can...
2025-07-08
1h 07
Slight Reliability
Learning with John Allspaw (Episode 100)
Send a textThis week on the 100th episode I'm joined by DevOps and Resilience Engineering legend John Allspaw to talk about learning (especially from incidents). We discuss...📒 Classroom VS situated learning🤝 The myth of the perfect handover ITIL as a coping strategy to try and make sense of the organic, wild, and messy🥕 How you cannot incentivise to avoid incidents (it doesn't work that way)❤️🩹 You can't understand how something is broken unless you know how it's supposed to work in the first place...and much more.Resources from t...
2025-06-24
48 min
Tales from the Humpback Sky
Escape from the Crimson Citadel (S01E07)
Send us a textThis week on Tales from the Humpback Sky our heroes (and Thalorg) scamper to escape the Crimson Citadel as it falls from the sky.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/All music is by Harrison Sommerville. You can......find his music here: https://linktr.ee/ScoopMoosic...follow him on Instagram: https://www.instagram...
2025-06-24
47 min
Tales from the Humpback Sky
Battle of the Bowels (S01E06)
Send us a textOur heroes delve into the depths of the Crimson Citadel to uncover it's disturbing secrets, and face their greatest challenge yet.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/All music is by Harrison Sommerville. You can......find his music here: https://linktr.ee/ScoopMoosic...follow him on Instagram: https://www.instagram.com/scoopmoosic/
2025-06-10
53 min
Slight Reliability
Focusing on What Matters with Trent Hornibrook (Episode 99)
Send a textThis week I'm joined by SRE leader Trent Hornibrook who shares a story about how he improved on-call early in his career, and then we explore the broader theme of focusing on the things that matter in observability, incident response, on-call, and beyond. We discuss...🔌 Empowering engineers to implement change in your org🧑🍼 Focusing on what matters (customer & business > technology)👀 Not just adding more monitoring as the output of each PIR😎 How autonomy can lead to accountability🌳 How to influence change in an organisation...and much more.You can...
2025-06-03
29 min
Tales from the Humpback Sky
Caught Red Handed (S01E05)
Send us a textOwen and Ridge get put in a waiting room, and respond accordingly.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/All music is by Harrison Sommerville. You can......find his music here: https://linktr.ee/ScoopMoosic...follow him on Instagram: https://www.instagram.com/scoopmoosic/You can contact us on humpbacksky@gmail.com
2025-05-27
55 min
Slight Reliability
The Root Cause Fallacy with Andrew Hatch (Episode 98)
Send a textThis week I'm joined by SRE leader Andrew Hatch from Cisco ThousandEyes to talk about a dirty word in the resilience community... root cause. In this excellent conversation we explore...🌌 Is the root cause of every incident the big bang?🦖 How the value of root cause degrades as complexity increases🫣 That if the culture is not blameless, people will hide things🌳 Alternative approaches to root cause analysis such as branching timelines🙋 Getting someone without skin in the game to facilitate your blameless post-mortems...and much more.You can fi...
2025-05-20
32 min
Tales from the Humpback Sky
The Red Kurgen (S01E04)
Send us a textOwen and Ridge face down impossible odds in the grand arena, and Owen comes face to face with one of his own kind.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/All music is by Harrison Sommerville. You can......find his music here: https://linktr.ee/ScoopMoosic...follow him on Instagram: https://www.instagram.com...
2025-05-14
1h 10
Slight Reliability
Synthetic Monitoring with David Dick (Episode 97)
Send a textThis week I'm joined by David Dick from 2 Steps to (finally!) discuss synthetic monitoring. We cover...🤖 What is synthetic monitoring?🦾 What are the benefits and drawbacks to using it?☢️ Non-web based synthetics (the tough stuff)🍹 Combining RUM and synthetics🫢 Does synthetics need an OTEL-like framework?...and much more.You can find David on:LinkedIn: https://www.linkedin.com/in/david-dick/You can find more about 2 Steps at https://2steps.io/#You can find Stephen on:LinkedIn: https://www.link...
2025-05-06
33 min
Tales from the Humpback Sky
The Crimson Citadel (S01E03)
Send us a textOwen and Ridge board the imposing Crimson Citadel and experience crippling bureaucracy at its finest.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/All music is by Harrison Sommerville. You can......find his music here: https://linktr.ee/ScoopMoosic...follow him on Instagram: https://www.instagram.com/scoopmoosic/You can contact us on...
2025-04-30
1h 09
Tales from the Humpback Sky
Man in the Mirror (S01E02)
Send us a textOwen and Ridge are given a quest by the mysterious Professor Eska Howell, and our two adventurers make the way to rescue his assistant, Kennick.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).Troika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/All music is by Harrison Sommerville. You can......find his music here: https://linktr.ee/ScoopMoosic...follow him on Instagram: https://www...
2025-04-29
54 min
Tales from the Humpback Sky
A Broken Mirror (S01E01)
Send us a textJoin Tom, Paul, and Stephen as they play the science fiction RPG "Troika" and embark on a multiversal adventure across time, space, and reality.Starring Tom Wardle (GM), Paul Harrop (Owen Sunderland), and Stephen Townshend (Ridge Mackelbie).You can also find episodes on YouTube: https://www.youtube.com/watch?v=DgEnFwTX3qQTroika was created by the Melsonain Arts Council: https://www.melsonia.com/You can find out more about Troika here: https://www.troikarpg.com/All music is by Harrison Sommerville. You can...
2025-04-28
1h 12
Slight Reliability
Tech Leadership with Milan Brown (Episode 96)
Send a textThis week I'm joined by Cin7 Engineering Director Milan Brown to unpack the challenges of technology management and leadership. We discuss...✖️ Theory X vs Theory Y management🗣️ Intention based leadership and communication🏢 Conditions in an org for people to thrive😵💫 How do you learn to manage and lead?🫤 Managing people when you're not an expert in what they do...and much more.Resources mentioned during the episode:Turn The Ship Around! (book): https://davidmarquet.com/turn-the-ship-around-book/Agile Conversations (book): https://itrevolution.com/product/agile-conve...
2025-04-23
31 min
Slight Reliability
Finding Tech Work with Leon Adato (Episode 95)
Send a textThis week Leon Adato and I break down the state of applying for roles in tech. We cover...📝 What a resume or CV is and is not🤝 Leveraging your connections rather than relying on applying cold🪄 How most job descriptions are works of fiction🦾 White-fonting to game AI resume assessment🧪 Experimental ways we could recruit...and our pitch for Kubernetes the Rock Opera (and much more)You can find Leon's job postings weekly on his website:https://www.adatosystems.com/category/joblistings/You can find...
2025-03-29
36 min
Slight Reliability
Getting a Start in SRE with Priyam Kumar (Episode 94)
Send a textThis week Priyam Kumar shares his story of moving from a massive organisation to a startup and the challenges and growth that came from that. We discuss...🪖 War stories and examples of production incidents🩹 The "hacks" we build to keep things running (and how maybe that's just normal)😎 Keeping it simple... YAGNI (You Ain't Gonna Need It!)🧯 The perils of getting stuck in reactive mode📖 Areas of of learning if you want to get into SRE...and much much more.You can find Priyam on:Linke...
2025-03-22
31 min
Slight Reliability
SRE Leadership with Michelle Casey (Episode 93)
Send a textThis week Michelle Casey shares her insights as a 'head of' engineering manager in the SRE context. This was one of my favourite conversations on the podcast so far. We cover topics such as...🤷🏽 Why move into leadership?👁️ Learning from other leaders💎 What is unique about SRE leadership?👑 Women in engineering leadership...and we go through some feedback I got as a leader recently.Resources that Michelle mentions during the episode:The Five Dysfunctions of a Team (book): https://www.tablegroup.com/topics-and-resources/teamwork-5-dysfun...
2025-03-11
39 min
Slight Reliability
Observability Maturity with Ádám Tóth (Episode 92)
Send a textThis week Adam and I get philosophical about what constitutes maturity in the field of observability. We tackle questions such as...💸 Does your org treat observability as a cost centre or a value add?🔥 Are you using observability reactively to solve problems? Or proactively to build better products and services?👤 Is your observability connected to your users and business in a meaningful way?🌐 Is monitoring the social media sentiment of your product part of observability?...and much more.You can find Adam at:LinkedIn: ht...
2025-02-25
30 min
Slight Reliability
Head in the Clouds (Episode 91)
Send a textIn this episode I explore the challenges of achieving unified observability when integrating with SaaS products and services. I cover:🌊 The new wave of mega-complex SaaS⚗️ Challenges integrating SaaS with our observability pipelines👩🦯 How the lack of SaaS autonomy limits the effectiveness of OpenTelemetry💰 Paying twice to ingest, store, and search telemetry📈 Monitoring and predicting SaaS observability costs...and much more.Shout out to Mark Chiavaroli (and apologies for mispronouncing your surname multiple times), Damian Sharrock, and Reece Hewitt for bouncing ideas on this topic.The 'Is...
2025-01-21
15 min
Slight Reliability
Non-Prod Reliability Engineering + 2024 Wrap (Episode 90)
Send a textThis week I check in and give an update on work, life, and my attempts at bringing to life SRE practices in the world of non-production environment management.You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre...
2024-12-10
18 min
Slight Reliability
Slight Reliability Episode 89 - Blameless Post-mortems with Karanveer Anand
Send a textThis week I'm joined by Karanveer Anand, SRE Technical Program Manager at Google to discuss blameless post-mortems. We cover:🦅 The recent Crowdstrike outage and their public post-mortem🚑 When do we do a blameless post-mortem?😕 How do we do a blameless post-mortem?✅ How do we make sure action items are followed through?📰 The power of learning from post-mortems created by other teams and orgs...and much more.You can find Karanveer on LinkedIn: https://www.linkedin.com/in/karanveer/You can find Crowdstrike's preliminary po...
2024-09-03
26 min
Slight Reliability
Slight Reliability Episode 88 - OpenTelemetry Revisited with Zach Michel
Send a textThis week Zach Michel from https://middleware.io/ and I discuss the state of OpenTelemetry and what it means to adopt it. We cover:🌩️ Achieving observability in a SaaS world🥫 Context propagation - the magic sauce of OTEL🚪 The telemetry gateway concept and leveraging the OTEL collector🪵 The state of OpenTelemetry logging🫂 Making use of the OpenTelemetry community...and much more.You can find Zach on LinkedIn: https://www.linkedin.com/in/zamichel/You can find the official Slight Reliability podcast website at: https://sligh...
2024-08-27
26 min
Slight Reliability
Slight Reliability Episode 87 - Measuring the value of SRE with Artem Yakimenko
Send a textIn Episode 80 Niall Murphy talked about the need for SREs to be better at articulating the value of our work. In this episode I'm joined by ex-Googler and Engineering Director (SRE) at Culture Amp Artem Yakimenko about how we might achieve this.We discuss both quantifiable and qualitative approaches including leveraging the untapped data in support tickets, customer sentiment and rankings, the relationship between finance and performance, the link between user design and performance, and so much more.Books mentioned in the episode:100 Things Every Designer Needs to...
2024-07-24
35 min
Slight Reliability
Slight Reliability Episode 86 - Evolving SLOs with Dom Finn
Send a textIn the world of SRE we constantly talk about defining SLOs, but what about evolving them over time? This week I chat with SRE Tech Lead Dom Finn about just that. We cover the relationship between reliability and user analytics, latency classes as a way to speak SLOs with business stakeholders, the role of NFRs and how the thresholds differ from SLOs, and much more.Books mentioned in the episode:The Beginning of Infinity: Explanations That Transform the WorldBy David Deutchhttps://www.amazon.com.au/Beginning-Infinity-Explanations-Transform-World...
2024-06-08
25 min
Slight Reliability
Slight Reliability Episode 85 - Feeling SaaSsy
Send a textThis week I talk about the impact of SaaS-first technology strategies on the work of an SRE. I pose questions about observability, ownership, on-call, and how much control we have over reliability.You can find the Bleeding Tech blog on Medium: https://medium.com/@stownshendYou can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the...
2024-05-02
11 min
Slight Reliability
Slight Reliability Episode 84 - Clinical Troubleshooting with Dan Slimmon
Send a textThis week I chat with Dan Slimmon about applying the approach doctors use to treat patient symptoms during incident response.You can find Dan's blog at https://blog.danslimmon.com/ or connect with him on LinkedIn here: https://www.linkedin.com/in/danslimmon/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram...
2024-03-30
27 min
Slight Reliability
Slight Reliability Episode 83 - An Unfulfilled Promise with Itiel Shwartz
Send a textThis week I hear about all things Kubernetes from Komodor CTO and co-founder Itiel Shwartz. We chat about the promise that was made when Kubernetes first entered the industry, the challenge of getting developers engaged and capable of working in Kubernetes, my hate/hate relationship with Helm but its important contribution to the Kubernetes project, Kubernetes observability, and so much more.You can find the Kubernetes for Humans podcast here:https://komodor.com/blog/the-kubernetes-for-humans-podcast/Or find out more about Komodor here:https://komodor.com/Or find Itiel...
2024-03-05
30 min
Slight Reliability
Slight Reliability Episode 82 - CI/CD with Amin Astaneh
Send a textThis week I sit down and have a discussion with Amin Astaneh (from Certo Modo) about CI/CD. We cover the power of the standard change as a way to navigate ITIL while still implementing DevOps practices, what to monitor to make your CI/CD observable, single piece flow, testing in production, and so much more.You can find Amin on his company website https://certomodo.io, LinkedIn: https://www.linkedin.com/in/aminastaneh/ and Twitter: https://twitter.com/aastanehYou can find the official Slight Reliability podcast website at...
2024-02-13
25 min
Slight Reliability
Slight Reliability Episode 81 - Incident Management in Non-Prod Environments
Send a text"Environment issues are just incidents that happened to occur in a non-production environment"... so why do we treat them so differently?In this first episode of the 2024 season I reflect on how we handle incidents in non-prod environments.(Note: Had a few issues with noise suppression in OBS Studio cutting off the start of some words, will sort it for the next episode)You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https...
2024-02-06
10 min
Slight Reliability
Slight Reliability Episode 80 - What's Been Bugging Niall Murphy
Send a textThis week I speak with co-author of the original SRE book + the SRE workbook, and renowned speaker Niall Murphy.We chat about the state of SRE in the current macro-economic climate and how we're not yet doing a very good job at articulating the value of SRE to leaders, the relationship that velocity and reliability have, the value of new features versus reliability improvements, and *much* more.You can find Niall at:LinkedIn: https://www.linkedin.com/in/niallm/X: https://twitter.com/niallmWebsite: https...
2023-11-22
36 min
Slight Reliability
Slight Reliability Episode 76 - Sampling Distributed Traces with Paige Cruz
Send a textPaige Cruz (from Chronosphere) is back. This week we discuss sampling. What is sampling? Why do it? What kinds of sampling are there?You can check out Chronosphere's cloud native observability platform here: https://chronosphere.io/You can find Paige on:LinkedIn: https://www.linkedin.com/in/paigerduty/X: https://twitter.com/paigerdutyYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/X: https...
2023-11-21
45 min
Slight Reliability
Slight Reliability Episode 79 - Incident Story Time with Valeska Victoria
Send a textThis week Valeska Victoria returns to share some of her experiences working as an SRE at eBay.We look at the cascading effect of production issues in complex integrated environments (how there's often no single root cause), developer literacy of how infrastructure works, the importance of ownership and accountability of reliability, and much more.You can find Valeska on: LinkedIn: https://www.linkedin.com/in/valeska-victoria/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
2023-11-20
37 min
Slight Reliability
Slight Reliability Episode 78 - Developer Experience with Ankit Jain
Send a textThis week I chat with Ankit Jain from aviator.co about developer experience.We define developer experience and developer productivity, and how this applies to SRE. We discuss the growing expectation on developers and how this leads to frustration and burnout. We also explore how to measure developer experience and how to start working to make improvements.You can check out Aviator's developer experience platform here: https://www.aviator.co/You can find Ankit on:LinkedIn: https://www.linkedin.com/in/ankitjaindce/You can...
2023-11-17
32 min
Slight Reliability
Slight Reliability Episode 77 - SRE to DevRel with Liz Fong-Jones
Send a textThis week I had the privilege of interviewing Liz Fong-Jones from honeycomb.io about DevRel, Developer Advocacy, and how that applies to SRE.We discuss the difference between Developer Relations (DevRel) and Developer Advocacy, how Liz got into advocacy, how DevRel helps companies and the community, and some tips on how to get traction with SRE practices in your organisation.You can check out Honeycomb's observability platform here: https://www.honeycomb.io/You can find Liz on:LinkedIn: https://www.linkedin.com/in/efong/Website...
2023-11-15
31 min
Slight Reliability
Slight Reliability Episode 75 - Enterprise SRE with Steve McGhee
Send a textThis week I had the honour of chatting with Steve McGhee (former Google SRE, current Google Reliability Advocate, and co-author of Enterprise Roadmap to SRE).We discuss the evolution of SRE from where it began at Google and how it is being adopted by enterprises around the world now (and why this is happening). We talk about getting leadership support and how we get reliability taken seriously, the lies we tell ourselves to justify incidents and issues, leveraging transformation projects to bring SRE to life, how SLOs can act as the fulcrum...
2023-11-14
39 min
Slight Reliability
Slight Reliability Episode 74 - The Hidden Side of Vendor Lock-In
Send a textThis week on Slight Reliability Stephen discusses observability vendor lock-in. What is it? What does OpenTelemetry do to help? What areas are yet to be solved?You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
2023-10-31
08 min
Slight Reliability
Slight Reliability Episode 73 - Enterprise SLOs with Brian Singer
Send a textThis week we sit down and talk about SLOs with CPO and co-founder of Nobl9 Brian Singer.We talk about the importance of reviewing operational effectiveness, getting buy in from leadership, using SLOs to reduce noise, how to implement SLOs within different cultures and structures, the parallels between security and reliability... and much more.You can check out Nobl9's reliability and SLO platform here: https://www.nobl9.com/You can find Brian on LinkedIn: https://www.linkedin.com/in/briantsinger/You can find the official...
2023-10-24
32 min
Slight Reliability
Slight Reliability Episode 72 - Rapid Incident Response with Valeska Victoria
Send a textThis week Stephen chats with Valeska Victoria about her time working as an SRE at eBay.Valeska shares her data driven approach to SRE, having a voice as a less experienced engineer, handling incidents under high pressure, leveraging large language models to rapidly find the information you need during an incident, and much more.You can check out PromptOps here: https://www.promptops.com/You can find Valeska on LinkedIn: https://www.linkedin.com/in/valeska-victoria/You can find the official Slight Reliability podcast website at: h...
2023-10-17
42 min
Slight Reliability
Slight Reliability Episode 71 - Implementing SRE with Dr. Vlad Ukis
Send a textThis week Stephen chats with Dr. Vlad Ukis about his journey discovering, and then implementing SRE practices at Siemens Healthineers (which led to him writing a book). They discuss how the evolution of infrastructure necessitates a shift in how we operate, the power of selling SRE practices, the SRE infrastructure used to build SLOs and reliability capabilities, how he implemented SLOs, and much more.You can find Vlad's book "Establishing SRE Foundations" here: https://www.amazon.com/Establishing-Foundations-Step-Step-Organizations/dp/0137424604 You can find Vlad on LinkedIn: https://www.linkedin...
2023-10-10
29 min
Slight Reliability
Slight Reliability Episode 70 - Meta SRE with Amin Astaneh
Send a textAmin Astaneh (from Certo Modo) is back to discuss his experience working as a production engineer (SRE equivalent) at Meta.Stephen and Amin discuss what it's like interviewing for big tech, "you build it, you own it", different SRE engagement models, SRE at different sizes of organisation, socialising your SRE success as a way to get traction, and so much more.You can find Amin on his company website https://certomodo.io, LinkedIn: https://www.linkedin.com/in/aminastaneh/ and Twitter: https://twitter.com/aastanehThe books Amin...
2023-10-03
42 min
Slight Reliability
Slight Reliability Episode 69 - Developer to SRE with Praveen Kasam
Send a textThis week Stephen talks to Praveen Kasam from Diconium Digital Solutions about how he led SRE transformations.Praveen shares his experience transitioning from development to SRE and how leveraging automation and bringing application knowledge to the ops team provided quick wins. He also covers how he later applied SRE concepts to uplift the wider organisation. If you are out there looking for advice on how to implement SRE in your organisation, this is the episode for you.You can find Praveen at:LinkedIn: https://www.linkedin.com/in...
2023-09-26
30 min
Slight Reliability
Slight Reliability Episode 68 - Dashboards and Modern Observability with Eric Schabell
Send a textThis week Stephen asks Eric Schabell (Director of Technical Marketing & Evangelism @ Chronosphere) about how dashboards fit into modern observability.They discuss how untamed observability can lead to unexpectedly high cloud bills, the similarities between dashboards and documentation, the "know > triage > understand" workflow, and much more.You can find Eric at:LinkedIn: https://www.linkedin.com/in/ericschabell/X: https://twitter.com/ericschabell And you can find Chronosphere at: https://www.linkedin.com/company/chronosphereio/You can find the official Slight Reliability podcast website at: h...
2023-09-19
32 min
Slight Reliability
Slight Reliability Episode 67 - Single Pane of Glass with Jamie Allen and Adam Kinniburgh
Send a textThis week Stephen chats with Jamie Allen (Cheif Technologist AWS & SRE @ EPAM Systems) and Adam Kinniburgh (VP Innovation @ SquaredUp) about the concept of a single pane of glass (SPOG) for SRE.Is it performance art or something actionable? Can alerting replace the need for dashboards? And are metrics drowning in the wake of distributed tracing?You can find Jamie at:LinkedIn: https://www.linkedin.com/in/jlallen/And the Single Pain of Glass article he wrote here: https://medium.com/site-reliability-engineering-leadership/the-single-pain-of-glass-6e42930e966You...
2023-09-12
34 min
Slight Reliability
Slight Reliability Episode 66 - Building Digital Assistants for SRE with Kyle Forster
Send a textThis week Stephen brings back Kyle Forster from RunWhen to talk about the purple elephant in the room… “AI”. What makes it GenAI, LLM, Advanced Statistics, or ML? Kyle shares his experience surrounding building AI powered search engines for SRE troubleshooting commands and how to incorporate a (paid) open source community of experts rather than trust AI by itself. They discuss what search looks like under the hood, why GenAI powered chatbots will or won't take over the SaaS industry, how Digital Assistants can be utilised by SREs to increase productivity (hint: giving...
2023-09-05
29 min
Slight Reliability
Slight Reliability Episode 65 - The Truth About Incidents with Courtney Nash
Send a textThis week Stephen chats with the internet incident librarian herself, Courtney Nash. They explore what Courtney has learned through meta-analysis of the over ten thousands incidents in the Verica Open Incident Database (VOID). They cover why MTTR needs to go in the garbage, joint cognitive systems, the value of looking at near misses and *much* more.You can check out the VOID here: https://www.thevoid.community/The two papers mentioned are:Ironies of Automation by Lisanne Bainbridge: https://queue.acm.org/detail.cfm?id=3380779Managing the Hidden...
2023-08-29
41 min
Slight Reliability
Slight Reliability Episode 64 - Observability During Development with Martin Thwaites
Send a textThis week Stephen chats with Martin Thwaites from Honeycomb about how developers can leverage observability to understand what they're building better, solve bugs quicker, and have more time for coding. They also discuss OpenTelemetry (the protocol and semantic conventions), manual versus automatic instrumentation, and how keeping every span of trace data is irresponsible.You can find Martin at:LinkedIn: https://www.linkedin.com/in/martin-thwaites-ab445120/X: https://twitter.com/MartinDotNetAnd Honeycomb at https://www.honeycomb.io/You can find the official Slight Reliability podcast...
2023-08-22
36 min
Slight Reliability
Slight Reliability Episode 63 - The Power of Summary
Send a textObservability is a necessary adaptation to make sense of software systems in the Digital Age, but how can we unlock its power for non-engineer stakeholders (such as executives, product owners, etc)? Perhaps we need a layer of abstraction sitting on top of our detailed observability to get the most out of it.You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https...
2023-08-15
09 min
Slight Reliability
Slight Reliability Episode 62 - On-Call with Matt Brown
Send a textThis week Stephen chats with former-Google SRE Matt Brown about being on-call. They cover how to up-lift junior engineers so they can be on-call, what a fair on-call schedule looks like, run-books, and much more.As you heard, Matt believes flexibility is key to a healthy on-call rotation. Matt is exploring ideas for improvements to existing tooling and products in this space and would love to hear from as many listeners as possible with feedback on what they find useful or frustrating with the existing tools they use to support on-call in...
2023-08-01
36 min
Slight Reliability
Slight Reliability Episode 61 - SRE VS DevOps VS Platform Eng... (Yawn)
Send a textThe internet is full of people who want to tell you about SRE, DevOps, and Platform Engineering and how different and similar they are... and will give you the impression that these things compete with each other. But do they? And is it a helpful question to ask in the first place?You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https...
2023-07-25
06 min
Slight Reliability
Slight Reliability Episode 60 - From Zero to SRE with Amin Astaneh
Send a textIn this episode Amin Astaneh from Certo Modo discusses his experience undertaking an SRE transformation over several years.Stephen and Amin cover a lot of ground including making ops work visible, measuring toil, the power of calculating the $ value of work, getting developers on-call, the embedded model for SRE, SLOs, culture change, and a whole lot more.You can find Amin on his company website https://certomodo.io, LinkedIn: https://www.linkedin.com/in/aminastaneh/ and Twitter: https://twitter.com/aastanehThe books Amin mentions are......
2023-07-11
42 min
Slight Reliability
Slight Reliability Episode 59 - Bad API Observability with Sonja Chevre
Send a textIn this episode Stephen Townshend and Sonja Chevre from Tyk discuss making APIs observable, and some anti-patterns to avoid. They cover GraphQL, OpenTelemetry and semantic conventions, correlation IDs, observability pipelines, and much more.You can find Sonja on LinkedIn: https://www.linkedin.com/in/sonjachevre/ and Twitter: https://twitter.com/SonjaChevreYou can listen to Sonja's KubeCon talk here: https://youtu.be/IkEUJjRBCboYou can find Tyk's open source gateway here: https://github.com/TykTechnologies/tykYou can find the official Slight Reliability podcast website at...
2023-07-04
40 min
Slight Reliability
Slight Reliability Episode 58 - Tackling Cloud Cost with Harinder Seera
Send a textIn this episode Stephen Townshend and Harinder Seera explore how to monitor and manage the cost of cloud. They discuss FinOps as a cultural practice, anti-patterns for implementing in the cloud, keeping cost down through resources, pricing, and architecture... and much more.You can find Harinder on LinkedIn: https://www.linkedin.com/in/harinderseera/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the...
2023-06-27
36 min
Slight Reliability
Slight Reliability Episode 57 - A Tale of Three Conferences
Send a textIn this episode Stephen shares his experiences traveling overseas to the UK and Singapore AWS Summit, SREcon APAC, and the internal SquaredUp conference "SqUpCon".You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Slight Reliability artwork on Instagram:https://www.instagram.com/slight_reliability/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
2023-06-20
16 min
Slight Reliability
June 9th 2023 Update
Send a textA quick update on Stephen's whereabouts and when the next episode will be released.
2023-06-09
01 min
Slight Reliability
Slight Reliability Episode 56 - Dashbored
Send a textIn this episode Stephen discusses the role of dashboards within the context of the Digital Era. What are they *not* appropriate for? What can they help with? What kinds of things are suitable to present?If you want to get involved in the SquaredUp dashboard competition head along to: https://squaredup.com/blog/dashboard-competition/ (everyone who submits an entry gets a t-shirt, you can also win Star Wars Lego, get video interviewed by me, and have the story of your dashboard presented both as a blog and on our Dashboard Gallery).
2023-05-23
14 min
Slight Reliability
Slight Reliability Episode 55 - Reflections on KubeCon with Bruce Cullen
Send a textThis week Bruce Cullen is back to share his experiences from KubeCon + CloudNativeCon 2023 Europe. We chat about OpenTelemetry, green engineering, securing your CI/CD pipeline and much more.Bruce is the Director of Engineering at SquaredUp. You can find him on LinkedIn: https://www.linkedin.com/in/bruce-cullen/You can find the official Slight Reliability podcast website at: https://slightreliability.com/If you like Slight Reliability's mspaint style artwork you can find more of it on Instagram: https://www.instagram.com/slight_reliability/You can find...
2023-05-16
40 min
Slight Reliability
Slight Reliability Episode 54 - Trends in Incident Management with Andy Thurai
Send a textIn this episode Stephen Townshend chats to Andy Thurai (VP and Principal Analyst at Constellation Research) about Andy's latest report titled "Trends in Incident Management 2023". They chat about "mean time to innocence", status pages, they debate whether AI or ML has real value for incident management, and ponder why anyone would willingly decide to become an incident commander?You can find Andy's report here: https://www.constellationr.com/research/2023-trends-incident-managementYou can find Andy on LinkedIn here: https://www.linkedin.com/in/andythurai/You can find the official...
2023-05-09
32 min
Slight Reliability
Slight Reliability Episode 53 - DORA Metrics with Tim Wheeler
Send a textIn this episode Stephen Townshend chats to Tim Wheeler (Director of Engineering Services at SquaredUp) about his work implementing and continually monitoring DORA metrics. They chat about customising each metric to your own unique context, avoiding the weaponisation metrics, the "tools will solve this for me" trap, and much more.The books mentioned during this episode were: Accelerate, The DevOps Handbook, The Phoenix Project, The Unicorn Project, Lean Enterprise, and Sooner, Safer, Happier. Tim also mentioned the work of Bryan Finster (https://twitter.com/BryanFinster).You can find Tim on...
2023-05-02
28 min
Slight Reliability
Slight Reliability Episode 52 - Double, Double, Toil and Trouble!
Send a textIn this episode Stephen explores the SRE concept of "toil". What is it? How can we measure it? How do we reduce it?Also in this episode: Can we make non-technology systems observable? (like we do technology ones), and the ineffectiveness of change advisory boards (CAB). Also, Stephen's upcoming attendance at SREcon, AWS Summit, and SLOconf.Shout outs to Steve McGhee, Dom Finn, and Shea Stewart.You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:L...
2023-04-25
09 min
Slight Reliability
Slight Reliability Episode 51 - The reliability.org Community with Anurag Gupta
Send a textIn this episode Stephen Townshend and Anurag Gupta discuss the new reliability.org community for SREs or reliability engineers to share experiences, ask questions, and find community. They discuss the value of community and sharing your thoughts, collaboration between organisations, vicious versus virtuous cycles for reliability, and much more.You can join us in the community by visiting https://www.reliability.org/You can find Anurag:On LinkedIn: https://www.linkedin.com/in/awgupta/You can find out more about Shoreline by visiting https://www.shoreline.io/
2023-04-18
30 min
Slight Reliability
Slight Reliability Episode 50 - The 50th Episode Special with Bruce Cullen
Send a textIn this episode Bruce Cullen interviews Stephen Townshend about the past, present, and future of the Slight Reliability podcast. They discuss their shared backgrounds in software testing, the different career paths that testing has opened up, and much more!Bruce is the Director of Engineering at SquaredUp. You can find him on LinkedIn: https://www.linkedin.com/in/bruce-cullen/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https...
2023-04-11
39 min
Slight Reliability
Slight Reliability Episode 49 - Implementing Observability in the Real World with Ivan Merrill
Send a textIn this episode Ivan Merrill from Fiberplane shares his experiences implementing observability within some of the large complex organisations he's worked for in the past.You can find Ivan on LinkedIn: https://www.linkedin.com/in/ivan-merrill-1a05223/You can find out more about Fiberplane here: https://fiberplane.com/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
2023-04-04
38 min
Slight Reliability
Slight Reliability Episode 48 - Blind Insight
Send a textIn this episode I discuss the word "insight" within the context of observability. Is insight something tools can provide? Is it something you can reproduce? You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
2023-03-21
08 min
Slight Reliability
Slight Reliability Episode 47 - Cloud Dependency Reliability with Jeff Martens and Ryan Duffield
Send a textIn this episode Stephen Townshend discusses our increased dependency on third party cloud services and what this means for reliability with Jeff Martens and Ryan Duffield from https://metrist.io/.You can find Jeff... On LinkedIn: https://www.linkedin.com/in/jmartens/On Twitter: https://twitter.com/JmartensYou can find Ryan...On StackOverflow: https://stackoverflow.com/users/2696/ryan-duffieldOn GitHub: https://github.com/rduffieldYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen a...
2023-03-14
32 min
Slight Reliability
Slight Reliability Episode 46 - Raw Telemetry
Send a textIn this episode I propose the use of scatterplots of raw data to better understand how our systems are behaviour and what our customers are experiencing. The ideas from this episode come from my time as a performance engineer and working with legends in that space Richard Leeke (https://www.linkedin.com/in/richard-leeke-450448/) and Neil Davies (https://www.linkedin.com/in/neildaviesnz/).For some basic examples of scatterplots and what they show you versus line charts check out an article I wrote back in 2017 called "Let's Talk About Averages": https://www...
2023-03-07
10 min
Slight Reliability
Slight Reliability Episode 45 - Telemetry Fluency with Paige Cruz
Send a textIn this episode we discuss uplifting telemetry knowledge within engineering teams to enrich their work (and their lives) with Paige Cruz from Chronosphere. We cover why not to take a chainsaw to your observability in order to cut costs, the dark side of auto-instrumentation, story telling with live data, and much more.The book that Paige recommends at the end is "Effecting Monitoring and Alerting for Web Operations": https://www.oreilly.com/library/view/effective-monitoring-and/9781449333515/You can check out Chronosphere here: https://chronosphere.io/You can find Paige...
2023-02-28
48 min
Slight Reliability
Slight Reliability Episode 44 - Cognitive Overload with Paige Cruz
Send a textIn this episode we discuss cognitive overload in SRE with Paige Cruz from Chronosphere. We cover both what cognitive load is, what causes it, as well as some potential antidotes and preventative measures.You can check out Chronosphere here: https://chronosphere.io/You can find Paige on LinkedIn: https://www.linkedin.com/in/paigerduty/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter...
2023-02-21
38 min
Slight Reliability
Slight Reliability Episode 43 - Beyond Observability
Send a textIn this episode I discuss my "bigger picture" perspective of what observability needs to be, and why it's important we include business and customer into what we monitor in the Digital Era.The books I highlight in this episode are...Observability Engineering https://www.oreilly.com/library/view/observability-engineering/9781492076438/Sooner, Safer, Happier: https://soonersaferhappier.com/book/The Phoenix Project https://www.oreilly.com/library/view/the-phoenix-project/9781457191350/The Unicorn Project https://www.oreilly.com/library/view/the-unicorn-project/9781098124175/Accelerate: https://www.oreilly.com/library/view/accelerate/9781457191435/...
2023-02-14
10 min
Slight Reliability
Slight Reliability Episode 42 - Reliability Insights with José Velez
Send a textIn this episode we speak to José Velez from Rely about reliability at scale, a top down approach to SLOs, the potential and limitations of AI and ML in operations, the question of service ownership, utilising the business criticality of services in how we monitor the underlying infrastructure, and much more.You can check out Rely at https://www.rely.io/You can find José on LinkedIn: https://www.linkedin.com/in/josevelez-relyio/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You...
2023-02-07
36 min
Slight Reliability
Slight Reliability Episode 41 - Testing with Traces (with Ken Hamric)
Send a textIn this episode we speak to Ken Hamric about distributed tracing, leveraging tracing for better testing, and observability driven development.The tool that Henrik Rexed integrated with Tracetest was Kuberhealthy (https://www.cncf.io/projects/kuberhealthy/) and you can watch a video of him discussing it in combination with Tracetest here: https://youtu.be/PKQQEeeMYxg?t=2492Ken also mentioned Charity Majors' writing about observability driven development: https://thenewstack.io/a-next-step-beyond-test-driven-developmentYou can check out Tracetest: - The official website: https://tracetest.io/- GitHub repo: https://g...
2023-01-31
31 min
Slight Reliability
Slight Reliability Episode 40 - Drowning in an Observability Data Lake
Send a textIn this episode Stephen explores the pros and cons of centralising observability data. Is it a practical to stand up a complex and costly data storage and retrieval solution? Is there another way?You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
2023-01-24
11 min
Performance Time
Performance Time Episode 28: The Grand Finale!
In this episode I wrap up the Performance Time show with some commentary around my blog "Wrapping up 13 years as a Performance Engineer" (https://www.linkedin.com/pulse/wrapping-up-13-years-performance-engineering-stephen-townshend/) Hosted on Acast. See acast.com/privacy for more information.
2023-01-09
23 min
Slight Reliability
Slight Reliability Episode 37 - Observability New Year's Resolutions with Henrik Rexed
Send a textThis week Henrik Rexed and Stephen Townshend discuss their New Year's resolutions for observability. They cover OpenTelemetry and a unified query language, continuous profiling, raw data analysis, instrumenting code, using distributed tracing as part of testing, and much more.Some of the tools or resources mentioned during the episode include:https://tracetest.io/ (distributed tracing for testing)https://github.com/open-telemetry/opamp-go (OTEL orchestration)https://ebpf.io/ (for continuous profiling)You can find Henrik on LinkedIn: https://www.linkedin.com/in/hrexed/ and Twitter: https://twitter.com...
2022-12-19
45 min