podcast
details
.com
Print
Share
Look for any podcast host, guest or anyone
Search
Showing episodes and shows of
Vibesre
Shows
Platform Engineering Playbook Podcast
Replace 5 Databases with 1? SurrealDB for AI Agents Explained
Your AI agents are using five different databases right now - and you don't even know it. This database sprawl is silently killing your platform's performance and your team's sanity. In today's Platform Engineering Playbook, we dive deep into SurrealDB's multi-model approach and how it's revolutionizing AI infrastructure. Plus, breaking news on vulnerability management patterns that every platform engineer needs to understand. **What You'll Learn:** • Why database proliferation is the hidden killer of AI agent performance • SurrealDB's architecture deep dive and real-world deployment strategies • When (and when NOT) to consolidate your AI infras...
2026-02-18
19 min
Platform Engineering Playbook Podcast
Agoda’s API Agent Turns Any API into MCP — No Code, No Deployments
**What if API integration nightmares could disappear without writing a single line of code?** Agoda just dropped a game-changing solution that transforms any API into MCP (Model Context Protocol) with zero deployments - and it's about to reshape how platform teams approach AI integrations. In today's Platform Engineering Playbook, we break down this revolutionary no-code approach and explore what it means for enterprise platform strategies. Plus, we dive into Docker's latest sandbox capabilities with NanoClaw, performance testing breakthroughs for Identity Management systems using encrypted DNS in OpenShift, and the emerging patterns for running AI coding...
2026-02-17
18 min
Platform Engineering Playbook Podcast
LocalStack Kills Community Edition: What Breaks in March
**LocalStack just killed their open-source edition - but what does this really mean for your platform engineering stack?** In today's episode of Platform Engineering Playbook, we break down LocalStack's shocking decision to discontinue their Community Edition and what it means for teams relying on AWS local development. Plus, we dive into the ripple effects across the developer ecosystem and provide a practical decision framework for your next moves. **What You'll Learn:** • Why LocalStack's pricing shift from free to $39/month matters for platform teams • Decision frameworks for evaluating local development alternatives • How AI is rev...
2026-02-16
15 min
Platform Engineering Playbook Podcast
OpenTofu vs Terraform: What Enterprise Teams Are Actually Doing (2026)
**Is your infrastructure strategy about to become obsolete?** By 2025, half of all Terraform installations could be running OpenTofu - and the implications for platform engineering teams are massive. In today's deep dive, we break down the OpenTofu vs. Terraform battle that's reshaping infrastructure as code. You'll learn the real mechanics behind migrating between these tools, practical decision frameworks for enterprise teams, and why this choice could define your platform's next five years. **What You'll Learn:** • The technical and business drivers behind the OpenTofu fork • Step-by-step migration strategies and gotchas to avoid • How to eval...
2026-02-13
18 min
Platform Engineering Playbook Podcast
Why Databases Inside Kubernetes Are Becoming Technical Debt
**Is running databases in Kubernetes about to become legacy technical debt overnight?** By 2026, the inference cloud revolution is forcing platform engineers to completely rethink database architecture - and the implications are massive. In today's deep dive, we break down the "container paradox" that's reshaping how we think about stateful workloads in Kubernetes. You'll discover why the rise of AI inference is making traditional database-in-K8s patterns unsustainable and what this means for your platform strategy. **What You'll Learn:** • Why the inference cloud demands decoupled database architectures • A practical framework for assessing your statefulness spec...
2026-02-12
17 min
Platform Engineering Playbook Podcast
47% of CNCF Projects Slowed Down in 2025 — Why That’s Actually Good News
**Why did 47% of CNCF projects slow down their development velocity in 2025 — and why platform engineers should celebrate this trend?** In today's Platform Engineering Playbook, we decode what declining commit velocity across cloud native projects actually reveals about infrastructure maturity and what it means for your platform strategy. **What You'll Learn:** • How to interpret CNCF project velocity metrics as leading indicators for platform decisions • Why slower development cycles might signal stronger, more stable infrastructure foundations • Strategic insights for platform engineers navigating the evolving cloud native landscape • Breaking analysis of agentic AI transforming DevOps aut...
2026-02-11
18 min
Platform Engineering Playbook Podcast
The Claude Skills That Stop AI From Writing Dangerous Infrastructure as Code
**Are 87% of DevOps teams unknowingly creating security vulnerabilities with AI-generated infrastructure code?** Today's Platform Engineering Playbook dives deep into the hidden risks of AI in DevOps workflows and reveals the specialized skills that top-performing teams use to harness AI safely and effectively. **What You'll Learn:** • Why AI-generated infrastructure code is creating blind spot vulnerabilities • The 8 Claude skills that actually move the needle for DevOps engineers • How to identify and automate your repetitive workflows with AI guardrails • Breaking news: Cloud complexity becomes the #1 security threat • Cloudflare's new vertical microfrontend template for edge routi...
2026-02-10
19 min
Platform Engineering Playbook Podcast
Docker vs Nix: Why Your Builds Aren’t Actually Reproducible
97% of Docker containers can't reproduce the exact same build six months later—what does this mean for platform engineering, and why should you care? In today's episode of the Platform Engineering Playbook, we delve into the critical issue of reproducibility in Docker containers. Discover why this seemingly technical detail could significantly impact your workflows and productivity. We'll explore the limitations of traditional package managers and discuss how they can be a bottleneck in achieving true reproducibility. **Timestamps:** - **[00:00] Cold Open:** Dive into the startling statistic about Docker containers. - **[01:15] Intro:** Welcome and overview of...
2026-02-09
18 min
Platform Engineering Playbook Podcast
The Data Canary Pattern: How Netflix Prevents Bad Metadata Deploys
**What happens when 2 billion daily metadata events could crash Netflix's entire platform with one bad transformation?** Today's Platform Engineering Playbook dives deep into Netflix's Data Canary system - a masterclass in building trust and validation into your data pipelines at scale. Plus, we cover the latest platform engineering news that's reshaping how we deploy and monitor distributed systems. **What You'll Learn:** • How Netflix validates massive data transformations without risking production • Container readiness strategies for Spring Boot in Kubernetes environments • LinkedIn's redesigned SAST pipeline using GitHub Actions and CodeQL • Why GitOps is becoming...
2026-02-07
15 min
Platform Engineering Playbook Podcast
Claude Opus 4.6: The First AI That Feels Like a Teammate
**Claude Opus 4.6 just demolished GPT-4 on every coding benchmark - and it's about to reshape how we think about platform engineering automation.** In today's episode, we break down Anthropic's game-changing AI release and what it means for platform teams worldwide. We dive deep into the autonomous capabilities that could revolutionize how we handle infrastructure operations, but also explore the new risks this creates for production environments. **What You'll Learn:** • How Claude Opus 4.6's coding performance impacts platform tooling decisions • Why autonomous AI operations require new safety frameworks • Practical strategies for identifying AI automa...
2026-02-06
16 min
Platform Engineering Playbook Podcast
Autonomous AI in DevOps Is Here — And Most Teams Are Doing It Wrong
**Will 87% of DevOps teams really be obsolete by 2026?** As AI agents take control of production infrastructure, we're witnessing the biggest transformation in platform engineering history. In today's episode, we dive deep into **autonomous AI agents in DevOps workflows** and explore how they're reshaping everything from monitoring to incident response. You'll discover real-world examples of AI agents managing production systems, plus critical insights on when and how to safely implement these powerful tools in your own infrastructure. **What You'll Learn:** • How AI agents are revolutionizing observability and SRE practices • Practical implementation strategies for autonomous moni...
2026-02-05
19 min
Platform Engineering Playbook Podcast
Kubernetes Is Retiring Ingress NGINX (And 50% of Clusters Aren’t Ready)
"90% of Kubernetes clusters are running Ingress NGINX—abandoned in 16 months with zero maintainers left! What does this mean for your production systems? In this episode, we dive deep into the urgent need for migration and the alternatives available as the clock ticks down. With the retirement of Ingress NGINX set for March 2026, it's critical to understand how this affects millions of deployments worldwide. If you're among the half still relying on Ingress NGINX, you can't afford to miss this episode. 🔑 What you'll learn: - The migration timeline and key deadlines you need to know. ...
2026-02-04
19 min
Platform Engineering Playbook Podcast
OpenAI’s New macOS App: Is Agentic Coding Finally Here?
**OpenAI just made 73% of coding assistants obsolete overnight - but what does this mean for platform engineers?** Today's episode breaks down OpenAI's game-changing macOS app for "agentic coding" and its massive implications for platform engineering workflows. We'll analyze why this isn't just another coding assistant, but a fundamental shift in how we approach infrastructure automation and developer tooling. **What You'll Learn:** ✅ Deep dive into OpenAI's new agentic coding capabilities and competitive advantages ✅ Critical risks platform teams need to consider (hallucinations, security, dependency management) ✅ How enterprise desktop computing is shifting toward immutable Linux system...
2026-02-03
13 min
Platform Engineering Playbook Podcast
98% of Container CVEs Are Hiding Where You’re Not Scanning
**Are your container security scans missing 98% of critical vulnerabilities?** New research from Chainguard reveals a shocking blind spot that could be exposing your infrastructure to massive security risks. In today's Platform Engineering Playbook, we unpack this bombshell finding and explore why traditional container scanning is failing at scale. You'll discover where these hidden vulnerabilities are lurking, why your current tools aren't catching them, and most importantly - what you can do about it. **What You'll Learn:** • Why 98% of container CVEs hide outside the top 20 images • The computational costs of comprehensive vulnerability scanning • How to...
2026-02-02
13 min
Platform Engineering Playbook Podcast
Why Forward-Deployed Engineers Are Making $300K+ (And Why Companies Are Desperate for Them)
Why are forward-deployed engineers making 40% more than traditional backend developers, and why can't companies hire enough of them? In today's Platform Engineering Playbook, we dive deep into tech's hottest new role and explore three critical platform engineering developments reshaping the industry. **What You'll Learn:** • The explosive rise of forward-deployed engineers and why they're commanding premium salaries • Real-world case studies from Snowflake and financial services implementations • Three essential skill areas every successful FDE needs to master • How Artera is revolutionizing prostate cancer diagnostics with AWS architecture • Cloudflare's innovative approach to vertical microfront...
2026-01-31
11 min
Platform Engineering Playbook Podcast
AWS DevOps Agent in Production: What Most Teams Get Wrong
**Why do 73% of AWS DevOps Agent deployments crash and burn in their first week?** It's not what you think. In this episode of Platform Engineering Playbook, we uncover the hidden culprits behind these shocking failure rates and reveal the systematic approach that separates successful platform teams from the rest. **What You'll Learn:** • The real reasons AWS DevOps Agent deployments fail (hint: it's not the code) • How to transform your incident response from "crowded stadium chaos" to "conference room clarity" • A practical framework for optimizing on-call rotations and team structure • Production-ready deployment strategi...
2026-01-30
16 min
Platform Engineering Playbook Podcast
AI Agents Are Rewriting the SRE Playbook (For Better or Worse)
What if AI agents could flip the script on SRE work, turning 87% of firefighting into 87% prevention? That's exactly what's happening in the "agentic revolution" transforming platform engineering teams. In today's Platform Engineering Playbook, we dive deep into how AI agents are reshaping SRE workflows and what this means for your platform strategy. We'll cut through the hype to examine the real-world gap between vision and current reality, then identify which SRE tasks are actually ready for agent automation. **What You'll Learn:** • The three characteristics that make SRE tasks perfect candidates for AI automation • Why...
2026-01-29
15 min
Platform Engineering Playbook Podcast
DevOps Is Dead — Platform Engineering Replaced It
**DevOps is dead - and the companies that created it are the ones pulling the trigger.** But what's replacing it might be the most significant shift in software delivery since containerization. In today's Platform Engineering Playbook, we dive deep into how Internal Developer Platforms are fundamentally reshaping the DevOps landscape. We'll explore why platform engineering has shed its experimental status and become the new standard for scaling development teams. **What You'll Learn:** • The five critical red flags that signal your platform needs immediate attention • Why the "black box problem" is derailing developer productivity • How to navigate the ingress-nginx archival and tr...
2026-01-28
19 min
Platform Engineering Playbook Podcast
47 Countries Went Offline — What Platform Engineers Must Learn From It
**What happens when 47 countries lose internet access in just 3 months—and it's not cyberattacks?** Today's Platform Engineering Playbook dives deep into the shocking Q4 2025 internet disruption data that reveals critical infrastructure vulnerabilities every platform engineer needs to understand. We'll analyze how cable cuts, storms, and DNS failures brought down entire regions, and more importantly—which companies survived and why. **What You'll Learn:** • The hidden patterns behind massive internet outages that caught most teams off guard • How resilient platform architectures saved companies millions during widespread disruptions • Specific signals to monitor for early detection of infrast...
2026-01-27
19 min
Platform Engineering Playbook Podcast
Two Missing Characters Nearly Compromised AWS’s Supply Chain
**What if two missing characters could compromise every AWS-managed GitHub repository?** That's exactly what happened in a critical regex vulnerability that exposed massive supply-chain risks. In today's Platform Engineering Playbook, we break down this shocking security flaw and explore how platform engineers can protect their infrastructure from similar attacks. You'll discover the technical details behind the vulnerability, learn essential webhook security practices, and understand why regex validation is more critical than ever. **What You'll Learn:** ✅ How a simple regex pattern flaw created enterprise-wide security risks ✅ Webhook signature verification best practices ✅ AI-powered Linux securi...
2026-01-26
15 min
Platform Engineering Playbook Podcast
Kubernetes Just Became Essential for AI Growth (CNCF Report)
**Why will 90% of AI workloads fail without Kubernetes in the next 18 months?** Most platform teams are walking into a disaster they can't see coming. In today's Platform Engineering Playbook, we break down the CNCF's shocking new survey results showing 82% of organizations are unprepared for AI infrastructure demands. Plus, we cover the Cloudflare BGP incident t hat took down major services and what it means for your platform resilience. **What You'll Learn:** ✅ Why Kubernetes is becoming make-or-break for AI workloads ✅ The hidden performance bottlenecks killing AI model deployments ✅ Actionable audit checklist for your current K8s setup ✅ How organizational culture trumps t...
2026-01-25
18 min
Platform Engineering Playbook Podcast
ChatGPT Scales PostgreSQL to power 800 million users
OpenAI is running ChatGPT for ~800 million users on PostgreSQL — and according to their own disclosures, it’s actually working. In this episode of the Platform Engineering Playbook Daily Podcast, we break down how PostgreSQL was pushed to hyperscale, the architectural tradeoffs behind a single-primary model, and the operational playbook that makes this kind of scale possible. This isn’t a generic “Postgres is great” story. It’s a real-world look at what it takes to run open-source databases at extreme scale, and what platform engineers can learn from it. ⏱️ Episod...
2026-01-24
19 min
Platform Engineering Playbook Podcast
3 Skills You Need to Transition to Platform Engineer
**Will 70% of DevOps engineers disappear in the next 5 years?** That's the bold prediction kicking off today's deep dive into the massive career shift happening in tech right now. In this episode of Platform Engineering Playbook, we explore the critical transition from DevOps to Platform Engineering and what it means for your career survival. You'll discover why traditional DevOps roles are evolving, how companies like Spotify are leading this transformation, and the concrete roadmap you need to navigate this shift successfully. **What You'll Learn:** • Why the DevOps-to-Platform Engineering transition is inevitable • Real-world examples from indu...
2026-01-23
16 min
Platform Engineering Playbook Podcast
The Infrastructure Monitoring Tools Teams Regret Choosing
The monitoring tool everyone trusts is actually blind to 40% of your infrastructure failures—and the vendor knows it. Are you using an industry standard that misses almost half of all incidents? In this episode, we unravel the mystery of infrastructure monitoring tools and why your choice could be costing you dearly. As platform engineering teams grapple with an overwhelming array of options—from battle-tested open source tools to shiny SaaS platforms—the stakes have never been higher. The shift in focus from simple server monitoring to comprehensive observability is crucial for modern development. 🔑 What you’ll learn in...
2026-01-22
17 min
Platform Engineering Playbook Podcast
Your CI/CD Pipeline is a Debt Trap
**73% of engineering teams are drowning in technical debt because of their CI/CD pipelines. Not despite them—because of them.** Are your automation tools secretly sabotaging your codebase? Today's Platform Engineering Playbook dives deep into the hidden ways CI/CD pipelines create technical debt and reveals practical strategies to break the cycle. **What You'll Learn:** • Why inheritance beats copying in platform design • Docker's new hardened images for bulletproof container security • How OpenTelemetry's log deduplication processor can slash your log volume • Critical vulnerabilities in Chainlit and Cloudflare you need to patch NOW • Acti...
2026-01-21
11 min
Platform Engineering Playbook Podcast
Kubernetes Just Revolutionized Learning — Get Ahead Now!
**Are major tech companies secretly abandoning Kubernetes certifications?** What we discovered about the future of K8s learning will change how you approach platform engineering in 2026. In today's Platform Engineering Playbook, we uncover why traditional Kubernetes education is becoming obsolete and what platform teams are doing instead. Plus, breaking news that could revolutionize your infrastructure stack. **What You'll Learn:** • Why the volume of Kubernetes resources reveals a hidden shift in the industry • Microsoft's game-changing Azure Functions announcement for Model Context Protocol servers • How Pinterest's Moka is rewriting big data processing rules with Kubern...
2026-01-20
17 min
Platform Engineering Playbook Podcast
How AWS's New Euro Cloud Changes Data Control Forever
"92% of European companies don’t trust US cloud providers with their data anymore. So, AWS just locked itself out of its own Euro Cloud! This shocking move raises critical questions about data sovereignty and compliance for businesses operating in Europe. In this episode, we dive deep into AWS's groundbreaking decision to create a completely isolated European cloud infrastructure, one that even Amazon employees can't access. Why would they cut off their own access, and what does this mean for your data strategy? 🔑 Learn about the implications of AWS's European Sovereign Cloud and how it represents a shif...
2026-01-19
16 min
Platform Engineering Playbook Podcast
Why Pulumi's New Move Could Change Terraform Forever
Terraform’s biggest competitor just made a move that could redefine infrastructure-as-code in 2026. Pulumi now runs Terraform and HCL natively—better than HashiCorp does. That’s not a migration tool, not a compatibility shim, but full native execution through the Pulumi engine, plus Terraform state hosted in Pulumi Cloud and financial credits to help teams exit existing HashiCorp contracts. In this episode of the Platform Engineering Playbook Daily Podcast, we break down why this announcement is one of the most important platform engineering stories of the year—and what it actually means for SREs, platform teams, a...
2026-01-18
15 min
Platform Engineering Playbook Podcast
Astro Joins Cloudflare: What It Means for Platform Engineers
Cloudflare acquires the Astro Technology Company, adding a 1M-downloads-per-week web framework to their edge platform. We analyze the strategic implications, what stays open source, and lessons about framework sustainability for platform engineering teams. Key Topics: - Astro framework overview: islands architecture, framework-agnostic components, content-first approach - Why Cloudflare acquired Astro: Developer ecosystem capture, edge compute alignment, workerd integration - Open source sustainability: MIT license preserved, historical patterns (Gatsby, Remix) - What changes for platform teams: Framework evaluation criteria, portability concerns, exit strategies - News: AWS European Sovereign Cloud, Let's Encrypt 6-day certs...
2026-01-17
13 min
Platform Engineering Playbook Podcast
ScyllaDB X Cloud Challenges DynamoDB Cost and Performance
ScyllaDB just launched X Cloud with claims of double the performance at half the cost compared to DynamoDB. This episode breaks down the technical architecture behind their tablet-based approach, how they're achieving 80% data compression on ARM Graviton4 instances, and when this actually makes sense for platform engineering teams running high-throughput workloads. Key Topics: - ScyllaDB X Cloud tablet-based architecture (5GB chunks) vs traditional consistent hashing - Claims of 6x performance improvement with 50% cost reduction vs DynamoDB - 80% compression on ARM Graviton4 instances, 25x faster data streaming - High-throughput workload targets: Discord, Disney, Starbucks...
2026-01-16
11 min
Platform Engineering Playbook Podcast
Invisible Linux Malware: The Undetectable Threat to Your Cloud Infrastructure
Your Linux servers aren't just running containers anymore—they're hosting invisible tenants that security teams can't even detect. In this episode, we deep dive into VoidLink, the new cloud-native malware framework that Check Point Research just uncovered. This isn't your typical malware that got retrofitted for the cloud—this thing was born in the cloud, designed from the ground up to evade every detection tool in your security stack. We explore: • How VoidLink achieves its terrifying persistence in cloud environments • Why every major cloud provider is vulnerable to this new threat class • eBPF-based...
2026-01-15
16 min
Platform Engineering Playbook Podcast
The AI-Cloud Native Symbiosis - How Intelligent Infrastructure is Transforming Platform Engineering
By 2025, 90% of new enterprise applications will be AI-powered and cloud-native. This episode explores the symbiotic relationship between AI and Kubernetes - where AI isn't just another workload, but is fundamentally transforming how we build and operate cloud native platforms. We cover real-world examples like Netflix's predictive scaling achieving 92% accuracy, the emergence of AI-driven observability platforms, and why platform engineers need to evolve from infrastructure operators to AI-infrastructure orchestrators. In this episode: - AI transforming the Kubernetes control plane with predictive scheduling - Netflix's AI-driven traffic management: 92% prediction accuracy, 35% resource reduction - AI-native observability: anomaly...
2026-01-14
14 min
Platform Engineering Playbook Podcast
MIT 10 Breakthrough Technologies 2026 - The Platform Engineering Perspective
MIT just released their 10 Breakthrough Technologies for 2026 - and three of them are infrastructure problems that platform engineers are solving right now. This episode explores hyperscale AI data centers consuming 96 GW globally by 2026, vibe coding with 41% of code now AI-generated, and LLM interpretability research from Anthropic. We break down how platform engineers enable these breakthroughs through power-aware scheduling, AI coding guardrails, and new observability patterns for ML systems. In this episode: - Hyperscale AI data centers: 96 GW capacity, $600B capex, 100+ kW per rack - Vibe coding: 92% developer AI adoption, GitHub Copilot at 20M users ...
2026-01-13
20 min
Platform Engineering Playbook Podcast
AWS Route 53 Global Resolver - Enterprise DNS Security at the Edge
Every DNS query your hybrid environment makes could be exposing sensitive data. AWS Route 53 Global Resolver, announced at re:Invent 2025, combines anycast routing, encrypted DNS protocols (DoH/DoT), and managed threat filtering in a single service. In this episode, we cover: - Anycast DNS architecture routing to nearest of 11 AWS regions - DoH and DoT encrypted DNS protocol support - AWS RAM authorization for multi-account private hosted zones - DNS filtering with managed threat lists - Implementation patterns for hybrid environments and remote workforces - Query logging for security visibility and...
2026-01-12
20 min
Platform Engineering Playbook Podcast
Kubernetes Upcoming Features Deep Dive - Extended Toleration Operators and Mutable PV Node Affinity
There's a Kubernetes cluster out there right now burning ten thousand dollars a month on GPU nodes that sit idle sixty percent of the time. Why? Because the scheduler can't say "only schedule pods on nodes with MORE than four GPUs." It's 2026, and our scheduler still can't count. But that's about to change. In this episode, we dive deep into two alpha features in Kubernetes 1.35 that represent a fundamental shift in how Kubernetes handles scheduling and storage: **Extended Toleration Operators (KEP-5471)** - Finally, numeric threshold-based scheduling with taints. New Gt (greater than) and Lt (less...
2026-01-11
41 min
Platform Engineering Playbook Podcast
Why Is a 2016 AWS Instance Still the Best Value? (Cloudspecs Research)
New research from TUM reveals uncomfortable truths about cloud hardware stagnation. The paper "Cloudspecs: Cloud Hardware Evolution Through the Looking Glass" shows that the best-performing AWS instance for NVMe I/O per dollar was released in 2016 - and nothing since has come close. In this episode: • CIDR 2026 research from Technical University of Munich • AWS i3 instances from 2016 still beat all newer options for storage price-performance • CPU gains: 10x cores, but only 2-3x cost-adjusted improvement • Memory crisis: DRAM capacity per dollar has "effectively flatlined" • Network is the only bright spot: 10x improvement per dollar...
2026-01-10
20 min
Platform Engineering Playbook Podcast
Iran IPv6 Blackout - When Governments Weaponize Protocol Transitions
The same IPv6 transition your infrastructure team has been procrastinating on is now being weaponized by governments. On January 8, 2026, Iran's IPv6 address space dropped 98.5% while IPv4 remained intact—a surgical strike against mobile users. In this episode, we break down: - Why blocking IPv6 specifically targets mobile users (hint: carrier NAT exhaustion) - The BGP mechanics of protocol-specific blocking - "Engineered degradation" vs total blackout—the new censorship playbook - How Starlink terminals are changing the calculus for authoritarian internet control - What platform engineers need to know: protocol-specific monitoring, Happy Eyeballs test...
2026-01-09
24 min
Platform Engineering Playbook Podcast
Venezuela BGP Anomaly - Deep Technical Analysis
A deep technical dive into the January 2026 Venezuela BGP route leak incident. Was it a cyberattack? The technical evidence says no - and that's actually more concerning. In this special deep-dive episode (no news segment), Jordan and Alex break down: - What actually happened on January 2, 2026 with AS8048 (CANTV, Venezuela's state ISP) - Why 10x AS-path prepending proves this was misconfiguration, not a man-in-the-middle attack - How BGP valley-free routing works and why Type 1 Hairpin leaks happen - The pattern of 11 similar leaks from CANTV since December 2025 - Why your multi-region...
2026-01-08
28 min
Platform Engineering Playbook Podcast
HolmesGPT: AI Root Cause Analysis for Kubernetes
Deep dive into HolmesGPT, the CNCF Sandbox AI agent that revolutionizes cloud-native troubleshooting. This episode covers what it is, its 40+ integrations, the project roadmap, and how to set it up today. News Segment: AirFrance-KLM's secure automation platform with Terraform, Vault, and Ansible AWS ECS tmpfs mounts on Fargate for secure secrets handling Qwen 30B running on Raspberry Pi - democratizing edge AI AWS European Sovereign Cloud with independent EU governance Main Topic - HolmesGPT: CNCF Sandbox project (accepted October 2025) with 1,600+ GitHub stars Agentic architecture: creates investigation task lists, queries systems, synthesizes findings 40+ built-in toolsets...
2026-01-08
25 min
Platform Engineering Playbook Podcast
Docker Kanvas: Infrastructure as Design
Docker just launched Kanvas, a visual tool that turns your architecture diagrams into deployable infrastructure. Built on Meshery (CNCF's 6th highest-velocity project), it converts Docker Compose files to Kubernetes manifests and challenges Helm and Kustomize dominance. In this episode, we explore: - The dev-to-prod gap that Kanvas solves - How Meshery Models add semantic understanding to infrastructure - Designer Mode vs Operator Mode capabilities - When to use Helm vs Kustomize vs Kanvas - Practical adoption strategies for platform teams Whether you're struggling with YAML hell or looking to lower...
2026-01-07
23 min
Platform Engineering Playbook Podcast
Remote MCP Architecture - Running AI Tool Servers on Kubernetes
The MCP server registry hit 10,000+ integrations, but most teams are running these servers on laptops. This episode breaks down the production architecture that Google, Red Hat, and AWS are converging on: remote MCP servers deployed on Kubernetes. We cover three deployment patterns (local stdio, remote HTTP/SSE, and managed), the critical difference between wrapper-based and native API implementations, and a defense-in-depth security model using dedicated ServiceAccounts, time-bound tokens, RBAC, and audit logging. In this episode: - Remote MCP is production MCP—local stdio mode is for experimentation only; team-scale access requires HTTP/SSE mode - Na...
2026-01-06
23 min
Platform Engineering Playbook Podcast
AWS DevOps Agent - Promises vs Reality
AWS launched DevOps Agent at re:Invent 2025 as an "autonomous on-call engineer." But before you cancel your PagerDuty subscription, we separate marketing from mechanics. NEWS THIS EPISODE: • KubeCon Europe 2026: March 23-26 in Amsterdam, 224 sessions across 5 tracks • Platform Engineering 2026 Predictions: Agentic infrastructure becomes standard In this deep-dive episode, we cover: WHAT IT PROMISES: • Always-on AI that investigates incidents 24/7 • Automatic root cause analysis across logs, metrics, traces, and deployments • Mitigation plan generation with step-by-step remediation • Integration with CloudWatch, Datadog, Dynatrace, New Relic, Splunk WHAT IT ACTUALLY DELIVERS: • Agent...
2026-01-05
26 min
Platform Engineering Playbook Podcast
AWS Graviton5: 192 Cores, 5x Cache - ARM Takes Over the Data Center
AWS doubled the core count on their flagship ARM processors with Graviton5—192 cores in a single socket, 5x L3 cache (180MB), and 3nm fabrication. We go deep on ARM vs x86 architecture, cache hierarchy latencies, NUMA elimination benefits, formal verification security proofs, and a complete migration framework with multi-arch CI/CD patterns. With 98% of top EC2 customers already on Graviton, the ARM tipping point is now. Duration: ~22 minutes This episode covers: - 192-core single socket design eliminating NUMA overhead - 180MB L3 cache enabling database working sets to fit entirely in cache ...
2026-01-04
23 min
Platform Engineering Playbook Podcast
Can OpenTelemetry Save Observability in 2026?
OpenTelemetry has won the instrumentation wars with 95% adoption predicted for 2026. But winning data collection doesn't solve observability's real problems: spiraling costs, signal-to-noise ratios declining, and too much distance between seeing a problem and fixing it. In this episode, we break down: • Netflix's evolution to high-cardinality analytics processing 1M+ spans per episode • The cost-control chokepoint that OTel enables for telemetry optimization • Why 40% of organizations are targeting autonomous remediation by end of 2026 • How SLOs are becoming business conversations, not just engineering metrics Plus news on GitHub Actions 39% pricing reduction and Jaeger v2.14.0 legacy removal...
2026-01-03
17 min
Platform Engineering Playbook Podcast
When Serverless Fails: Unkey's 6x Performance Migration to Containers
Why did an API key management platform abandon edge serverless for stateful containers? Unkey hit 30ms p99 cache latency when they needed sub-10ms—so they rebuilt everything on AWS Fargate. This episode covers the technical decision-making framework for choosing between serverless and containers, plus a deep dive into Kubernetes 1.35's new structured z-pages for debugging. In This Episode: - The serverless constraint: stateless = network request for every cache read - Unkey's complexity tax: Workers, Durable Objects, Queues, custom proxies - The container solution: Fargate + Global Accelerator = 6x performance - Decision framework: latency ta...
2026-01-02
19 min
Platform Engineering Playbook Podcast
From Alert Fatigue to Signal-Driven Ops: The Observability Shift
Why do 73% of organizations experience outages from alerts they ignored? This episode breaks down the technical shift from reactive thresholds to SLO-driven observability. Learn multi-window burn-rate alerting patterns, AIOps implementations that actually work, and an 8-week migration path to cut alert noise by 80%. In This Episode: - The alert fatigue paradox: 2000+ weekly alerts with only 3% actionable - Technical causes: static thresholds, compound rule blind spots, alert storms - SLO-driven observability: error budgets and multi-window burn-rate alerting - AIOps patterns that work: anomaly detection, event correlation, RCA acceleration - Practical 8-week migration path...
2026-01-01
21 min
Platform Engineering Playbook Podcast
Security Ops Specialty: The Underrated Skill Every Platform Engineer Needs in 2026
Platform engineers who understand security operations—secrets management, vulnerability scanning, and compliance automation—are commanding premium salaries in 2026. This episode breaks down the security ops specialty: what it includes, why organizations are desperate for it, and how to build these skills alongside your existing platform engineering expertise. In this episode: • Security ops specialty encompasses secrets management, vulnerability scanning, policy-as-code, and compliance automation • Organizations are struggling to find platform engineers with security depth—creating a supply-demand gap • The 2025 State of Secrets report shows 70% of organizations experienced a secrets-related incident • Key tools include HashiCorp Vault, Trivy, OPA/Gat...
2025-12-31
19 min
Platform Engineering Playbook Podcast
Agentic AI Foundation - MCP and the Future of AI-Native Platform Engineering
The Linux Foundation announced the Agentic AI Foundation (AAIF) on December 9, 2025, bringing together AWS, Anthropic, Google, Microsoft, OpenAI, Block, Cloudflare, and Bloomberg. This episode breaks down MCP (Model Context Protocol) - the "HTTP for AI" with 97M+ monthly downloads. 📰 NEWS: Docker hardened images now free, MongoBleed CVE patch alert, Cloudflare "Fail Small" resilience plan, DORA metrics with Process Behavior Charts 🎯 Key Topics: • What AAIF and MCP mean for platform teams • MCP architecture: Hosts, Clients, and Servers • The N×M to N+M integration simplification • Security: OAuth flows, permission scopes, audit logging • Practical...
2025-12-30
14 min
Platform Engineering Playbook Podcast
FinOps 2026 for Platform Engineers: The Complete Skills Guide
FinOps is becoming an essential skill for platform engineers in 2026. This episode provides a complete guide to the skills, certifications, and tools you need to add cloud cost management to your platform engineering toolkit. 📰 News Segment: • GPG.fail documents 14 critical GnuPG vulnerabilities - check your signing tools • MongoBleed CVE-2025-14847: Critical MongoDB exploit - patch immediately • The Dangers of SSL Certificates: Catastrophic failure modes in automation • Google Multi-Cluster Orchestrator: Cross-region K8s management (KubeCon 2025) • GPG cleartext signature parsing vulnerabilities found 💡 Key Takeaways: • Platform teams own 70%+ of cloud spending decisions • FinOps + Pla...
2025-12-29
16 min
Platform Engineering Playbook Podcast
Platform Engineering Salary Report 2026: Skills That Pay
Platform engineers are commanding $172K-$207K in 2026, a 13-27% premium over DevOps roles. This episode breaks down salary benchmarks from Dice, Motion Recruitment, and Levels.fyi, revealing which skills are S-tier ($200K+) and which are table stakes. We cover: - Platform Engineer vs DevOps salary gap (13-27% premium) - S-tier skills: LLM/GenAI ($195K-$312K), Platform Engineering, DevSecOps, MLOps - A-tier skills: Kubernetes + CKA, Go/Golang, FinOps, OpenTelemetry - Entry-level hiring crisis (-25% to -50% at major tech) - Geographic salary shifts: Atlanta +13.9%, Silicon Valley -7.3% - Top certification ROI: CKA...
2025-12-28
17 min
Platform Engineering Playbook Podcast
Platform Engineering 2026 Predictions Roundup (Platform Engineering 2026 Look Forward Series - Part 5/5)
The series finale of our five-part Platform Engineering 2026 Look Forward Series. We synthesize everything from agentic AI operations, mainstream adoption, developer experience metrics, and boring Kubernetes into ten concrete predictions for 2026. Learn what to invest in versus ignore, and discover our 2026 platform engineering thesis. In this episode: - High confidence predictions: IDP market consolidates into 3 tiers, AI-assisted operations becomes table stakes, policy-as-code becomes table stakes - Medium confidence predictions: Talent gap peaks H1 2026 then stabilizes, "Platform team of one" becomes technically viable - INVEST IN: Developer experience measurement, self-service capabilities, golden paths, AI-assisted incident...
2025-12-27
16 min
Platform Engineering Playbook Podcast
Kubernetes Enters the Boring Era (Platform Engineering 2026 Look Forward Series - Part 4/5)
The best thing happening to Kubernetes in 2026 is that it's becoming boring. After a decade of explosive innovation, Kubernetes is entering its "mature infrastructure" phase - stable, predictable, and increasingly invisible. Like Linux and PostgreSQL before it, boring Kubernetes enables platform teams to build abstractions without worrying about breaking changes. Part of the Platform Engineering 2026 Look Forward Series. In this episode: - Boring infrastructure is mature infrastructure - Linux and PostgreSQL became boring, then conquered the world - K8s 1.32-1.35 pattern: incremental stability, small refinements, no paradigm shifts - Innovation is moving up...
2025-12-26
14 min
Platform Engineering Playbook Podcast
Developer Experience Metrics Beyond DORA (Platform Engineering 2026 Look Forward Series - Part 3/5)
DORA metrics revolutionized how we measure DevOps performance, but they have a critical blind spot: they tell you how your delivery pipeline is performing, but not how your people are doing. This episode explores the SPACE framework, DX Core 4, cognitive load measurement, and the HEART framework for platform teams. Part of the Platform Engineering 2026 Look Forward Series. In this episode: - DORA tells you the what but not the how or the at what cost - teams can hit every DORA metric while engineers burn out - SPACE framework: Satisfaction, Performance, Activity, Communication, and Efficiency...
2025-12-24
13 min
Platform Engineering Playbook Podcast
Platform Engineering Goes Mainstream in 2026 (Platform Engineering 2026 Look Forward Series - Part 2/5)
Episode 2 of our 5-part "Platform Engineering 2026 Look Forward Series" examines the macro trend: platform engineering crossing the chasm to mainstream adoption. Gartner predicts 80% of software engineering organizations will have platform teams by 2026. The CNPE certification launched at KubeCon 2025. But there's a 56% talent gap and nearly half of initiatives run on under $1M annually. We address the "DevOps rebranding" debate with a 5-question litmus test: 1. Do you have internal customers (developers)? 2. Do you measure developer satisfaction? 3. Do you have a product roadmap? 4. Can developers self-serve without tickets? 5. Do you deprecate platform...
2025-12-23
16 min
Platform Engineering Playbook Podcast
Agentic AI Transforms Platform Operations in 2026 (Platform Engineering 2026 Look Forward Series - Part 1/5)
Episode 1 of our 5-part "Platform Engineering 2026 Look Forward Series" tackles the hottest debate in platform engineering: will AI agents replace us or amplify us? AWS Frontier Agents can reason across 30+ steps. The MLOps market hits $129 billion by 2028. Netflix AI triage cuts MTTR by 40%. But where are the hard limits? We introduce the 60/30/10 Framework: - 60% Delegate: Log analysis, runbook execution, cost optimization - 30% Augment: Incident response, capacity planning (AI suggests, human confirms) - 10% Guard: Architecture decisions, security posture, novel failures The key insight: the 20% AI can't do is 80% of the value.
2025-12-22
21 min
Platform Engineering Playbook Podcast
CNPE (Certified Cloud Native Platform Engineer) Certification Study Guide
The CNPE (Certified Cloud Native Platform Engineer) exam launched November 11, 2025 at KubeCon Atlanta, becoming the first hands-on platform engineering certification in five years. This deep dive covers exam format, all five domains, and a complete study guide. Key Points: • CNPE is hands-on: 17 tasks in 2 hours, 64% pass score • Five domains: GitOps/CD (25%), Platform APIs (25%), Observability (20%), Architecture (15%), Security (15%) • BACK stack: Backstage, Argo CD, Crossplane, Kyverno • Golden Kubestronaut requires CNPE after March 2026 • Career impact: Platform engineer salaries $160K-$220K Resources: • Episode page: https://platformengineering.org/podcasts/00066-cnpe-certification-study-guide • CNPE Exam: https://training.linuxfoundatio...
2025-12-21
18 min
Platform Engineering Playbook Podcast
Kubernetes 1.35 Timbernetes Deep Dive: Breaking Changes, In-Place Resize GA, Gang Scheduling
Kubernetes 1.35 "Timbernetes" dropped on December 17, 2025, fundamentally changing how we operate clusters. This deep dive covers the 60 enhancements, 3 breaking changes that will bite you if unprepared, and in-place pod resize graduating to GA after six years of development. What You'll Learn: • Breaking Changes: cgroup v1 REMOVED (not deprecated), containerd 1.x EOL, IPVS deprecated • In-Place Pod Resize GA: Resize CPU/memory without pod restart - 6 years from KEP to stable • Pod Certificates Beta: Native kubelet-managed mTLS for zero-trust pod-to-pod auth • Gang Scheduling Alpha: Native all-or-nothing scheduling for AI/ML distributed training • Alpha Features: Node Declared F...
2025-12-20
19 min
Platform Engineering Playbook Podcast
Terraform Stacks + Native Monorepo Support: HashiCorp's Answer to IaC Complexity
No more copy-paste configs. No more manual state management. Terraform just went component-based. HashiCorp released native monorepo support and Terraform Stacks to GA on September 25, 2025. This is the biggest architectural shift since Terraform modules. Instead of directory-per-environment with duplicate configurations, you define components once and deploy multiple times with isolated state. We explain components (lifecycle-aware resource groups in .tfstack.hcl files), deployments (isolated instances with separate state), orchestration rules (context-aware automated approvals), linked stacks (declarative cross-stack dependencies), migration paths from Terragrunt, and when platform teams should adopt. NEWS SEGMENT: • Terraform Stacks + Monorepo (GA...
2025-12-20
17 min
Platform Engineering Playbook Podcast
95% Fewer CVEs, $0 Cost: Docker Just Open-Sourced Enterprise Security
Supply chain attacks cost $60 billion in 2025. Docker just made the solution free. On December 17, Docker released 1,000+ hardened container images under Apache 2.0—previously a paid offering. Independent penetration testing by SRLabs confirmed 95% CVE reduction and found NO root escapes or container breakouts. These images use distroless runtime: no shell, no package manager, no attack surface. We break down how distroless actually works (why removing /bin/sh matters), SLSA Level 3 cryptographic provenance, SBOM/VEX for killing alert fatigue, multi-stage build migration patterns, debugging without a shell (kubectl debug), and how Docker compares to Chainguard Wolfi, Google distroless, an...
2025-12-19
18 min
Platform Engineering Playbook Podcast
Kubernetes 1.35 "Timbernetes" - The End of the Pod Restart Era
Kubernetes 1.35 is here, and it changes everything about pod lifecycle management. In this episode, we break down the release that finally lets you scale pods without restarting them. In This Episode: - In-Place Pod Vertical Scaling goes GA - adjust CPU/memory without pod restarts - Breaking changes: cgroup v1 removed, containerd 1.x EOL, IPVS deprecated - Pod Certificates (beta) for native workload identity without cert-manager - 60 enhancements: what matters for platform teams - Practical upgrade checklist and timing guidance News Segment: - Docker makes 1,000+ hardened container images free (95...
2025-12-18
15 min
Platform Engineering Playbook Podcast
40,000x Fewer Deployment Failures: How Netflix Adopted Temporal
Netflix reduced their deployment failures by 40,000x using Temporal. In this episode, we break down how they achieved this remarkable improvement and what it means for your platform engineering practice. In This Episode: - Netflix's deployment reliability problem: 4% failure rate from transient cloud operations - What is durable execution? Write code as if failures don't exist - Temporal vs AWS Step Functions vs Apache Airflow vs Cadence comparison - Netflix's Spinnaker/Clouddriver implementation with 2-hour fix-forward window - When Temporal is (and isn't) the right choice for your organization Key...
2025-12-17
17 min
Platform Engineering Playbook Podcast
Kubernetes: Helm vs Crossplane vs kro (Honest Comparison)
48% of Kubernetes users struggle with tool choice. That's nearly half of us paralyzed by options. So when AWS adopted kro alongside Argo CD, we had to ask: is this the Goldilocks solution we've been waiting for? In this episode, Jordan and Alex tackle the composition tool landscape with an honest decision framework. We dive deep into CEL expressions, resource graph mechanics, and GitOps integration. We also give Viktor Farcic's criticism a fair hearing, and explain exactly when kro makes sense - and when it doesn't. News Segment: • Shai-Hulud npm supply chain attack postmortem - 500+ pa...
2025-12-16
22 min
Platform Engineering Playbook Podcast
Platform Engineering 2025 Year in Review
2025 was the year platform engineering grew up—and got a reality check. AI entered infrastructure in ways we couldn't ignore. Industry consensus finally emerged on what platforms should actually do. And Cloudflare went down six times to remind us that concentration risk isn't just theoretical. In this special year-in-review episode, we look back at the ten stories that defined platform engineering in 2025: ✅ AI-native Kubernetes arrived (DRA GA, AI Conformance v1.0) ✅ Platform engineering reached consensus—but 70% still fail ✅ Infrastructure concentration risk became undeniable (AWS + Cloudflare) ✅ IngressNightmare exposed 43% of cloud environments ✅ Open source sustainability...
2025-12-15
19 min
Platform Engineering Playbook Podcast
Okta's GitOps Journey - Scaling ArgoCD from 12 to 1,000 Clusters
In five years, Okta scaled Auth0's private cloud from 12 to 1,000+ Kubernetes clusters using ArgoCD. At KubeCon 2025, engineers Jérémy Albuixech and Kahou Lei shared their hard-won lessons. This episode breaks down the challenges, solutions, and practical wisdom for scaling GitOps to enterprise levels. Full episode page: https://platformengineeringplaybook.com/podcasts/00058-okta-gitops-argocd-1000-clusters In this episode, we cover: - The 83x scaling journey: from 12 clusters in 2020 to 1,000+ in 2025 - Five major challenges at scale: controller degradation, centralized bottlenecks, application explosion, global latency, observability gaps - Five key solutions: controller sharding, ArgoCD Ag...
2025-12-14
15 min
Platform Engineering Playbook Podcast
Platform Engineering Team Structures That Work
Ninety percent of organizations now have platform teams, but most just renamed their ops team and expected different results. This episode breaks down the team sizes, reporting structures, and interaction patterns backed by DORA 2025 data that separate successful platform teams from glorified ticket handlers. Full episode page: https://platformengineeringplaybook.com/podcasts/00057-platform-engineering-team-structures In this episode, we cover: - DORA 2025 shows 90% of orgs have platforms, 76% have dedicated teams—when done right, 8% individual productivity boost and 10% team productivity boost - Optimal team size is 6-12 people (Spotify squads, Microsoft 5-9)—small enough for ownership, large enou...
2025-12-13
17 min
Platform Engineering Playbook Podcast
CDKTF Deprecated - The End of HashiCorp's Programmatic IaC Experiment
HashiCorp (now IBM) has officially archived the CDK for Terraform project, ending a five-year experiment in programmatic infrastructure-as-code. Full episode page: https://platformengineeringplaybook.com/podcasts/00056-cdktf-deprecated-iac-migration In this episode, we break down: - Why CDKTF failed to find product-market fit (243K downloads vs Pulumi's 1.1M) - The four key factors behind the deprecation: Pulumi's head start, JSII complexity, HCL "good enough", IBM acquisition timing - Community reaction and the "rug pull" sentiment - Migration paths: HCL (cdktf synth --hcl), Pulumi, OpenTofu, or AWS CDK - What platform engineers should learn...
2025-12-12
14 min
Platform Engineering Playbook Podcast
stern v1.33.1 - Listen to the Docs with AudioDocs
🎧 AUDIODOCS: Official documentation of popular open-source projects, adapted and narrated for audio. Learn while commuting, exercising, or doing chores. Stop juggling terminal windows to tail Kubernetes logs. stern lets you tail multiple pods and containers simultaneously with regex queries, auto-detection of new pods, and color-coded output. This episode covers everything from basic usage to advanced templates and filtering. WHAT YOU'LL LEARN: 00:00 - Introduction & The Problem stern Solves 01:30 - Basic Usage: Regex and Resource Queries 03:00 - Multi-Container Tailing & Filtering 04:30 - Namespace, Label, and Node Filtering 06:00 - Output Formatting & Custom Templates 07:30 - T...
2025-12-11
15 min
Platform Engineering Playbook Podcast
CoreDNS v1.13.1 - Listen to the Docs with AudioDocs
🎧 AUDIODOCS: Official documentation of popular open-source projects, adapted and narrated for audio. Learn while commuting, exercising, or doing chores. Master CoreDNS, the default DNS server for Kubernetes clusters. This 72-minute episode covers the complete v1.13.1 documentation - from plugin architecture to production configuration. Every time a pod looks up a service, every time kubectl exec needs to find a pod - CoreDNS handles that resolution. If you're debugging DNS issues or optimizing cluster performance, this comprehensive audio guide has you covered. WHAT YOU'LL LEARN: 00:00 - Introduction & Overview 02:30 - Project Context: CNCF Gra...
2025-12-11
1h 13
Platform Engineering Playbook Podcast
kubectx & kubens v0.9.5 - Listen to the Docs with AudioDocs
🎧 AUDIODOCS: Official documentation of popular open-source projects, adapted and narrated for audio. Learn while commuting, exercising, or doing chores. Stop typing long kubectl config commands! kubectx and kubens are essential CLI tools that let you switch Kubernetes contexts and namespaces instantly with tab completion and fuzzy search. This 10-minute episode covers everything you need to know about v0.9.5 - from installation to power-user workflows. If you work with multiple Kubernetes clusters, these tools will save you hours every week. WHAT YOU'LL LEARN: 00:00 - The Problem: Why kubectl Context Switching is Painful 01:30 - k...
2025-12-11
10 min
Platform Engineering Playbook Podcast
AWS re:Invent 2025 Recap 4/4 - Data & AI Wrap-Up
Part 4 of 4 in our AWS re:Invent 2025 series (finale). The data and AI services that tie everything together. S3 Tables with Apache Iceberg hits GA with Intelligent-Tiering and cross-region replication. Aurora DSQL delivers distributed SQL with GPS atomic clocks. S3 Vectors supports 2 billion vectors at 90% lower cost. Clean Rooms ML enables privacy-enhanced synthetic datasets. Plus a comprehensive wrap-up connecting 50+ announcements across all four episodes. News: Envoy CVE-2025-0913, Rust in Linux kernel permanent, Let's Encrypt 10 years. In this episode: - S3 Tables GA with Intelligent-Tiering (80% cost savings) and automatic cross-region replication for Iceberg tables ...
2025-12-11
24 min
Platform Engineering Playbook Podcast
AWS re:Invent 2025 Recap Part 3/4 - EKS & Cloud Operations
Part 3 of our AWS re:Invent 2025 series. AWS transforms Kubernetes into an AI infrastructure platform with massive scale and AI-native operations. In this episode: - EKS Ultra Scale: 100,000 nodes per cluster (vs 15K GKE, 5K AKS)—1.6 million Trainium accelerators or 800K GPUs in a single cluster - AWS replaced etcd's Raft consensus with their internal "journal" system and moved to in-memory storage for 500 pods/sec at 100K scale - Anthropic using EKS Ultra Scale for Claude training, improving latency KPIs from 35% to 90%+ - EKS Capabilities: Fully managed Argo CD, AWS Controllers for Kubernetes (200+ CR...
2025-12-10
17 min
Platform Engineering Playbook Podcast
AWS re:Invent 2025 Part 2/4 - Infrastructure & Developer Experience
AWS re:Invent 2025 Series (Part 2 of 4) AWS announces Graviton5 with 192 cores (3x previous gen) and 40% better price-performance vs x86. Trainium 3 delivers 4.4x performance at 50% lower cost, with NeuronLink eliminating 50% network overhead. Lambda Durable Functions enable year-long workflows. Werner Vogels introduces the "Renaissance Developer" framework for the AI era. Plus: BellSoft's hardened Java images cut CVEs by 95%, GitHub Actions package management security gaps exposed, and Proxmox releases VMware escape hatch. Links & Resources: - Full episode page: https://platformengineering.org/podcasts/00050-aws-reinvent-2025-infrastructure-developer-experience - BellSoft Hardened Images: https://www.infoq.com/news/2025/12/bellsoft-hardened-images/ ...
2025-12-09
14 min
Platform Engineering Playbook Podcast
AWS re:Invent 2025 Part 1/4 - The Agentic AI Revolution
AWS announces autonomous AI agents that can work for days without human intervention. The DevOps Agent is an always-on incident responder. The Security Agent understands your application architecture. And Kiro is already used by hundreds of thousands of developers. This is part 1 of our 4-part AWS re:Invent 2025 coverage series. KEY TOPICS: • Frontier Agents: DevOps Agent, Security Agent, and Kiro • DevOps Agent: 24/7 incident response with human-in-the-loop approval • Security Agent: Context-aware security from design through deployment • Kiro: GA autonomous developer agent used internally at Amazon • Bedrock AgentCore: Policy controls, memory, and 13 evaluation...
2025-12-08
17 min
Platform Engineering Playbook Podcast
Developer Experience Metrics Beyond DORA
DORA metrics revolutionized how we measure DevOps performance, but are we missing the bigger picture? This episode explains DORA from the ground up—the four key metrics, how they're measured, and why elite teams deploy more AND fail less. Then we explore what DORA misses: developer satisfaction, cognitive load, and flow state. From SPACE to DevEx to DX Core 4, discover the frameworks changing how we measure developer productivity. In this episode: - DORA's Four Key Metrics: Deployment Frequency, Lead Time, Change Failure Rate, and MTTR (now Failed Deployment Recovery Time) - Elite vs Low performers: El...
2025-12-07
13 min
Platform Engineering Playbook Podcast
Cloudflare's Trust Crisis - December 2025 Outage and the Human Cost
Three weeks after their worst outage since 2019, Cloudflare went down again. On December 5, 2025, a Lua code bug took down 28% of HTTP traffic for 25 minutes - the sixth major outage of 2025. Beyond the technical postmortem, this episode examines the pattern of repeated failures, community reactions, and the often-overlooked human cost to on-call engineers. 📰 News Segment Links: • KubeCon Survey: How Platform Teams Are Adopting AI and IDPs https://thenewstack.io/kubecon-survey-how-platform-teams-are-adopting-ai-and-idps/ • GitHub Actions workflow dispatch now supports 25 inputs https://github.blog/changelog/2025-12-04-actions-workflow-dispatch-workflows-now-support-25-inputs • Hybrid Cloud-Native Networking in Enterprise - Louis Ryan (Google)...
2025-12-06
11 min
Platform Engineering Playbook Podcast
Cloud Cost Quick Wins for Year-End
Global cloud spend hits $720 billion in 2025—and organizations waste 20-30% on unused resources. Year-end is the perfect time to show savings before budgets reset. In this episode, Jordan and Alex deliver six actionable quick wins you can implement THIS WEEK: 💰 The Six Wins: 1️⃣ Scheduling non-prod environments → 70% savings 2️⃣ Right-sizing oversized instances → 25-40% per instance 3️⃣ Reserved Instances/Savings Plans → Up to 72% discount 4️⃣ Spot instances for CI/CD → 60-90% savings 5️⃣ Storage tiering → Move cold data to Glacier 6️⃣ Zombie resource hunt → $500-2K/month per account 📋 Monday Morning Checklist: • Run cloud cost analyzer (30 min) • Find top 5...
2025-12-05
12 min
Platform Engineering Playbook Podcast
Platform Engineering vs DevOps vs SRE - The Identity Crisis
Platform Engineer roles pay 20% more than DevOps Engineer roles, but job descriptions are 90% identical. Is Platform Engineering just DevOps with better marketing? In this episode, we cut through the confusion with origin stories, philosophy comparisons, and practical career advice. Key insights: • Platform Engineer job postings grew 40% YoY while DevOps postings declined 15% • DevOps (2009) was a movement—never meant to be a job title • SRE (2003/2016) introduced Google's 50% engineering time rule • Platform Engineering (2018-2020) brought product thinking to internal tools • The 20% salary premium is for product thinking, not the title Decision framework: S...
2025-12-04
16 min
Platform Engineering Playbook Podcast
Platform Engineering Certification Tier List 2025
Are certifications worth it? The answer is: it depends. And that's precisely the problem. In this episode, Jordan and Alex rank 25+ certifications using a data-driven 60/40 framework (60% skill-building, 40% market signal). 🎯 The Certification Dilemma: • Platform engineers span Kubernetes, cloud, observability, security, and developer experience • No single certification captures that breadth • Most certifications prove you can cram for exams, not solve production problems 📊 Key Statistics: • Platform engineers earn $172K vs DevOps $152K (13% premium) • CKA appears in 45,000+ job postings globally • Average certification investment: $800-1,200/year • CKA pass rate: 66% (hands-on exam, production-relevant...
2025-12-03
25 min
Platform Engineering Playbook Podcast
Kubernetes AI Conformance - The End of AI Infrastructure Chaos
The Wild West of AI infrastructure just ended. CNCF launched the Certified Kubernetes AI Conformance Program at KubeCon Atlanta on November 11, 2025. In this episode, Jordan and Alex break down: 🎯 The Problem AI Teams Faced: • GPU scheduling worked differently on GKE vs EKS vs OpenShift • Training on one platform, deploying on another = rewriting code • GPU utilization stuck at 45-60% without standardization • 82% of organizations building custom AI, 58% using Kubernetes ⚡ The 5 Core Certification Requirements: • Dynamic Resource Allocation (DRA) - request GPUs with specific VRAM, interconnect requirements • Intelligent Autoscaling - cluster and pod scaling b...
2025-12-02
14 min
Platform Engineering Playbook Podcast
Helm 4 - The Definitive Guide to the Biggest Update in 6 Years
Helm 4.0 dropped at KubeCon Atlanta 2025, marking the biggest update in 6 years. Server-Side Apply finally ends the GitOps ownership wars. WASM plugins bring sandboxed security. But what breaks? This is the definitive guide covering SSA deep-dive, migration timeline, and the full breaking changes analysis. In this episode: - Server-Side Apply (SSA) replaces three-way merge - field ownership tracked at API server level via managedFields - SSA delivers 40-60% faster deployments by reducing API calls (1 PATCH vs 2+ GET/PATCH per resource) - WASM plugins via Extism runtime are optional but recommended - existing Go binaries and...
2025-12-01
23 min
Platform Engineering Playbook Podcast
CNPE Certification Guide - The First Platform Engineering Credential
CNCF just launched the first-ever hands-on platform engineering certification at KubeCon Atlanta 2025. But with beta testers reporting 29% scores, is CNPE worth pursuing? In this episode, Jordan and Alex break down everything you need to know: 🎯 What CNPE Tests: • GitOps & Continuous Delivery (25%) • Platform APIs & Self-Service (25%) • Observability & Operations (20%) • Platform Architecture (15%) • Security & Policy Enforcement (15%) 📊 Career Impact: • Platform engineers earn $219K average (US) • 20% higher than DevOps engineers • Second most popular K8s role at 11.47% of job postings 🛤️ Three Certification Paths: • Traditional: CKA → CKS → CNPA → CNPE • Fast-track: CNPA → CNPE • Full co...
2025-11-30
14 min
Platform Engineering Playbook Podcast
10 Platform Engineering Anti-Patterns That Kill Developer Productivity
DORA 2024 found organizations with platform teams saw throughput decrease by 8% and stability decrease by 14%. Wait—isn't platform engineering supposed to help? In this episode, Jordan and Alex unpack the 10 anti-patterns sabotaging platform engineering initiatives: ORGANIZATIONAL ANTI-PATTERNS: 1. Ticket Ops - The bottleneck factory where developers wait a week for tasks that should take minutes 2. Ivory Tower Platform - Teams disconnected from developer reality creating standards no one follows 3. Platform as Bucket - When platform scope grows 3x without corresponding team growth 4. Mandatory Adoption - Forcing usage hides resistance and breeds resentment ...
2025-11-29
12 min
Platform Engineering Playbook Podcast
Black Friday War Stories: Lessons from E-Commerce's Worst Days
Why do major retailers with unlimited budgets still crash on Black Friday? This episode dives into the graveyard of e-commerce outages—from J.Crew's $775,000 five-hour crash to the AWS typo that cost $150 million. In this Black Friday special episode, we examine: 📊 THE HALL OF FAME CRASHES • J.Crew 2018: 323,000 shoppers affected, $775,000 lost in 5 hours • Walmart 2018: $9 million lost before Black Friday even started • Best Buy 2014: Infrastructure optimized for desktop, got 78% mobile • Cloudflare 2024: 99.3% of Shopify stores frozen (6M+ domains) 💥 THE FAMOUS NON-BLACK-FRIDAY DISASTERS • AWS S3 2017: One typo took down half the internet for 4+ ho...
2025-11-28
11 min
Platform Engineering Playbook Podcast
Giving Thanks to Your Dependencies: A Platform Engineer's Gratitude Guide
This Thanksgiving, let's talk about the people you've never thanked. 60% of open source maintainers are unpaid. 60% have left or considered leaving. Your infrastructure runs on their free time. In this episode: - Gratitude tools: npx thanks, npm fund, cargo-thanks, thanks-stars - Happiness Packets: Send anonymous thank-you notes to developers - Beyond stars: Why specific use case emails matter more than generic thanks - Company-level: Open Source Pledge ($2K/dev/year), GitHub Sponsors - Your 5-minute Thanksgiving challenge Perfect for platform engineers, developers, and engineering leadership who want to support the...
2025-11-27
08 min
Platform Engineering Playbook Podcast
KubeCon Atlanta 2025 Part 3: Community at 10 Years - The Sustainability Question
CNCF celebrates 10 years with 300,000 contributors and 230+ projects—but the hallway track told a different story. 60% of maintainers unpaid. 60% have left or considered leaving. The XZ Utils backdoor showed what happens when isolated maintainers burn out. Han Kang's passing reminds us of the human cost behind the code. In this episode: - Technical breakout sessions: CiliumCon (TikTok IPv6, 60K node clusters), in-toto graduation, Gateway API convergence, OpenTelemetry eBPF - Open Source Pledge: Antithesis $110K, Convex $100K - real cash to maintainers - Kat Cosgrove survival strategies: "When you're an open source maintainer, you don't get to...
2025-11-26
14 min
Platform Engineering Playbook Podcast
KubeCon Atlanta 2025 Part 2: Platform Engineering Consensus and Community Reality Check
After years of "what even IS platform engineering" debates, KubeCon 2025 delivered consensus: three non-negotiable principles, real-world adoption at Intuit/Bloomberg/ByteDance scale, and the honest truth about maintainer burnout. Cat Cosgrove's "ready to abandon ship" quote reveals the human cost of building the infrastructure everyone depends on. In this episode: - Three platform principles emerged: API-first self-service, business relevance (not just tech metrics), and managed service approach (not templates) - The "puppy for Christmas" anti-pattern explains 70% platform team failure rate - templates without ongoing operational support - Intuit migrated Mailchimp's 11M users and 700M...
2025-11-25
17 min
Platform Engineering Playbook Podcast
KubeCon Atlanta 2025 Part 1: AI Goes Native and the 30K Core Lesson
Google donates a GPU driver live on stage. OpenAI saves $2.16M/month with one line of code. Kubernetes rollback finally works after 10 years. What changed at KubeCon Atlanta 2025 that proves Kubernetes isn't adapting to AI—it's being rebuilt for it? This is Part 1 of our three-part deep dive into KubeCon Atlanta 2025 (November 12-21). Over three episodes, we're covering the CNCF's 10-year anniversary, the announcements reshaping platform engineering, and the honest conversations about ecosystem sustainability. Key Topics Covered: • Dynamic Resource Allocation (DRA) reaches GA in Kubernetes 1.34 - prevents 10-40% GPU performance loss from NUMA misalignment ($200K/da...
2025-11-24
19 min
Platform Engineering Playbook Podcast
The $4,350/Month GPU Waste Problem: How Kubernetes Architecture Creates Massive Cost Inefficiency
Your H100 costs $5,000 per month, but you're only using it at 13% capacity—wasting $4,350 monthly per GPU. Analysis of 4,000+ Kubernetes clusters reveals 60-70% of GPU budgets burn on idle resources because Kubernetes treats GPUs as atomic, non-shareable resources. Discover why this architectural decision creates massive waste, and the five-layer optimization framework (MIG, time-slicing, VPA, Spot, regional arbitrage) that recovers 75-93% of lost capacity in 90 days. 🔗 Full episode page: https://platformengineeringplaybook.com/podcasts/00034-kubernetes-gpu-cost-waste-finops 📝 See a mistake or have insights to add? This podcast is community-driven - open a PR on GitHub! Keywords: kubernetes gpu, gpu cost...
2025-11-23
27 min
Platform Engineering Playbook Podcast
Service Mesh Showdown: Why User-Space Beat eBPF
Kernel-level eBPF should beat user-space proxies—but Istio Ambient delivers 8% mTLS overhead while Cilium shows 99%. Academic benchmarks reveal why architecture boundaries matter more than execution location, and what that means for your service mesh choice in 2025. In this episode: - Istio Ambient (user-space) achieves 8% mTLS overhead vs Cilium (kernel eBPF) at 99%—counterintuitive result explained by L7 processing boundaries requiring kernel/user-space transitions - 50,000-pod stability test shows Cilium's distributed control plane crashed API server under churn while Istio's centralized control handled it—20% per-core efficiency, 56% total throughput advantage - Decision framework: Ambient for 1,000+ nodes with mixed...
2025-11-22
20 min
Platform Engineering Playbook Podcast
The Terraform vs OpenTofu Debate - Why "Just Switch" Is Bad Advice
HashiCorp's license change and IBM's $6.4B acquisition created the "you must migrate" narrative—but 70% of teams using Terraform in-house aren't legally affected. Jordan and Alex challenge the binary thinking with Fidelity's 50,000 state file migration case study, a three-factor decision framework, and the truth nobody talks about: migration is 90% organizational change management, not technology. In this episode: - 70% of teams using Terraform in-house are unaffected by BSL license restrictions, yet face strategic vendor lock-in risk with IBM's $6.4B acquisition - Fidelity migrated 50,000 state files managing 4M resources in 2 quarters—technical migration is trivial, organizational change management is t...
2025-11-21
17 min
Platform Engineering Playbook Podcast
Agentic DevOps: GitHub Agent HQ and the Autonomous Pipeline Revolution
GitHub Universe 2025 announced Agent HQ—mission control for orchestrating AI agents from OpenAI, Anthropic, Google, and more. Azure SRE Agent saved Microsoft 20,000+ engineering hours. But 80% of companies report agents executing unintended actions, and only 44% have agent-specific security policies. Jordan and Alex break down what agentic DevOps actually means, the architectural shift from automation to autonomy, and the tiered adoption framework for deploying agents without creating catastrophic risk. In this episode: - GitHub Agent HQ enables multi-agent orchestration (OpenAI, Anthropic, Google, Cognition) with Enterprise Control Plane for governance - Copilot coding agent works asynchronously—spins up GitH...
2025-11-20
18 min
Platform Engineering Playbook Podcast
Cloudflare Outage November 2025: When a Rust Panic Took Down 20% of the Internet
A routine database permissions change triggered Cloudflare's worst outage since 2019—taking down ChatGPT, X, Shopify, Discord, and 20% of the internet for nearly 6 hours. Jordan and Alex dissect the technical chain reaction from ClickHouse metadata exposure to a Rust panic in the FL2 proxy, examining how ~60 features became >200 and exceeded a hardcoded memory limit. The third major cloud outage in 30 days—after AWS and Azure—raises critical questions about infrastructure concentration risk and why internal configuration needs the same defensive programming as external input. Perfect for senior platform engineers, SREs, DevOps engineers with 5+ years experience looking to level up the...
2025-11-19
12 min
Platform Engineering Playbook Podcast
Ingress NGINX Retirement: The March 2026 Migration Deadline
The de facto standard Kubernetes ingress controller will stop receiving security patches in March 2026—and only 1-2 people have been maintaining it for years. Jordan and Alex unpack why this happened, examine the security implications of unpatched CVEs on internet-facing infrastructure, and provide a four-phase migration framework to Gateway API. Includes controller comparison (Envoy Gateway, Cilium, Kong, Traefik, NGINX Gateway Fabric) and immediate actions for this week. Perfect for senior platform engineers, SREs, DevOps engineers with 5+ years experience looking to level up their platform engineering skills. Episode URL: https://platformengineeringplaybook.io/podcasts/00029-ingress-nginx-retirement
2025-11-19
12 min
Platform Engineering Playbook Podcast
OpenTelemetry eBPF Instrumentation: Zero-Code Observability Under 2% Overhead
What if you could achieve complete observability coverage—every HTTP request, database query, and gRPC call—without touching application code? Jordan and Alex investigate eBPF instrumentation for OpenTelemetry, revealing how kernel-level hooks deliver under 2% CPU overhead versus traditional APM agents' 10-50%. Discover the May 2025 inflection point, the TLS encryption challenge, and a practical framework for combining eBPF with SDK instrumentation. In this episode: - eBPF instrumentation achieves under 2% CPU overhead by observing kernel operations already happening—versus 10-50% for traditional APM agents - Grafana donated Beyla to OpenTelemetry in May 2025, making eBPF instrumentation part of the co...
2025-11-18
14 min
Platform Engineering Playbook Podcast
The Open Source Observability Showdown: When "Free" Costs $12K/Month
Prometheus is free, Grafana is free, Loki is free—yet Datadog posted $2.3B in revenue and Shopify runs a 15-person team just to manage their observability stack. We decode which open source tools (Prometheus, Loki, Tempo, VictoriaMetrics) actually deliver on their promises, which hide massive operational complexity, and when the "free" option costs more than paying a vendor. Learn the decision framework that matches observability architecture to your team's operational maturity. In this episode: - Single-cluster Prometheus costs ~5 hrs/month ($750-1500 equivalent), but multi-cluster federation jumps to 40-80 hrs/month ($6K-12K)—know your tier before comm...
2025-11-17
19 min
Platform Engineering Playbook Podcast
The Kubernetes Complexity Backlash: When Simpler Infrastructure Wins
Kubernetes commands 92% market share, yet 88% report year-over-year cost increases and 25% plan to shrink deployments. We unpack the 3-5x cost underestimation problem, the cargo cult adoption pattern, and when alternatives like Docker Swarm, Nomad, ECS, or PaaS platforms deliver better ROI. From the 200-node rule to 37signals' $10M+ five-year savings leaving AWS, this is your data-driven framework for right-sizing infrastructure decisions in 2025. 🔗 Full episode page: https://platformengineeringplaybook.com/podcasts/00026-kubernetes-complexity-backlash 📝 See a mistake or have insights to add? This podcast is community-driven - open a PR on GitHub! Summary: • 88% of Kubernetes adopters report y...
2025-11-16
15 min
Platform Engineering Playbook Podcast
SRE Reliability Principles: The 26% Problem - Error Budgets, SLOs, Platform Engineering
Only 26% of organizations actively use SLOs after a decade of Google's SRE principles being gospel. We explore why adoption is so low despite 49% saying they're more relevant than ever, which principles remain timeless (error budgets, embracing risk, blameless postmortems), and how to adapt SRE for 2025's complexity of AI/ML systems, Platform Engineering collaboration, and multi-cloud chaos. Includes practical playbooks for starting from zero, fixing ignored SLOs, and ML-specific adaptations. The key insight: it's not that SRE principles are wrong—implementation is harder than anticipated, but the philosophy remains timeless when properly adapted. In this episode: ...
2025-11-16
15 min
Platform Engineering Playbook Podcast
Internal Developer Portal Showdown 2025: Backstage vs Port vs Cortex vs OpsLevel
Your team spent 6 months implementing Backstage. Adoption? 8%. The CFO asks: "Why didn't we buy a solution?" Here's the 2025 comparison with real pricing, real timelines, and the counterintuitive truth: commercial platforms are 8-16x cheaper than "free" Backstage for most teams. OpsLevel $39/user/month delivers in 30-45 days. Port $78/month offers flexibility without coding. Cortex $65-69/month enforces standards. We break down the decision framework by team size—under 200? OpsLevel. 200-500? Port or OpsLevel. 500+? Backstage viable with dedicated platform team. The key insight: it's not open-source free vs commercial expensive—it's transparent licensing vs hidden $150K/20-developer engineering costs. In...
2025-11-14
24 min
Platform Engineering Playbook Podcast
DNS for Platform Engineering: The Silent Killer
Why does a forty-year-old protocol keep taking down billion-dollar infrastructure? The October 2025 AWS outage lasted fifteen hours because of a DNS race condition. Kubernetes defaults create 5x query amplification. We investigate how DNS really works in modern platforms—CoreDNS plugin chains, the ndots:5 trap, GSLB failover—and deliver the five-layer defensive playbook to prevent your platform from becoming the next postmortem. 🔗 Full episode page: https://platformengineeringplaybook.com/podcasts/00023-dns-platform-engineering 📝 See a mistake or have insights to add? This podcast is community-driven - open a PR on GitHub!
2025-11-13
19 min