Listen

Description

Brought to You By:

Statsig — ⁠ The unified platform for flags, analytics, experiments, and more.

Sonar – The makers of SonarQube, the industry standard for automated code review

WorkOS – Everything you need to make your app enterprise ready.

Amazon S3 is one of the largest distributed systems ever built, storing and serving data for a significant portion of the internet. Behind its simple interfaces hides an enormous amount of engineering work, careful tradeoffs, and long-term thinking.

In this episode, I sit down with Mai-Lan Tomsen Bukovec, VP of Data and Analytics at AWS, who has been running Amazon S3 for more than a decade. Mai-Lan shares how S3 operates at extreme scale, what it takes to design for durability and availability across millions of servers, and why building for failure is a core principle.

We also go deep into how AWS approaches correctness using formal methods, how storage tiers and limits shape system design, and why simplicity remains one of the hardest and most important goals at S3’s scale.

Timestamps

(00:00) Intro

(01:03) S3’s scale 

(03:58) How S3 started 

(07:25) Parquet, Iceberg, and S3 tables

(09:46) S3 for developers 

(13:37) Why AWS keeps S3 prices low 

(17:10) AWS pricing tiers

(19:38) Availability and durability 

(26:21) The cost of S3's consistency

(31:22) Automated reasoning and proof of correctness 

(35:14) Durability at AWS scale

(39:58) Correlated failure and crash consistency 

(43:22) Failure allowances 

(46:04) Two opposing principles in S3 design

(49:09) S3’s evolution 

(52:21) S3 Vectors 

(1:01:16) The 50 TB limit on AWS

(1:07:54) The simplicity principle

(1:10:10) Types of engineers working on S3

(1:14:15) Closing recommendations 

The Pragmatic Engineer deepdives relevant for this episode:

Inside Amazon’s engineering culture

How AWS deals with a major outage

A Day in the Life of a Senior Manager at Amazon

What is a Principal Engineer at Amazon? – with Steve Huynh

Working at Amazon as a software engineer – with Dave Anderson

Amazon papers recommended by Mai-Lan:

Using lightweight formal methods to validate a key-value storage node in Amazon S3

Formally verified cloud-scale authorization

Analyzing metastable failures

Amazon’s engineering tenets

Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email podcast@pragmaticengineer.com.



Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe