In this episode, we explore two foundational papers that have reshaped the landscape of distributed storage systems: "Bigtable: A Distributed Storage System for Structured Data" by Google engineers and "Cassandra: A Decentralized Structured Storage System" by engineers at Facebook. These papers laid the groundwork for much of today’s cloud infrastructure, influencing systems like Google Cloud and Apache Cassandra.
1. "Bigtable: A Distributed Storage System for Structured Data"
In this landmark paper, Fay Chang, Jeffrey Dean, Sanjay Ghemawat, and colleagues introduce Bigtable, a highly scalable, distributed storage system designed to handle vast amounts of structured data across many machines. We’ll delve into Bigtable’s unique architecture, including its use of tablet-based sharding, distributed storage with Chubby lock service, and how it enables Google’s massive data-driven services like Search, Maps, and YouTube.
2. "Cassandra: A Decentralized Structured Storage System"
From the team at Facebook, Avinash Lakshman and Prashant Malik present Cassandra, a decentralized, fault-tolerant storage system designed to manage large-scale data across commodity hardware. Cassandra introduced key concepts like eventual consistency and peer-to-peer architecture, enabling massive scalability while maintaining high availability, even in the face of network partitions or node failures. We'll explore how Cassandra builds on the lessons of Bigtable while making different trade-offs, particularly with its write-heavy design and tunable consistency model.
In this episode, we’ll compare and contrast Bigtable and Cassandra, focusing on their core design philosophies, the challenges they solve in large-scale data storage, and the impact they’ve had on distributed systems and modern NoSQL databases. We'll also discuss how these systems influenced the design of the distributed databases we rely on today, including HBase, Google Cloud Bigtable, and Apache Cassandra.
If you’re a database engineer, architect, or anyone interested in the evolution of scalable data storage systems, this talk will provide a deep dive into two of the most important systems in the field of distributed computing.
Some or all of this content is AI generated and may contain some errors. Please use with caution.
References:
Bigtable: A Distributed Storage System for Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber
Cassandra - A Decentralized Structured Storage System
Avinash Lakshman Prashant Malik