Listen

Description

In this episode, we dive into LSM-trees, the write-optimized data structure behind Cassandra, Bigtable, HBase, and RocksDB and explain how a design meant to make writes fast reshaped modern databases.

We compare LSM-trees to B-trees, unpack compaction and write amplification, explain why Bloom filters exist, and talk about the hidden costs that show up under real-world load. If you’ve ever tuned RocksDB or wondered why latency spikes appear out of nowhere, this episode will make those behaviors finally make sense.

References:

This episode draws primarily from the following papers:

Organization and maintenance of large ordered indicesby R. Bayer and E. McCreight


The Log-Structured Merge-Tree (LSM-Tree)by Patrick O'Neil1, Edward Cheng2Dieter Gawlick3, Elizabeth O'Neil1

 

The paper references several other important works in this field. Please refer to the full papers for acomprehensive list.

Disclaimer:

Please note that parts or all this episode was generatedby AI. While the content is intended to be accurate and informative, it isrecommended that you consult the original research papers for a comprehensiveunderstanding.