Ben Sigelman began working on distributed tracing when he was at Google and authored the “Dapper” paper. Dapper was implemented at Google to help debug some of the distributed systems problems faced by the engineers who work on Google infrastructure.
Today, a decade after he started thinking about distributed tracing, Ben Sigelman is the CEO of Lightstep, a company that provides distributed tracing and other monitoring technologies.
Lightstep’s distributed tracing model still bears a resemblance to the same techniques described in the paper–so I was eager to learn the differences between open source versions of distributed tracing (such as OpenZipkin) and enterprise providers such as Lightstep.
The key feature of Lightstep that we discussed: garbage collection.
If you are using a distributed tracing system, you could be collecting a lot of traces. You could collect a trace for every single user request. Not all of these traces are useful–but some of them are very useful. Maybe you only want to keep track of traces that take an exceptionally long latency. Maybe you want to keep every trace in the last 5 days, and destroy them over time. So, the question of how to manage the storage footprint of those traces was as interesting as the discussion of distributed tracing itself.
Beyond the distributed tracing features of his product, Ben has a vision for how his company can provide other observability tools over time. I spoke to Ben at Kubecon–and although this conversation does not talk about Kubernetes specifically, this topic is undoubtedly interesting to people who are building Kubernetes technologies.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.