In 2015, eleven years had passed since MapReduce was first published, and companies were still having data problems. Tomer started working on Dremio, a company that was in stealth for another two years. I interviewed Tomer two years ago, when he still could not say much about what Dremio was doing. We talked about Apache Drill, an open-source project related to what Dremio eventually built.
Earlier this year, two of Tomer’s colleagues Jacques Nadeau and Julien Le Dem came on to discuss columnar data storage and interoperability. What I took away from that conversation was that today, data within an average enterprise is accessible, but the different formats are a problem. Some data is in MySQL, some is in Amazon S3, some is in ElasticSearch, some is on HDFS stored in Parquet files. Different teams will set up different BI tools and charts that read from a specific silo of data.
At the lowest level, the different data formats are incompatible–you have to transform MySQL data in order to merge it with S3 data. On top of that, engineers doing data science work are using Spark, Pandas, and other tools that pull lots of data into memory–if the in-memory formats are not compatible, the data teams can’t get the most out of their work. On top of THAT, at the highest level, data analysts are working with different data analysis tools, so there is even more siloing.
Now I understand why Dremio took two years to bring to market.
They are trying to solve data interoperability by making it easy to transform data sets between different formats. They are trying to solve data access speed by creating a sophisticated caching system. And they are trying to improve the effectiveness of the data analysts by providing the right abstractions for someone who is not a software engineer to study the different data sets across an organization.
Dremio is an exciting project because it is rare to see a pure software company put so many years into up-front stealth product development. After talking to Tomer in this conversation, I’m looking forward to seeing Dremio come to market. It was fascinating to hear him talk about how data engineering has evolved to today.