Listen

Description

When a user interacts with an application to order a ride with a ridesharing app, the data for that user interaction is written to a “transactional” database. A transactional database is a database where specific rows need to be written to and read from quickly and consistently.
Speed and consistency are important for applications like a user ordering a car, and riding around in that car, because the user’s client is frequently communicating with the database to update their session. Other applications of a transactional database would include a database that backs a messaging system, a banking application, or document editing software.
The data from a transactional database is often reused in “analytic” databases. An analytic database can be used for performing large scale analysis, aggregations, averages, and other data science queries.
The requirements for an analytic database are different from a transactional database because the data is not being used for an active user session. To fill the data in an analytic database, the transactional data gets copied from the transactional database in a process called ETL.
The separation of the transaction data store from the analytic data store causes problems for data engineering. To address these problems, some newer databases combine transactional and analytic functionality in the same database. These databases are often called “NewSQL”.
TiDB is an open source database built on RocksDB and Kubernetes. TiDB is widely used in China by high volume applications such as bike sharing and massively multiplayer online games. Kevin Xu works at PingCAP, a company built around TiDB. He joins the show to talk about modern databases, distributed systems, and the architecture for TiDB.