Abstract:Hybrid transactional analytical processing (HTAP) relies on a single system to process the mixed workloads of transactions and analytical queries simultaneously. It not only eliminates the extract-transform-load (ETL) process, but also enables real-time data analysis. Nevertheless, in order to process the mixed workloads of OLTP and OLAP, such systems must balance the trade-off between workload isolation and data freshness. This is mainly because of the interference of highly-concurrent short-lived OLTP workloads and bandwidth-intensive, long-running OLAP workloads. Most existing HTAP databases leverage the best of row store and column store to support HTAP. As there are different requirements for different HTAP applications, HTAP databases have disparate storage strategies and processing techniques. This study comprehensively surveys the HTAP databases. The taxonomy of state-of-the-art HTAP databases is introduced according to their storage strategies and architectures. Then, their pros and cons are summarized and compared. Different from previous works that focus on single-model and spark-based loosely-coupled HTAP systems, real-time HTAP databases with a row-column dual store are focused on. Moreover, a deep dive into their key techniques is accomplished regarding data organization, data synchronization, query optimization, and resource scheduling. The existing HTAP benchmarks are also introduced. Finally, the research challenges and open problems are discussed for HTAP.