Abstract:Blockchain technologies have gained more and more attention during the last few years. In general, blockchains are distributed ledgers in which the users do not fully trust each other. Embedded with consensus protocols and security mechanism, blockchain systems achieve several properties, such as immutability, and all the users agree on all the data records and histories of transactions. From the perspective of data management, blockchain is a distributed database, in which nodes agree with the orders of executions of all the transactions. Many works have been done to survey about the security and consensus problems for blockchains. This study aims to survey and analyze the techniques about data management for the blockchain systems. In traditional databases, it assumes that the nodes in the distributed database are trusted, and only the crash failure needs to be considered. On the other hand, as the blockchains consider the malicious nodes, it needs to consider Byzantine fault tolerance. These have brought new problems and challenges to the blockchains. Since blockchains and databases have similar architecture, many works have been done to translate the techniques from distributed databases to blockchains. Considering this, in this study, the techniques for the data management in blockchains are surveyed. Four aspects of management, including storage, transaction management, query processing, and blockchain scalability are focused on. The differences are compared and the benefits of the techniques in these areas are analyzed for blockchains.