分布式异构数据库数据同步工具
作者:
作者简介:

徐梓荐(1994-),男,硕士,主要研究领域为关系型数据库,数据同步;叶盛(1993-),男,硕士,主要研究领域为数据迁移,数据同步;张孝(1972-),男,博士,副教授,CCF高级会员,主要研究领域为数据库,大数据.

通讯作者:

张孝,E-mail:zhangxiao@ruc.edu.cn

基金项目:

国家重点研发计划(2018YFB1004401);国家自然科学基金(61732014);北京市科技计划(Z171100005117002)


Data Synchronization Tool for Distributed Heterogeneous Database
Author:
Fund Project:

National Key Research and Development Program of China (2018YFB1004401); National Natural Science Foundation of China (61732014); Beijing Municipal Science and Technology Project (Z171100005117002)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [26]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    一般而言,读写分离技术可以解决当前大数据环境下的读写速度失配的部分问题,但是现有的读写分离技术主要是针对同构数据库的解决方案.由于存储结构的不一致,由行式存储数据库和列式存储数据库构成的异构分布式数据库系统相较于同构分布式数据库系统在数据同步的过程中就会面临格式转换、同步速度不匹配等诸多难题.提出了基于MySQL二进制日志(Binlog)进行SQL还原的方法TD-Reduction,设计并实现了Binlog解析器BinParser和Binlog还原器BinReducer,它们可以基于Mixed格式的Binlog,针对不同的事件(event)进行日志的解析,并依据相应的规则进行还原,生成可执行的SQL语句.综合以上技术,实现了一款分布式数据库数据同步工具Cynomys.在实验环境中,Cynomys表现出较好的性能.该方法适用于所有具有类似Binlog机制的其他异构数据库之间进行数据同步.

    Abstract:

    In general, the read-write separation technology can solve some of the problems on mismatch between read and write in the current big data environment, but most of the current read-write separation technology are prepared for homogeneous database. Due to the inconsistent storage structure, heterogeneous distributed database systems composed of a row storage database and a columnar storage database will face many difficulties like format conversion and mismatch of synchronization speed in data synchronization compared to a homogeneous distributed database system. This study proposes the use of MySQL binary log to perform the TD-Reduction of SQL. It designs and implements Binlog parser BinParser and Binlog restorer BinReducer, which based on the mixed format. Different events perform log parsing and restoring according to the corresponding rules to generate executable SQL statements. Based on the above techniques, this study has implemented Cynomys, a distributed database data synchronization tool. In the experimental environment, Cynomys has shown sound performance. The method is suitable for data synchronization between all other heterogeneous databases with a similar mechanism like Binlog.

    参考文献
    [1] Stonebraker M, Aoki PM, Litwin W, Pfeffer A, Sah A, Sidell J, et al. Mariposa:A wide-area distributed database system. VLDB Journal, 1996,5(1):48-63.
    [2] Chen K, Zhou Y, Cao Y. Online data partitioning in distributed database systems. In:Proc. of the Int'l Conf. on Extending Database Technology (EDBT 2015). 2015. 1-12.
    [3] Corbett JC, Dean J, Epstein M, et al. Spanner:Google's globally-distributed database. In:Proc. of the Usenix Conf. on Operating Systems Design and Implementation, Vol.31. 2012. 251-264.
    [4] Wang J, Zhang DS. Research and design of distributed database synchronization system based on middleware. In:Proc. of the Modern Electronics Technique. 2016. 685-688.
    [5] Lahiri T, Chavan S, Colgan M, et al. Oracle database in-memory:A dual format in-memory database. In:Proc. of the IEEE, Int'l Conf. on Data Engineering. 2016. 1253-1258.
    [6] Mukherjee N, Chavan S, Colgan M, et al. Distributed architecture of oracle database in-memory. Proc. of the VLDB Endowment, 2015,8(12):1630-1641.
    [7] Mukherjee N, Kulkarni K, Jin H, et al. How does oracle database in-memory scale out? In:Proc. of the Int'l Joint Conf. on Software Technologies, Vol.1. 2015. 1-6.
    [8] Färber F, May N, Lehner W, et al. The sap hana database-An architecture overview. Bulletin of the Technical Committee on Data Engineering, 2012,35(1):28-33.
    [9] Wang Z. Research and implementation of load balancing algorithm for offline data migration[MS. Thesis]. Shenyang:Northeastrn University, 2015(in Chinese with English abstract).
    [10] Li GX, Liu S, Liu JC, et al. Research and application of data synchronization service platform based on achived logs. Electric Power Information and Communication Technology, 2010,8(2):31-35(in Chinese with English abstract).
    [11] Song FL. The research and implementation of massive data synchronization system for database based on log parser[MS. Thesis]. Guangzhou:South China University of Technology, 2016(in Chinese with English abstract).
    [12] Lin Y, Chen ZB. Implementation of synchronization system for distributed database. Computer Engineering and Design, 2010, 31(24):5278-5281(in Chinese with English abstract).
    [13] Zheng HM. Research and implementation of heterogeneous database synchronization technology based on SQL restore method. Computer Era, 2008(10):15-18(in Chinese with English abstract).
    [14] Prisco RD, Lampson B, Lynch N. Revisiting the Paxos algorithm. In:Proc. of the Int'l Workshop on Distributed Algorithms, Vol.243. 1997. 111-125.
    [15] Xu JX, Hou ZS. Notes on data-driven system approaches. Acta Automatica Sinica, 2009,35(6):668-675.
    [16] Boncz PA, Zukowski M, et al. MonetDB/X100:Hyper-pipelining query execution. In:Proc. of the Int'l Conf. on Innovation Database Research (CIDR), Vol.5. 2005. 225-237.
    [17] Bouchenak S, Hagimont D, Palma ND. Techniques for implementing efficient Java thread serialization. In:Proc. of the ACS/IEEE Int'l Conf. on Computer Systems and Applications, Vol.34. 2003. 355-393.
    [18] Zeng CH, Zhang JJ, Xiong SF. Design and implementation of P2P remote assistance system based on JXTA. Journal of Jiangxi University of Science and Technology, 2009,30(3):36-40(in Chinese with English abstract).
    [19] Gutierrez F. Messaging with Redis. Berkeley, Apress, 2017. 120-155.
    附中文参考文献:
    [9] 王智.负载均衡的离线数据迁移算法的研究与实现[硕士学位论文].沈阳:东北大学,2015.
    [10] 李功新,刘升,刘金长,等.基于归档日志的数据同步服务平台研究与应用.电力信息与通信技术,2010,8(2):31-35.
    [11] 宋芳利.基于日志解析的数据库海量数据同步系统的研究与实现[硕士学位论文].广州:华南理工大学,2016.
    [12] 林源,陈志泊.分布式异构数据库同步系统的研究与应用.计算机工程与设计,2010,31(24):5278-5281.
    [13] 郑海明.基于SQL还原法的异构数据库同步技术的研究与实现.计算机时代,2008(10):15-18.
    [18] 曾传璜,张晶晶,熊圣芬.基于JXTA的P2P远程协助系统的设计与实现.江西理工大学学报,2009,30(3):36-40.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

徐梓荐,叶盛,张孝.分布式异构数据库数据同步工具.软件学报,2019,30(3):684-699

复制
分享
文章指标
  • 点击次数:3952
  • 下载次数: 7006
  • HTML阅读次数: 3420
  • 引用次数: 0
历史
  • 收稿日期:2018-07-20
  • 最后修改日期:2018-09-20
  • 在线发布日期: 2019-03-06
文章二维码
您是第19765693位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号