LazyStore: 基于混合存储架构的写优化键值存储系统
作者:
中图分类号:

TP311

基金项目:

国家重点研发计划(2022YFB2703100); 浙江省“尖兵”计划(2024C01021)


LazyStore: Write-optimized Key-value Storage System Based on Hybrid Storage Architecture
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [39]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    基于日志合并树(LSM-tree)的键值(key-value)存储由于其出色的读写性能而被广泛用于许多应用中. 大多数现有的日志合并树采用多层结构来存储数据. 尽管多层数据结构可以很好地服务于适度的写密集型应用, 但这种结构并不十分适合高写密集型应用. 这是因为以多层方式保存数据会引入写放大问题, 即新的数据插入会引发很大一部分已经存储在多层中的数据被重组的问题. 这种巨大的(有时是频繁的)数据重组是昂贵的, 并且在许多高写密集型的应用中降低了写入性能. 此外, 多层结构不能为热数据持续提供出色的读取性能. 这是因为多级结构不能通过及时合并重叠的范围来优化热数据的读取操作. 为了解决上述两个问题, 提出LazyStore, 一种基于混合存储架构的新型单层日志合并树. LazyStore通过将数据存储在单一逻辑层而不是多个逻辑层来解决写放大的问题. 因此, 昂贵的多级数据重组在很大程度上被消除. 为了进一步提高写入性能, LazyStore根据每个存储设备的容量和读/写性能, 将逻辑层中的数据分布到多个存储设备中, 如内存、非易失性内存和闪存. 此外, LazyStore引入实时合并操作, 以提高热数据范围的读取性能. 实验表明, 与其他多级日志合并树相比, LazyStore最多将写入性能提高3倍, 并将写入放大率降低至1/4. 而对于热门范围的读取, LazyStore的实时数据合并优化可以将范围查询处理的延迟降低一半.

    Abstract:

    Log-structured merge-tree (LSM-tree) based key-value storage is widely used in many applications due to its excellent read and write performance. Most existing LSM-trees utilize a multi-level structure to store data. Although the multi-level data structure can serve moderately write-intensive applications well, this structure is not well suited for highly write-intensive applications. This is because storing data in multi-levels introduces the write amplification problem, where new data insertion triggers the reorganization of a large portion of the data already stored in multiple levels. This huge (and sometimes frequent) data reorganization is expensive and degrades write performance in many highly write-intensive applications. In addition, the multi-level structure does not provide consistently excellent read performance for hot data. This is because the multi-level structure cannot optimize the read operation of hot data by merging overlapping ranges in a timely manner. To address the above two challenges, this study proposes LazyStore, a novel single-level LSM-tree based on a hybrid storage architecture. LazyStore solves the write amplification problem by storing data in a single logical level instead of multiple logical levels. As a result, expensive multi-level data reorganization is largely eliminated. To further improve write performance, LazyStore distributes data at the logical level to multiple storage devices, such as DRAM, NVM, and SSD, based on the capacity and read/write performance of each storage device. Furthermore, LazyStore introduces real-time merge operations to improve the read performance of hot data ranges. Experiments show that LazyStore improves write performance by 3 times and reduces write amplification by nearly 4 times compared to other multi-level LSM-trees. For hot range reads, LazyStore’s real-time data merge optimization can reduce the latency of range query processing by a factor of two.

    参考文献
    [1] Armstrong TG, Ponnekanti V, Borthakur D, Callaghan M. LinkBench: A database benchmark based on the Facebook social graph. In: Proc. of the 2013 ACM SIGMOD Int’l Conf. on Management of Data. New York: ACM, 2013. 1185–1196.
    [2] Li CJ, Chen HZ, Zhang S, Hu YQ, Chen C, Zhang ZJ, Li M, Li XC, Han DQ, Chen XH, Wang XD, Zhu HM, Fu XW, Wu TW, Tan HF, Ding HT, Liu MJ, Wang KC, Ye T, Li L, Li X, Wang Y, Zheng CG, Yang H, Cheng J. ByteGraph: A high-performance distributed graph database in ByteDance. Proc. of the VLDB Endowment, 2022, 15(12): 3306–3318.
    [3] Huang DX, Liu Q, Cui Q, Fang ZH, Ma XY, Xu F, Shen L, Tang L, Zhou YX, Huang ML, Wei W, Liu C, Zhang J, Li JJ, Wu XL, Song LY, Sun RX, Yu SP, Zhao L, Cameron N, Pei LQ, Tang X. TiDB: A Raft-based HTAP database. Proc. of the VLDB Endowment, 2020, 13(12): 3072–3084.
    [4] Corbett JC, Dean J, Epstein M, et al. Spanner: Google’s globally distributed database. ACM Trans. on Computer Systems, 2013, 31(3): 1–22.
    [5] Taft R, Sharif I, Matei A, et al. CockroachDB: The resilient geo-distributed SQL database. In: Proc. of the 2020 ACM SIGMOD Int’l Conf. on Management of Data. Portland: ACM, 2020. 1493–1509. [doi: 10.1145/3318464.3386134]
    [6] Apache Hbase. The Apache software foundation. 2023. https://hbase.apache.org/
    [7] Cao Z, Chen SM, Li FF, Wang M, Wang XS. LogKV: Exploiting key-value stores for event log processing. In: Proc. of the 6th Biennial Conf. on Innovative Data Systems Research. Asilomar: CIDR, 2013.
    [8] 刘睿诚, 张俊晨, 罗永平, 金培权. 面向非易失内存的异构索引. 软件学报, 2022, 33(3): 832–848. http://www.jos.org.cn/1000-9825/6456.htm
    Liu RC, Zhang JC, Luo YP, Jin PQ. Heterogeneous index for non-volatile memory. Ruan Jian Xue Bao/Journal of Software, 2022, 33(3): 832–848 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6456.htm
    [9] 张超, 李国良, 冯建华, 张金涛. HTAP数据库关键技术综述. 软件学报, 2023, 34(2): 761–785. http://www.jos.org.cn/1000-9825/6713.htm
    Zhang C, Li GL, Feng JH, Zhang JT. Survey of key techniques of HTAP databases. Ruan Jian Xue Bao/Journal of Software, 2023, 34(2): 761–785 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6713.htm
    [10] 沙行勉, 陈咸彰, 马殿龙, 诸葛晴凤. 基于非易失性内存的持久化嵌入式内存数据库. 软件学报, 2016, 27: 320–327. https://www.jos.org.cn/jos/article/abstract/16046
    Sha XM, Chen XZ, Ma DL, Zhuge QF. Design of persistent embedded main memory databases on non-volatile memory. Ruan Jian Xue Bao/Journal of Software, 2016, 27: 320–327 (in Chinese with English abstract). https://www.jos.org.cn/jos/article/abstract/16046
    [11] Comer D. Ubiquitous B-tree. ACM Computing Surveys, 1979, 11(2): 121–137.
    [12] O’Neil P, Cheng E, Gawlick D, O’Neil E. The log-structured merge-tree (LSM-tree). Acta Informatica, 1996, 33(4): 351–385.
    [13] Lu LY, Pillai TS, Gopalakrishnan H, Arpaci-Dusseau AC, Arpaci-Dusseau RH. WiscKey: Separating keys from values in SSD-conscious storage. ACM Trans. on Storage, 2017, 13(1): 5.
    [14] Raju P, Kadekodi R, Chidambaram V, Abraham I. PebblesDB: Building key-value stores using fragmented log-structured merge trees. In: Proc. of the 26th Symp. on Operating Systems Principles. Shanghai: ACM, 2017. 497–514. [doi: 10.1145/3132747.3132765]
    [15] Shavit N, Lotan I. SkipList-based concurrent priority queues. In: Proc. of the 14th Int’l Parallel and Distributed Processing Symp. Cancun: IEEE, 2000. 263–268. [doi: 10.1109/IPDPS.2000.845994]
    [16] Balmau O, Dinu F, Zwaenepoel W, Gupta K, Chandhiramoorthi R, Didona D. SILK: Preventing latency spikes in log-structured merge key-value stores. In: Proc. of the 2019 USENIX Annual Technical Conf. Renton: USENIX Association, 2019. 753–766.
    [17] Balmau O, Didona D, Guerraoui R, Zwaenepoel W, Yuan HP, Arora A, Gupta K, Konka P. TRIAD: Creating synergies between memory, disk and log in log structured key-value stores. In: Proc. of the 2017 USENIX Annual Technical Conf. Santa Clara: USENIX Association, 2017. 363–375.
    [18] Wu XB, Xu YH, Shao ZL, Jiang S. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In: Proc. of the 2015 USENIX Annual Technical Conf. Santa Clara: USENIX Association, 2015. 71–82.
    [19] 舒继武, 陆游游, 张佳程, 郑纬民. 基于非易失性存储器的存储系统技术研究进展. 科技导报, 2016, 34(14): 86–94.
    Shu JW, Lu YY, Zhang JC, Zheng WM. Research progress on non-volatile memory based storage system. Science & Technology Review, 2016, 34(14): 86–94 (in Chinese with English abstract).
    [20] Agrawal N, Prabhakaran V, Wobber T, Davis JD, Manasse M, Panigrahy R. Design tradeoffs for SSD performance. In: Proc. of the 2008 USENIX Annual Technical Conf. Boston: USENIX Association, 2008. 57–70.
    [21] Eisenman A, Gardner D, AbdelRahman I, Axboe J, Dong SY, Hazelwood K, Petersen C, Cidon A, Katti S. Reducing DRAM footprint with NVM in Facebook. In: Proc. of the 13th EuroSys Conf. Porto: ACM, 2018. 42. [doi: 10.1145/3190508.3190524]
    [22] 王海涛, 李战怀, 张晓, 赵晓南. 基于非易失性存储器的存储引擎性能优化. 集成技术, 2022, 11(3): 56–70.
    Wang HT, Li ZH, Zhang X, Zhao XN. Performance optimization of storage engine based on non-volatile memory. Journal of Integration Technology, 2022, 11(3): 56–70 (in Chinese with English abstract).
    [23] Kaiyrakhmet O, Lee S, Nam B, Noh SH, Choi YR. SLM-DB: Single-level key-value store with persistent memory. In: Proc. of the 17th USENIX Conf. on File and Storage Technologies. Boston: USENIX Association, 2019. 191–204.
    [24] Kannan S, Bhat N, Gavrilovska A, Arpaci-Dusseau A, Arpaci-Dusseau R. Redesigning LSMs for nonvolatile memory with NoveLSM. In: Proc. of the 2018 USENIX Annual Technical Conf. Boston: USENIX Association, 2018. 993–1005.
    [25] Yao T, Zhang YW, Wan JG, Cui Q, Tang L, Jiang H, Xie CS, He XB. MatrixKV: Reducing write stalls and write amplification in LSM-tree based KV stores with a matrix container in NVM. In: Proc. of the 2020 USENIX Annual Technical Conf. USENIX Association, 2020. 2.
    [26] 朱阅岸, 简怀兵, 龙永超, 李彬, 王树, 吴喜亮, 钟治初, 张延松. 构建新型高性能与高可用的键值数据库系统. 软件学报, 2021, 32(10): 3203–3218. http://www.jos.org.cn/1000-9825/6023.htm
    Zhu YA, Jian HB, Long YC, Li B, Wang S, Wu XL, Zhong ZC, Zhang YS. Building new key-value store with high performance and high availability. Ruan Jian Xue Bao/Journal of Software, 2021, 32(10): 3203–3218 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6023.htm
    [27] Bayer R, McCreight E. Organization and maintenance of large ordered indices. In: Proc. of the 1970 ACM SIGFIDET Workshop on Data Description, Access and Control. New York: ACM, 1970. 107–141.
    [28] Kim BK, Lee DH. LSB-Tree: A log-structured B-tree index structure for NAND flash SSDs. Design Automation for Embedded Systems, 2015, 19(1–2): 77–100.
    [29] 周信静. SilkStore: 写优化与工作负载自适应的键值存储系统 [硕士学位论文]. 杭州: 浙江大学, 2020.
    Zhou XJ. SilkStore: Write-optimized and workload adaptive key-value store [MS. Thesis]. Hangzhou: Zhejiang University, 2020 (in Chinese with English abstract).
    [30] Liu HK, Chen D, Jin H, Liao XF, He BS, Hu K, Zhang Y. A survey of non-volatile main memory technologies: State-of-the-arts, practices, and future directions. Journal of Computer Science and Technology, 2021, 36(1): 4–32.
    [31] Huang KS, Imai D, Wang TZ, Xie D. SSDs striking back: The storage jungle and its implications on persistent indexes. In: Proc. of the 12th Conf. on Innovative Data Systems Research. Chaminade: CIDR, 2022. 9–12.
    [32] Dong SY, Kryczka A, Jin YQ, Stumm M. RocksDB: Evolution of development priorities in a key-value store serving large-scale applications. ACM Trans. on Storage, 2021, 17(4): 26.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

杜云箫,陈珂,寿黎但,江大伟,骆歆远,陈刚. LazyStore: 基于混合存储架构的写优化键值存储系统.软件学报,2025,36(2):805-829

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-04-12
  • 最后修改日期:2023-06-04
  • 在线发布日期: 2024-08-28
文章二维码
您是第19982941位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号