Volume 36,Issue 3,2025 Table of Contents

  • Display Type:
  • Text List
  • Abstract List
  • 1  Research on the Progress in Blockchain Sharding
    TANG Hai-Bo ZHANG Huan ZHANG Zhao JIN Che-Qing ZHOU Ao-Ying
    2025, 36(3):1-25. DOI: 10.13328/j.cnki.jos.007276
    [Abstract](574) [HTML](0) [PDF 1.55 M](1195)
    Abstract:
    Cloud-native databases leverage cloud infrastructure to provide highly available and elastically scalable data management, and they have experienced rapid development in recent years. Blockchain is a transparent, tamper-resistant, and traceable database system, with sharding being the most direct and promising approach to scale, it can also achieve elastic scalability by utilizing cloud infrastructure. This paper first summarizes the three key technical challenges that need to be addressed in blockchain sharding: ensuring the security of node partitioning, on-chain data sharding and cross-shard transaction processing. It reviews the current state of research on these issues and introduces and compares the corresponding solutions, this paper also discusses the new challenges these solutions face in cloud-native environments. Then, a more comprehensive analysis and comparison of all solutions are conducted from the perspective of the whole blockchain system. Finally, the paper analyzes the development trends in blockchain sharding technology and presented several research directions that deserve further exploration.
    2  Short Time Series Group Compression and Merging Methods in Apache TsFile
    LIU Xing-Yu SONG Shao-Xu HUANG Xiang-Dong WANG Jian-Min
    2025, 36(3):1-21. DOI: 10.13328/j.cnki.jos.007277
    [Abstract](128) [HTML](0) [PDF 2.36 M](488)
    Abstract:
    Time series data are widely used in industrial manufacturing, meteorology, electricity, vehicles, and other fields, which has promoted the development of time series database management systems. More and more database systems are migrating to the cloud, and the architecture of end-cloud collaboration is becoming more common, leading to increasingly large data scales to be processed. In scenarios such as end-cloud collaboration and massive time series, a large number of short time series are generated due to short synchronization cycles, frequent data flushing, and other reasons, posing new challenges to database systems. Effective data management and compression methods can significantly improve storage performance, enabling database systems to handle the storage of massive time series. Apache TsFile is a columnar storage file format designed for time series scenarios, playing an important role in database management systems such as Apache IoTDB. This paper describes the group compression and merging methods used in Apache TsFile to address the scenario of a large number of short time series, especially in applications with a large number of time series such as industrial Internet of Things. This group compression method fully considers the data characteristics in the short time series scenario, improves the utilization of metadata through device grouping, reduces file index size, reduces short time series, and significantly improves compression efficiency. After validation with real-world datasets, our grouping method shows significant improvements in compression efficiency, reading, writing, file merging, and other aspects, enabling better management of TsFiles in scenarios with short time series.
    3  FBO: The Cloud Database Knob-tuning Method Based on Federated learning
    YAN Yu DAI Zhi-Yu Lü Ze-Kai WANG Hong-Zhi
    2025, 36(3):1-22. DOI: 10.13328/j.cnki.jos.007278
    [Abstract](108) [HTML](0) [PDF 1.96 M](483)
    Abstract:
    In recent years, with the development of software and hardware, the cloud database has become an emerging development trend and can reduce database operation and maintenance costs for small and medium-sized enterprises and individual users. Furthermore, the development of cloud databases has brought huge market demands for database tunings. Researchers have proposed many database self-tuning technologies to support automatic optimization of database knobs. To improve tuning efficiency, existing technologies have shifted from focusing solely on the tuning problem itself to focusing on how to reuse historical experience to find the optimal parameter configuration for the current database instance. However, with the development of cloud databases, users have gradually increased their requirements for privacy protection, hoping to avoid privacy leaks while having efficient data access efficiency. Existing methods do not consider protecting the privacy of users' historical tuning experience, which may cause user load characteristics to be perceived, causing economic losses. This article analyzes the characteristics of cloud database tuning tasks in detail, organically combines the server and the user, and proposes a cloud database knob-tuning technology based on federated learning. First, in order to solve the problem of data heterogeneity in federated learning, this paper proposes an experience screening method based on meta-feature matching to eliminate historical experiences with large differences in data distribution in advance to improve the efficiency of federated learning. In order to protect user privacy, this paper organically combines the characteristics of cloud database services and proposes a federated Bayesian optimization algorithm with the node as the training center. We utilize random Fourier features to complete the protection without distorting the tuning experience. The evaluation results in extensive benchmarks present that our method could achieve competitive tuning performance compared with the existing centralized tuning methods.
    4  PG-RAC: PostgreSQL-based Database with Shared Cache for Multi-write Transaction
    YIN Yu-Jie SHI Hao-Yang FAN Zi-Hao ZHOU Hua-Hui LIU Sheng-Chi HU Hui-Qi WEI Xing CHEN He-Dui TU Yao-Feng CAI Peng ZHOU Xuan
    2025, 36(3):1-19. DOI: 10.13328/j.cnki.jos.007279
    [Abstract](131) [HTML](0) [PDF 12.40 M](497)
    Abstract:
    Single-master multi-slave is the mainstream architecture of cloud-native databases. In the cluster, slave nodes can share the read-only requests of the master node, while write requests are handled by the master node. Based on this, to further meet the demands of large-scale transaction expansion, some cloud databases attempt to implement multi-write transaction expansion. One possible approach to multi-write expansion is to introduce shared cache among computing nodes to support cross-node data access. For shared-cache database systems, the overhead of cross-node remote access is significantly higher than that of local access. Therefore, the design of cache protocol is a crucial factor that affects system performance and scalability. This study proposes two innovative improvements to the coherence protocol and implements PG-RAC, a shared-cache database, which supports multi-write transactions based on PostgreSQL. On one hand, PG-RAC proposes a new distributed chained routing strategy, which disperses routing information among computing nodes. Compared to the routing strategy that utilizes single-node directory management, it reduces the average transaction latency by approximately 20%. On the other hand, this study also enhances the duplicate page invalidation mechanism by separating invalidation operations from the transaction path, reducing the latency of the critical path in the transaction. Based on this, PG-RAC takes advantage of the characteristics of multi-version concurrency control (MVCC) and further proposes to delay the invalidation point of duplicate pages, which effectively improves cache utilization. TPC-C experimental results show that for a cluster with 4 compute nodes, the throughput is nearly 2 times that of PostgreSQL and 1.5 times that of the distributed database Citus.
    5  Elastic Scaling Method for Multi-tenant Databases Based on the Hybrid Workload Prediction Model
    XU Hai-Yang LIU Hai-Long CHEN Xian WANG Lei JIN Ke HOU Shu-Feng LI Zhan-Huai
    2025, 36(3):1-16. DOI: 10.13328/j.cnki.jos.007280
    [Abstract](96) [HTML](0) [PDF 1.72 M](500)
    Abstract:
    One of the important features of multi-tenant databases in cloud environments is scalability. However, most elastic scaling techniques struggle to make effective scaling decisions for complex workload variations. If workload changes can be predicted in advance, resources can be accurately adjusted. In this paper, we propose a memory load-based elastic scaling method for multi-tenant databases, including a cluster-level load prediction model and an elastic scaling strategy. The load prediction model integrates the advantages of convolutional neural networks, long short-term memory networks, and gated recurrent units to accurately forecast the memory load requirements of the database cluster. The elastic scaling strategy, based on the demand prediction results, precisely adjusts the number of virtual machines to ensure that resource provisioning remains within a reasonable range. Compared to existing methods, this model reduces prediction errors by 8.7% to 21.8% and improves prediction fitting by 4.6%. Additionally, this paper improves the Bayesian optimization algorithm for hyperparameter tuning of this model. It addresses the issue of poor performance of Bayesian optimization in combined domains of discrete and continuous solutions, further reducing errors by 7.6% and improving fitting by 1.04%. Experimental results indicate that compared to the most widely used scaling strategy in Kubernetes, the elastic scaling method proposed in this paper avoids the latency and resource waste associated with elastic scaling. Response time is reduced by 8.12%, latency by 9.56%.
    6  Deterministic Concurrency Control based Multi-Write Transaction Processing over Cloud-native Databases
    HONG Yin-Hao ZHAO Hong-Yao WANG Yin-Lin SHI Xin-Yue LU Wei YANG Shang DU Sheng
    2025, 36(3):1-29. DOI: 10.13328/j.cnki.jos.007281
    [Abstract](116) [HTML](0) [PDF 3.11 M](504)
    Abstract:
    Cloud-native databases have emerged as a hot topic in the field of database development in the era of cloud computing, thanks to their advantages such as out-of-the-box functionality, elastic scalability, and pay-as-you-go pricing. However, mainstream cloud-native databases only support a single master node to execute write transactions. This limitation hampers the system's ability to handle write-intensive workloads, making it difficult to meet the demands of businesses with high write requirements. To address this issue, this paper proposes the D3C (deterministic concurrency control cloud database) architecture, which achieves cloud-native multi-writer capabilities by designing a transaction processing mechanism based on deterministic concurrency control. D3C splits transactions into sub-transactions and independently executes them on various nodes according to a pre-defined global order, ensuring serializable isolation for transaction execution on multiple read-write nodes. Additionally, this paper introduces mechanisms such as asynchronous batch data persistence mechanisms based on multi-version to ensure the performance of multi-writer transaction processing, and proposes a consistency point-based fault recovery mechanism to achieve high availability. Experiments have shown that D3C can achieve 5.1 times the throughput of a traditional single-master architecture in write-intensive scenarios, while meeting the key requirements of cloud-native databases.
    7  Distributed Database Diagnosis for Compound Anomalies
    XIANG Qing-Feng SHAO Ying-Xia XU Quan-Qing YANG Chuan-Hui
    2025, 36(3):1-18. DOI: 10.13328/j.cnki.jos.007282
    [Abstract](118) [HTML](0) [PDF 980.46 K](637)
    Abstract:
    Databases are foundational components in computer services, however, performance anomalies can damage service quality. How to diagnose performance anomalies in databases has become a hot problem in industry and academia. Recently, a series of automated anomaly diagnosis methods have been proposed. They analyze the runtime status of the database and find the most likely anomalies. However, with the expansion of data scale, distributed databases are becoming increasingly popular in enterprises. In a distributed database, which is composed of multiple nodes, existing anomaly diagnosis methods struggle to effectively locate anomalies that can occur on nodes, and fail to identify compound anomalies across multiple nodes, resulting in insufficient diagnostic capabilities. To address these challenges, we propose an anomaly diagnosis method for compound anomalies in distributed databases, DistDiagnosis. It models the anomalous state of distributed databases using a Compound Anomaly Graph, which not only represents anomalies at each node but also captures the correlations between nodes. DistDiagnosis introduces a correlation-aware root cause ranking method, locating root cause anomalies based on the relation of nodes. In this work, we construct anomaly testing cases for different scenarios on the domestically developed distributed database OceanBase. The experimental results show that DistDiagnosis outperforms other SOTA baselines, achieving the AC@1, AC@3, and AC@5 values of 0.97, 0.98, and 0.98. Compared to the second-best method, DistDiagnosis improves accuracy by up to 5.20%, 5.45%, and 4.46%, respectively.
    8  A Secure Multi-Party Database Computing System Based on Serverless Computing
    MA Xü-Yang ZHOU Xiao-Kai ZHENG Hao-Yu CUI Bin XU Quan-Qing YANG Chuan-Hui YAN Xiao JIANG Jia-Wei
    2025, 36(3):1-22. DOI: 10.13328/j.cnki.jos.007283
    [Abstract](196) [HTML](0) [PDF 3.25 M](545)
    Abstract:
    Secure computation for federated multi-party databases enables federated querying or federated modeling tasks on private data from multiple databases while preserving data privacy. Such a federation is typically a loosely organized group where the participating databases can dropout at will. However, existing multi-party secure computation systems usually employ privacy-preserving computation schemes such as secret sharing, which require the participants to remain online, resulting in poor system availability. Moreover, the existing system can not predict the number of users and the request speed when providing services to the outside. If these systems are deployed on a private cluster or rented virtual machines from a cloud computing platform, it will experience increased latency during sudden bursts of requests and resource wastage when the request workload is low, leading to poor scalability. With the advancement of cloud computing technology, serverless computing has emerged as a new cloud-native deployment paradigm that offers elastic resource scaling. In this work, we design a system architecture and an indirect communication scheme within the serverless computing framework to architect a highly scalable and highly available multi-party database secure computation system. This system can tolerate database node dropouts and automatically scale system resources in response to dynamic request workload. We implement a prototype of the system based on Alibaba Cloud and OceanBase database, conducting comprehensive experiments evaluation. The results show that our system outperforms existing systems in terms of computational cost, system performance, and scalability for tasks such as low-frequency queries and horizontal modeling. It can save up to 78% in computational costs and improve system performance by over 1.6 times. We also analyze the shortcomings of our system for complex queries and vertical modeling tasks.

    Current Issue


    Volume , No.

    Table of Contents

    Archive

    Volume

    Issue

    联系方式
    • 《Journal of Software 》
    • 主办单位:Institute of Software, CAS, China
    • 邮编:100190
    • 电话:010-62562563
    • 电子邮箱:jos@iscas.ac.cn
    • 网址:https://www.jos.org.cn
    • 刊号:ISSN 1000-9825
    •           CN 11-2560/TP
    • 国内定价:70元
    You are the firstVisitors
    Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
    Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
    Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
    Technical Support:Beijing Qinyun Technology Development Co., Ltd.

    Beijing Public Network Security No. 11040202500063