WENG Tong-Feng , ZHOU Xu , LI Ken-Li , HU Yi-Kun
2024, 35(12):5341-5362. DOI: 10.13328/j.cnki.jos.007063 CSTR: 32375.14.jos.007063
Abstract:With the development of Internet information technology, large-scale graphs have widely emerged in social networks, computer networks, and biological information networks. In view of the storage and performance limitations of traditional graph data management technology when dealing with large-scale graphs, distributed management technology has become a hotspot in industry and academia fields. The core decomposition is adopted to get core numbers of vertices in a graph and plays a key role in many applications, including community search, protein structure analysis, and network structure visualization. The existing distributed core decomposition algorithm applied a broadcast message delivery mechanism based on the vertex-centric mode, which may generate a large amount of redundant communication and computation overhead and lead to memory overflow when processing large-scale graphs. To address these issues, this study proposes novel distributed core decomposition algorithms based on global activation and peeling calculation frameworks, respectively. In addition, there are several strategies designed to improve algorithm performance. Based on the locality of the vertex core number, the study proposes a new message-pruning strategy and a new worker-centric computing mode, thereby improving the efficiency of our algorithms. To verify those strategies, this study deploys the proposed models and algorithms on the distributed cluster of the National Supercomputing Center in Changsha, and the effectiveness and efficiency of the proposed methods are evaluated through a large number of experiments on real and synthetic data sets. The total time performance of the algorithm is improved by 37% to 98%.
WANG Dong , DOU Wen-Sheng , GAO Yu , WU Chen-Ao , WEI Jun , HUANG Tao
2024, 35(12):5363-5381. DOI: 10.13328/j.cnki.jos.007066 CSTR: 32375.14.jos.007066
Abstract:Raft is one of the most popular distributed consensus protocols. Since it was proposed in 2014, Raft and its variants have been widely used in different kinds of distributed systems. To prove the correctness of the Raft protocol, developers use the TLA+ formal specification to model and verify its design. However, due to the gap between the abstract formal specification and practical implementation, distributed systems that implement the Raft protocol can still violate the protocol design and introduce intricate bugs. This study proposes a novel testing technique based on TLA+ formal specification to unearth bugs in Raft implementations. To be specific, the study maps the formal specification to the corresponding system implementation and then uses the specification-defined state space to guide the testing in the implementations. To evaluate the feasibility and effectiveness of the proposed approach, the study applies it on two different Raft implementations and finds 3 previously unknown bugs.
ZHANG Xiang-Ping , LIU Jian-Xun , HU Hai-Ze , LIU Yi
2024, 35(12):5382-5396. DOI: 10.13328/j.cnki.jos.007067 CSTR: 32375.14.jos.007067
Abstract:The code search method based on deep learning realizes the code search task by calculating the similarity of the corresponding representation of the code and the description statement. However, this manner does not consider the real probability distribution of relevance between the code and the description. To solve this problem, this study proposes a code search method based on a generative adversarial game that combines the correlation between the code and the description in the classical probability model with the feature extraction in the vector space model. Then the generative adversarial game is adopted to apply the probability distribution between the code and the description to the alternate training of the generator and discriminator. Meanwhile, the code encoder and the description encoder are optimized, and high-quality code representation and description statement representation are generated for the code search task. Finally, experimental verification is carried out on the public dataset, and the results show that the proposed method improves the Recall@10, MRR@10, and NDCG@10 metrics by 8.4%, 32.5%, and 24.3% respectively compared to the DeepCS method.
WANG Xiao-Qi , CHEN Xiao-Hong , JIN Zhi , GU Bin , QI Yan-Xia
2024, 35(12):5397-5418. DOI: 10.13328/j.cnki.jos.007081 CSTR: 32375.14.jos.007081
Abstract:Embedded systems are becoming increasingly complex, and the requirements analysis of their software systems has become a bottleneck in embedded system development. Device dependency and interleaving execution logic are typical characteristics of embedded software systems, necessitating effective requirement analysis methods to decouple the requirements based on device dependencies. Starting from the idea of environment-based modeling in requirement engineering, this study proposes a projection-based requirement analysis approach from system requirements to software requirements for embedded software systems, helping requirement engineers to effectively decouple the requirements. The study first summarizes the system requirement and software requirement descriptions of embedded software systems, defines the requirement decoupling strategies of embedded software systems based on interactive environment characteristics, and designs the specification process from system requirements to software requirements. A real case study is carried out in the spacecraft sun search system, and five representative case scenarios are quantitatively evaluated through two metrics of coupling and cohesion, which demonstrate the effectiveness of the proposed approach.
ZHOU Zhang-Bing , ZHAO Deng , ZHANG Wen-Bo , SUN Xiao , XUE Xiao
2024, 35(12):5419-5451. DOI: 10.13328/j.cnki.jos.007082 CSTR: 32375.14.jos.007082
Abstract:In recent years, service-oriented IoT architectures have received a lot of attention from academia and industry. By encapsulating IoT resources into intelligent IoT services, interconnecting and collaborating these resource-constrained and capacity-evolving IoT services to facilitate IoT applications has become a widely adopted and flexible mechanism. Upon capacity-fluctuating and resource-varying edge devices, IoT services may experience QoS degradations or resource mismatches during their execution, making it difficult for IoT applications to continue and possibly inducing failures. Therefore, quantitative monitoring of IoT services at runtime has become the key to guaranteeing the robustness of IoT applications. Different monitoring mechanisms have been proposed in recent literature, but they are inadequate in formal interpretation with strong domain relevance and empirical subjectivity. Based on formal methods, such as signal temporal logic (STL), the problem of IoT service monitoring can be formulated as a temporal logic task to achieve runtime quantitative monitoring. However, STL and its extensions suffer from issues of non-differentiability, loss of soundness, and inapplicability in dynamic environments. Moreover, existing works are inadequate for the monitoring of composite services, with a lack of integrity, linkage, and dynamics. To solve these problems, this study proposes a compositional signal temporal logic (CSTL) to achieve quantitative monitoring of different QoS constraints and time constraints upon intra-, inter-, and composite services. Specifically, CSTL extends an accumulative operator based on positively and negatively biased Riemann sums to emphasize the robust satisfaction of all sub-formulae over their entire time domains and to evaluate qualitative and quantitative constraint satisfaction for IoT service monitoring. Besides, CSTL extends a compositional operator based on constraint types and composite structures, as well as dynamic variables that can vary with the dynamic environment, to effectively monitor QoS variations and temporal violations of composite services. As a result, temporal and QoS constraints upon intra-, inter-, and composite services, can be specified by CSTL formulae, and formally interpreted with qualitative and quantitative satisfaction at runtime. Extensive evaluations show that the proposed CSTL performs better than baseline techniques in terms of expressiveness, applicability, and robustness.
XIE Rui-Lin , CUI Zhan-Qi , CHEN Xiang , LI Li
2024, 35(12):5452-5469. DOI: 10.13328/j.cnki.jos.007088 CSTR: 32375.14.jos.007088
Abstract:With the rapid development of deep neural network (DNN), the accuracy of DNN has become comparable to or even surpassed that of humans in some specific tasks. However, like traditional software, DNN is inevitably prone to defects. If defective DNN models are applied to safety-critical fields, they may cause serious accidents. Therefore, it is urgent to propose effective methods to detect defective DNN models. The traditional differential testing methods rely on the output of the testing target at the same test input as the basis for difference analysis. However, even different DNN models trained with the same program and dataset may produce different outputs under the same test input. Therefore, it is difficult to directly use the traditional differential testing method for detecting defective DNN models. To solve the above problems, this study proposes interpretation-analysis-based differential testing (IADT), an interpretation-analysis-based differential testing method for DNN models. IADT uses interpretation methods to analyze the behavior explanation of DNN models and uses statistical methods to analyze the significant differences in the models’ behavior interpretations to detect defective models. Experiments carried out on real defective models show that the introduction of interpretation methods makes IADT effective in detecting defective DNN models, while the F1-value of IADT in detecting defective models is 0.8% –6.4% greater than that of DeepCrime, and the time consumed by IADT is only 4.0%–5.4% of DeepCrime.
ZHANG Han , WANG Jing-Jing , LUO Jia-Min , ZHOU Guo-Dong
2024, 35(12):5470-5486. DOI: 10.13328/j.cnki.jos.007057 CSTR: 32375.14.jos.007057
Abstract:Currently, sentiment analysis research is generally based on big data-driven models, which heavily rely on expensive annotation and computational costs. Therefore, research on sentiment analysis in low-resource scenarios is particularly urgent. However, existing research on sentiment analysis in low-resource scenarios mainly focuses on a single task, making it difficult for models to acquire external task knowledge. Therefore, this study constructs successive sentiment analysis in low-resource scenarios, aiming to allow models to learn multiple sentiment analysis tasks over time by continual learning methods. This can make full use of data from different tasks and learn sentiment information from different tasks, thus alleviating the problem of insufficient training data for a single task. There are two core problems with successive sentiment analysis in low-resource scenarios. One is preserving sentiment information for a single task, and the other is fusing sentiment information between different tasks. To solve these two problems, this study proposes continual attention modeling for successive sentiment analysis in low-resource scenarios. Sentiment masked Adapter (SMA) is first constructed, which is used to generate hard attention emotion masks for different tasks. This can preserve sentiment information for different tasks and mitigate catastrophic forgetting. Secondly, dynamic sentiment attention (DSA) is proposed, which dynamically fuses features extracted by different Adapters based on the current time step and task similarity. This can fuse sentiment information between different tasks. Experimental results on multiple datasets show that the proposed approach significantly outperforms the state-of-the-art benchmark approaches. Additionally, experimental analysis indicates that the proposed approach has the best sentiment information retention ability and sentiment information fusion ability compared to other benchmark approaches while maintaining high operational efficiency.
GUO Li-Li , WANG Long-Biao , DANG Jian-Wu , DING Shi-Fei
2024, 35(12):5487-5508. DOI: 10.13328/j.cnki.jos.007232 CSTR: 32375.14.jos.007232
Abstract:Speech emotion recognition is an important part of affective computing and plays an important role in human-computer interaction. Accurately distinguishing emotions helps machines understand users’ intentions and provide better interactivity to enhance user experience. This study reviews the theories and methods of speech emotion recognition focusing on discrete speech emotions. Firstly, the study reviews the development of emotion recognition and presents an architecture of speech emotion recognition to summarize research progress. Secondly, emotion representation models and commonly used corpora are introduced to provide basic support for speech emotion recognition. Then, the process of speech emotion recognition is outlined, including feature extraction and recognition models, with a focus on traditional classification models, classical deep models, and other advanced models. Meanwhile, commonly used evaluation indicators are introduced and applied to provide a summary of models. Finally, the study discusses the challenges in speech emotion recognition and suggests possible directions for future research.
ZHANG Yu-Hui , CHEN Li , JU Sheng-Gen , LI Mei-Wen
2024, 35(12):5509-5525. DOI: 10.13328/j.cnki.jos.007062 CSTR: 32375.14.jos.007062
Abstract:Spoken language understanding is a key task in task-based dialogue systems, mainly composed of two sub-tasks: slot filling and intent detection. Currently, the mainstream method is to jointly model slot filling and intent detection. Although this method has achieved good results in both slot filling and intent detection, there are still issues with error propagation in the interaction process between intent detection and slot filling in joint modeling, as well as the incorrect correspondence between multi-intent information and slot information in multi-intent scenarios. In response to these problems, this study proposes a joint model for multi-intent detection and slot filling based on graph attention networks (WISM). The WISM established a word-level one-to-one mapping relationship between fine-grained intentions and slots to correct incorrect correspondence between multi-intent information and slots. By constructing an interaction graph of word-intent-semantic slots and utilizing a fine-grained graph attention network to establish bidirectional connections between the two tasks, the problem of error propagation during the interaction process can be reduced. Experimental results on the MixSINPS and MixATIS datasets showed that, compared with the latest existing models, WISM has improved semantic accuracy by 2.58% and 3.53%, respectively. This model not only improves accuracy but also verifies the one-to-one correspondence between multi-intent and semantic slots.
ZHANG Qian-Zhen , GUO De-Ke , ZHAO Xiang
2024, 35(12):5526-5543. DOI: 10.13328/j.cnki.jos.007064 CSTR: 32375.14.jos.007064
Abstract:Temporal graph is a type of graph where each edge is associated with a timestamp. Seasonal-bursting subgraph is a dense subgraph characterized by burstiness over multiple time periods, which can applied for activity discovery and group relationship analysis in social networks. Unfortunately, most previous studies for subgraph mining in temporal networks ignore the seasonal or bursting features of subgraphs. To this end, this study proposes a maximal (ω,θ)-dense subgraph model to represent a seasonal-bursting subgraph in temporal networks. Specially, the maximal (ω,θ)-dense subgraph is a subgraph that accumulates its density at the fastest speed during at least ω particular periods of length no less than θ on the temporal graph. To compute all seasonal bursting subgraphs efficiently, the study first models the mining problem as a mixed integer programming problem, which consists of finding the densest subgraph and the maximum burstiness segment. Then corresponding solutions are given for each subproblem, respectively. The study further conceives two optimization strategies by exploiting key-core and dynamic programming algorithms to boost performance. The results of experiments show that the proposed model is indeed able to identify many seasonal-bursting subgraphs. The efficiency, scalability, and effectiveness of the proposed algorithms are also verified on five real-life datasets.
2024, 35(12):5544-5557. DOI: 10.13328/j.cnki.jos.007094 CSTR: 32375.14.jos.007094
Abstract:Online class-increment learning aims to learn new classes effectively under data stream scenarios and guarantee that the model meets the small cache and small batch constraints. However, due to the one-pass nature of data streams, it is difficult for the category information in small batches like offline learning to be exploited by multiple explorations. To alleviate this problem, current studies adopt multiple data augmentation combined with contrastive learning for model training. Nevertheless, considering the limitations of small cache and small batches, existing methods of selecting and storing data randomly are not conducive to obtaining diverse negative samples, which restricts the model discriminability. Previous studies have shown that hard negative samples are the key to improving contrastive learning performance, but this is rarely explored in online learning scenarios. The condued data proposed in traditional Universum learning provides a simple yet intuitive strategy using hard negative samples. Specifically, this study has proposed mixup-induced Universum (MIU) with certain coefficients previously, which effectively improves the performance of offline contrastive learning. Inspired by this, it tries to introduce MIU to online scenes, which is different from the previously statically generated Universum, and data stream scenarios face some additional challenges. Firstly, due to the increasing number of classes, the conventional approach of generating Universum based on globally given classes statically becomes inapplicable, necessitating redefinition and dynamic generation. Therefore, this study proposes to recursively generate MIU with the maximum entropy (incremental MIU, IMIU) relative to the seen (local) class and provides it with an additional small cache to meet the memory limit generally. Secondly, the generated IMIU and positive samples in small batches are mixed up together again to produce diverse and high-quality hard negative samples. Finally, by combining the above steps, the IMIU-based contrastive learning (IUCL) algorithm is developed. Meanwhile, comparison experiments on the standard datasets CIFAR-10, CIFAR-100, and Mini-ImageNet verify the validity of the proposed algorithm.
QIU Zhi-Lin , SHOU Li-Dan , CHEN Ke , JIANG Da-Wei , LUO Xin-Yuan , CHEN Gang
2024, 35(12):5558-5581. DOI: 10.13328/j.cnki.jos.007058 CSTR: 32375.14.jos.007058
Abstract:Due to the continuous advancements in the field of deep learning, there is growing interest in extending relational databases with collaborative query processing (CQP) techniques to handle advanced analytical queries involving structured and unstructured data. State-of-the-art CQP methods employ user-defined functions (UDFs) to implement deep neural network (NN) models for processing unstructured data while utilizing relational operations for structured data. UDF-based approaches simplify query composition, allowing users to submit analytical queries with a single SQL statement. However, they require manual selection of appropriate and efficient models based on desired performance metrics during ad-hoc data analysis, posing significant challenges to users. To address this issue, this research proposes a CQP technique based on declarative inference functions (DIF), which constructs a complete CQP framework by optimizing model selection, execution strategies, and device bindings across multiple query execution paths. Leveraging the cost model and optimization rules designed in this study, the query processor is capable of estimating the cost of different query plans and automatically selecting the optimal physical query plan. Experimental results on four datasets validate the effectiveness and efficiency of the proposed DIF-based CQP approach.
DONG Jian-Kuo , HUANG Yue-Hua , FU Yu-Sheng , XIAO Fu , ZHENG Fang-Yu , LIN Jing-Qiang , DONG Zhen-Jiang
2024, 35(12):5582-5608. DOI: 10.13328/j.cnki.jos.007089 CSTR: 32375.14.jos.007089
Abstract:As the core foundation for ensuring network security, cryptography plays a crucial role in data protection, identity verification, encrypted communication, and other aspects. With the rapid popularization of 5G and the Internet of Things technology, network security is facing unprecedented challenges, and the demand for cryptographic performance is showing explosive growth. GPU can utilize thousands of parallel computing cores to accelerate complex computing problems, which is very suitable for the computationally intensive nature of cryptographic algorithms. Therefore, researchers have extensively explored methods to accelerate various cryptographic algorithms on GPU platforms. Compared with platforms such as CPU and FPGA, GPU has significant performance advantages. This study discusses the classification of various cryptographic algorithms and GPU platform architecture, and provides a detailed analysis of current research on various ciphers on GPU heterogeneous platforms. Additionally, it summarizes the current technical challenges confronted by high-performance cryptography based on GPU platforms and provides prospects for future technological development. Finally, comprehensive references can be provided for practitioners in cryptography engineering research on the latest research progress and application practices of high-performance cryptography based on GPU by in-depth studies and summaries.
LAI Jian-Chang , HUANG Xin-Yi , HE De-Biao , CHEN Li-Quan , YANG Shao-Jun
2024, 35(12):5609-5620. DOI: 10.13328/j.cnki.jos.007041 CSTR: 32375.14.jos.007041
Abstract:Revocation encryption is a negative analogue of broadcast encryption. Unlike broadcast encryption, the input to the encryption algorithm is not a receiver set, but a set of revoked users. All users who are not in the revocation set within the system can decrypt the ciphertext successfully. Users in the revocation set learn nothing about the encrypted data, even in collusion. Compared to broadcast encryption, revocation encryption is more suitable for scenarios where most of the users in the system are the intended recipients and when revoking decryption rights for certain users is required. This study proposes a revocation encryption scheme based on the Chinese identity-based encryption standard SM9. The ciphertext size in the proposed scheme remains constant, and it is independent of the size of the revocation set. Based on a complex assumption in the generic group model, the scheme is proven secure against CPA under the random oracle model. Finally, the performance of the scheme is analyzed, and the results indicate that its computational costs and storage overheads are comparable to the existing revocation encryption schemes.
2024, 35(12):5621-5635. DOI: 10.13328/j.cnki.jos.007060 CSTR: 32375.14.jos.007060
Abstract:To solve the problems of users’ private key security, this study proposes a user-oriented and practical private key protection framework by combining secret sharing and edge computing mode. Based on this framework, it designs a private key protection scheme for the SM2 public-key cryptographic system. In this scheme, a user’s SM2 private key is divided into two shares via a secret sharing scheme and kept by the user’s device and the edge server respectively. The public-key cryptographic task requested by Web3 applications is executed cooperatively by the user’s device and the edge server without having to recover the original private key. After the user’s device or the edge server is attacked, a key updating protocol will be executed among them to update the private key shares and scrap the one that may have been leaked. Experiment results show that the computing time of the new scheme is acceptable for common devices (smartphones, laptops, etc.) in the real world.
LI Xiang-Xian , ZHENG Yu-Ze , MA Hao-Kai , QI Zhuang , YAN Xiao-Shuo , MENG Xiang-Xu , MENG Lei
2024, 35(12):5636-5652. DOI: 10.13328/j.cnki.jos.007052 CSTR: 32375.14.jos.007052
Abstract:The performance of image classification algorithms is limited by the diversity of visual information and the influence of background noise. Existing works usually apply cross-modal constraints or heterogeneous feature alignment algorithms to learn visual representations with strong discrimination. However, the difference in feature distribution caused by modal heterogeneity limits the effective learning of visual representations. To address this problem, this study proposes an image classification framework (CMIF) based on cross-modal semantic information inference and fusion and introduces the semantic description of images and statistical knowledge as privileged information. The study uses the privileged information learning paradigm to guide the mapping of image features from visual space to semantic space in the training stage, and a class-aware information selection (CIS) algorithm is proposed to learn the cross-modal enhanced representation of images. In view of the heterogeneous feature differences in representation learning, the partial heterogeneous alignment (PHA) algorithm is used to achieve cross-modal alignment of visual features and semantic features extracted from privileged information. In order to further suppress the interference caused by visual noise in semantic space, the CIS algorithm based on graph fusion is selected to reconstruct the key information in the semantic representation, so as to form an effective supplement to the visual prediction information. Experiments on the cross-modal classification datasets VireoFood-172 and NUS-WIDE show that CMIF can learn robust semantic features of images, and it has achieved stable performance improvement on the convolution-based ResNet-50 and Transformer-based ViT image classification models as a general framework.
WANG Jin-Wei , WANG Wei , WANG Hao , LUO Xiang-Yang , MA Bin
2024, 35(12):5653-5670. DOI: 10.13328/j.cnki.jos.007055 CSTR: 32375.14.jos.007055
Abstract:Detecting aligned double joint photographic experts group (JPEG) compression is a challenging task in digital image forensics. Previous studies have proposed methods that can effectively detect aligned double JPEG compression, but these methods mostly rely on features extracted during the JPEG decompression process. If the aligned double compressed JPEG image is saved in BMP format, these methods may be difficult to be directly applied. To address this issue, this study proposes a quantization step estimation method based on dual thresholds, which allows for the acquisition of quantization tables and the extraction of features. Furthermore, the study defines a minimum error based on the unique properties of JPEG compression with a quality factor of 100, and by removing the minimum error from the features, the feature detection performance of the proposed method is further improved. Finally, the study extracts first-order relative error features based on the convergence properties of the de-quantized JPEG coefficients, which further enhances the detection performance of the proposed method at lower quality factors. Experimental results demonstrate that the proposed method outperforms current state-of-the-art algorithms at different quality factors.
WEI Kang-Kang , LUO Wei-Qi , LIU Ming-Lin
2024, 35(12):5671-5686. DOI: 10.13328/j.cnki.jos.007068 CSTR: 32375.14.jos.007068
Abstract:Currently, most of the published image steganalysis methods are designed for grayscale images, which cannot effectively detect color images widely used in social media. To solve this problem, this study proposes a color image steganalysis method based on central difference convolution and attention enhancement. The proposed method first designs a backbone flow consisting of three stages: preprocessing, feature extraction, and feature classification. In the preprocessing stage, the input color image is color channel-separated, and the residual images after SRM filtering are concatenated through each channel. In the feature extraction stage, the study constructs three convolutional blocks based on central difference convolution to extract deeper steganalysis feature maps. In the classification stage, the study uses global covariance pooling and two fully connected layers with dropout operation to classify the cover and stego images. Additionally, to further enhance the feature expression ability of the backbone flow at different stages, it introduces a residual spatial attention enhancement module and a channel attention enhancement module at the early and late stages of the backbone flow, respectively. Specifically, the residual spatial attention enhancement module first uses Gabor filter kernels to perform channel-separated convolution on the input image and then obtains the effective information of the residual feature map through the spatial attention mechanism. The channel attention enhancement module enhances the final feature classification ability of the model by obtaining the dependence relationship between channels. A large number of comparative experiments have been conducted, and the results show that the proposed method can significantly improve the detection performance of color image steganography and achieve the best results currently. In addition, the study also conducts corresponding ablation experiments to verify the rationality of the proposed network architecture.
LIU Hui , ZHU Ji-Cheng , WANG Xin-Yu , SHENG Yu-Rui , ZHANG Cai-Ming , NIE Li-Qiang
2024, 35(12):5687-5709. DOI: 10.13328/j.cnki.jos.007083 CSTR: 32375.14.jos.007083
Abstract:Multi-modal medical image fusion provides a more comprehensive and accurate medical image description for medical diagnosis, surgical navigation, and other clinical applications by effectively combining human tissue structure and lesion information reflected by different modal datasets. This study aims to address partial spectral degradation, lack of edges and details and insufficient color reproduction of adhesion lesion-invaded regions in current fusion methods. It proposes a novel multi-modal medical image fusion method to achieve multi-feature enhancement and color preservation in the multi-scale feature frequency domain decomposition filter domain. This method decomposes the source image into four parts: smoothing, texture, contour, and edge feature layers, which employ specific fusion rules and generate fusion results by image reconstruction. In particular, given the potential feature information contained in the smoothing layer, the study proposes a visual saliency decomposition strategy to explore the energy and partial fiber texture features with multi-scale and multi-dimensionality, enhancing the utilization of source image information. In the texture layer, the study introduces a texture enhancement operator to extract details and hierarchical information through spatial structure and information measurement, addressing the issue of distinguishing the invasion status of adherent lesion areas in current fusion methods. In addition, due to the lack of a public abdominal dataset, 403 sets of abdominal images are registered in this study for public access and download. Experiments conducted on public dataset Atlas and abdominal datasets are compared with six baseline methods. Compared to the most advanced methods, the results show that the similarity between the fused image and the source image is improved by 22.92%, the edge retention, spatial frequency, and contrast ratio of fused images are improved by 35.79%, 28.79%, and 32.92%, respectively. In addition, the visual and computing efficiency of the proposed method are better than those of other methods.
ZHAO Yu-Long , ZHANG Lu-Fei , XU Guo-Chun , LI Yu-Xuan , SUN Ru-Jun , LIU Xin
2024, 35(12):5710-5724. DOI: 10.13328/j.cnki.jos.007084 CSTR: 32375.14.jos.007084
Abstract:The homegrown Shenwei AI acceleration card is equipped with the Shenwei many-core processor based on systolic array enhancement, and although its intelligent computing power can be comparable to the mainstream GPU, there is still a lack of basic software support. To lower the utilization threshold of the Shenwei AI acceleration card and effectively support the development of AI applications, this study designs a runtime system SDAA for the Shenwei AI acceleration card, whose semantics is consistent with the mainstream CUDA. For key paths such as memory management, data transmission, and kernel function launch, the software and hardware co-design method is adopted to realize the multi-level memory allocation algorithm with segment and paged memory combined on the card, pageable memory transmission model of multiple threads and channels, adaptive data transmission algorithm with multi-heterogeneous components, and fast kernel function launch method based on on-chip array communication. As a result, the runtime performance of SDAA is better than that of the mainstream GPU. The experimental results indicate that the memory allocation speed of SDAA is 120 times the corresponding interface of NVIDIA V100, the memory transmission overhead is 1/2 of the corresponding interface, and the data transmission bandwidth is 1.7 times the corresponding interface. Additionally, the launch time of the kernel function is equivalent to the corresponding interface, and thus the SDAA runtime system can support the efficient operation of mainstream frameworks and actual model training on the Shenwei AI acceleration card.
ZHANG Rui-Lin , DU Jin-Hua , YIN Hao
2024, 35(12):5725-5740. DOI: 10.13328/j.cnki.jos.007085 CSTR: 32375.14.jos.007085
Abstract:As a new type of distributed machine learning paradigm, federated learning makes full use of the computing power of many distributed clients and their local data to jointly train a machine learning model under the premise of meeting user privacy and data confidentiality requirements. In cross-device federated learning scenarios, the client usually consists of thousands or even tens of thousands of mobile devices or terminal devices. Due to the limitations of communication and computing costs, the aggregation server only selects few clients for the training during each round of training. Meanwhile, several widely employed federated optimization algorithms adopt a completely random client selection algorithm, which has been proven to have a huge optimization space. In recent years, how to efficiently and reliably select a suitable set from massive heterogeneous clients to participate in training and thus optimize the resource consumption and model performance of federated learning protocols has been extensively studied, but there is still no comprehensive investigation on the key issue. Therefore, this study conducts a comprehensive survey of client selection algorithms for cross-device federated learning. Specifically, it provides a formal description of the client selection problem, then gives the classification of selection algorithms, and discusses and analyzes the algorithms one by one. Finally, some future research directions for client selection algorithms are explored.