LIU Xin , JING Li-Ping , YU Jian
2024, 35(4):1587-1600. DOI: 10.13328/j.cnki.jos.007014
Abstract:With the development of technologies such as big data, computing, and the Internet, artificial intelligence techniques represented by machine learning and deep learning have achieved tremendous success. Particularly, the emergence of various large-scale models has greatly accelerated the application of artificial intelligence in various fields. However, the success of these techniques heavily relies on massive training data and abundant computing resources, which significantly limits their application in data or resource-scarce domains. Therefore, how to learn from limited samples, known as few-shot learning, has become a crucial research problem in the new wave of industrial transformation led by artificial intelligence. The most commonly used approach in few-shot learning is based on meta- learning. Such methods learn meta-knowledge for solving similar tasks by training on a series of related training tasks, which enables fast learning on new testing tasks using the acquired meta-knowledge. Although these methods have achieved sound results in few-shot classification tasks, they assume that the training and testing tasks come from the same distribution. This implies that a sufficient number of training tasks are required for the model to generalize the learned meta-knowledge to continuously changing testing tasks. However, in some real-world scenarios with truly limited data, ensuring an adequate number of training tasks is challenging. To address this issue, this study proposes a robust few-shot classification method based on diverse and authentic task generation (DATG). The method generates additional training tasks by applying Mixup to a small number of existing tasks, aiding the model in learning. By constraining the diversity and authenticity of the generated tasks, this method effectively improves the generalization of few-shot classification methods. Specifically, the base classes in the training set are firstly clustered to obtain different clusters and then tasks are selected from different clusters for Mixup to increase task diversity. Furthermore, performing inter-cluster tasks Mixup helps alleviate the learning of pseudo-discriminative features highly correlated with the categories. To ensure that the generated tasks do not deviate too much from the real distribution and mislead the model’s learning, the maximum mean discrepancy (MMD) between the generated tasks and real tasks is minimized, thus ensuring the authenticity of the generated tasks. Finally, it is theoretically analyzed why the inter-cluster task Mixup strategy can improve the model’s generalization performance. Experimental results on multiple datasets further demonstrate the effectiveness of the proposed method.
YANG Hong-Yu , MA Jian-Hui , HOU Min , SHEN Shuang-Hong , CHEN En-Hong
2024, 35(4):1601-1617. DOI: 10.13328/j.cnki.jos.007016
Abstract:Code representation aims to extract the characteristics of source code to obtain its semantic embedding, playing a crucial role in deep learning-based code intelligence. Traditional handcrafted code representation methods mainly rely on domain expert annotations, which are time-consuming and labor-intensive. Moreover, the obtained code representations are task-specific and not easily reusable for specific downstream tasks, which contradicts the green and sustainable development concept. To this end, many large-scale pretraining models for source code representation have shown remarkable success in recent years. These methods utilize massive source code for self-supervised learning to obtain universal code representations, which are then easily fine-tuned for various downstream tasks. Based on the abstraction levels of programming languages, code representations have four level features: text level, semantic level, functional level, and structural level. Nevertheless, current models for code representation treat programming languages merely as ordinary text sequences resembling natural language. They overlook the functional-level and structural-level features, which bring performance inferior. To overcome this drawback, this study proposes a representation enhanced contrastive multimodal pretraining (REcomp) framework for code representation pretraining. REcomp has developed a novel semantic-level to structure-level feature fusion algorithm, which is employed for serializing abstract syntax trees. Through a multi-modal contrastive learning approach, this composite feature is integrated with both the textual and functional features of programming languages, enabling a more precise semantic modeling. Extensive experiments are conducted on three real-world public datasets. Experimental results clearly validate the superiority of REcomp.
CHEN Yi-Yu , HUO Jing , DING Tian-Yu , GAO Yang
2024, 35(4):1618-1650. DOI: 10.13328/j.cnki.jos.007011
Abstract:In recent years, deep reinforcement learning (DRL) has achieved remarkable success in many sequential decision-making tasks. However, the current success of deep reinforcement learning heavily relies on massive learning data and computing resources. The poor sample efficiency and strategy generalization ability are the key factors restricting DRL’s further development. Meta-reinforcement learning (Meta-RL) studies to adapt to a wider range of tasks with a smaller sample size. Related researches are expected to alleviate the above limitations and promote the development of reinforcement learning. Taking the scope of research object and application range of current research works, this study comprehensively combs the research progress in the field of meta-reinforcement learning. Firstly, a basic introduction is given to deep reinforcement learning and the background of meta-reinforcement learning. Then, meta-reinforcement learning is formally defined and common scene settings are summarized, and the current research progress of meta-reinforcement learning is also introduced from the perspective of application range of the research results. Finally, the research challenges and potential future development directions are discussed.
WANG Fan , HAN Zhong-Yi , SU Wan , YIN Yi-Long
2024, 35(4):1651-1666. DOI: 10.13328/j.cnki.jos.007010
Abstract:Unsupervised domain adaptation (UDA) has achieved success in solving the problem that the training set (source domain) and the test set (target domain) come from different distributions. In the low energy consumption and open dynamic task environment, with the emergence of resource constraints and public classes, existing UDA methods encounter severe challenges. Source free open-set domain adaptation (SF-ODA) aims to transfer the knowledge from the source model to the unlabeled target domain where public classes appear, thus realizing the identification of common classes and detection of public class without the source data. Existing SF-ODA methods focus on designing source models that accurately detect public class or modifying the model structures. However, they not only require extra storage space and training overhead, but also are difficult to be implemented in the strict privacy scenarios. This study proposes a more practical scenario: Active learning source free open-set domain adaptive adaptation (ASF-ODA), based on a common training source model and a small number of valuable target samples labeled by experts to achieve a robust transfer. A local consistent active learning (LCAL) algorithm is proposed to achieve this objective. First of all, LCAL includes a new proposed active selection method, local diversity selection, to select more valuable samples of target domain and promote the separation of threshold fuzzy samples by taking advantage of the feature local labels in the consistent target domain. Then, based on information entropy, LCAL initially selects possible common class set and public class set, and corrects these two sets with labeled samples obtained in the first step to obtain two corresponding reliable sets. Finally, LCAL introduces open set loss and information maximization loss to further promote the separation of common and public classes, and introduces cross entropy loss to realize the discrimination of common classes. A large number of experiments on three publicly available benchmark datasets, Office-31, Office-Home, and VisDA-C, show that with the help of a small number of valuable target samples, LCAL significantly outperforms the existing active learning methods and SF-ODA methods, with over 20% HOS improvements in some transfer tasks.
ZHOU Zhi , ZHANG Ding-Chu , LI Yu-Feng , ZHANG Min-Ling
2024, 35(4):1667-1681. DOI: 10.13328/j.cnki.jos.007009
Abstract:Open-set recognition is an important issue for ensuring the efficient and robust deployment of machine learning models in the open world. It aims to address the challenge of encountering samples from unseen classes that emerge during testing, i.e., to accurately classify the seen classes while identifying and rejecting the unseen ones. Current open-set recognition studies assume that the covariate distribution of the seen classes remains constant during both training and testing. However, in practical scenarios, the covariate distribution is constantly shifting, which can cause previous methods to fail, and their performance may even be worse than the baseline method. Therefore, it is urgent to study novel open-set recognition methods that can adapt to the constantly changing covariate distribution so that they can robustly classify seen categories and identify unseen categories during testing. This novel problem adaptation in the open world (AOW) is named and a test-time adaptation method is proposed for open-set recognition called open-set test-time adaptation (OTA). OTA method only utilizes unlabeled test data to update the model with adaptive entropy loss and open-set entropy loss, maintaining the model’s ability to discriminate seen classes while further enhancing its ability to recognize unseen classes. Comprehensive experiments are conducted on multiple benchmark datasets with different covariate shift levels. The results show that the proposal is robust to covariate shift and demonstrates superior performance compared to many state-of-the-art methods.
LI Qing , WANG Qi-Xin , LI Zi-Yu , ZHU Zhi-Yuan , ZHANG Shi-Hao , MOU Hao-Nan , YANG Wen-Ting , WU Xia
2024, 35(4):1682-1702. DOI: 10.13328/j.cnki.jos.007012
Abstract:Neural architecture search (NAS) is an important part of automated machine learning, which has been widely used in multiple fields, including computer vision, speech recognition, etc. NAS can search the optimal deep neural network structures for specific data, scenarios, and tasks. In recent years, NAS has been increasingly applied to brain data analysis, significantly improving the performance in multiple application fields, such as brain image segment, feature extraction, brain disease auxiliary diagnosis, etc. Such researches have demonstrated the advantages of low-energy automated machine learning in the field of brain data analysis. NAS-based brain data analysis is one of the current research hotspots, and it still has certain challenges. At present, there are few review literatures available for reference in this field worldwide. This study conducts a detailed survey and analysis of relevant literature from different perspectives, including search frameworks, search space, search strategies, research tasks, and experimental data. At the same time, a systematic summary of brain data sets is also provided that can be used for NAS training. In addition, challenges and future research directions of NAS are prospected in brain data analysis.
TIAN Qing , SUN Can-Yu , CHU Yi
2024, 35(4):1703-1716. DOI: 10.13328/j.cnki.jos.007015
Abstract:As an emerging field of machine learning, multi-source partial domain adaptation (MSPDA) poses challenges to related research due to the complexities of the involved source domains, the diversities between the domains, and the unsupervised nature of the target domain itself, leading to rarely few works presented. In this scenario, the irrelevant class samples in multiple-source domains will cause large cumulative errors and negative transfer during domain adaptation. In addition, most of the existing multisource domain adaptation methods do not consider the different contributions of different source domains to the target domain tasks. Therefore, this study proposes an adaptive weightinduced multisource partial domain adaptation (AW-MSPDA). Firstly, a diverse feature extractor is constructed to effectively utilize the prior knowledge of the source domain. Meanwhile, multilevel distribution alignment strategy is constructed to eliminate distribution discrepancies from different levels to promote positive transfer. Moreover, the pseudolabel weighting and similarity measurement are used to construct adaptive weights to quantify the contribution of different source domains and filter samples which are irrelevant to the source domain. Finally, the generalization and performance superiority of the proposed AWMSPDA algorithm are evaluated by extensive experiments.
YAN Tao , GAO Hao-Xuan , ZHANG Jiang-Feng , QIAN Yu-Hua , ZHANG Lin-Yuan
2024, 35(4):1717-1731. DOI: 10.13328/j.cnki.jos.007013
Abstract:Microscopic three-dimensional (3D) shape reconstruction is a crucial step in the field of precision manufacturing. The reconstruction process relies on the acquisition of high-resolution and dense images. Nevertheless, in the face of high efficiency requirements in complex application scenarios, inputting high-resolution dense images will result in geometrically increased computation and complexity, making it difficult to achieve efficient and low-latency real-time microscopic 3D shape reconstruction. In response to this situation, this study proposes a grouping parallelism lightweight real-time microscopic 3D shape reconstruction method GPLWS-Net. The GPLWS-Net constructs a lightweight backbone network based on a U-shaped network and accelerates the 3D shape reconstruction process with parallel group-querying. In addition, the neural network structure is re-parameterized to avoid the accuracy loss of reconstructing the microstructure. Furthermore, to supplement the lack of existing microscopic 3D reconstruction datasets, this study publicly releases a set of multi-focus microscopic 3D reconstruction dataset called Micro 3D. The label data uses multi-modal data fusion to obtain a high- precision 3D structure of the scene. The results show that the GPLWS-Net network can not only guarantee the reconstruction accuracy, but also reduce the average time of 39.15% in the three groups of public datasets and 50.55% in the Micro 3D dataset compared with the other five types of deep learning-based methods, which can achieve real-time 3D shape reconstruction of complex microscopic scenes.
QIAN Hong , SHU Xiang , SUN Tian-Xiang , QIU Xi-Peng , ZHOU Ai-Min
2024, 35(4):1732-1750. DOI: 10.13328/j.cnki.jos.007017
Abstract:Derivative-free optimization is commonly employed in tasks such as black-box tuning of language-model-as-a-service and hyper-parameter tuning of machine learning models, where the mapping between the solution space of the optimization task and the performance indicator is intricate and complex, making it challenging to explicitly formulate an objective function. Accurate and stable evaluation of solutions is crucial for derivative-free optimization methods. The evaluation of the quality of a solution often requires running the model on the entire dataset, and the optimization process sometimes requires a large number of evaluations of solution quality. The growing complexity of machine learning models and the expanding size of training datasets result in escalating time and computational costs for accurate and stable solution evaluation, contradicting the principle of green and low-carbon machine learning and optimization. In view of this, this study proposes a green derivative-free optimization framework with dynamic batch evaluation (GRACE). Based on the similarity of training subsets, GRACE adaptively and dynamically adjusts the sample size used for evaluating solutions during the optimization process, thereby ensuring optimization performance while reducing optimization costs and computational expenses, achieving the goal of green, low-carbon, and efficient optimization. Experiments are conducted on tasks such as black-box tuning of language-model-as-a-service and hyper-parameter optimization of models. By comparing with the comparative methods and the degraded versions of GRACE, the effectiveness, efficiency, and green and low-carbon merits of GRACE are verified. The results also show the hyper-parameter robustness of GRACE.
ZHAO Wen-Zhu , YUAN Guan , ZHANG Yan-Mei , QIAO Shao-Jie , WANG Sen-Zhang , ZHANG Lei
2024, 35(4):1751-1773. DOI: 10.13328/j.cnki.jos.007018
Abstract:Traffic flow prediction is an essential component of environmental, safe, and efficient intelligent transportation system. Due to the powerful spatial-temporal data representation ability, spatial-temporal graph neural network is widely used in traffic flow prediction. Nevertheless, existing spatial-temporal graph neural network based traffic flow prediction models have two limitations. (1) The static topology graph constructed from city spatial correlation ignores the dynamic traffic patterns, which are unable to reflect the temporal dynamic correlation between nodes in road network; and (2) only considering the spatial correlation of local traffic areas lacks the spatial correlations between the local region and the global road network. To overcome the above limitations, this study proposes a multi-view fused spatial- temporal dynamic graph convolutional network model for traffic flow prediction. Firstly, it constructs a road network spatial structure graph and a dynamic traffic-flow association graph from the perspectives of static spatial topology and dynamic traffic patterns, and uses dynamic graph convolution to learn the node features from both perspectives, comprehensively capturing the diverse spatial correlations in the road network. After that, from the local and global perspectives, it calculates the global representation of the road network and fuses global features with local features to enhance the expressiveness of node features and explore the global structural features of traffic flow. Finally, the model designs a local convolutional multi-head self-attention mechanism to obtain the dynamic temporal correlation of traffic data, achieving accurate traffic flow prediction under multiple time windows. The experimental results on four real traffic data demonstrate the effectiveness and universality of the proposed model.
GUAN Ze-Li , DU Jun-Ping , XUE Zhe , WANG Pei-Wen , PAN Zhen-Hui , WANG Xiao-Yang
2024, 35(4):1774-1789. DOI: 10.13328/j.cnki.jos.007019
Abstract:In recent years, the method of transforming public safety data into graph form and constructing node representations through graph neural networks for training and inference of downstream tasks has fully exploited the entity and association information of public safety data, achieving excellent results. Nevertheless, to enhance the effectiveness of the model, a large amount of high-quality data is needed, which is usually held by governments, companies, and organizations, making it difficult to learn an effective event detection model through data centralization. Moreover, due to different focuses and collection times of the data from various parties, there is a Non-IID (independent and identically distributed) problem among the data. Traditional methods that assume a global model can accommodate all clients are challenging to solve such issues. Therefore, this study proposes personalized public safety event detection (PPSED) method based on a reinforcement federated graph neural network. In this method, each client trains a personalized and more robust model through multi-party collaboration to solve local event detection tasks. A local training and gradient quantization module is designed for the federated public safety emergency event detection model and trained GraphSage through a minibatch mechanism based on graph sampling to construct a local model for public safety event detection. This approach reduces the impact of Non-IID data and supports the gradient quantization method to lower the consumption of gradient communication. A client state awareness module is also designed based on random graph embedding, which better retains the valuable information of the client model while protecting privacy. Furthermore, a personalized gradient aggregation and quantization strategy are designed for the federated graph neural network. Deep deterministic policy gradient (DDPG) is used to fit a personalized federated learning gradient aggregation weighting strategy, and it is determined whether the gradient can be quantized based on the weight, balancing the model's performance, and communication pressure. This study demonstrated the effectiveness of the method through extensive experiments on a public safety dataset collected from the Weibo platform and three public graph datasets.
WAN Chang-Xuan , ZHANG Yi-Tao , LIU De-Xi , LIU Xi-Ping , LIAO Guo-Qiong , WAN Qi-Zhi
2024, 35(4):1790-1818. DOI: 10.13328/j.cnki.jos.006840
Abstract:The hierarchical topic model is an important tool to organize topic hierarchy. Most of the existing hierarchical topic models provide tree-structured prior distributions for document topics by introducing the nCRP construction method into the topic models, but they cannot acquire a topic hierarchy with clear domain meanings, referred to as domain topic hierarchy. Meanwhile, there are not only hierarchical relationships among domain topics but also sub-topic aspect sharing relationships under different parent topics. There is no appropriate model that yields such domain topic hierarchy in the current research on topic relationships. In order to automatically and effectively mine the hierarchical and correlated relationships of domain topics from domain texts, improvements are put forward as follows. Firstly, this study improves the nCRP construction method through the topic sharing mechanism and proposes the nCRP+ hierarchical construction method to provide a tree-structured prior distribution with hierarchical topic aspect sharing for topics generated from topic models. Then the reallocated hierarchical Dirichlet processes (rHDP) are developed based on nCRP+ and HDP models, and an rHDP model is proposed. By employing the domain taxonomy, word semantics, and domain representation of topic words, the study defines domain knowledge, including the domain membership degree based on the voting mechanism, the semantic relevance between words and domain topics, and the contribution degree of hierarchical topic words. Finally, domain knowledge is used to improve the allocation processes of domain topics and topic words in the rHDP model, and rHDP with domain knowledge (rHDP_DK) model is proposed to improve the sampling process. The experimental results show that hierarchical topic models based on nCRP+ are superior to those based on nCRP (hLDA and nHDP) and neural topic model (TSNTM) in terms of evaluation metrics. The topic hierarchy, built by the rHDP_DK model, is characterized by clear domain topic hierarchy and explicit domain differences among related sub-topics. Furthermore, the model will provide a general automatic mining framework for domain topic hierarchy.
LI He , LIU Yan-Na , YANG Shu-Qi , HUANG Jian-Bin , QIAO Shao-Jie
2024, 35(4):1819-1840. DOI: 10.13328/j.cnki.jos.006842
Abstract:Graph partitioning is a basic task for distributed graph computing. It is used to divide a large-scale graph into different parts and allocate them to different machines in a cluster. The quality of graph partitioning has a great impact on the performance of distributed graph computing, and graph partitioning aims to minimize edge cuts and load balance. Nowadays, the graph data usually grow dynamically, which needs a partitioning method to process dynamic incremental graphs, so as to ensure the quality of graph partitioning. Although some dynamic graph partitioning algorithms have been presented recently, they cannot process real-time dynamic changes and obtain high-quality graph partitioning results simultaneously. In this study, a dynamic incremental graph partitioning algorithm based on vertex group redistribution (ED-IDGP) is proposed to solve the problem of large-scale dynamic incremental graph partitioning. In ED-IDGP, a dynamic processor is designed to process four different unit update types in real time, and the graph partitioning quality is further improved by executing a local optimizer near the dynamic change in the partition after each unit update. In the local optimizer of ED-IDGP, a vertex group search strategy based on the improved label propagation algorithm is used to search for the vertex group, and a vertex group movement gain formula is proposed to measure the most beneficial vertex group and move it to the target partition for optimization. This study evaluates the performance and efficiency of the ED-IDGP algorithm from different perspectives and metrics on real datasets.
WANG Shang-Wen , LIU Kui , LIN Bo , LI Li , Jacques KLEIN , Tegawendé François BISSYANDÉ , MAO Xiao-Guang
2024, 35(4):1841-1860. DOI: 10.13328/j.cnki.jos.006924
Abstract:Software defect localization refers to the activity of finding program elements that are related to software failure. The existing defect localization techniques, however, can only produce localization results at the function or statement level. These coarse-grained localization results can affect the efficiency and effectiveness of manual debugging and automatic software defect repair. This study focuses on the fine-grained identification of specific code tokens that lead to software defects. The study establishes abstract syntax tree paths for code tokens and proposes a fine-grained defect localization model based on a pointer neural network to predict specific code tokens of defects and specific operation behaviors of repairing the tokens. A large number of defect patch data sets in open-source projects contain a large amount of trainable data, and the paths constructed based on abstract syntax trees can effectively capture the program’s structural information. Experimental results show that the model trained in this study can accurately predict defect code tokens and is significantly better than the baseline methods based on statistics and machine learning. In addition, in order to verify that fine-grained defect localization results can contribute to automatic defect repair, two kinds of program repair processes are designed based on the fine-grained defect localization results. The processes are implemented by using code completion tools to predict the correct token or by following heuristic rules to find appropriate code repair elements. The results show that both methods can effectively solve the overfitting problem in automatic software defect repair.
SUN Jia-Ze , WEN Su-Lei , ZHENG Wei , CHEN Xiang
2024, 35(4):1861-1884. DOI: 10.13328/j.cnki.jos.006829
Abstract:Nowadays, deep neural networks (DNNs) have been widely used in various fields. However, research has shown that DNNs are vulnerable to attacks of adversarial examples (AEs), which seriously threaten the development and application of DNNs. Most of the existing adversarial defense methods need to sacrifice part of the original classification accuracy to obtain defense capability and strongly rely on the knowledge provided by the generated AEs, so they cannot balance the effectiveness and efficiency of defense. Therefore, based on manifold learning, this study proposes an origin hypothesis of AEs in attackable space from the feature space perspective and a trap-type ensemble adversarial defense network (Trap-Net). Trap-Net adds trap data to the training data based on the original model and uses the trap-type smoothing loss function to establish the seducing relationship between the target data and trap data, so as to generate trap-type networks. In order to address the problem that most adversarial defense methods sacrifice original classification accuracy, ensemble learning is used to ensemble multiple trap networks, so as to expand attackable target space defined by trap labels in the feature space and reduce the loss of the original classification accuracy. Finally, Trap-Net determines whether the input data are AEs by detecting whether the data hit the attackable target space. Experiments on MNIST, K-MNIST, F-MNIST, CIFAR-10, and CIFAR-100 datasets show that Trap-Net has strong defense generalization of AEs without sacrificing the classification accuracy of clean samples, and the results of experiments validate the adversarial origin hypothesis in attackable space. In the low-perturbation white-box attack scenario, Trap-Net achieves a detection rate of more than 85% for AEs. In the high-perturbation white-box attack and black-box attack scenarios, Trap-Net has a detection rate of almost 100% for AEs. Compared with other detection methods of AEs, Trap-Net is highly effective against white-box and black-box adversarial attacks, and it provides an efficient robustness optimization method for DNNs in adversarial environments.
ZHANG Qi-Chen , WANG Shuai , LI Jing-Mei
2024, 35(4):1885-1898. DOI: 10.13328/j.cnki.jos.006831
Abstract:Spoken language understanding (SLU), as a core component of task-oriented dialogue systems, aims to extract the semantic framework of user queries. In dialogue systems, the SLU component is responsible for identifying user requests and creating a semantic framework that summarizes user requests. SLU usually includes two subtasks: intent detection (ID) and slot filling (SF). ID is regarded as a semantic utterance classification problem that analyzes the semantics of utterance at the sentence level, while SF is viewed as a sequence labeling task that analyzes the semantics of utterance at the word level. Due to the close correlation between intentions and slots, mainstream works employ joint models to exploit shared knowledge across tasks. However, ID and SF are two different tasks with strong correlation, and they represent sentence-level semantic information and word-level information of utterances respectively, which means that the information of the two tasks is heterogeneous and has different granularities. This study proposes a heterogeneous interactive structure for joint ID and SF, which adequately captures the relationship between sentence-level semantic information and word-level information in heterogeneous information for two correlative tasks by adopting self-attention and graph attention networks. Different from ordinary homogeneous structures, the proposed model is a heterogeneous graph architecture containing different types of nodes and links because a heterogeneous graph involves more comprehensive information and rich semantics and can better interactively represent the information between nodes with different granularities. In addition, this study utilizes a window mechanism to accurately represent word-level embedding to better accommodate the local continuity of slot labels. Meanwhile, the study uses a pre-trained model (BERT) to analyze the effect of the proposed model using BERT. The experimental results of the proposed model on two public datasets show that the model achieves an accuracy of 97.98% and 99.11% on the ID task and an F1 score of 96.10% and 96.11% on the SF task, which are superior to the current mainstream methods.
SUN Fu-Ming , HU Xi-Hang , WU Jing-Yu , SUN Jing , WANG Fa-Sheng
2024, 35(4):1899-1913. DOI: 10.13328/j.cnki.jos.006833
Abstract:In recent years, RGB-D salient detection method has achieved better performance than RGB salient detection model by virtue of its rich geometric structure and spatial position information in depth maps and thus has been highly concerned by the academic community. However, the existing RGB-D detection model still faces the challenge of improving performance continuously. The emerging Transformer is good at modeling global information, while the convolutional neural network (CNN) is good at extracting local details. Therefore, effectively combining the advantages of CNN and Transformer to mine global and local information will help to improve the accuracy of salient object detection. For this purpose, an RGB-D salient object detection method based on cross-modal interactive fusion and global awareness is proposed in this study. The transformer network is embedded into U-Net to better extract features by combining the global attention mechanism with local convolution. First, with the help of the U-Net encoder-decoder structure, this study efficiently extracts multi-level complementary features and decodes them step by step to generate a salient feature map. Then, the Transformer module is used to learn the global dependency between high-level features to enhance the feature representation, and the progressive upsampling fusion strategy is used to process the input and reduce the introduction of noise information. Moreover, to reduce the negative impact of low-quality depth maps, the study also designs a cross-modal interactive fusion module to realize cross-modal feature fusion. Finally, experimental results on five benchmark datasets show that the proposed algorithm has an excellent performance than other latest algorithms.
WU Xin-Dong , ZHU Xiao-Yu , DONG Bing-Bing , JI Sheng-Wei , BU Chen-Yang
2024, 35(4):1914-1933. DOI: 10.13328/j.cnki.jos.006838
Abstract:Attendance may be for private purposes, which is not associated with an organization, such as keeping a personal travel log, or it is for business needs, which is part of organizational attendance and sometimes associated with multiple organizations. Therefore, the recording, sharing, and analysis of attendance data require elaborate management. The HAO attendance system is a lightweight and mobile attendance platform. It takes the user and organization as two starting points and is driven by HAO intelligence consisting of human intelligence (HI), artificial intelligence (AI), and organizational intelligence (OI). This study builds the knowledge graph of the HAO attendance system and puts forward the closed-loop authority management structure of the HAO attendance system, supplemented by the privacy authority management method from coarse-gained to fine-gained level to ensure refined attendance management and protect the users’ privacy, thereby promoting the intelligent transformation of a new-generation attendance system. For organizational attendance analysis, a four-element scoring method and a four-element attendance reporting method are designed to calculate employee attendance scores, generate accurate and comprehensive attendance reports, provide decision-making support for organizations, and inspire the vitality of both organizations and individuals, so as to build intelligent organizations with organizational intelligence.
Lü Shen-Huan , CHEN Yi-He , JIANG Yuan
2024, 35(4):1934-1944. DOI: 10.13328/j.cnki.jos.006841
Abstract:In multi-label learning, each sample is associated with multiple labels. The key task is how to use the correlation between labels when building the model. Multi-label deep forest (MLDF) algorithm attempts to mine the correlation between labels by using layer-by-layer representation learning under the framework of deep ensemble learning and use the obtained label probability representation to improve prediction accuracy. However, on the one hand, the label probability representation is highly correlated with the label information, which will lead to its low diversity. As the depth of the deep forest increases, the performance will decline. On the other hand, the calculation of label probability requires the storage of forest structures with all layers and the application of these structures one by one in the test stage, which will cause unbearable computational and storage overhead. To solve these problems, this study proposes interaction-representation-based MLDF (iMLDF). iMLDF mines the structural information in the feature space from the decision path of the forest model, extracts the feature interaction in the decision tree path by using the random interaction trees, and obtains two interaction representations of feature confidence score and label probability distribution, respectively. On the one hand, iMLDF makes full use of the feature structural information in the forest model to enrich the relevant information between labels. On the other hand, it calculates all the representations through interaction expressions so that the algorithm does not need to store all the forest structures, which greatly improves computational efficiency. The experimental results show that iMLDF algorithm achieves better prediction performance, and the computational efficiency is improved by an order of magnitude compared with MLDF for datasets with massive samples.
2024, 35(4):1945-1963. DOI: 10.13328/j.cnki.jos.006843
Abstract:As a new granular computing model, partition order product space can describe and solve problems from multiple views and levels. Its problem solving space is a lattice structure composed of multiple problem solving levels, and each problem solving level is composed of multiple one-level views. How to choose the problem solving level in the partition order product space is an NP-hard problem. Therefore, this study proposes a two-stage adaptive genetic algorithm (TSAGA) to find the problem solving level. First, real encoding is used to encode the problem solving level, and then the fitness function is defined according to the classification accuracy and granularity of the problem solving level. The first stage of the algorithm is based on a classical genetic algorithm, and some excellent problem solving levels are pre-selected as part of the initial population of the second stage, so as to optimize the problem solving space. In the second stage of the algorithm, an adaptive selection operator, adaptive crossover operator, and adaptive large-mutation operator are proposed, which can dynamically change with the number of iterations of the current population evolution, so as to further select the problem solving level in the optimized problem solving space. Experimental results demonstrate the effectiveness of the proposed method.
JIANG Lu-Yu , OUYANG Dan-Tong , DONG Bo-Wen , ZHANG Li-Ming
2024, 35(4):1964-1979. DOI: 10.13328/j.cnki.jos.006845
Abstract:Enumerating minimal unsatisfiable subsets (MUS) is an important subproblem in the Boolean satisfiability problem. For an unsatisfiable problem, the MUS enumeration can reflect the key factors resulting in its unsatisfiability. However, enumerating MUS is extremely time-consuming, and different pruning schemes will directly affect the size of the search space and the total number of iterations, thus affecting the algorithm efficiency. This study proposes a novel enhanced pruning scheme, accelerating by critical MSS (ABC), to accelerate the MUS enumeration. According to the relationship among maximal satisfiable subsets (MSS), minimal correction sets (MCS), and MUS, the concepts of cMSS and subMUS are put forward. Additionally, four properties are summarized, namely that each MUS must be a superset of subMUS, and then the feature that MUS and MCS are mutually hitting sets can be effectively employed to avoid the time cost in solving hitting sets of MCS. When the subMUS is unsatisfiable, it will be the only MUS, and the algorithm will terminate in advance; otherwise, the node representing subMUS will be pruned to effectively avoid searching the non-solution space. Meanwhile, the effectiveness of the proposed ABC scheme is proven by theorem, which has been applied to the state-of-the-art algorithms MARCO and MARCO-MAM, respectively. Experimental results on SAT11 MUS benchmarks show the proposed scheme can effectively prune the search space to improve the enumeration efficiency of MUS.
HUANG Ming , ZHANG Sha-Sha , HONG Chun-Lei , ZENG Le , XIANG Ze-Jun
2024, 35(4):1980-1992. DOI: 10.13328/j.cnki.jos.006839
Abstract:As an automatic search tool, mixed integer linear programming (MILP) is widely used to search for differential, linear, integral, and other cryptographic properties of block ciphers. In this study, a new technique of constructing MILP models based on a dynamic selection strategy is proposed, which uses different constraint inequalities to describe the propagation of cryptographic properties under different conditions. Specifically, according to the different Hamming weights of the input division property, this study adopts different methods to construct MILP models of the division property propagation with linear layers. Finally, this technique is applied to search for integral distinguishers of uBlock and Saturnin algorithms. The experimental results show that the proposed technique can obtain an 8-round integral distinguisher with 32 more balance bits than the previous optimal integral distinguisher for the uBlock128 algorithm. In addition, this study gets 9- and 10-round integral distinguishers for uBlock128 and uBlock256 algorithms which are one round longer than the previous optimal integral distinguishers. For the Saturnin256 algorithm, the study finds a 9-round integral distinguisher which is one round longer than the previous optimal integral distinguisher.
2024, 35(4):1993-2021. DOI: 10.13328/j.cnki.jos.006939
Abstract:Internet transport-layer protocols rely on the feedback information provided by the acknowledgment (ACK) mechanism to achieve functions such as congestion control and reliable transmission. According to the evolution of Internet transmission protocols, the ACK mechanisms of transmission control are reviewed. The unsolved problems among the mechanisms are discussed. Based on the elements of “type-trigger-information”, the ACK mechanism based on demand and its design principle are proposed, and the coupling relationship between the ACK mechanism and other transmission protocol submodules (e.g., congestion control, packet loss recovery, etc.) is emphatically analyzed. Subsequently, according to the design principle, the TACK mechanism, a feasible ACK mechanism based on demand, is elaborated, and relative concepts are systematically clarified. Finally, several meaningful research directions are provided according to the challenges encountered by the ACK mechanism based on demand.
TONG Qing-Shan , KANG Wen-Hui , FU Qiang , HUANG Jin , TIAN Feng , DAI Guo-Zhong
2024, 35(4):2022-2038. DOI: 10.13328/j.cnki.jos.006909
Abstract:With the popularity of touch devices, pen + touch inputs have become mainstream input modes for mobile officing. However, existing applications mainly take one of them as input, which limits users’ interaction space. In addition, existing pen + touch research mainly focuses on serial pen + touch cooperation and parallel processing of specific interactive tasks and does not systematically consider parallel cooperation mechanism and intention correlation between different inputs. This study first proposes an interaction model based on pen + touch inputs and then defines pen + touch interaction primitives according to users’ behavioral habits in pen + touch cooperation, so as to extend pen + touch interaction space. Furthermore, by using a partially observable Markov decision process (POMDP), the study develops a method of extracting pen + touch input intentions based on time sequence information, so as to incrementally extract the interaction intention of polysemantic interaction primitives. Finally, the study evaluates the advantages of pen + touch inputs through a user experiment.
HE Jian-Hang , SUN Jun-Yao , LIU Qiong
2024, 35(4):2039-2054. DOI: 10.13328/j.cnki.jos.006837
Abstract:Depth ambiguity is an important challenge for multi-person three-dimensional (3D) pose estimation of single-frame images, and extracting contexts from an image has great potential for alleviating depth ambiguity. Current top-down approaches usually model key point relationships based on human detection, which not only easily results in key point shifting or mismatching but also affects the reliability of absolute depth estimation using human scale factor because of a coarse-grained human bounding box with large background noise. Bottom-up approaches directly detect human key points from an image and then restore the 3D human pose one by one. However, the approaches are at a disadvantage in relative depth estimation although the scene context can be obtained explicitly. This study proposes a new two-branch network, in which human context based on key point region proposal and scene context based on 3D space are extracted by top-down and bottom-up branches, respectively. The human context extraction method with noise resistance is proposed to describe the human by modeling key point region proposal. The dynamic sparse key point relationship for pose association is modeled to eliminate weak connections and reduce noise propagation. A scene context extraction method from a bird’s-eye-view is proposed. The human position layout in 3D space is obtained by modeling the image’s depth features and mapping them to a bird’s-eye-view plane. A network fusing human and scene contexts is designed to predict absolute human depth. The experiments are carried out on public datasets, namely MuPoTS-3D and Human3.6M, and results show that compared with those by the state-of-the-art models, the relative and absolute position accuracies of 3D key points by the proposed HSC-Pose are improved by at least 2.2% and 0.5%, respectively, and the position error of mean roots of the key points is reduced by at least 4.2 mm.
WU Sheng-Yao , WANG Feng , WU Yan-Jun , LING Xiang , QU Sheng , LUO Tian-Yue , WU Jing-Zheng
2024, 35(4):2055-2075. DOI: 10.13328/j.cnki.jos.006900
Abstract:Log is an important carrier of a computer system, which records the states of events, and a log system is responsible for log generation, collection, and output. OpenHarmony is a new open-source, distributed operating system for smart devices in all scenarios of a fully-connected world. Prior to the work described in this study, many key subsystems of OpenHarmony, including the log system, had not been built. The open-source feature of OpenHarmony enables third-party developers to contribute core codes. To solve the problem of the lack of a log system of OpenHarmony, this paper mainly does the following work: ① It analyzes the technical architecture, advantages, and disadvantages of today’s popular log systems. ② It clarifies the model specifications of the log system HiLog according to the interconnection feature of heterogeneous devices in OpenHarmony. ③ It designs and implements the first log system HiLog of OpenHarmony and contributes it to the OpenHarmony trunk. ④ It conducts comparative experiments on the key indicators of HiLog. The experimental data show that in terms of basic performance, the throughput of HiLog and Log is 1500 KB/s and 700 KB/s, respectively, which indicates that HiLog has a 114% improvement over the log system of Android. In terms of log persistence, the packet loss of HiLog is less than 6‰ with a compression rate of 3.5% for persistency, much lower than that of Log. In addition, HiLog also has some novel practical functions such as data protection and flow control.
OUYANG Xiang-Zhen , ZHU Yi-An , SHI Xian-Chen
2024, 35(4):2076-2098. DOI: 10.13328/j.cnki.jos.006830
Abstract:Dynamic memory allocators are fundamental components of modern applications. They manage free memory and handle user memory requests. Modern general-purpose dynamic memory allocators ensure the balance of performance and memory footprint. However, in view of different memory footprints and optimization goals in application scenarios, a general-purpose memory allocator is not the optimal solution. Special-purpose memory allocators for specific application scenarios usually can better satisfy system requirements. However, they are time-consuming and error-prone to implement. Developers often use the memory allocation framework to build special-purpose dynamic memory allocators. However, the existing memory allocator framework has the problems of poor abstraction ability and insufficient composability and customizability. For this reason, this study proposes a composable and customizable dynamic memory allocator framework, namely mortise, based on function composability by reviewing the dynamic memory allocation process from the perspective of functional programming. The framework abstracts system memory allocation as a composition of hierarchical functions of several multiple decoupled memory allocations, and these functions can provide policies to ensure higher customizability and composability. Mortise is implemented by using standard C. To achieve zero performance overhead of hierarchical function composition, mortise uses the metaprogramming features offered by the C preprocessor. Developers can quickly build a memory allocator for targeted application scenarios by composing and customizing the hierarchical function of allocators. In order to prove the effectiveness of mortise, this study presents three different memory allocator instances, namely tlsfcc, hslab, and wfslab, by using mortise. Specifically, tlsfcc is designed for multi-core embedded application scenarios, which improves the parallel throughput by replacing the synchronization strategy; hslab is a core-aware slab-type allocator, which optimizes performance on heterogeneous hardware by customizing thread cache; wfslab is a low-latency and wait-free/lock-free allocator. This study runs benchmarks to compare these allocators with several existing memory allocators. The experiments are carried out on an 8-core x86/64 platform and an 8-core heterogeneous aarch64 embedded platform, and the experimental results show that tlsfcc achieves a mean speedup of 1.76 and 1.59 on the two platforms compared with the original tlsf allocator; hlsab achieves only 69.6% and 85.0% execution time compared with the tcmalloc with a similar architecture; the worst-case memory request latency of wfslab is the smallest among all memory allocators in the experiment, including the state-of-art lock-free memory allocators: mimalloc and snmalloc.