YAO Han-Tao , YU Lu , XU Chang-Sheng
2024, 35(5):2101-2119. DOI: 10.13328/j.cnki.jos.007022 CSTR:
Abstract:In real scenarios, the application often faces the problems of data scarcity and dynamic data changes. Few-shot incremental learning aims to use a small amount of data to infer data knowledge and reduce the model’s catastrophic forgetting of old knowledge. Existing few-shot incremental learning algorithms (CEC, FACT, etc.) mainly use visual features to adjust the feature encoder or classifier, so as to achieve the model’s transfer to new data and anti-forgetting of old data. However, the visual features of a small amount of data are often difficult to model a complete feature distribution of a class, resulting in weak generalization ability of the above algorithms. Compared with visual features, the text features of image class descriptions have better generalization and anti-forgetting abilities. Therefore, based on the visual language model (VLM), this study investigates the few-shot incremental learning based on textual knowledge embedding and realizes the effective learning of new and old class data in few-shot incremental learning by embedding text features with anti-forgetting ability in visual features. Specifically, in the basic learning stage, the study uses the VLM to extract the pre-trained visual features and class text descriptions of the image. Furthermore, the study uses the text encoder to project the pre-trained visual features to text space. Next, the study uses the visual encoder to fuse the learned text features and pre-trained visual features to abstract visual features with high discrimination ability. In the incremental learning stage, the study proposes the class space-guided anti-forgetting learning and uses the class space encoding of old data and new data features to fine-tune the visual encoder and text encoder, so as to achieve new data knowledge learning while reviewing old knowledge. This study also verifies the effectiveness of the algorithm on four datasets (CIFAR-100, CUB-200, Car-196, and miniImageNet), proving that textual knowledge embedding based on VLM can further improve the robustness of few-shot incremental learning on the basis of visual features.
YIN Meng-Ran , LIANG Mei-Yu , YU Yang , CAO Xiao-Wen , DU Jun-Ping , XUE Zhe
2024, 35(5):2120-2132. DOI: 10.13328/j.cnki.jos.007021 CSTR:
Abstract:Recently, a new task named cross-modal video corpus moment retrieval (VCMR) has been proposed, which aims to retrieve a small video segment corresponding to a query statement from an unsegmented video corpus. The key point of the existing cross-modal video text retrieval work is the alignment and fusion of different modal features. However, simply performing cross-modal alignment and fusion cannot ensure that semantically similar data from the same modal remain close under the joint feature space, and the semantics of query statements are not considered. To solve the above problems, this study proposes a query-aware cross-modal dual contrastive learning network for multi-modal video moment retrieval (QACLN), which achieves the unified semantic representation of different modal data by combining cross-modal and intra-modal contrastive learning. First, the study proposes a query-aware cross-modal semantic fusion strategy, obtaining the query-aware multi-modal joint representation of the video by adaptively fusing multi-modal features such as visual modal features and caption modality features of the video according to the aware query semantics. Then, a cross-modal and intra-modal dual contrastive learning mechanism for video and text query is proposed to enhance the semantic alignment and fusion of different modalities, which can improve the discriminability and semantic consistency of data representations of different modalities. Finally, the 1D convolution boundary regression and cross-modal semantic similarity calculation are employed to perform moment localization and video retrieval. Extensive experiments demonstrate that the proposed QACLN outperforms the benchmark methods.
ZHAO En-Yuan , SONG Ning , NIE Jie , WANG Xin , ZHENG Cheng-Yu , WEI Zhi-Qiang
2024, 35(5):2133-2149. DOI: 10.13328/j.cnki.jos.007025 CSTR:
Abstract:Remote sensing visual question answering (RSVQA) aims to extract scientific knowledge from remote sensing images. In recent years, many methods have emerged to bridge the semantic gap between remote sensing visual information and natural language. However, most of these methods only consider the alignment and fusion of multimodal information, ignoring the deep mining of multi-scale features and their spatial location information in remote sensing image objects and lacking research on modeling and reasoning about scale features, thus resulting in incomplete and inaccurate answer prediction. To address these issues, this study proposes a multi-scale-guided fusion inference network (MGFIN), which aims to enhance the visual spatial reasoning ability of RSVQA systems. First, the study designs a multi-scale visual representation module based on Swin Transformer to encode multi-scale visual features embedded with spatial position information. Second, guided by language clues, the study uses a multi-scale relation reasoning module to learn cross-scale higher-order intra-group object relations with scale space as clues and performs spatial hierarchical inference. Finally, this study designs the inference-based fusion module to bridge the multimodal semantic gap. On the basis of cross-attention, training goals such as self-supervised paradigms, contrastive learning methods, and image-text matching mechanisms are used to adaptively align and fuse multimodal features and assist in predicting the final answer. Experimental results show that the proposed model has significant advantages on two public RSVQA datasets.
YU Kai , BIN Yi , ZHENG Zi-Qiang , YANG Yang
2024, 35(5):2150-2164. DOI: 10.13328/j.cnki.jos.007024 CSTR:
Abstract:Text-to-image generation achieves excellent visual results but suffers from the problem of insufficient detail representation. This study proposes the conditional semantic augmentation generative adversarial network (CSA-GAN). The model first encodes the text and processes it using conditional semantic augmentation. It then extracts the intermediate features of the generator for up-sampling and generates the image mask through a two-layer convolutional neural network (CNN). Finally, the text coding is sent to two perceptrons for processing and fusing with the mask, so as to fully integrate the image spatial and text semantics features to improve the detail representation. In order to verify the quality of the generated images of this model, quantitative and qualitative analyses are conducted on different datasets. This study employs inception score (IS) and Frechet inception distance (FID) metrics to quantitatively evaluate the image clarity, diversity, and natural realism of the images. The qualitative analyses include the visualization of the generated images and the analysis of specific modules of the ablation experiment. The results show that the proposed model is superior to the state-of-the-art works in recent years. This fully verifies that the proposed method has better performance and can optimize the expression of main feature details in the image generation process.
WANG Rui-Qi , CHENG Hao-Nan , YE Long
2024, 35(5):2165-2175. DOI: 10.13328/j.cnki.jos.007027 CSTR:
Abstract:Visually guided binaural audio generation is one of the important tasks with wide application value in multimodal learning. The goal of the task is to generate binaural audio that conforms to audiovisual consistency with the given visual modal information and mono audio modal information. The existing visually guided binaural audio generation methods have unsatisfactory binaural audio generation effects due to insufficient utilization of audiovisual information in the encoding stage and neglect of shallow features in the decoding stage. To solve the above problems, this study proposes a visually guided binaural audio generation method based on hierarchical feature encoding and decoding, which effectively improves the quality of binaural audio generation. In order to effectively narrow the heterogeneous gap that hinders the association and fusion of audiovisual modal data, an encoder structure based on hierarchical coding and fusion of audiovisual features is proposed, which improves the comprehensive utilization efficiency of audiovisual modal data in the encoding stage. In order to realize the effective use of shallow structural feature information in the decoding process, a decoder structure with a skip connection between different depth feature layers from deep to shallow is constructed, which realizes the full use of shallow detail features and depth features of audiovisual modal information. Benefiting from the efficient use of audiovisual information and the hierarchical combination of deep and shallow structural features, the proposed method can effectively deal with binaural audio generation in complex visual scenes. Compared with the existing methods, the generation performance of the proposed method is improved by over 6% in terms of realism.
ZHANG Yi , Lü Jia-Yi , LAN Xing , XUE Jian
2024, 35(5):2176-2191. DOI: 10.13328/j.cnki.jos.007029 CSTR:
Abstract:As a critical task in computer vision and animation, facial reconstruction can provide 3D model structures and rich semantic information for multi-modal facial applications. However, monocular 2D facial images lack depth information and the parameters of the predicted facial model are not reliable, which causes poor reconstruction results. This study proposes to employ facial action unit (AU) and facial keypoints which are highly correlated with model parameters as a bridge to guide the regression of model-related parameters and thus solve the ill-posed monocular facial reconstruction. Based on existing facial reconstruction datasets, this study provides a complete semi-automatic labeling scheme for facial AUs and constructs a 300W-LP-AU dataset. Furthermore, a 3D facial reconstruction algorithm based on AU awareness is put forward to realize end-to-end multi-tasking learning and reduce the overall training difficulty. Experimental results show that it improves the facial reconstruction performance, with high fidelity of the reconstructed facial model.
QIANG Wei , DU Yu , LI Xin-Jin , FAN Xiang-Min , SU Wen , CHEN Hai-Bo , SUN Wei , TIAN Feng
2024, 35(5):2192-2207. DOI: 10.13328/j.cnki.jos.007028 CSTR:
Abstract:Parkinson’s disease is a widespread neurodegenerative disease that slowly impairs the motor and certain cognitive functions of patients. It is insidious and incurable and can cause a significant burden on sufferers and their families. However, clinical diagnosis of Parkinson’s disease typically relies on subjective rating scales, which can be influenced by the examinee’s recall bias and assessor subjectivity. Numerous researchers have investigated the physiological aspects of Parkinson’s disease from multiple modalities and have provided objective and quantifiable tools for auxiliary diagnosis. However, given the diversity of neurodegenerative diseases and the similarities in their effects, it remains a problem among unimodal methods built upon the representations of Parkinson’s disease to identify the disease uniquely. To address this issue, a multimodal auxiliary diagnosis system comprising the paradigms that evoke aberrant behaviors of Parkinson’s disease is developed in this study. First, parametric tests of the features are performed based on the results of the normal distribution test, and statistically significant feature sets are constructed (p<0.05). Second, multimodal data are collected from 38 cases in a clinical setting using the MDS-UPDRS scale. Finally, the significance of different feature combinations for the assessment of Parkinson’s disease is analyzed based on gait and eye movement modalities; the high immersion triggered task paradigm and the multimodal Parkinson’s disease auxiliary diagnosis system are validated in virtual reality scenarios. It is worth noting that it only takes 2–4 tasks for the combination of gait and eye movement modalities to obtain an average AUC of 0.97 and accuracy of 0.92.
CHEN Hao-Nan , ZHU Ying-Ying , ZHAO Jun-Qi , TIAN Qi
2024, 35(5):2208-2219. DOI: 10.13328/j.cnki.jos.007026 CSTR:
Abstract:To make full use of the local spatial relation between point cloud and multi-view data to further improve the accuracy of 3D shape recognition, a 3D shape recognition network based on multimodal relation is proposed. Firstly, a multimodal relation module (MRM) is designed, which can extract the relation information between the local features of any point cloud and that of any multi-view to obtain the corresponding relation features. Then, a cascade pooling consisting of maximum pooling and generalized mean pooling is applied to process the relation tensor and obtain the global relation feature. There are two types of multimodal relation modules, which output the point-view relation feature and the view-point relation feature, respectively. The proposed gating module adopts a self-attention mechanism to find the relation information within the features so that the aggregated global features can be weighted to achieve the suppression of redundant information. Extensive experiments show that the MRM can make the network obtain stronger representational ability; the gating module can allow the final global feature more discriminative and boost the performance of the retrieval task. The proposed network achieves 93.8% and 95.0% classification accuracy, as well as 90.5% and 93.4% average retrieval precision on two standard 3D shape recognition datasets (ModelNet40 and ModelNet10k), respectively, which outperforms the existing works.
SUN Shang-Quan , REN Wen-Qi , CAO Xiao-Chun
2024, 35(5):2220-2234. DOI: 10.13328/j.cnki.jos.007023 CSTR:
Abstract:In recent years, digital video shooting equipment has been continuously upgraded. Although the improvement of the latitude of its image sensor and shutter rate has greatly enriched the diversity of the scene that can be photographed, the degraded factors such as rain streaks caused by raindrops passing through the field of view at high speed are also easier to be recorded. The dense rain streaks in the foreground block the effective information of the background scene, thus affecting the effective acquisition of images. Therefore, video image deraining becomes an urgent problem to be solved. The previous video deraining methods focus on using the information of conventional images themselves. However, due to the physical limit of the image sensors of conventional cameras, the constraints of the shutter mechanism, etc., much optical information is lost during video acquisition, which affects the subsequent video deraining effect. Therefore, taking advantage of the complementarity of event data and conventional video information, as well as the high dynamic range and high temporal resolution of event information, this study proposes a video deraining network based on event data fusion, spatial attention, and temporal memory, which uses three-dimensional alignment to convert the sparse event stream into an expression form that matches the size of the image and superimposes the input to the event-image fusion module that integrates the spatial attention mechanism, so as to effectively extract the spatial information of the image. In addition, in continuous frame processing, the inter-frame memory module is used to utilize the previous frame features, which are finally constrained by the three-dimensional convolution and two loss functions. The video deraining method is effective on the publicly available dataset and meets the standard of real-time video processing.
CUI Zhan-Qi , YANG Hui-Wen , CHEN Xiang , WANG Lin-Zhang
2024, 35(5):2235-2267. DOI: 10.13328/j.cnki.jos.007046 CSTR:
Abstract:Smart contracts are computer programs running in the contract layer of the blockchain, which can be used to manage cryptocurrencies and data on the blockchain, realize diverse business logic, and expand the application of the blockchain. A large number of assets are stored in smart contracts, which attract attackers to steal the assets and obtain economic benefits via security vulnerabilities. In recent years, with the frequent occurrence of smart contract security incidents (such as TheDAO and Parity security incidents), the security vulnerability detection technique for smart contracts has become a hot research topic. This study proposes a research framework for detecting security vulnerabilities of smart contracts and analyzes the research progress of existing vulnerability detection techniques from three aspects: vulnerability discovery and identification, vulnerability analysis and detection, and dataset and evaluation indicators. Firstly, the basic process of collecting security vulnerability information is sorted out, and the security vulnerabilities are classified into 13 types according to their basic characteristics. A classification framework for security vulnerabilities of smart contracts is proposed. Secondly, existing techniques are studied in terms of symbolic execution, fuzzing testing, machine learning, formal verification, and static analysis, and the advantages and limitations of each technique are analyzed. Thirdly, the commonly used datasets and evaluation indicators are summarized. Finally, potential research directions for security vulnerability detection of smart contracts in the future are discussed.
CHEN Ke , LU Hui , FANG Bin-Xing , SUN Yan-Bin , SU Shen , TIAN Zhi-Hong
2024, 35(5):2268-2288. DOI: 10.13328/j.cnki.jos.007038 CSTR:
Abstract:Penetration testing is an important means to discover the weaknesses of significant network information systems and protect network security. Traditional penetration testing relies heavily on manual labor and has high technical requirements for testers, limiting the popularization depth and breadth. By introducing artificial intelligence technology into the whole penetration testing process, automated penetration testing lowers the technical threshold of penetration testing based on greatly solving the problem of heavy dependence on manual labor. Automated penetration testing can be mainly divided into model-based and rule-based automated penetration testing, and the research of the two has their respective focuses. The former utilizes model algorithms to simulate hacker attacks with attention paid to attack scene perception and attack decision-making models. The latter concentrates on how to efficiently adapt attack rules and attack scenarios. This study mainly analyzes the implementation principles of automated penetration testing from three aspects of attack scenario modeling, penetration testing modeling, and decision-making reasoning model. Finally, the future development direction of automated penetration is explored from the dimensions of attack-defense confrontation and vulnerability combination utilization.
ZHANG Zhuo , LEI Yan , MAO Xiao-Guang , XUE Jian-Xin , CHANG Xi
2024, 35(5):2289-2306. DOI: 10.13328/j.cnki.jos.006961 CSTR:
Abstract:Fault localization collects and analyzes the runtime information of test case sets to evaluate the suspiciousness of each statement of being faulty. Test case sets are constructed by the data from the input domain and have two types, i.e., passing test cases and failing ones. Since failing test cases generally account for a very small portion of the input domain, and their distribution is usually random, the number of failing test cases is much fewer than that of passing ones. Previous work has shown that the lack of failing test cases leads to a class-imbalanced problem of test case sets, which severely hampers fault localization effectiveness. To address this problem, this study proposes a model-domain data augmentation approach using generative adversarial network for fault localization. Based on the model domain (i.e., spectrum information of fault localization) rather than the traditional input domain (i.e., program input), this approach uses the generative adversarial network to synthesize the model-domain failing test cases covering the minimum suspicious set, so as to address the class-imbalanced problem from the model domain. The experimental results show that the proposed approach significantly improves the effectiveness of 12 representative fault localization approaches.
DONG Yan-Song , LIU Yue-Hao , DONG Xu-Qian , ZHAO Liang , TIAN Cong , YU Bin , DUAN Zhen-Hua
2024, 35(5):2307-2324. DOI: 10.13328/j.cnki.jos.006967 CSTR:
Abstract:With the rapid development of neural network technology, neural networks have been widely applied in safety-critical fields such as autonomous driving, intelligent manufacturing, and medical diagnosis. Thus, it is crucial to ensure the trustworthiness of neural networks. However, due to the vulnerability of neural networks, slight perturbation often leads to wrong results. Therefore, it is vital to use formal verification methods to ensure the safety and trustworthiness of neural networks. Current verification methods for neural networks are mainly concerned with the accuracy of the analysis, while apt to ignore operational efficiency. When verifying the safety properties of complex networks, the large-scale state space may lead to problems such as infeasibility or unsolvability. To reduce the state space of neural networks and improve the verification efficiency, this study presents a formal verification method for neural networks based on divide and conquer considering over-approximation errors. The method uses the reachability analysis technique to calculate the upper and lower bounds of nonlinear nodes and uses an improved symbolic linear relaxation method to reduce over-approximation errors during the boundary calculation of nonlinear nodes. The constraints of nodes are refined by calculating the direct and indirect effects of their over-approximation errors. Thereby, the original verification problem is split into a set of sub-problems whose mixed integer linear programming (MILP) formulation has a smaller number of constraints. The method is implemented as a tool named NNVerifier, whose properties are verified and evaluated through experiments on four ReLU-based fully-connected benchmark networks trained on three classic datasets. The experimental results show that the verification efficiency of the NNVerifier is 37.18% higher than that of the existing complete verification methods.
ZHANG Zhuo , LIU Ye-Peng , XUE Jian-Xin , YAN Meng , CHEN Jia-Chi , MAO Xiao-Guang
2024, 35(5):2325-2339. DOI: 10.13328/j.cnki.jos.006989 CSTR:
Abstract:The smart contract is a decentralized application widely deployed on the blockchain platform, e.g., Ethereum. Due to the economic attributes, the vulnerabilities in smart contracts can potentially cause huge financial losses and destroy the stable ecology of Ethereum. Thus, it is crucial to detect the vulnerabilities in smart contracts before they are deployed to Ethereum. The existing smart contract vulnerability detection methods (e.g., Oyente and Secure) are mostly based on heuristic algorithms. The reusability of these methods is weak in different application scenarios. In addition, they are time-consuming and with low accuracy. In order to improve the effectiveness of vulnerability detection, this study proposes Scruple: a smart contract timestamp vulnerability detection approach based on learning data-flow path. It first obtains all possible propagation chains of timestamp vulnerabilities, then refines the propagation chains, uses a graph pre-training model to learn the relationship in the propagation chains, and finally detects whether a smart contract has timestamp vulnerabilities using the learned model. Compared with the existing detection methods, Scruple has a stronger vulnerability capture ability and generalization ability. Meanwhile, learning the propagation chain is not only well-directed but also can avoid an unnecessarily deep hierarchy of programs for the convergence of vulnerabilities. To verify the effectiveness of Scruple, this study uses real-world distinct smart contracts to compare Scruple with 13 state-of-the-art smart contract vulnerability detection methods. The experimental results show that Scruple can achieve 96% accuracy, 90% recall, and 93% F1-score in detecting timestamp vulnerabilities. In other words, the average improvement of Scruple over 13 methods using the three metrics is 59%, 46%, and 57% respectively. It means that Scruple has substantially improved in detecting timestamp vulnerabilities.
LIU Bao-Chuan , ZHANG Li , LIU Zhen-Wei , JIANG Jing
2024, 35(5):2340-2358. DOI: 10.13328/j.cnki.jos.006992 CSTR:
Abstract:GitHub is a well-known open-source software development community that supports developers using the issue tracking system in each open-source project on GitHub to address issues. During the discussion of an issue about a defect, the developer may point out issues from other projects correlated to the defect, which are called cross-project issues, so as to provide reference information for fixing the defect. However, there are more than 200 million open-source projects and 1.2 billion issues on the GitHub platform, making it time-consuming to identify and acquire cross-project issues manually. This study presents a cross-project issue recommendation method CPIRecom for open-source software defects. This study builds a pre-selection set by filtering issues based on the number of historical issue pairs and the time interval for reporting issues. Then, the study also proposes an accurate recommendation model, which extracts textual features based on the pre-trained model of BERT, analyzes features of projects, calculates the relevant probability between defects and issues from the pre-selection set based on a random forest classifier, and obtains the recommendation list according to the ranking. This study simulates the application of CPIRecom method on GitHub platform. The mean reciprocal rank of CPIRecom method reaches 0.603, and the Recall@5 reaches 0.715 on the simulative test set.
SHEN Li , ZHOU Wen-Hao , WANG Fei , XIAO Qian , WU Wen-Hao , ZHANG Lu-Fei , AN Hong , QI Feng-Bin
2024, 35(5):2359-2378. DOI: 10.13328/j.cnki.jos.006896 CSTR:
Abstract:The heterogeneous many-core architecture with an ultra-high energy efficiency ratio has become an important development trend of supercomputer architecture. However, the complexity of heterogeneous systems puts forward higher requirements for application development and optimization, and they face many technical challenges such as usability and programmability in the development process. The independently developed new-generation Sunway supercomputer is equipped with a homegrown heterogeneous many-core processor, SW26010Pro. To take full advantage of the performance of the new-generation many-core processors and support the development and optimization of emerging scientific computing applications, this study designs and implements an optimized compiler swLLVM oriented to the SW26010Pro platform. The compiler supports Athread and SDAA dual-mode heterogeneous programming models and provides multi-level storage hierarchy description and SIMD extensions for vector-like operations. In addition, it realizes control-flow vectorization, cost-based node combination, and compiler optimization for multi-level storage hierarchy according to the architecture characteristics of SW26010Pro. The experimental results show that the compiler optimization designed and implemented in this paper achieves significant performance improvements. The average speedup of control-flow vectorization and node combination and optimization is 1.23 and 1.11, respectively, and the memory access optimization achieves a maximum performance improvement of 2.49 times. Finally, a comprehensive evaluation of swLLVM is performed from multiple dimensions on the standard test set SPEC CPU2006. The results show that swLLVM reports an average increase of 9.04% in the performance of floating-point projects, 5.25% in overall performance, and 79.1% in compilation speed and an average decline of 0.12% in the performance of integer projects and 1.15% in the code size compared to SWGCC with the same optimization level.
ZHANG Cheng-Long , DING Shi-Fei , GUO Li-Li , ZHANG Jian
2024, 35(5):2379-2399. DOI: 10.13328/j.cnki.jos.006804 CSTR:
Abstract:Stochastic configuration network (SCN), as an emerging incremental neural network model, is different from other randomized neural network methods. It can configure the parameters of hidden layer nodes through supervision mechanisms, thereby ensuring the fast convergence performance of SCN. Due to the advantages of high learning efficiency, low human intervention, and strong generalization ability, SCN has attracted a large number of national and international scholars and developed rapidly since it was proposed in 2017. In this study, SCN research is summarized from the aspects of basic theories, typical algorithm variants, application fields, and future research directions of SCN. Firstly, the algorithm principles, universal approximation capacity, and advantages of SCN are analyzed theoretically. Secondly, typical variants of SCN are studied, such as DeepSCN, 2DSCN, Robust SCN, Ensemble SCN, Distributed SCN, Parallel SCN, and Regularized SCN. Then, the applications of SCN in different fields, including hardware implementation, computer vision, medical data analysis, fault detection and diagnosis, and system modeling and prediction are introduced. Finally, the development potential of SCN in convolutional neural network architectures, semi-supervised learning, unsupervised learning, multi-view learning, fuzzy neural network, and recurrent neural network is pointed out.
WU Shang-Xi , YIN Yu-Yang , SONG Si-Qing , CHEN Guan-Hao , SANG Ji-Tao , YU Jian
2024, 35(5):2400-2413. DOI: 10.13328/j.cnki.jos.006949 CSTR:
Abstract:Deep neural networks can be affected by well-designed backdoor attacks during training. Such attacks are an attack method that controls the model output during tests by injecting data with backdoor labels into the training set. The attacked model performs normally on a clean test set but will be misclassified as the attack target class when the backdoor labels are recognized. The currently available backdoor attack methods have poor invisibility and are still expected to achieve a higher attack success rate. A backdoor attack method based on singular value decomposition is proposed to address the above limitations. The method proposed can be implemented in two ways: One is to directly set some singular values of the picture to zero, and the obtained picture is compressed to a certain extent and can be used as an effective backdoor triggering label. The other is to inject the singular vector information of the attack target class into the left and right singular vectors of the picture, which can also achieve an effective backdoor attack. The backdoor pictures obtained in the two kinds of processing ways are basically the same as the original picture from a visual point of view. According to the experiments, the proposed method proves that singular value decomposition can be effectively leveraged in backdoor attack algorithms to attack neural networks with considerably high success rates on multiple datasets.
CHEN Jia-Yan , REN Dong-Dong , LI Wen-Bin , HUO Jing , GAO Yang
2024, 35(5):2414-2429. DOI: 10.13328/j.cnki.jos.006958 CSTR:
Abstract:Few-shot learning aims at simulating the ability of human beings to quickly learn new things with only few samples, which is of great significance for deep learning tasks when samples are limited. However, in many practical tasks with limited computing resources, the model scale may still limit a wider application of few-shot learning. This study presents a realistic requirement for lightweight tasks for few-shot learning. As a widely used auxiliary strategy in deep learning, knowledge distillation transfers knowledge between models by using additional supervised information, which has practical application in both improving model accuracy and reducing model scale. This study first verifies the effectiveness of the knowledge distillation strategy in model lightweight for few-shot learning. Then according to the characteristics of few-shot learning, two new distillation methods for few-shot learning are designed: (1) distillation based on image local features; (2) distillation based on auxiliary classifiers. Experiments on miniImageNet and TieredImageNet datasets demonstrate that the new distillation methods are significantly superior to traditional knowledge distillation in few-shot learning tasks.
XU Li-Xiang , XU Wei , CHEN En-Hong , LUO Bin , TANG Yuan-Yan
2024, 35(5):2430-2445. DOI: 10.13328/j.cnki.jos.007039 CSTR:
Abstract:Graph neural network (GNN) is a framework for directly characterizing graph structured data by deep learning, and has caught increasing attention in recent years. However, the traditional GNN based on message passing aggregation (MP-GNN) ignores the smoothing speed of different nodes and aggregates the neighbor information indiscriminately, which is prone to the over-smoothing phenomenon. Thus, this study proposes a graph kernel neural network classification method KENN based on linear structural entropy. KENN firstly adopts the graph kernel method to encode node subgraph structure, determines isomorphism among subgraphs, and then utilizes the isomorphism coefficient to define the smoothing coefficient among different neighbors. Secondly, it extracts the graph structural information based on the low-complexity linear structural entropy to deepen and enrich the structural expression capability of the graph data. This study puts forward a graph kernel neural network classification method by deeply integrating linear structural entropy, graph kernel and GNN, which can solve the sparse node features of biomolecular data and information redundancy generated by leveraging node degree as features in social network data. It also enables the GNN to adaptively adjust its ability to characterize the graph structural features and makes GNN beyond the upper bound of MP-GNN (WL test). Finally, experiments on seven public graph classification datasets verify that the proposed model outperforms other benchmark models.
YAN Ming-Shi , CHENG Zhi-Yong , SUN Jing , WANG Fa-Sheng , SUN Fu-Ming
2024, 35(5):2446-2465. DOI: 10.13328/j.cnki.jos.006897 CSTR:
Abstract:Multi-behavior recommendation aims to utilize interactive data from multiple behaviors of users to improve recommendation performance. Existing multi-behavior recommendation methods generally directly exploit the multi-behavior data for the shared initialized user representations and involve the mining of user preferences and modeling of relationships among different behaviors in the tasks. However, these methods ignore the data imbalance under different interactive behaviors (the amount of interactive data varies greatly among different behaviors) and the information loss caused by the adaptation to the above two tasks. User preferences refer to the interests that users exhibit in different behaviors (e.g., browsing preferences), and the relationship among behaviors indicates a potential conversion from one behavior to another behavior (e.g., the conversion from browsing to purchasing). In multi-behavior recommendation, the mining of user preferences and the modeling of relationships among different behaviors can be regarded as a two-stage task. On the basis of the above considerations, the model of two-stage learning for multi-behavior recommendation (TSL-MBR for short) is proposed, which decouples the above two tasks with a two-stage strategy. In particular, the model retains the end-to-end structure and learns the two tasks by alternating training with fixed parameters. The first stage is to model user preferences under different behaviors. In this stage, the interactive data from all behaviors (without distinction as to behavior type) are first used to model the global preferences of users to alleviate the problem of data sparsity to the greatest extent. Then, the interactive data of each behavior are used to refine the behavior-specific user preference (local preference) and thus lessen the influence of the data imbalance among different behaviors. The second stage is to model the relationships among different behaviors. In this stage, the mining of user preferences and modeling of relationships among different behaviors are decoupled to relieve the information loss problem caused by adaptation to the two tasks. This two-stage model significantly improves the system’s ability to predict target behaviors. Extensive experimental results show that TSL-MBR can substantially outperform the state-of-the-art baseline models, achieving 103.01% and 33.87% of relative gains on average over the best baseline on the Tmall and Beibei datasets, respectively.
2024, 35(5):2466-2484. DOI: 10.13328/j.cnki.jos.006899 CSTR:
Abstract:Knowledge space theory, which uses mathematical language for the knowledge evaluation and learning guide of learners, belongs to the research field of mathematical psychology. Skills and problems are the two basic elements of knowledge space, and an in-depth study of the relationship between them is the inherent requirement of knowledge state description and knowledge structure analysis. In the existing knowledge space theory, no explicit bi-directional mapping between skills and problems has been established, which makes it difficult to put forward a knowledge structure analysis model under intuitive conceptual meanings. Moreover, the partial order relationship between knowledge states has not been clearly obtained, which is not conducive to depicting the differences between knowledge states and planning the learning path of learners. In addition, the existing achievements mainly focus on the classical knowledge space, without considering the uncertainties of data in practical problems. To this end, this study introduces formal concept analysis and fuzzy sets into knowledge space theory and builds the fuzzy concept lattice models for knowledge structure analysis. Specifically, fuzzy concept lattice models of knowledge space and closure space are presented. Firstly, the fuzzy concept lattice of knowledge space is constructed, and it is proved that the extents of all concepts form a knowledge space by the upper bounds of any two concepts. The idea of granule description is introduced to define the skill-induced atomic granules of problems, whose combinations can help determine whether a combination of problems is a state in the knowledge space. On this basis, a method to obtain the fuzzy concepts in the knowledge space from the problem combinations is proposed. Secondly, the fuzzy concept lattice of closure space is established, and it is proved that the extents of all concepts form the closure space by the lower bounds of any two concepts. Similarly, the problem-induced atomic granules of skills are defined, and their combinations can help determine whether a skill combination is the skills required by a knowledge state in the closure space. In this way, a method to obtain the fuzzy concepts in the closure space from the skill combinations is presented. Finally, the effects of the number of problems, the number of skills, the filling factor, and the analysis scale on the sizes of knowledge space and closure space are analyzed by some experiments. The results show that the fuzzy concepts in the knowledge space are different from any existing concept and cannot be derived from other concepts. The fuzzy concepts in the closure space are attribute-oriented one-sided fuzzy concepts in essence. In the formal context of two-valued skills, there is one-to-one correspondence between the states in knowledge space and closure space, but this relationship does not hold in the formal context of fuzzy skills.
HOU Kai-Xiang , QIU Tie , XU Tian-Yi , ZHOU Xiao-Bo , CHI Jian-Cheng
2024, 35(5):2485-2502. DOI: 10.13328/j.cnki.jos.006892 CSTR:
Abstract:The committee consensus and hybrid consensus elect the committee to replace the whole nodes for block validation, which can effectively speed up consensus and improve throughput. However, malicious attacks and bribes can easily lead to committee corruption, affect consensus results, and even cause system paralysis. Although the existing work proposes the reputation mechanism to reduce the possibility of committee corruption, it has high overhead and poor reliability and cannot reduce the impact of corruption on the system. Therefore, this study proposes a dynamic blockchain consensus with pre-validation (DBCP). DBCP realizes reliable reputation evaluation of the committee through pre-validation with little overhead, which can eliminate malicious nodes from the committee in time. If serious corruption has undermined the consensus result, DBCP will transfer the authority of block validation to the whole nodes through dynamic consensus and eliminate the committee nodes that give wrong suggestions to avoid system paralysis. When the committee iterates to the high-credibility state, DBCP will hand over the authority of block validation to the committee, and the whole nodes will accept the consensus result from the committee without verifying the block to speed up the consensus. The experimental results show that the throughput of DBCP is two orders of magnitude higher than that of Bitcoin and similar to that of Byzcoin. In addition, DBCP can quickly deal with committee corruption within a block cycle, demonstrating better security than Byzcoin.
FENG Xue-Wei , XU Ke , LI Qi , YANG Yu-Xiang , ZHU Min , FU Song-Tao
2024, 35(5):2503-2521. DOI: 10.13328/j.cnki.jos.006941 CSTR:
Abstract:The transport layer is a key component in the network protocol stack, which is responsible for providing end-to-end services for applications between different hosts. Existing transport layer protocols such as TCP provide users with some basic security protection mechanisms, e.g., error controls and acknowledgments, which ensures the consistency of datagrams sent and received by applications between different hosts to a certain extent. However, these security protection mechanisms of the transport layer have serious flaws. For example, the sequence number of TCP datagrams is easy to be guessed and inferred, and the calculation of the datagram’s checksum depends on the vulnerable sum of the complement algorithm. As a result, the existing transport layer security mechanisms cannot guarantee the integrity and security of the datagram, which allows a remote attacker to craft a fake datagram and inject it into the target network stream, thus poisoning the target network stream. The attack against the transport layer occurs at the basic layers of the network protocol stack, which can bypass the security protection mechanisms enforced at the upper application layer and thus cause serious damage to the network infrastructure. After investigating various attacks over network protocols and the related security vulnerabilities in recent years, this study proposes a method for enhancing the security of the transport layer? based on lightweight chain verification, namely LightCTL. Based on the hash verification, LightCTL enables both sides of a TCP connection to create a mutually verifiable consensus on transport layer datagrams, so as to prevent attackers or middlemen from stealing and forging sensitive information. As a result, LightCTL can successfully foil various attacks against the network protocol stack, including TCP connection reset attacks based on sequence number inferring, TCP hijacking attacks, SYN flooding attacks, man-in-the-middle attacks, and datagram replay attacks. Besides, LightCTL does not need to modify the protocol stack of intermediate network devices such as routers. It only needs to modify the checksum and the related parts of the end protocol stack. Therefore, LightCTL can be easily deployed and significantly improves the security of network systems.
WANG Yu-Fu , WANG Xing-Wei , YI Bo , HUANG Min
2024, 35(5):2522-2542. DOI: 10.13328/j.cnki.jos.006988 CSTR:
Abstract:Aiming at the growing threat of distributed denial of service (DDoS) attacks under the rapid popularization of IPv6, this study proposes a two-stage DDoS defense mechanism, including a pre-detection stage to real-time monitor the early appearance of DDoS attacks and a deep-detection stage to accurately filter DDoS traffic after an alarm. First, the IPv6 traffic format is analyzed and the hexadecimal header fields are extracted from PCAP capture files as detection elements. Then, in the pre-detection stage, a lightweight binary convolutional neural network (BCNN) model is introduced and a two-dimensional traffic matrix is designed as model input, which can sensitively perceive the malicious situation caused by mixed DDoS traffic in the network as evidence of DDoS occurrence. After the alarm, the deep-detection stage will intervene with a one-dimensional convolutional neural network (1DCNN) model, which can specifically distinguish the mixed DDoS packets with one-dimensional packet vector as input to issue blocking policies. In the experiment, an IPv6-LAN topology is built and the proposed pure IPv6-DDoS traffic is generated by replaying the CIC-DDoS2019 public set through NAT 4to6. The results show that the proposed mechanism can effectively improve response speed, detection accuracy, and traffic filtering efficiency in DDoS defense. When DDoS traffic only takes 6% and 10% of the total network, BCNN can perceive the occurrence of DDoS with 90.9% and 96.4% accuracy, and the 1DCNN model can distinguish mixed DDoS packets with 99.4% accuracy at the same time.
2024, 35(5):2543-2565. DOI: 10.13328/j.cnki.jos.006893 CSTR:
Abstract:Deep neural networks (DNNs) have made remarkable achievements in many fields, but related studies show that they are vulnerable to adversarial examples. The gradient-based attack is a popular adversarial attack and has attracted wide attention. This study investigates the relationship between gradient-based adversarial attacks and numerical methods for solving ordinary differential equations (ODEs). In addition, it proposes a new adversarial attack based on Runge-Kutta (RK) method, a numerical method for solving ODEs. According to the prediction idea in the RK method, perturbations are added to the original examples first to construct predicted examples, and then the gradients of the loss functions with respect to the original and predicted examples are linearly combined to determine the perturbations to be added for the generation of adversarial examples. Different from the existing adversarial attacks, the proposed adversarial attack employs the prediction idea of the RK method to obtain the future gradient information (i.e., the gradient of the loss function with respect to the predicted examples) and uses it to determine the adversarial perturbations to be added. The proposed attack features good extensibility and can be easily applied to all available gradient-based attacks. Extensive experiments demonstrate that in contrast to the state-of-the-art gradient-based attacks, the proposed RK-based attack boasts higher success rates and better transferability.
2024, 35(5):2566-2582. DOI: 10.13328/j.cnki.jos.006912 CSTR:
Abstract:Adaptor signature, also known as scriptless script, is an important cryptographic technique that can be used to solve the problems of poor scalability and low transaction throughput in blockchain applications such as cryptocurrency. An adaptor signature can be seen as an extension of a digital signature on hard relations, and it ties together the authorization with witness extraction and has many advantages in blockchain applications, such as (1) low on-chain cost; (2) improved fungibility of transactions; (3) advanced functionality beyond the limitation of the blockchain’s scripting language. SM2 signature is the Chinese national standard signature algorithm and has been widely used in various important information systems. This work designs an efficient SM2-based adaptor signature with batch proofs and gives security proofs under the random oracle model. The scheme avoids to generate zero-knowledge proofs used in the pre-signing phase based on the structure of SM2 signature and is more efficient than existing ECDSA/SM2-based adaptor signature. Specifically, the efficiency of pre-signature generation is increased by 4 times, and the efficiency of pre-signature verification is increased by 3 times. Then, based on distributed SM2 signature, this work develops distributed SM2-based adaptor signature which can avoid the single point of failure and improve the security of signing key. Finally, in real-world applications, this work gives a secure and efficient batch atomic swap protocol for one-to-many scenarios based on SM2-based adaptor signature.