ZHANG Qi , CHENG Miao-Miao , LI Rong-Hua , WANG Guo-Ren
2024, 35(3):1051-1073. DOI: 10.13328/j.cnki.jos.007071 CSTR:
Abstract:Real-world networks often exhibit community structures, and community query is a fundamental task in graph data mining. Existing studies introduced various models to identify communities within networks, such as k-core based models and k-truss based models. Nevertheless, these models typically confine themselves to constraining the number of neighbors of nodes or edges within a community, disregarding the relationships between these neighbors, namely, the neighborhood structure of the nodes. Consequently, the localized density of nodes within communities tends to be low. To address this limitation, this study integrates the information regarding the neighborhood structure of nodes into the k-core dense subgraph model, thereby introducing a community model based on neighborhood k-core and defining the density of a community. Based on the novel model, this study investigates the densest single community query problem which outputs the community containing the query node set with the highest community density. In real-life networks, the query nodes may be distributed across multiple disjoint communities. To this end, this study further works on the problem of multi-community query based on a density threshold. This entails returning multiple communities that encompass the query node set, with each community demonstrating a density no lower than the user-specified threshold. For the problem of the densest single community query and the multi-community query based on a density threshold, this study introduces the concept of edge density with which the basic algorithms are proposed. To improve the efficiency, the index tree and the enhanced index tree structures are devised to support outputting results in polynomial time. The effectiveness of the community model based on neighborhood k-core and the efficiency of query algorithms are demonstrated through comparative analyses against basic algorithms using several different datasets.
YIN Zhan-Zuo , LI Bo-Han , WANG Meng , HUANG Rui-Long , WU Wen-Long , WANG Hao-Fen
2024, 35(3):1074-1089. DOI: 10.13328/j.cnki.jos.007076 CSTR:
Abstract:Due to the exponential growth of multimodal data, traditional databases are confronted with challenges in terms of storage and retrieval. Multimodal hashing is able to effectively reduce the storage cost of databases and improve retrieval efficiency by fusing multimodal features and mapping them into binary hash codes. Although many works on multimodal hashing perform well, there are also three important problems to be solved: (1) Existing methods tend to consider that all samples are modality-complete, while in practical retrieval scenarios, it is also common for samples to miss partial modalities; (2) Most methods are based on shallow learning models, which inevitably limits models’ learning ability and affects the final retrieval performance; (3) Some methods based on deep learning framework have been proposed to address the issue of weak learning ability, but they directly use coarse-grained feature fusion methods, such as concatenation, after extracting features from different modalities, which fails to effectively capture deep semantic information, thereby weakening the representation ability of hash codes and affecting the final retrieval performance. In response to the above problems, the PMH-F3 model is proposed. This model implements partial multimodal hashing for the case of samples missing partial modalities. The model is based on deep network architecture, and the Transformer encoder is used to capture deep semantics in self-attention manner, achieving fine-grained multimodal feature fusion. Sufficient experiments are conducted on MIR Flickr and MS COCO datasets and the best retrieval performance is achieved. The results of experiments show that PMH-F3model can effectively implement partial multimodal hashing and can be applied to large-scale multimodal data retrieval.
CHEN Zi-Jun , MA De-Long , WANG Yi-Shu , YUAN Ye
2024, 35(3):1090-1106. DOI: 10.13328/j.cnki.jos.007072 CSTR:
Abstract:Personalized PageRank, as a basic algorithm in large graph analysis, has a wide range of applications in search engines, social recommendation, community detection, and other fields, and has been a hot problem of interest to researchers. The existing distributed personalized PageRank algorithms assume that all data are located in the same geographic location and the network environment is the same among the computing nodes where the data are located. However, in the real world, these data may be distributed in multiple data centers across continents, and these cross-geo-distributed data centers are connected to each other through WANs, which are characterized by heterogeneous network bandwidth, huge hardware differences, and high communication costs. The distributed personalized PageRank algorithm requires multiple iterations and random wandering on the global graph. Therefore, the existing distributed personalized PageRank algorithms are not applicable to the cross-geo-distributed environment. To address this problem, the GPPR (cross-geo- distributed personalized PageRank) algorithm is proposed in this study. The algorithm first preprocesses the big graph data in the cross-geo-distributed environment and maps the graph data by using a heuristic algorithm to reduce the impact of network bandwidth heterogeneity on the iteration speed of the algorithm. Secondly, GPPR improves the random wandering approach and proposes a probability-based push algorithm to further reduce the number of iterations required by the algorithm by reducing the bandwidth load of transmitting data between working nodes. The GPPR algorithm is implemented based on the Spark framework and a real cross-geo-distributed environment in AliCloud is built to conduct experiments on eight open-source big graph data compared with several existing representative distributed personalized PageRank algorithms. The results show that the communication data volume of GPPR is reduced by 30% on average in the cross-geo-distributed environment compared with other algorithms. In terms of algorithm running efficiency, GPPR improves by an average of 2.5 times compared to other algorithms.
ZHANG Tian-Ming , ZHANG Shan , LIU Xi , CAO Bin , FAN Jing
2024, 35(3):1107-1124. DOI: 10.13328/j.cnki.jos.007069 CSTR:
Abstract:As a crucial subtask in natural language processing (NLP), named entity recognition (NER) aims to extract the import information from text, which can help many downstream tasks such as machine translation, text generation, knowledge graph construction, and multi-modal data fusion to deeply understand the complex semantic information of the text and effectively complete these tasks. In practice, due to time and labor costs, NER suffers from annotated data scarcity, known as few-shot NER. Although few-shot NER methods based on text have achieved sound generalization performance, the semantic information that the model can extract is still limited due to the few samples, which leads to the poor prediction effect of the model. To this end, this study proposes a few-shot NER model based on the multi-modal dataset fusion, which provides additional semantic information with multi-modal data for the first time, to help the model prediction and can further effectively improve the effect of multimodal data fusion and modeling. This method converts image information into text information as auxiliary modality information, which effectively solves the problem of poor modality alignment caused by the inconsistent granularity of semantic information contained in text and images. In order to effectively consider the label dependencies in few-shot NER, this study uses the CRF framework and introduces the state-of-the-art meta-learning methods as the emission module and the transition module, respectively. To alleviate the negative impact of noisy samples in the auxiliary modal samples, this study proposes a general denoising network based on the idea of meta-learning. The denoising network can measure the variability of the samples and evaluate the beneficial extent of each sample to the model. Finally, this study conducts extensive experiments on real unimodal and multimodal data sets. The experimental results show the outstanding generalization performance of the proposed method, where the proposed method outperforms the state-of-the-art methods by 10 F1 scores in the 1-shot setting.
TANG Xiu , WU Sai , HOU Jie , CHEN Gang
2024, 35(3):1125-1139. DOI: 10.13328/j.cnki.jos.007073 CSTR:
Abstract:Training multimodal models in deep learning often requires a large amount of high-quality annotated data from diverse modalities such as images, text, and audio. However, acquiring such data in large quantities can be challenging and costly. Active learning has emerged as a powerful paradigm to address this issue by selectively annotating the most informative samples, thereby reducing annotation costs and improving model performance. However, existing active learning methods encounter limitations in terms of inefficient data scanning and costly maintenance when dealing with large-scale updates. To overcome these challenges, this study proposes a novel approach called So-CBI (semi-ordered class boundary index) that efficiently retrieves samples for multimodal model training. So-CBI incorporates inter-class boundary perception and a semi-ordered indexing structure to minimize maintenance costs and enhance retrieval efficiency. Experimental evaluations on various datasets demonstrate the effectiveness of So-CBI in the context of active learning.
ZHAO Hai-Quan , WANG Xu-Wu , LI Jin-Liang , LI Zhi-Xu , XIAO Yang-Hua
2024, 35(3):1140-1153. DOI: 10.13328/j.cnki.jos.007078 CSTR:
Abstract:With the rapid development of the Internet and big data, the scale and variety of data are increasing. Video, as an important form of information, is becoming increasingly prevalent, particularly with the recent growth of short videos. Understanding and analyzing large-scale videos has become a hot topic of research. Entity linking, as a way of enriching background knowledge, can provide a wealth of external information. Entity linking in videos can effectively assist in understanding the content of video, enabling classification, retrieval, and recommendation of video content. However, the granularity of existing video linking datasets and methods is too coarse. Therefore, this study proposes a video-based fine-grained entity linking approach, focusing on live streaming scenarios, and constructs a fine-grained video entity linking dataset. Additionally, based on the challenges of fine-grained video linking tasks, this study proposes the use of large models to extract entities and their attributes from videos, as well as utilizing contrastive learning to obtain better representations of videos and their corresponding entities. The results demonstrate that the proposed method can effectively handle fine-grained entity linking tasks in videos.
CUI Shuang-Shuang , WU Xian , WANG Hong-Zhi , WU Hao
2024, 35(3):1154-1172. DOI: 10.13328/j.cnki.jos.007070 CSTR:
Abstract:In the cloud-edge-device collaborative architecture, data types are diverse, and there are differences in storage resources and computing resources at all levels, which bring new challenges to data management. The existing data models or simple superposition of data models are difficult to meet the requirements of multimodal data management and collaborative management in the cloud-edge-device. Therefore, research on multimodal data modeling technology for cloud-edge-device collaboration has become an important issue. The core is how to efficiently obtain the query results that meet the needs of the application from the three-tier architecture of cloud-edge-device. Starting from the data types of the three-layer data of cloud-edge-device, this study proposes a multimodal data modeling technology for cloud-edge-device collaboration, gives the definition of multimodal data model based on tuples, and designs six base classes to solve the problem of unified representation of multimodal data. The basic data operation architecture of cloud-edge-device collaborative query is also proposed to meet the query requirements of cloud-edge-device business scenarios. The integrity constraints of the multimodal data model are given, which lays a theoretical foundation for query optimization. Finally, a demonstration application of the multimodal data model for cloud edge-device collaboration was given, and the proposed data model storage method was verified from three aspects of data storage time, storage space and query time. The experimental results show that the proposed scheme can effectively represent the multimodal data in the cloud-edge-device collaborative architecture.
HE Wen-Di , XIA Tian-Rui , SONG Shao-Xu , HUANG Xiang-Dong , WANG Jian-Min
2024, 35(3):1173-1193. DOI: 10.13328/j.cnki.jos.007077 CSTR:
Abstract:Time-series data are widely used in industrial manufacturing, meteorology, ships, electric power, vehicles, finance, and other fields, which promotes the booming development of time-series database management systems. Faced with larger data scales and more diverse data modalities, efficiently storing and managing the data is very critical, and data encoding and compression become more and more important and are worth studying. Existing data encoding methods and systems fail to consider the characteristics of data in different modalities thoroughly, and some methods of time-series data analysis have not been applied to the scenario of data encoding. This study comprehensively introduces the multimodal data encoding methods and their system implementation in the Apache IoTDB time-series database system, especially for the industrial Internet of Things application scenarios. In the proposed encoding methods, data are comprehensively considered in multiple modals including timestamp data, numerical data, Boolean data, frequency domain data, text data, etc., and the characteristics of the corresponding modal of data fully are explored and utilized, especially the characteristics of timestamp intervals approximation in timestamp modality, to carry out targeted data encoding design. At the same time, the data quality issue that may occur in practical applications has been taken into consideration in the coding algorithm. Experimental evaluation and analysis on the encoding algorithm level and the system level over multiple datasets validate the effectiveness of the proposed encoding method and its system implementation
XIE Yu-Peng , LUO Yu-Yu , FENG Jian-Hua
2024, 35(3):1194-1206. DOI: 10.13328/j.cnki.jos.007074 CSTR:
Abstract:With the advent of the big data era, the significance of data analysis has increasingly come to the forefront, showcasing its ability to uncover valuable insights from vast datasets, thereby enhancing the decision-making process for users. Nonetheless, the data analysis workflow faces three dominant challenges: high coupling in the analysis workflow, a plethora of interactive interfaces, and a time-intensive exploratory analysis process. To address these challenges, this study introduces Navi, a data analysis system powered by natural language interaction. Navi embraces a modular design philosophy that abstracts three core functional modules from mainstream data analysis workflows: data querying, visualization generation, and visualization exploration. This approach effectively reduces the coupling of the system. Meanwhile, Navi leverages natural language as a unified interactive interface to seamlessly integrate various functional modules through a task scheduler, ensuring their effective collaboration. Moreover, in order to address the challenges of exponential search space and ambiguous user intent in visualization exploration, this study proposes an automated approach for visualization exploration based on Monte Carlo tree search. In addition, a pruning algorithm and a composite reward function, both incorporating visualization domain knowledge, are devised to enhance the search efficiency and result quality. Finally, this study validates the effectiveness of Navi through both quantitative experiments and user studies.
DING Guang-Yao , XU Chen , QIAN Wei-Ning , ZHOU Ao-Ying
2024, 35(3):1207-1230. DOI: 10.13328/j.cnki.jos.007075 CSTR:
Abstract:Computer vision has been widely used in various real-world scenarios due to its powerful learning ability. With the development of databases, there is a growing trend in research to exploit mature data management techniques in databases for vision analytics applications. The integration and processing of multimodal data, including images, video and text, promotes diversity and improves accuracy in vision analytics applications. In recent years, due to the popularization of deep learning, there has been a growing interest in vision analytics applications that support deep learning. Nevertheless, traditional database management techniques in deep learning scenarios suffer from the issues such as lack of semantics for vision analytics and inefficiency in application execution. Hence, vision database management systems that support deep learning have been widely studied. This study reviews the progress of vision database management systems. First, this study summarizes the challenges faced by vision database management systems in different dimensions, including programming interface, query optimization, execution scheduling, and data storage. Second, this study discusses the technologies in each of these four dimensions. Finally, the study investigates the future research directions of vision database management systems.
LIANG Zhen , LIU Wan-Wei , WU Tao-Ran , XUE Bai , WANG Ji , YANG Wen-Jing
2024, 35(3):1231-1256. DOI: 10.13328/j.cnki.jos.007061 CSTR:
Abstract:With the development of the intelligent information era, applications of deep neural networks in various fields of human society, especially deployments in safety-critical systems such as automatic driving and military defense, have aroused concern from academic and industrial communities on the erroneous behaviors that deep neural networks may exhibit. Although neural network verification and neural network testing can provide qualitative or quantitative conclusions about erroneous behaviors, such post-analysis cannot prevent their occurrence. How to repair the pre-trained neural networks that feature wrong behavior is still a very challenging problem. To this end, deep neural network repair comes into being, aiming at eliminating the unexpected predictions generated by defective neural networks and making the neural networks meet certain specification properties. So far, there are three typical neural network repair paradigms: retraining, fine tuning without fault localization, and fine tuning with fault localization. This study introduces the development of deep neural networks and the necessity of deep neural network repair, clarifies some similar concepts, and identifies the challenges of deep neural network repair. In addition, it investigates the existing neural network repair strategies in detail and compares the internal relationships and differences among these strategies. Moreover, the study explores and sorts out the evaluation metrics and benchmark tests commonly used in neural network repair strategies. Finally, it forecasts the feasible research directions that should be paid attention to in the future development of neural network repair strategies.
ZHENG Wei , LIU Cheng-Yuan , WU Xiao-Xue , CHEN Xiang , CHENG Jing-Yuan , SUN Xiao-Bing , SUN Rui-Yang
2024, 35(3):1257-1279. DOI: 10.13328/j.cnki.jos.006812 CSTR:
Abstract:Security bug reports (SBRs) can describe critical security vulnerabilities in software products. SBR prediction has attracted the increasing attention of researchers to eliminate security attack risks of software products. However, in actual software development scenarios, a new company or new project may need software security bug prediction, without enough marked SBRs for building SBR prediction models in practice. A simple solution is employing the migration model, which means that marked data of other projects can be adopted to build the prediction model. Inspired by two recent studies in this field, this study puts forward a cross-project SBR prediction method integrating knowledge graphs, i.e., knowledge graph of security bug report prediction (KG-SBRP), based on the idea of security keyword filtering. The text information field in SBR is combined with common weakness enumeration (CWE) and common vulnerabilities and exposures (CVE) Details to build a triple rule entity. Then the entity is utilized to build a knowledge graph of security bugs and identify SBRs by combining the entity and relationship recognition. Finally, the data is divided into training sets and test sets for model fitting and performance evaluation. The built model conducts empirical research on seven SBR datasets with different scales. The results show that compared with the current main methods FARSEC and Keyword matrix, the proposed method can increase the performance index F1-score by an average of 11% under cross-project SBR prediction scenarios. In addition, the F1-score value can also grow by an average of 30% in SBR prediction scenarios within a project.
XIONG Jing-Liu , REN Qiu-Rong , Shmuel TYSZBEROWICZ , LIU Zhi-Ming , LIU Bo
2024, 35(3):1280-1306. DOI: 10.13328/j.cnki.jos.006813 CSTR:
Abstract:Migrating from monolithic systems to microservice systems is one of the mainstream options for the industry to realize the reengineering of legacy systems, and microservice architecture refactoring based on monolithic legacy systems is the key to realizing migration. Currently, academia mainly focuses on the research on microservice identification methods, and there are many industry practices of legacy systems refactored into microservices. However, systematic approaches and efficient and robust tools are insufficient. Therefore, based on earlier research on microservices identification and model-driven development method, this study presents MSA-Lab, an integrated design platform for microservice refactoring of monolithic legacy systems based on the model-driven development approach. MSA-Lab analyzes the method call sequence in the running log of the monolithic legacy system, identifies and clusters classes and data tables for constructing abstract microservices, and generates a system architecture design model including the microservice diagram and microservice sequence diagram. The model has two core components: MSA-Generator for automatic microservice identification and design model generation and MSA-Modeller for visualization, interactive modeling, and model syntax constraint checking of microservice static structure and dynamic behavior models. This study conducts experiments in the MSA-Lab platform for effectiveness, robustness, and function transformation completeness on four open-source projects and carries out performance comparison experiments with three same-type tools. The results show that the platform has excellent effectiveness and robustness, function transform completeness for running logs, and superior performance.
2024, 35(3):1307-1320. DOI: 10.13328/j.cnki.jos.006817 CSTR:
Abstract:The ranking function method is the main method for the termination analysis of loops, and it indicates that loop programs can be terminated. In view of single-path linear constraint loop programs, this study presents a method to analyze the termination of the loops. Based on the calculation of the normal space of the increasing function, this method considers the calculation of the ranking function in the original program space as that in the subspace. Experimental results show that the method can effectively verify the termination of most loop programs in the existing literature.
TIAN Jia-Hao , ZHANG Li , LIAN Xiao-Li , ZHAO Qian-Hui
2024, 35(3):1321-1340. DOI: 10.13328/j.cnki.jos.006820 CSTR:
Abstract:In large-scale and complex software systems, requirement analysis and generation are accomplished through a top-down process, and the construction of tracking relationships between cross-level requirements is very important for project management, development, and evolution. The loosely-coupled contribution approach of open-source systems requires each participant to easily understand the context and state of the requirements, which relies on cross-level requirement tracking. The issue description log is a common way of presenting requirements in open-source systems. It has no fixed template, and its content is diverse (including text, code, and debugging information). Furthermore, the terms can be freely used, and the gap in abstraction level between cross-level requirements is large, which brings great challenges to automatic tracking. In this paper, a correlation feedback method for key feature dimensions is proposed. Through static analysis of the project’s code structure, code-related terms and their correlation strength are extracted, and a code vocabulary base is constructed to alleviate the gap in abstraction level and the inconsistency of terminology between cross-level requirements. By measuring the importance of terms to requirement description and screening key feature dimensions on this basis, the inquiry statement is optimized to effectively reduce the noise of requirement description length, content form, and other aspects. Experiments with two scenarios on three open-source systems suggest that the proposed method outperforms baseline approaches in cross-level requirement tracking and improves F2 value to 29.01%, 7.45%, and 59.21% compared with vector space model (VSM), standard Rocchio, and trace bidirectional encoder representations from transformers (BERT), respectively.
GUO Na , LIU Cong , LI Cai-Hong , LU Ting , WEN Li-Jie , ZENG Qing-Tian
2024, 35(3):1341-1356. DOI: 10.13328/j.cnki.jos.006824 CSTR:
Abstract:Remaining process time prediction is important for preventing and intervening in abnormal business operations. For predicting the remaining time, existing approaches have achieved high accuracy through deep learning techniques. However, most of these techniques involve complex model structures, and the prediction results are difficult to be explained, namely, unexplainable issues. In addition, the prediction of the remaining time usually uses the key attribute, namely activity, or selects several other attributes as the input features of the predicted model according to the domain knowledge. However, a general feature selection method is missing, which may affect both prediction accuracy and model explainability. To tackle these two challenges, this study introduces a remaining process time prediction framework based on an explainable feature-based hierarchical (EFH) model. Specifically, a feature self-selection strategy is first proposed, and the attributes that have a positive impact on the prediction task are obtained as the input features of the model through the backward feature deletion based on priority and the forward feature selection based on feature importance. Then an EFH model is proposed. The prediction results of each layer are obtained by adding different features layer by layer, so as to explain the relationship between input features and prediction results. The study also uses the light gradient boosting machine (LightGBM) and long short-term memory (LSTM) algorithms to implement the proposed approach, and the framework is general and not limited to the algorithms selected in this study. Finally, the proposed approach is compared with other methods on eight real-life event logs. The experimental results show that the proposed approach can select effective features and improve prediction accuracy. In addition, the prediction results are explained.
LIN Li , MAO Xin-Ya , CHU Zhen-Xing , XIE Xiao-Yu
2024, 35(3):1357-1376. DOI: 10.13328/j.cnki.jos.006809 CSTR:
Abstract:In a hybrid cloud environment, enterprise business applications and data are often transferred across different cloud services. For complex and diversified cloud service environments, most hybrid cloud applications adopt access control policies made around only access subjects and adjust the policies manually, which cannot meet the fine-grained dynamic access control requirements at different stages of the data life cycle. This study proposes AHCAC, an adaptive access control method oriented to data life cycle in a hybrid cloud environment. Firstly, the the policy description idea based on key attributes are employed to unify the heterogeneous policies of the full life cycle of data under the hybrid cloud. Especially, the “stage” attribute is introduced to explicitly identify the life-cycle state of data, which is the basis for achieving fine-grained access control oriented to data life cycle. Secondly, in view of the similarity and consistency of access control policy with the same life-cycle stage, the policy distance is defined, and a hierarchical clustering algorithm based on the policy distance is proposed to construct the corresponding data access control policy in each life-cycle stage. Finally, when the life-cycle stage of data is changed, the adaptation and loading of policies of corresponding data stages in the policy evaluation are triggered through key attribute matching, which realizes the adaptive access control oriented to the data life cycle. This study also conducts experiments to verify the effectiveness and feasibility of the proposed method on OpenStack and open-source policy evaluation engine Balana.
ZHAO Yan-Yan , LU Xin , ZHAO Wei-Xiang , TIAN Yi-Jian , QIN Bing
2024, 35(3):1377-1402. DOI: 10.13328/j.cnki.jos.006807 CSTR:
Abstract:Emotional dialogue technology focuses on the “emotional quotient” of conversational robots, aiming to give the robots the ability to observe, understand and express emotions as humans do. This technology can be seen as the intersection of emotional computing and dialogue technology, and can simultaneously consider the “intelligent quotient” and “emotional quotient” of conversational robots to realize spiritual companionship, emotional comfort, and psychological guidance for users. Combined with the characteristics of emotions in dialogues, this study provides a comprehensive analysis of emotional dialogue technology: 1) Three important technical points including emotion recognition, emotion management, and emotion expression in dialogue scenarios are shown, and the technology of emotional dialogues in multimodal scenarios is expanded. 2) This study presents the latest research progress on technology points related to emotional dialogues and summarizes the main challenges and possible solutions correspondingly. 3) Data resources for emotional dialogue technologies are introduced. 4) The difficulty and prospect of emotional dialogue technology are pointed out.
ZHANG Yi-Chen , HE Gan , DU Kai , HUANG Tie-Jun
2024, 35(3):1403-1417. DOI: 10.13328/j.cnki.jos.006816 CSTR:
Abstract:How brains realize learning and perception is an essential question for both artificial intelligence and neuroscience communities. Since the existing artificial neural networks (ANNs) are different from the real brain in terms of structures and computing mechanisms, they cannot be directly used to explore the mechanisms of learning and dealing with perceptual tasks in the real brain. The dendritic neuron model is a computational model to model and simulate the information processing process of neuron dendrites in the brain and is closer to biological reality than ANNs. The use of the dendritic neural network model to deal with and learn perceptual tasks plays an important role in understanding the learning process in the real brain. However, current learning models based on dendritic neural networks mainly focus on simplified dendritic models and are unable to model the entire signal-processing mechanisms of dendrites. To solve this problem, this study proposes a learning model of the biophysically detailed neural network of medium spiny neurons (MSNs). The neural network can fulfill corresponding perceptual tasks through learning. Experimental results show that the proposed model can achieve high performance on the classical image classification task. In addition, the neural network shows strong robustness under noise interference. By further analyzing the network features, this study finds that the neurons in the network after learning show stimulus selectivity, which is a classical phenomenon in neuroscience. This indicates that the proposed model is biologically plausible and implies that stimulus selectivity is an essential property of the brain in fulfilling perceptual tasks through learning.
2024, 35(3):1418-1439. DOI: 10.13328/j.cnki.jos.006819 CSTR:
Abstract:Federated learning is an effective method to solve the problem of data silos. When the server calculates all gradients, incorrect calculation of global gradients exists due to the inertia and self-interest of the server, so it is necessary to verify the integrity of global gradients. The existing schemes based on cryptographic algorithms are overspending on verification. To solve these problems, this study proposes a rational and verifiable federated learning framework. Firstly, according to game theory, the prisoner contract and betrayal contract are designed to force the server to be honest. Secondly, the scheme uses a replication-based verification scheme to verify the integrity of the global gradient and supports the offline client side. Finally, the analysis proves the correctness of the scheme, and the experiments show that compared with the existing verification algorithms, the proposed scheme reduces the computing overhead of the client side to zero, the number of communication rounds in one iteration is optimized from three to two, and the training overhead is inversely proportional to the offline rate of the client side
WU Jun , CAO Jie , WANG Chong-Jun , XIE Jun-Yuan
2024, 35(3):1440-1465. DOI: 10.13328/j.cnki.jos.006825 CSTR:
Abstract:A social law is a set of restrictions on the available actions of agents to establish some target properties in a multiagent system. In the strategic case, where the agents have individual rationality and private information, the social law synthesizing problem should be modeled as an algorithmic mechanism design problem instead of a common optimization problem. Minimal side effect is usually a basic requirement for social laws. From the perspective of game theory, minimal side effect closely relates to the concept of maximum social welfare, and synthesizing a social law with minimal side effect can be modeled as an efficient mechanism design problem. Therefore, this study not only needs to find out the efficient social laws with maximum social welfare for the given target property but also pays for the agents to induce incentive compatibility and individual rationality. The study first designs an efficient mechanism based on the VCG mechanism, namely VCG-SLM, and proves that it satisfies all the required formal properties. However, as the computation of VCG-SLM is an FPNP-complete problem, the study proposes an ILP-based implementation of this mechanism (VCG-SLM-ILP), transforms the computation of allocation and payment to ILPs based on the semantics of ATL, and strictly proves its correction, so as to effectively utilize the currently mature industrial-grade integer programming solver and successfully solve the intractable mechanism computing problems.
XU Bin , ZHAO Yun-Kai , ZHU Jian-Ming , LIU Yi-Chuan , LI Xuan-Tao , SUN Yan-Fei , JI Yi-Mu
2024, 35(3):1466-1484. DOI: 10.13328/j.cnki.jos.006805 CSTR:
Abstract:The uncertainty of tasks in mobile edge computing scenarios makes task offloading and resource allocation more complex and difficult. Therefore, a continuous offloading and resource allocation method of uncertain tasks in mobile edge computing is proposed. Firstly, a continuous offloading model of uncertain tasks in mobile edge computing is built, and the multi-batch processing technology based on duration slice partition is employed to address task uncertainty. A multi-device computing resource coordination mechanism is designed to improve the carrying capacity of computation-intensive tasks. Secondly, an adaptive strategy selection algorithm based on load balancing is put forward to avoid channel congestion and additional energy consumption caused by the over-allocation of computing resources. Finally, the uncertain task scenario model is simulated based on Poisson distribution, and experimental results show that the reduction of time slice length can reduce the total energy consumption of the system. In addition, the proposed algorithm can achieve task offloading and resource allocation more effectively and can reduce energy consumption by up to 11.8% compared with comparison algorithms.
WANG Yi-Jun , FENG Yong , LIU Ming , LIU Nian-Bo
2024, 35(3):1485-1501. DOI: 10.13328/j.cnki.jos.006814 CSTR:
Abstract:Efficient mobile charging scheduling is a key technology to build wireless rechargeable sensor networks (WRSN) which have long life cycle and sustainable operation ability. The existing charging methods based on reinforcement learning only consider the spatial dimension of mobile charging scheduling, i.e., the path planning of mobile chargers (MCs), while leaving out the temporal dimension of the problem, i.e., the adjustment of the charging duration, and thus these methods have suffered some performance limitations. This study proposes a dynamic spatiotemporal charging scheduling scheme based on deep reinforcement learning (SCSD) and establishes a deep reinforcement learning model for dynamic adjustment of charging sequence scheduling and charging duration. In view of the discrete charging sequence planning and continuous charging duration adjustment in mobile charging scheduling, the study uses DQN to optimize the charging sequence for nodes to be charged and calculates and dynamically adjusts the charging duration of the nodes. By optimizing the two dimensions of space and time respectively, the SCSD proposed in this study can effectively improve the charging performance while avoiding the power failure of nodes. Simulation experiments show that SCSD has significant performance advantages over several well-known typical charging schemes.
YUAN Chao , WANG Hong-Xia , HE Pei-Song
2024, 35(3):1502-1514. DOI: 10.13328/j.cnki.jos.006815 CSTR:
Abstract:With the development of deep learning and steganography, deep neural networks are widely used in image steganography, especially in a new research direction, namely embedding an image message in an image. The mainstream steganography of embedding an image message in an image based on deep neural networks requires cover images and secret images to be input into a steganographic model to generate stego-images. But recent studies have demonstrated that the steganographic model only needs secret images as input, and then the output secret perturbation is added to cover images, so as to embed secret images. This novel embedding method that does not rely on cover images greatly expands the application scenarios of steganography and realizes the universality of steganography. However, this method currently only verifies the feasibility of embedding and recovering secret images, and the more important evaluation criterion for steganography, namely concealment, has not been considered and verified. This study proposes a high-capacity universal steganography generative adversarial network (USGAN) model based on an attention mechanism. By using the attention module, the USGAN encoder can adjust the perturbation intensity distribution of the pixel position on the channel dimension in the secret image, thereby reducing the influence of the secret perturbation on the cover images. In addition, in this study, the CNN-based steganalyzer is used as the target model of USGAN, and the encoder learns to generate a secret adversarial perturbation through adversarial training with the target model so that the stego-image can become an adversarial example for attacking the steganalyzer at the same time. The experimental results show that the proposed model can not only realize a universal embedding method that does not rely on cover images but also further improves the concealment of steganography.
LIU Zhao-Peng , LI Sheng-Jie , ZHANG Yue , ZENG You-Wei , ZHANG Da-Qing
2024, 35(3):1515-1533. DOI: 10.13328/j.cnki.jos.006826 CSTR:
Abstract:Recently, with the popularity of ubiquitous computing, intelligent sensing technology has become the focus of researchers, and non-contact sensing based on WiFi is more and more popular in academia and industry because of its excellent generality, low deployment cost, and great user experience. The typical non-contact sensing work based on WiFi includes gesture recognition, breath detection, intrusion detection, behavior recognition, etc. For real-life deployment of these works, one of the major challenges is to avoid the interference of irrelevant behaviors in other irrelevant areas, so it is necessary to judge whether the target is in a specific sensing area or not, which means that the system should be able to determine exactly which side of the boundary line the target is on. However, the existing work cannot find a way to accurately monitor a freely set boundary, which hinders the actual implementation of WiFi-based sensing applications. In order to solve this problem, based on the physical essence of electromagnetic wave diffraction and the Fresnel diffraction model, this study finds a signal feature, namely Rayleigh distribution in Fresnel diffraction model (RFD), when the target passes through the link (the line between the WiFi receiver and transmitter antennas) and reveals the mathematical relationship between the signal feature and human activity. Then, the study realizes a boundary monitoring algorithm through line crossing detection by using the link as the boundary and considering the waveform delay caused by antenna spacing and the features of automatic gain control (AGC) when the link is blocked. On this basis, the study also implements two practical applications, that is, intrusion detection system and home state detection system. The intrusion detection system achieves a precision of more than 89% and a recall rate of more than 91%, while the home state detection system achieves an accuracy of more than 89%. While verifying the availability and robustness of the boundary monitoring algorithm, the study also shows the great potential of combining the proposed method with other WiFi-based sensing technologies and provides a direction for the actual deployment of WiFi-based sensing technologies.
LIU Fang , LÜ Tian , LIU Xin-Ge , YE Sheng , GUO Rui , ZHANG Lie , MA Cui-Xia , WANG Qing-Wei , LIU Yong-Jin
2024, 35(3):1534-1551. DOI: 10.13328/j.cnki.jos.006801 CSTR:
Abstract:The Olympic heritage is the treasure of the world. The integration of technology, culture, and art is crucial to the diversified presentation and efficient dissemination of the heritage of the Beijing Winter Olympics. As an important trend form of digital museums in the information era, online exhibition halls lay a good foundation in the research on individual digital museums and interactive technologies, but so far, no systematic, intelligent, interactive, and friendly system of the Winter Olympics digital museum has been built. This study proposes an online exhibition hall construction method with interactive feedback for the Beijing 2022 Winter Olympics. By constructing an interactive exhibition hall with intelligent virtual agent, it has further explored the role of interactive feedback in disseminating intangible cultural heritage in a knowledge dissemination-based digital museum. To explore the influence of audio-visual interactive feedback on spreading Olympic spiritual culture in the exhibition hall and improve the user experience, the study conducts a user experiment with 32 participants. The results show that the constructed exhibition hall can greatly promote the dissemination of Olympic culture and spirit, and the introduction of audio-visual interactive feedback in the exhibition hall can improve users’ perceptual control, thereby improving the user experience.
LIANG Yun , ZHANG Yu-Qing , ZHENG Jin-Tu , ZHANG Yong
2024, 35(3):1552-1568. DOI: 10.13328/j.cnki.jos.006821 CSTR:
Abstract:As challenges such as serious occlusions and deformations coexist, video segmentation with accurate robustness has become one of the hot topics in computer vision. This study proposes a video segmentation method with absorbing Markov chains and skeleton mapping, which progressively produces accurate object contours through the process of pre-segmentation—optimization—improvement. In the phase of pre-segmentation, based on the twin network and the region proposal network, the study obtains regions of interest for objects, constructs the absorbing Markov chains of superpixels in these regions, and calculates the labels of foreground/background of the superpixels. The absorbing Markov chains can perceive and propagate the object features flexibly and effectively and preliminarily pre-segment the target object from the complex scene. In the phase of optimization, the study designs the short-term and long-term spatial-temporal cue models to obtain the short-term variation and the long-term feature of the object, so as to optimize superpixel labels and reduce errors caused by similar objects and noise. In the phase of improvement, to reduce the artifacts and discontinuities of optimization results, this study proposes an automatic generation algorithm for foreground/background skeleton based on superpixel labels and positions and constructs a skeleton mapping network based on encoding and decoding, so as to learn the pixel-level object contour and finally obtain accurate video segmentation results. Many experiments on standard datasets show that the proposed method is superior to the existing mainstream video segmentation methods and can produce segmentation results with higher region similarity and contour accuracy.
HU Yi , CHEN Dao-Kun , YANG Chao , MA Wen-Jing , LIU Fang-Fang , SONG Chao-Bo , SUN Qiang , SHI Jun-Da
2024, 35(3):1569-1584. DOI: 10.13328/j.cnki.jos.006811 CSTR:
Abstract:Basic linear algebra subprogram (BLAS) is one of the most basic and important math libraries. The matrix-matrix operations covered in the level-3 BLAS functions are particularly significant for a standard BLAS library and are widely employed in many large-scale scientific and engineering computing applications. Additionally, level-3 BLAS functions are computing intensive functions and play a vital role in fully exploiting the computing performance of processors. Multi-core parallel optimization technologies are studied for level-3 BLAS functions on SW26010-Pro, a domestic processor. According to the memory hierarchy of SW26010-Pro, this study designs a multi-level blocking algorithm to exploit the parallelism of matrix operations. Then, a data-sharing scheme based on remote memory access (RMA) mechanism is proposed to improve the data transmission efficiency among CPEs. Additionally, it employs triple buffering and parameter tuning to fully optimize the algorithm and hide the memory access costs of direct memory access (DMA) and the communication overhead of RMA. Besides, the study adopts two hardware pipelines and several vectorized arithmetic/memory access instructions of SW26010-Pro and improves the floating-point computing efficiency of level-3 BLAS functions by writing assembly code manually for matrix-matrix multiplication, matrix equation solving, and matrix transposition. The experimental results show that level-3 BLAS functions can significantly improve the performance on SW26010-Pro by leveraging the proposed parallel optimization. The floating-point computing efficiency of single-core level-3 BLAS is up to 92% of the peak performance, while that of multi-core level-3 BLAS is up to 88% of the peak performance.