CHEN Bo-Lei , KANG Jia-Xu , ZHONG Ping , CUI Yong-Zheng , LU Si-Yi , YANG Hao-Nan , WANG Jian-Xin
Online: November 27,2024 DOI: 10.13328/j.cnki.jos.007250
Abstract:With the continuous development of computer vision and artificial intelligence (AI) in recent years, embodied AI has received widespread attention from academia and industry at home and abroad. Embodied AI emphasizes that an agent should actively obtain real feedback from the physical world by interacting with the environment in a contextualized way and make itself more intelligent through learning from the feedback. As one of the concrete tasks of embodied AI, object goal navigation requires an agent to search for and navigate to a specified object goal (e.g., find a sink) in a previously unknown, complex, and semantically rich scenario. Object goal navigation has great potential for applications in smart assistants that support daily human activities, serving as a fundamental and antecedent task for other interaction-based embodied AI research. This study systematically classifies current research on object goal navigation. Firstly, the knowledge related to environmental representation and autonomous visual exploration is introduced, and existing object goal navigation methods are classified and analyzed from three different perspectives. Secondly, two categories of higher-level object rearrangement tasks are introduced, with a description of datasets for realistic indoor environment simulation, evaluation metrics, and a generic training paradigm for navigation strategies. Finally, the performance of existing object goal navigation strategies is compared and analyzed on different datasets. The challenges in this field are summarized, and development trends are predicted.
WANG Rui-Jin , WANG Jin-Bo , ZHANG Feng-Li , LI Jing-Wei , LI Zeng-Peng , CHEN Ting
Online: November 20,2024 DOI: 10.13328/j.cnki.jos.007183
Abstract:Federated learning, a framework for training global machine learning models through distributed iterative collaboration without sharing private data, has gained prevalence. FedProto, a widely used federated learning approach, employs abstract class prototypes, termed feature maps, to enhance model convergence speed and generalization capacity. However, this approach overlooks the verification of the aggregated feature maps’ accuracy, risking model training failures due to incorrect feature maps. This study investigates a feature map poisoning attack on FedProto, revealing that malicious actors can degrade inference accuracy by up to 81.72% through tampering with the training data labels. To counter such attacks, we propose a dual defense mechanism utilizing knowledge distillation and feature map validation. Experimental results on authentic datasets demonstrate that this defense strategy can enhance the compromised model inference accuracy by a factor of 1 to 5, with only a marginal 2% increase in operational time.
YANG Shang-Dong , YU Miao-Ying , CHEN Xing-Guo , CHEN Lei
Online: November 20,2024 DOI: 10.13328/j.cnki.jos.007184
Abstract:Reinforcement learning has achieved remarkable results in decision-making tasks like intelligent dialogue systems, yet its efficiency diminishes notably in scenarios with intricate structures and scarce rewards. Researchers have integrated the skill discovery framework into reinforcement learning, aiming to maximize skill disparities to establish policies and boost agent performance in such tasks. However, the constraint posed by the limited diversity of sampled trajectory data confines existing skill discovery methods to learning a single skill per reinforcement learning episode. Consequently, this limitation results in subpar performance in complex tasks requiring sequential skill combinations within a single episode. To address this challenge, a group-wise contrastive learning based sequence-aware skill discovery method (GCSSD) is proposed, which integrates contrastive learning into the skill discovery framework. Initially, to augment trajectory data diversity, the complete trajectories interacting with the environment are segmented and grouped, employing contrastive loss to learn skill embedding representations from grouped trajectories. Subsequently, skill policy training is conducted by combining the skill embedding representation with reinforcement learning. Lastly, to enhance performance in tasks featuring diverse sequential skill combinations, the sampled trajectories are segmented into skill representations and embedded into the learned policy network, facilitating the sequential combination of learned skill policies. Experimental results demonstrate the efficacy of the GCSSD method in tasks characterized by sparse rewards and sequential skill combinations, showcasing its capability to swiftly adapt to tasks with varying sequential skill combinations using learned skills.
QIN Zheng , XU Li-Jie , CHEN Wei , WANG Yi , WU Ming-Chao , ZENG Hong-Bin , WANG Wei
Online: November 20,2024 DOI: 10.13328/j.cnki.jos.007235
Abstract:With the advent of the big data era, massive volumes of user data have empowered numerous data-driven industry applications, such as smart grids, intelligent transportation, and product recommendations. In scenarios where real-time data is crucial, the business value embedded within data rapidly diminishes over time. Consequently, data analysis systems require high throughput and low latency. Stream processing systems in big data, exemplified by Apache Flink, have been widely applied. Flink enhances system throughput by parallelizing computing tasks across cluster nodes. However, current research indicates that Flink has weak single-point performance and poor cluster scalability. To improve the throughput of stream processing systems, researchers have focused on optimizations in designing control planes, implementing system operators, and improving vertical scalability. However, there is still a lack of attention to the data flow in streaming analysis applications. These applications are driven by event streams and employ stateful processing functions, including low voltage detection in smart grids and advertising recommendation. This study analyzes the data flow characteristics of typical streaming analysis applications, identifies three bottlenecks in optimizing scalability, and proposes corresponding optimization strategies: the key-level watermark strategy, the dynamic load distribution strategy, and the the key-value based exchange strategy. Based on these optimization strategies, this study implements Trilink based on Flink and applies it to various applications such as low voltage detection, bridge arch crowns monitoring, and the Yahoo Streaming Benchmark. Experimental results show that the modified system, Trilink, achieves more than a 5-fold increase in throughput in a single-machine environment and over a 1.6-fold improvement in horizontal scalability acceleration in an 8-node setup, compared to Flink.
WU Hua , LUO Hao , ZHAO Shi-Shun , LIU Song-Tao , CHENG Guang , HU Xiao-Yan
Online: November 18,2024 DOI: 10.13328/j.cnki.jos.007236
Abstract:The rise of video platforms has led to the rapid dissemination of videos, integrating them into various aspects of social life. Videos transmitted in the network may include harmful content, highlighting an urgent need for cyberspace security supervision to accurately identify harmful videos that are encrypted and transmitted in the network. The existing methods collect traffic data at main network access points to extract the features of encrypted video traffic and identify the harmful videos by matching the traffic features based on harmful video databases. However, with the progress of encryption protocol for video transmission, HTTP/2 using new multiplexing technologies has been widely applied, which makes the traditional traffic analysis method based on HTTP/1.1 features fail to identify encrypted videos using HTTP/2. Moreover, the current research mostly focuses on videos with a fixed resolution during playback. Few studies have considered the impact of resolution switching in video identification. To address the above problems, this study analyzes the factors that cause offsets in the length of the audio/video data during the HTTP/2 transmission process and proposes a method to precisely reconstruct corrected fingerprints for encrypted videos by calculating the size of the combined audio and video segments in the encrypted traffic. The study also proposes an encrypted video identification model based on the hidden Markov model and the Viterbi algorithm by using the corrected fingerprints of encrypted videos and a large plaintext fingerprint database for videos. The model applies dynamic planning to solve the problems caused by adaptive video resolution switching. The proposed model achieves identification accuracy of 98.41% and 97.91% respectively for encrypted videos with fixed and adaptive resolutions in 400000-level fingerprint databases, namely Facebook and Instagram. The study validates the generality and generalization of the proposed method using three video platforms: Triller, Twitter, and Mango TV. The higher application value of the proposed method has been validated through comparisons with similar work in terms of recognition effectiveness, generalization, and time overhead.
LI Zi-Tong , MENG Xiao-Feng , WANG Lei-Xia , HAO Xin-Li
Online: November 18,2024 DOI: 10.13328/j.cnki.jos.007237
Abstract:Machine learning has become increasingly prevalent in daily life. Various machine learning methods are proposed to utilize historical data for making predictions, making people’s life more convenient. However, there is a significant challenge associated with machine learning-privacy leakage. Mere deletion of a user’s data from the training set is not sufficient for avoiding privacy leakage, as the trained model may still harbor this information. To tackle this challenge, the conventional approach entails retraining the model on a new training set that excludes the data of the user. However, this method can be costly, prompting the exploration for a more efficient way to “unlearn” specific data while yielding a model comparable to a retrained one. This study summarizes the current literature on this topic, categorizing existing unlearning methods into three groups: training-based, editing-based, and generation-based methods. Additionally, various metrics are introduced to assess unlearning methods. The study also evaluates current unlearning methods in deep learning and concludes with future research directions in this field.
HE Xian-Hao , HU Yi-Kun , LI Yi-Chen , YAN Yu-Wei , Lü Yi-Sheng , LIAO Qing , LI Yong , LI Ken-Li
Online: November 18,2024 DOI: 10.13328/j.cnki.jos.007238
Abstract:As the scale of cities continues to increase, urban transportation systems are facing more and more challenges, such as traffic congestion and traffic safety. Traffic simulation is a method to solve urban traffic problems. It uses virtual and real computing technologies to process real-time traffic data and optimize urban traffic efficiency. It is an important method to achieve the parallel city theory in intelligent transportation. However, traditional computing systems often encounter problems such as insufficient computing resources and long simulation delays when running large-scale urban traffic simulations. To solve the above problems, this study proposes a parallel algorithm for traffic simulation of parallel cities based on the parallel city theory and the heterogeneous architecture of China’s new-generation supercomputer, Tianhe. This algorithm accurately simulates traffic elements such as vehicles, roads, and traffic signals, and applies methods such as road network division, parallel driving of vehicles, and parallel control of signal lights to achieve high-performance traffic simulation. The algorithm runs on Tianhe, a supercomputing platform with 16 nodes and more than 25 000 cores, and simulates real traffic scenarios involving 2.4 million vehicles, 7 797 intersections, and 170 000 lanes within the Fifth Ring Road in Beijing. Compared with traditional single-node simulation, the proposed algorithm reduces the simulation time of each step from 2.21 s to 0.37 s, achieving nearly 6 times acceleration. An urban traffic simulation with a scale of one million vehicles has been successfully implemented on a domestic heterogeneous supercomputing platform.
LI Yun , GAO Ya , YAO Zhi-Xiu , XIA Shi-Chao , WU Guang-Fu
Online: November 18,2024 DOI: 10.13328/j.cnki.jos.007239
Abstract:Traffic flow prediction is an important foundation and a hot research direction for traffic management in intelligent transportation systems (ITS). Traditional methods for traffic flow prediction typically rely on a large amount of high-quality historical observation data to achieve accurate predictions, but the prediction accuracy significantly decreases in more common scenarios with data scarcity in traffic networks. To address this problem, a transfer learning model is proposed based on spatial-temporal graph convolutional networks (TL-STGCN), which leverages traffic flow features from a source network with abundant data to assist in predicting future traffic flow in a target network with data scarcity. Firstly, a spatial-temporal graph convolutional network based on time attention is employed to learn the spatial and temporal features of the traffic flow data in both the source and target networks. Secondly, domain-invariant spatial-temporal features are extracted from the representations of the two networks using transfer learning techniques. Lastly, these domain-invariant features are utilized to predict the future traffic flow in the target network. To validate the effectiveness of the proposed model, experiments are conducted on real-world datasets. The results demonstrate that TL-STGCN outperforms existing methods by achieving the highest accuracy in mean absolute error, root mean square error, and mean absolute percentage error, which proves that TL-STGCN provides more accurate traffic flow predictions for scenarios with data scarcity in traffic networks.
ZHU Yi-Fan , LUO Cheng-Yang , MA Rui-Yao , CHEN Lu , MAO Yu-Ren , GAO Yun-Jun
Online: November 06,2024 DOI: 10.13328/j.cnki.jos.007177
Abstract:The density-based spatial clustering of applications with noise (DBSCAN) algorithm is one of the clustering analysis methods in the field of data mining. It has a strong capability of discovering complex relationships between objects and is insensitive to noise data. However, existing DBSCAN methods only support the clustering of unimodal objects, struggling with applications involving multi-model data. With the rapid development of information technology, data has become increasingly diverse in real-life applications and contains a huge variety of models, such as text, images, geographical coordinates, and data features. Thus, existing clustering methods fail to effectively model complex multi-model data and cannot support efficient multi-model data clustering. To address these issues, in this study, a density-based clustering algorithm in multi-metric spaces is proposed. Firstly, to characterize the complex relationships within multi-model data, this study uses a multi-metric space to quantify the similarity between objects and employs aggregated multi-metric graph (AMG) to model multi-model data. Next, this study employs differential distances to balance the graph structure and leverages a best-first search strategy combined with pruning techniques to achieve efficient multi-model data clustering. The experimental evaluation on real and synthetic datasets, using various experimental settings, demonstrates that the proposed method achieves at least one order of magnitude improvement in efficiency with high clustering accuracy, and exhibits good scalability.
ZHOU Man , LI Xiang-Qian , WANG Qian , LI Qi , SHEN Chao , ZHOU Yu-Ting
Online: November 06,2024 DOI: 10.13328/j.cnki.jos.007181
Abstract:With the popularity of mobile devices and the enhancement of users’ requirements for privacy protection, studies of user authentication on mobile devices have attracted widespread attention. Recently, the audio infrastructures of mobile devices have provided greater flexibility and scalability for the design of novel user authentication schemes with excellent performance. After surveying a large number of related works, this study first classifies acoustic sensing-based user authentication schemes on mobile devices according to the difference in authentication metrics and sensing methods and describes the corresponding attack model. Then, it analyzes and compares single authentication metric-based and acoustic sensing-based user authentication schemes on mobile devices. Finally, combined with the problems of existing works, this study gives two metrics (security and practicability) to measure the performance of the user authentication system and discuss future research directions.