2025, 36(8):1-18.DOI: 10.13328/j.cnki.jos.007354
Abstract:With the rapid advancement of intelligent cyber-physical system (ICPS), intelligent technologies are increasingly utilized in components such as perception, decision-making, and control. Among these, deep reinforcement learning (DRL) has gained wide application in ICPS control components due to its effectiveness in managing complex and dynamic environments. However, the openness of the operating environment and the inherent complexity of ICPS necessitate the exploration of highly dynamic state spaces during the learning process. This often results in inefficiencies and poor generalization in decision-making. A common approach to address these issues is to abstract large-scale, fine-grained Markov decision processes (MDPs) into smaller-scale, coarse-grained MDPs, thus reducing computational complexity and enhancing solution efficiency. Nonetheless, existing methods fail to adequately ensure consistency between the spatiotemporal semantics of the original states, the abstracted system space, and the real system space. To address these challenges, this study proposes a causal spatiotemporal semantic-driven abstraction modeling method for deep reinforcement learning. First, causal spatiotemporal semantics are introduced to capture the distribution of value changes across time and space. Based on these semantics, a two-stage semantic abstraction process is applied to the states, constructing an abstract MDP model for the deep reinforcement learning process. Subsequently, abstraction optimization techniques are employed to fine-tune the abstract model, minimizing semantic discrepancies between the abstract states and their corresponding detailed states. Finally, extensive experiments are conducted on scenarios including lane-keeping, adaptive cruise control, and intersection crossing. The proposed model is evaluated and analyzed using the PRISM verifier. The results indicate that the proposed abstraction modeling technique demonstrates superior performance in abstraction expressiveness, accuracy, and semantic equivalence.
2024, 35(2):739-757.DOI: 10.13328/j.cnki.jos.006832
Abstract:The mixed cooperative-competitive multi-agent system consists of controlled target agents and uncontrolled external agents. The target agents cooperate with each other and compete with external agents, so as to deal with the dynamic changes in the environment and the external agents and complete tasks. In order to train the target agents and make them learn the optimal policy for completing the tasks, the existing work proposes two kinds of solutions: (1) focusing on the cooperation between target agents, viewing the external agents as a part of the environment, and leveraging the multi-agent-reinforcement learning to train the target agents; but these approaches cannot handle the uncertainty of or dynamic changes in the external agents’ policy; (2) focusing on the competition between target agents and external agents, modeling the competition as two-player games, and using a self-play approach to train the target agents; these approaches are only suitable for cases where there is one target agent and external agent, and they are difficult to be extended to a system consisting of multiple target agents and external agents. This study combines the two kinds of solutions and proposes a counterfactual regret advantage-based self-play approach. Specifically, first, based on the counterfactual regret minimization and counterfactual multi-agent policy gradient, the study designs a counterfactual regret advantage-based policy gradient approach for making the target agent update the policy more accurately. Second, in order to deal with the dynamic changes in the external agents’ policy during the self-play process, the study leverages imitation learning, which takes the external agents’ historical decision-making trajectories as training data and imitates the external agents’ policy, so as to explicitly model the external agents’ behaviors. Third, based on the counterfactual regret advantage-based policy gradient and the modeling of external agents’ behaviors, this study designs a self-play training approach. This approach can obtain the optimal joint policy for training multiple target agents when the external agents’ policy is uncertain or dynamically changing. The study also conducts a set of experiments on the cooperative electromagnetic countermeasure, including three typical mixed cooperative-competitive tasks. The experimental results demonstrate that compared with other approaches, the proposed approach has an improvement of at least 78% in the self-game effect.
2024, 35(4):1914-1933.DOI: 10.13328/j.cnki.jos.006838
Abstract:Attendance may be for private purposes, which is not associated with an organization, such as keeping a personal travel log, or it is for business needs, which is part of organizational attendance and sometimes associated with multiple organizations. Therefore, the recording, sharing, and analysis of attendance data require elaborate management. The HAO attendance system is a lightweight and mobile attendance platform. It takes the user and organization as two starting points and is driven by HAO intelligence consisting of human intelligence (HI), artificial intelligence (AI), and organizational intelligence (OI). This study builds the knowledge graph of the HAO attendance system and puts forward the closed-loop authority management structure of the HAO attendance system, supplemented by the privacy authority management method from coarse-gained to fine-gained level to ensure refined attendance management and protect the users’ privacy, thereby promoting the intelligent transformation of a new-generation attendance system. For organizational attendance analysis, a four-element scoring method and a four-element attendance reporting method are designed to calculate employee attendance scores, generate accurate and comprehensive attendance reports, provide decision-making support for organizations, and inspire the vitality of both organizations and individuals, so as to build intelligent organizations with organizational intelligence.
2023, 34(2):733-760.DOI: 10.13328/j.cnki.jos.006706
Abstract:Deep hierarchical reinforcement learning (DHRL) is an important research field in deep reinforcement learning (DRL). It focuses on sparse reward, sequential decision, and weak transfer ability problems, which are difficult to be solved by classic DRL. DHRL decomposes complex problems and constructs a multi-layered structure for DRL strategies based on hierarchical thinking. By using temporal abstraction, DHRL combines lower-level actions to learn semantic higher-level actions. In recent years, with the development of research, DHRL has been able to make breakthroughs in many domains and shows a strong performance. It has been applied to visual navigation, natural language processing, recommendation system and video description generation fields in real world. In this study, the theoretical basis of hierarchical reinforcement learning (HRL) is firstly introduced. Secondly, the key technologies of DHRL are described, including hierarchical abstraction techniques and common experimental environments. Thirdly, taking the option-based deep hierarchical reinforcement learning framework (O-DHRL) and the subgoal-based deep hierarchical reinforcement learning framework (G-DHRL) as the main research objects, those research status and development trend of various algorithms are analyzed and compared in detail. In addition, a number of DHRL applications in real world are discussed. Finally, DHRL is prospected and summarized.
2023, 34(6):2804-2832.DOI: 10.13328/j.cnki.jos.006496
Abstract:Vertical data partitioning technology logically stores database table attributes satisfying certain semantic conditions in the same physical block, so as to reduce the cost of data accessing and improve the efficiency of query processing. Every query is usually related only to the table’s some attributes in the database, so a subset of the table’s attributes can be used to get the accurate query results. Reasonable vertical data partitioning can make most queries answered without scanning the whole table, so as to reduce the amount of data accessing and improve the efficiency of query processing. Traditional database vertical partitioning methods are mainly based on heuristic rules set by experts. The granularity of partitioning is coarse, and it can not provide different partition optimizations according to the characteristics of workload. Besides, when the scale of workload or the number of attributes becomes large, the execution time of the existing methods are too long to meet the performance requirements of online real-time tuning of database. Therefore, a method called spectral clustering based vertical partitioning (SCVP) is proposed for the online environment. The idea of phased solution is adapted to reduce the time complexity of the algorithm and speed up partitioning. Firstly, SCVP reduces the solution space by increasing the constraint conditions, that is, generating initial partitions by spectral clustering. Secondly, SCVP designs the algorithm to search solution space, that is, the initial partitions are optimized by combining frequent itemset mining and greedy search. In order to further improve the performance of SCVP under high-dimensional attributes, a new method called special clustering based vertical partitioning redesign (SCVP-R) is proposed which is an improved version of SCVP. SCVP-R optimizes the partitions combiner component of SCVP by introducing sympatric-competition mechanism, double-elimination mechanism, and loop mechanism. The experimental results on different datasets show that SCVP and SCVP-R have faster execution time and better performance than the current state-of-the-art vertical partitioning method.
2023, 34(8):3821-3835.DOI: 10.13328/j.cnki.jos.006593
Abstract:In recent years, deep reinforcement learning has been widely used in sequential decisions with positive effects, and it has outstanding advantages in application scenarios with high-dimensional input and large state spaces. However, deep reinforcement learning faces some limitations such as a lack of interpretability, inefficient initial training, and a cold start. To address these issues, this study proposes a dynamic decision framework combing explicit knowledge reasoning with deep reinforcement learning. The framework successfully embeds the priori knowledge in intelligent agent training via explicit knowledge representation and gets the agent intervened by the knowledge reasoning results during the reinforcement learning, so as to improve the training efficiency and the model’s interpretability. The explicit knowledge in this study is categorized into two kinds, namely, heuristic acceleration knowledge and evasive safety knowledge. The heuristic acceleration knowledge intervenes in the decision of the agent in the initial training to speed up the training, while the evasive safety knowledge keeps the agent from making catastrophic decisions to keep the training process stable. The experimental results show that the proposed framework significantly improves the training efficiency and the model’s interpretability under different application scenarios and reinforcement learning algorithms.
2023, 34(8):3836-3852.DOI: 10.13328/j.cnki.jos.006594
Abstract:The realization of safe and efficient behavior decision-making has become a challenging issue for autonomous driving. As autonomous driving industries develop vigorously, industrial professionals and academic members have proposed many autonomous driving behavior decision-making approaches. However, due to the influence of environmental uncertainties as well as requirements for effectiveness and high security of the decision, existing approaches fail to take all these factors into account. Therefore, this study proposes an autonomous driving behavior decision-making approach with the RoboSim model based on the Bayesian network. First, based on domain ontology, the study analyzes the semantic relationship between elements in autonomous driving scenarios and predicts the intention of dynamic entities in scenarios by the LSTM model, so as to provide driving scenario information for establishing the Bayesian network. Next, the autonomous driving behavior decision-making in specific scenarios is inferred by the Bayesian network, and the state transition of the RoboSim model is employed to carry the dynamic execution of behavior decision-making and eliminate the redundant operation of the Bayesian network, thus improving the efficiency of decision-making. The RoboSim model is platform-independent. In addition, it can simulate the decision-making cycle and support validation technologies in different forms. To ensure the safety of the behavior decision-making, this study uses a model checking tool UPPAAL to verify and analyze the RoboSim model. Finally, based on lane change and overtaking cases, this study validates the feasibility of the proposed approach and provides a feasible way to achieve safe and efficient autonomous driving behavior decision-making.
2023, 34(9):4114-4131.DOI: 10.13328/j.cnki.jos.006637
Abstract:Comment generation for software codes has been an important research task in the field of software engineering in the past few years. Several research efforts have achieved impressive results on the open-source datasets that contain copious pairs. In the practice of software enterprises, however, the codes to be commented usually belong to a software project library, and it should be decided first on which code lines the comment generation can achieve better performance; moreover, the code snippets to be commented have different lengths and granularity. Thus, a code comment generation method is required, which can integrate commenting decisions and comment generation and is resistant to noise. To this end, CoComment, a software project-oriented code comment generation approach, is proposed in this study. This approach can automatically extract domain-specific basic concepts from software project documents and then uses code parsing and text matching to propagate and expand these concepts. On this basis, automatic code commenting decisions are made by locating code lines or segments related to these concepts, and corresponding natural language comments with high readability are generated upon the fusion of concepts and contexts with templates. Comparative experiments are conducted on three enterprise software projects containing more than 46000 manually annotated code comments. The experimental results demonstrate the proposed approach can effectively make code commenting decisions and generate more helpful code comments compared with existing methods, which provides an integrated solution to code commenting decisions and comment generation for software projects.
2022, 33(3):1128-1140.DOI: 10.13328/j.cnki.jos.006099
Abstract:In recent years, the application of information technology and electronic medical records and medical records in medical institutions has become more and more widespread, which has resulted in a large amount of medical data in hospital databases. Decision tree is widely used in medical data analysis because of its high classification precision, fast calculation speed, and simple and easily understood classification rules. However, due to the inherent high dimensional feature space and high feature redundancy of medical data, the classification precision of traditional decision trees is low. Based on this, this paper proposes a hybrid feature selection algorithm (GRRGA) that combines information gain ratio ranking grouping and group evolution genetic algorithm. Firstly, the information gain ratio based filtering algorithm is used to sort the original feature set; then, the ranked features are grouped according to the density principle of equal division; finally, a group evolution genetic algorithm is used to perform a search on the ranked feature groups. There are two kinds of evolution methods: in-population and out-population, which use two different fitness functions to control the evolution process in group evolution genetic algorithm. The experimental results show that the average precision index of the GRRGA algorithm on the six UCI datasets is 87.13%, which is significantly better than the traditional feature selection algorithm. In addition, compared with the other two classification algorithms, the feature selection performance of the GRRGA algorithm proposed in this study is optimal. More importantly, the precision index of the bagging method on the arrhythmia and cancer medical datasets is 84.7% and 78.7% respectively, which fully proves the practical application significance of the proposed algorithm.
2022, 33(5):1774-1799.DOI: 10.13328/j.cnki.jos.006561
Keywords:command and control information system; self-adaptation decision-making; search-based software engineering; parallel genetic algorithm; POST-optimizationAbstract:The command and control information system (command and control system) runs in a dynamically changing and complex environment with constantly changed mission requirements. A self-adaptation decision-making method is urgently needed to dynamically generate the optimal strategy for adjusting the system, so as to adapt to changes in the environment or missions and ensure the long-term stable operation. At present, as the command and control system itself and its operating environment continue to become more complex, self-adaptation decision-making methods need to have the online trade-off decision-making ability to deal with multiple unexpected changes, so as to avoid conflicting adjustment consequences or failure to respond to unknown situations in a timely manner. Nevertheless, the current command and control system mostly adopts self-adaptation decision-making methods based on prior knowledge and responding to single changes, which cannot fully meet this capability requirement. Therefore, this study proposes a self-adaptation decision-making method for the command and control system based on parallel search optimization. This method uses search-based software engineering ideas to model the self-adaptation decision-making problem as a search optimization problem, and uses the genetic particle swarm algorithm to achieve the goal of online weighing against multiple changes that occur at the same time. In addition, in order to solve the problems of search efficiency guarantee and strategy selection in the actual application of this method in the command and control system, this study uses parallel genetic algorithm and POST-optimization theory to parallelize the self-adaptation decision-making method and establish a strategy multi-index sorting method to ensure the practicality of the method.