Abstract: In certain designs and applications of practical lattice-based cryptography, the use of a specialized variant of LWE problems, where the public matrix is sampled from a non-uniform distribution, is required to establish the securities of corresponding cryptographic schemes. Recently, the formal definition of LWE problems with semi-uniform seeds was introduced in some work, in which the hardness of Euclidean, ideal, and module lattice-based LWE problems with semi-uniform seeds was proved through reduction roadmaps similar to those employed in the hardness proofs of entropic LWE problems. However, known reduction introduces significant losses in the Gaussian parameters of errors and dimensions. Moreover, additional non-standard assumptions are required to demonstrate the hardness of LWE problems with semi-uniform seeds over rings. In this study, a tighter reduction is proposed for LWE problems with semi-uniform seeds by incorporating modified techniques from the hardness proofs of Hint-LWE problems. The proposed reduction is largely unaffected by the algebraic structures of the underlying problems and can be uniformly applied to Euclidean, ideal, and module lattice-based LWE problems with semi-uniform seeds. The hardness of these LWE problems can be established based on standard LWE assumptions without the need for any additional non-standard assumptions. Furthermore, the dimension of the corresponding LWE problems remains unchanged, and the reduction introduces only minimal losses in Gaussian parameters of errors.
Abstract: SPHINCS+ is a stateless digital signature scheme designed using hash functions and has been proven resistant to quantum computing attacks. However, its wide practical application is constrained by the large size of the generated signature values. To address the issue of the lengthy signature value generated by the WOTS+ one-time signature scheme within SPHINCS+, a compact one-time signature scheme, SM3-OTS, based on Chinese cryptographic algorithm SM3, is proposed in this study. The proposed scheme utilizes the binary and hexadecimal information of the message digest as the indices for node positions in the first 32 hash chains and the last 16 hash chains, respectively. This approach effectively reduces the key length and the signature value length compared to traditional one-time signature schemes based on hash functions. Compared to WOTS+ in SPHINCS+, Balanced WOTS+ in SPHINCS-α, and WOTS+C in SPHINCS+C, the proposed SM3-OTS shortens the signature value length by about 29%, 27%, and 26%, respectively, with a significant improvement in signing performance. In addition, by adopting the SM3 algorithm, SM3-OTS exhibits strong resistance to quantum attacks while maintaining well-balanced overall performance.
Abstract: Kyber, a key encapsulation mechanism based on lattice problems, was the first to be standardized by the National Institute of Standards and Technology (NIST) in 2023. Kyber-AKE, a weak forward-secure authenticated key exchange (AKE) protocol, was constructed by the designers of Kyber and derives session keys in two rounds using three IND-CCA secure key encapsulation mechanisms. This study introduces Kyber-PFS-AKE, a newly proposed authenticated key exchange protocol. In Kyber-PFS-AKE, only IND-CPA secure public-key encryption is utilized, and decryption errors within IND-CPA secure encryption are addressed using the re-encryption technique within the FO transformation, thus simplifying the design of post-quantum Kyber-AKE. A rigorous proof demonstrates that certain operations in the Kyber-AKE protocol are redundant. By eliminating these redundancies, the protocol achieves a simpler and more efficient design. The session key indistinguishability and perfect forward security of Kyber-PFS-AKE are formally proven within the eCK-PFS-PSK model. The proposed Kyber-PFS-AKE is implemented using Kyber-768. PKE with 165-bit quantum security. Experimental results show that compared to Kyber-AKE, the computation time for the initiator is reduced by 38%, while the computation time for the responder is reduced by 30%.
Abstract: Recognized as a crucial privacy-protecting technology, group signatures provide robust anonymity assurances for users. However, conventional group signature schemes often rely on group managers capable of revealing the identities of signers, a feature that contradicts the decentralized nature of blockchain and fails to meet stricter privacy demands in certain applications. To address these limitations, this study introduces a group signature scheme with user-controlled linkability and verifier conditional revocation, inspired by double-authentication-preventing signatures and existing linkable and revocable group signatures. The proposed scheme achieves an optimal balance between user privacy and platform oversight, with a concrete instantiation constructed on lattices. Under the random oracle model, the scheme is demonstrated to satisfy the properties of selfless anonymity, traceability, and non-frameability. Performance evaluations indicate that both time and communication costs remain within acceptable limits, ensuring feasibility for practical deployment. In addition, a post-quantum secure medical data sharing system has been designed, integrating the proposed group signature scheme with blockchain technology.
Abstract: Digital signature algorithms play a vital role in network security infrastructure. The majority of current digital signature schemes rely on RSA and ECC. However, with the rapid advancement of quantum computing, traditional public-key cryptographic schemes face increasing security risks. As a result, researching and deploying cryptographic schemes capable of resisting quantum attacks has become a critical research direction. Following multiple rounds of evaluation and analysis, National Institute of Standards and Technology (NIST) announced the post-quantum digital signature standard ML-DSA in August 2024, with Dilithium as its core algorithm. In light of the high-dimensional polynomial matrix operations characteristic of Dilithium, this study proposes various optimization strategies based on the FPGA platform. These include multifunctional systolic array operation units with configurable parameters, dedicated polynomial parallel sampling modules, reconfigurable storage units designed for multiple parameter sets, and high-parallelism timing state machines tailored for complex multi-module architectures. These optimizations aim to overcome performance bottlenecks and achieve enhanced signature operation efficiency, ultimately realizing a digital signature hardware architecture that supports three security levels simultaneously. The proposed hardware architecture is deployed and evaluated on the Xilinx Artix-7 FPGA platform and compared against existing implementations. The results demonstrate that the proposed design achieves improvements in signature operation efficiency by factors of 7.4, 8.3, and 5.6 across the three security levels, respectively. This advancement provides a robust performance foundation for quantum-resistant digital signature applications and offers valuable insights for the engineering and practical deployment of lattice cryptographic schemes.
Abstract: With the widespread application of blockchain technology, authenticated storage, as a core component, plays a crucial role in ensuring data integrity and consistency. In traditional blockchain systems, authenticated storage is maintained through a series of cryptographic algorithms, which verify transactions and preserve the integrity of ledger states. However, the advent of quantum computers has introduced a significant threat to existing blockchain authentication storage technologies, raising the risk of data breaches and compromised integrity. The most advanced authenticated storage schemes primarily rely on the bilinear Diffie-Hellman assumption, which is susceptible to quantum attacks. To enhance the security and efficiency of authenticated storage, this study introduces a stateless hash signature mechanism and proposes the quantum-resistant blockchain authenticated storage scheme EQAS. The proposed scheme decouples data storage from data authentication, utilizes random forest chains to efficiently generate commitment proofs, and employs a hyper tree structure to perform efficient authentication. Security analyses show that EQAS is resistant to quantum algorithm attacks. Comparative experiments with other authenticated storage schemes demonstrate the superior efficiency and performance of EQAS in handling blockchain authentication storage tasks.
Abstract: With the development of quantum computers, public blockchains relying on traditional elliptic curve digital signatures are expected to face disruptive security risks. A common solution involves the application of post-quantum digital signature algorithms within blockchain systems. For public blockchains utilizing the proof-of-work consensus mechanism, ensuring sufficient computing power is regarded as a critical foundation for security. Energy conservation and the maximization of computing power support have been identified as key research directions. Therefore, a post-quantum blockchain system featuring diversified computing power and autonomous post-quantum signature is proposed in this study. The Dilithium signature scheme, recommended by the National Institute of Standards and Technology (NIST) as a preferred and general-purpose post-quantum signature standard, relies on the security of MLWE and MSIS problems in power-of-two cyclotomic rings. However, similar to the early adoption of the EC-DSA standard in Bitcoin without adherence to the NIST-specific elliptic curves, the rich algebraic structure of power-of-two cyclotomic rings poses greater risks and uncertainties regarding long-term security. To address this, a more conservative and secure approach, based on post-quantum lattice-based cryptography with fewer algebraic structures, is constructed. In this study, a Dilithium variant, Dilithium-Prime, based on a large-Galois-group prime-degree prime-ideal field, is proposed as the signature algorithm for the post-quantum blockchain system to ensure high-confidence transaction signing with post-quantum security. To maximize the computing power support for the post-quantum public blockchain and address the current issue of declining mining pool and miner income, a multi-parent chain auxiliary proof-of-work consensus mechanism is introduced. This mechanism enables the request of computing power from all miners using Sha256 and Scrypt hash calculations to assist in consensus without increasing the workload for existing miners and mining pools. As a result, the source of computing power for the post-quantum blockchain is expanded, and the utilization rate of existing mining pools and miners is improved. In addition, a block and transaction structure, along with a difficulty adjustment algorithm tailored for this multi-parent chain auxiliary proof-of-work consensus mechanism, is proposed. This system stabilizes the block production ratio and production time across different levels of computing power and effectively responds to extreme cases, such as sudden surges or reductions in computing power, ensuring the system’s robustness.
Abstract: In the era of artificial intelligence, efficiently completing the pre-training of large language models to meet requirements for scalability, performance, and stability presents a critical challenge. These systems leverage accelerators and high-speed network interfaces to execute parallel tensor computations and communications, significantly enhancing training efficiency. However, these advancements bring a series of unresolved system design challenges. Based on an analysis of the pre-training process, this study first outlines the training procedures and workload characteristics of large language models. It then reviews system technologies from the perspectives of scalability, performance, and reliability, covering their classifications, underlying principles, current research progress, and key challenges. Finally, this study provides an in-depth analysis of the broader challenges facing large language model pre-training systems and discusses potential directions for future development.
Abstract: In distributed system environments, ensuring high availability of databases poses multiple challenges, including network latency, node failures, and the maintenance of data consistency. Addressing these challenges requires not only advanced technical solutions but also flexible architectural design and refined management strategies. High availability plays a crucial role in maintaining data integrity and consistency, as well as in improving system performance and enhancing fault tolerance. This study provides a comprehensive review of the current challenges and issues associated with high availability in distributed databases. Important concepts, theoretical foundations, and technical approaches are examined, and the current state of research is analyzed across three levels: system and network, data and computing, and application and service. The study aims to deepen the understanding of the difficulties to be addressed and the existing solutions while offering recommendations for future research and technological advancements in the field.
Abstract: Root cause analysis refers to identifying the underlying factors that lead to abnormal failures in complex systems. Causal-based backward reasoning methods, founded on structural causal models, are among the optimal approaches for implementing root cause analysis. Most current causality-driven root cause analysis methods require the prior discovery of the causal structure from data as a prerequisite, making the effectiveness of the analysis heavily dependent on the success of this causal discovery task. Recently, score function-based intervention identification has gained significant attention. By comparing the variance of score function derivatives before and after interventions, this approach detects the set of intervened variables, showing potential to overcome the constraints of causal discovery in root cause analysis. However, mainstream score function-based intervention identification is often limited by the score function estimation step. The analytical solutions used in existing methods struggle to effectively model the real distribution of high-dimensional complex data. In light of recent advances in data generation, this study proposes a diffusion model-guided root cause analysis strategy. Specifically, the proposed method first estimates the score functions corresponding to data distributions before and after the anomaly using diffusion models. It then identifies the set of root cause variables by observing the variance of the first-order derivatives of the overall score function after weighted fusion. Furthermore, to solve the issue of computational overhead raised by the pruning operation, an acceleration strategy is proposed to estimate the score function from the initially trained diffusion model, avoiding the re-training cost of the diffusion model after each pruning operation. Experimental results on simulated and real-world datasets demonstrate that the proposed method accurately identifies the set of root cause variables. Furthermore, ablation studies show that the guidance provided by the diffusion model is critical to the improved performance.
Abstract: GitHub is one of the most popular open-source project management platforms. Due to the need for team collaboration, GitHub introduced an issue tracking function to facilitate project users in submitting and tracking problems or new feature requests. When resolving issues, contributors of open-source projects typically need to execute failure reproducing test cases to reproduce the problems mentioned in the issue and verify whether the issue has been resolved. However, empirical research conducted on the SWE-bench Lite dataset reveals that nearly 90% of issues are submitted without failure reproducing test cases, leading contributors to write additional failure reproducing test cases when resolving the issues, bringing additional work burden. Existing failure reproducing test case generation methods usually rely on stack trace information, but GitHub issues do not explicitly require such information. Therefore, this study proposes a failure reproducing test case generation method based on a large language model, aimed at automatically generating failure reproducing test cases for GitHub issues, assisting issue contributors in reproducing, understanding, and verifying issues, and improving the efficiency of issue resolution. This method first retrieves diverse code context information related to the issue, including error root functions, import statements, and test case examples, then constructs precise prompts to guide the large language model in generating effective failure reproducing test cases. This study conducts comparative and ablation experiments to verify the effectiveness of this method in generating failure reproducing test cases for GitHub issues.
Abstract: In privacy-preserving inference using convolutional neural network (CNN) models, previous research has employed methods such as homomorphic encryption and secure multi-party computation to protect client data privacy. However, these methods typically suffer from excessive prediction time overhead. To address this issue, an efficient privacy-preserving CNN prediction scheme is proposed. This scheme exploits the different computational characteristics of the linear and non-linear layers in CNNs and designs a matrix decomposition computation protocol and a parameterized quadratic polynomial approximation for the ReLU activation function. This enables efficient and secure computation of both the linear and non-linear layers, while mitigating the prediction accuracy loss caused by the approximations. The computations in both the linear and non-linear layers can be performed using lightweight cryptographic primitives, such as secret sharing. Theoretical analysis and experimental results show that, while ensuring security, the proposed scheme improves prediction speed by a factor of 2 to 15, with only about a 2% loss in prediction accuracy.
Abstract: The black-box vulnerability scanner is an essential tool for Web application vulnerability detection, capable of identifying potential security threats effectively before a Web application is launched, thus enhancing the overall security of the application. However, most current black-box scanners primarily collect the attack surface through user operation simulation and regular expression matching. The simulation of user operations is vulnerable to interception by input validation mechanisms and struggles with handling complex event operations, while regular expression matching is ineffective in processing dynamic content. As a result, the scanner cannot effectively address hidden attack surfaces within JavaScript code or dynamically generated attack surfaces, leading to suboptimal vulnerability detection in some Web applications. To resolve these issues, this study proposes a JavaScript Exposure Scanner (JSEScan), a vulnerability scanner enhancement framework based on JavaScript code analysis. The framework integrates static and dynamic code analysis techniques, bypassing form validation and event-triggering restrictions. By extracting attack surface features from JavaScript code, JSEScan identifies attack surfaces and synchronizes them across multiple scanners, enhancing their vulnerability detection capabilities. The experimental results demonstrate that JSEScan increases coverage by 81.02% to 242.15% compared to using a single scanner and uncovers an additional 239 security vulnerabilities when compared to multiple scanners working concurrently, showing superior attack surface collection and vulnerability detection capabilities.
Abstract: As the foundation of AI, deep learning frameworks play a vital role in driving the rapid progress of AI technologies. However, due to the lack of unified standards, compatibility across different frameworks remains limited. Faithful model transformation enhances interoperability by converting a source model into an equivalent model in the target framework. However, the large number and diversity of deep learning frameworks, combined with the increasing demand for custom frameworks, lead to high conversion costs. To address this issue, this study proposes an automatic AI source code migration method between frameworks based on a domain knowledge graph. The method integrates domain knowledge graphs and abstract syntax trees to systematically manage migration challenges. First, the source code is transformed into a framework-specific abstract syntax tree, from which general dependency information and operator-specific details are extracted. By applying the operator and parameter mappings stored in the domain knowledge graph, the code is migrated to the target framework, generating equivalent target model code while significantly reducing engineering complexity. Compared with existing code migration tools, the proposed method supports mutual migration among widely used deep learning frameworks, such as PyTorch, PaddlePaddle, and MindSpore. The approach has proven to be both mature and reliable, with part of its implementation open-sourced in Baidu’s official migration tool, PaConvert.
Abstract: With the rapid development of merchant review websites, the volume of content on these websites has increased significantly, making it challenging for users to quickly find valuable reviews. This study introduces a new task, “multimodal customized review generation”. The task aims to generate customized reviews for specific users about products they have not yet reviewed, thus providing valuable insights into these products. To achieve this goal, this study explores a multimodal review generation framework based on a pre-trained language model. Specifically, a multimodal pre-trained language model is employed, which takes product images and user preferences as inputs. The visual and textual features are then fused to generate customized reviews. Experimental results demonstrate that the proposed model is effective in generating high-quality customized reviews.
Abstract: As core programmable components of blockchain, smart contracts are responsible for asset management and the execution of complex business logic, forming the foundation of decentralized finance (DeFi) protocols. However, with the rapid advancement of blockchain technology, security issues related to smart contracts and DeFi protocols have become increasingly prominent, attracting numerous attackers seeking to exploit vulnerabilities for illicit gains. In recent years, several major security incidents involving smart contracts and DeFi protocols have highlighted the importance of vulnerability detection research, making it a critical area for security defense. This study systematically reviews existing literature and proposes a comprehensive framework for research on vulnerability detection in smart contracts and DeFi protocols. Specifically, vulnerabilities and detection techniques are categorized and analyzed for both domains. For smart contracts, the study focuses on the application of large language models (LLM) as primary detection engines and their integration with traditional methods. For DeFi protocols, it categorizes and details various protocol-level vulnerabilities and their detection methods, analyzing the strengths and limitations of detection strategies before and after attacks, addressing gaps in existing reviews on DeFi vulnerability detection. Finally, this study summarizes the challenges faced by current detection approaches and outlines future research directions, aiming to provide new insights and theoretical support for the security detection of smart contracts and DeFi protocols.
Abstract: Contrastive learning is a self-supervised learning technique widely used in various fields such as computer vision and natural language processing. Graph contrastive learning (GCL) refers to methods that apply contrastive learning techniques to graph data. A review is presented on the basic concepts, methods, and applications of graph contrastive learning. First, the background and significance of GCL, as well as its basic concepts on graph data, are introduced. Then, the mainstream GCL methods are elaborated in detail, including methods with different graph data augmentation strategies, methods with different graph neural network (GNN) encoder structures, and methods with different contrastive loss objectives. Finally, three research ideas for GCL are proposed. Research findings demonstrate that graph contrastive learning is an effective approach for addressing various downstream tasks, including node classification and graph classification.
Abstract: Code comments serve as natural-language descriptions of the source code functionality, helping developers quickly understand the code’s semantics and functionality, thus improving software development and maintenance efficiency. However, writing and maintaining code comments is time-consuming and labor-intensive, often leading to issues such as absence, inconsistency, and obsolescence. Therefore, the automatic generation of comments for source code has attracted significant attention. Existing methods typically use information retrieval techniques or deep learning techniques for automatic code comment generation, but both have their limitations. Some research has integrated these two techniques, but such approaches often fail to effectively leverage the advantages of both methods. To address these issues, this study proposes a semantic reranking-based code comment generation method, SRBCS. SRBCS employs a semantic reranking model to rank and select comments generated by various approaches, thus integrating multiple methods and maximizing their respective strengths in the comment generation process. We compared SRBCS with 11 code comment generation approaches on two subject datasets. Experimental results demonstrate that SRBCS effectively integrates different approaches and outperforms existing methods in code comment generation.
Abstract: The safety of autonomous driving systems (ADSs) is crucial for the implementation of autonomous vehicles (AVs). Therefore, ADSs must undergo thorough evaluation before being released and deployed publicly. Generating diverse, safety-critical test scenarios is a key task for ADS testing. Existing methods for generating ADS test scenarios include reproducing real-world traffic accidents or using search-based techniques. However, the accident-based scenario often fails to uncover safety violations in ADSs due to the gap between human driving and ADSs. The search-based approach tends to produce scenarios with high similarity because of the limitations of the search algorithm. To address these issues, this study proposes LEADE, a road network modeling-based safety-critical scenario generation and adaptive evolution method for ADSs. Specifically, it constructs abstract scenarios from user test requirements and generates concrete scenarios through road network modeling. LEADE then employs an improved adaptive evolutionary search to generate diverse safety-critical scenarios for testing the ADS. LEADE is implemented and evaluated on an industrial-grade full-stack ADS platform, Baidu Apollo. Experimental results demonstrate that LEADE can effectively and efficiently generate safety-critical scenarios and expose 10 diverse safety violations of Apollo. LEADE outperforms two state-of-the-art search-based ADS testing techniques by identifying 4 new types of safety-critical scenarios on the same roads.
Abstract: Key classes are a crucial starting point for understanding complex software, contributing to the optimization of documentation and the compression of reverse-engineered class diagrams. Although many effective key class identification methods have been proposed, three major limitations remain: 1) software networks, which are graphs representing software elements and their dependencies, often include elements that are never or rarely executed at runtime; 2) networks constructed through dynamic analysis are frequently incomplete, potentially omitting truly key classes; and 3) most existing approaches consider only the effect of direct coupling between classes, while ignoring the influence of indirect (non-contact) coupling and the diversity of degree distribution among neighboring nodes. To address these issues, a key class identification approach is proposed that integrates dynamic analysis with a gravitational formula. First, a class coupling network (CCN) is constructed using static analysis to represent classes and their coupling relationships. Second, a gravitational entropy (GEN) metric is introduced to quantify class importance by jointly considering direct and indirect couplings in the CCN and the degree-distribution diversity of neighboring nodes. Third, classes are ranked in descending order based on their GEN values to obtain a preliminary ranking. Finally, dynamic analysis is performed to capture actual runtime interactions between classes, which are used to refine the preliminary results. A threshold is applied to filter out non-key classes, producing a final set of candidate key classes. Experimental results on eight open-source Java projects demonstrate that the proposed method significantly outperforms eleven baseline approaches when considering no more than the top 15% (or top 25) of nodes. The integration of dynamic analysis notably improves the performance of the proposed method. Moreover, the choice of weighting schemes for coupling types has a minimal impact on performance, and the overall computational efficiency is acceptable.
Abstract: Side-channel analysis is a technique that extracts leaked information generated during hardware or software execution to compromise cryptographic keys. Among various approaches, profiling side-channel analysis has been proven to be a powerful method for attacking cryptographic systems. In recent years, the integration of artificial intelligence technology into profiling side-channel analysis has significantly enriched attack strategies and improved efficiency. During the profiling phase, leakage information related to the target device is typically collected by accessing a cloned device. However, practical scenarios often involve discrepancies between the cloned and target devices. Most existing studies rely on a single device for training and validation, resulting in methods that are highly environment-dependent, with limited applicability and poor portability. This study focuses on the portability challenges encountered in complex application scenarios. Challenges arising from variations in parameter settings, algorithm implementations, and hardware differences are analyzed in detail. Solutions and analysis results proposed in recent years are systematically reviewed. Based on this survey, current limitations in portability research on side-channel analysis are summarized, and potential future directions are discussed.
Abstract: In recent years, cryptographic chips have developed rapidly. However, they are also facing a significant threat from non-invasive attacks. Although both international and domestic standards provide testing methods for non-invasive attacks, these standards are formulated for public algorithms and are not applicable to private algorithms, which still present considerable security risks. This study proposes a detection framework for private-algorithm cryptographic chips, which includes three components: timing analysis tests, simple power/electromagnetic analysis tests, and differential power/electromagnetic analysis tests. For the timing analysis test, a method based on average denoising is adopted, which significantly improves the accuracy of execution time measurements. Methods based on visual observation and cross-correlation analysis are presented for simple power/electromagnetic analysis tests. Finally, for differential power analysis, TVLA-1 and TVLA-2 are employed to detect leakages from various sources and evaluate the vulnerabilities of private-algorithm cryptographic chips to differential power attacks. The proposed framework serves as an effective supplement to traditional non-invasive attack detection, significantly expanding its application range. To verify the effectiveness of the framework, black-box experiments are conducted on several cryptographic chips. The results demonstrate that the framework can effectively assess the resilience of private-algorithm cryptographic chips against non-invasive attacks.
Abstract: Bug localization is a critical aspect of software maintenance, and improving the effectiveness and efficiency of automated fault localization has become a central research focus in software engineering. With the surge in open-source software and the increasing demand for software hot updates, automated bug localization focused on change sets has become a key tool for software quality assurance. Traditional bug localization methods based on information retrieval can only represent textual information and fail to fully account for structural and semantic changes within change sets, making them unsuitable for direct application in change set bug localization tasks. Therefore, this study proposes a method for change set bug localization based on graph Transformer, which uses an abstract syntax tree to represent change information and capture code structure changes. The method represents both local and global semantic information of the changed code and bug reports, enabling the matching and localization of bug information within change sets. To validate the effectiveness of the proposed method, it is tested on bug reports and changes from six sets of bug-inducing change sets. Compared to the state-of-the-art models, the proposed method demonstrates improvements of 11.4% and 12.9% in MAP and MRR metrics, respectively, validating the efficacy of the proposed approach.
Abstract: Label-specific features serve as an effective strategy for addressing multi-label classification tasks. By tailoring discriminative features to the individual preferences of each label, such features enhance the generalization capability of classification models. Existing methods typically focus on manipulating features to extract those relevant to label discrimination. Rather than following this conventional approach, this study explores a novel perspective based on feature invariance for label-specific feature learning. Specifically, invariance is injected into classifiers with respect to label-irrelevant features by intentionally manipulating these features for each class label. Accordingly, an invariance-based label-specific feature learning method, termed INVA, is proposed. INVA estimates the feature covariance matrix for each label to capture intra-class variation, thus identifying label-irrelevant features. Classifiers are then endowed with invariance to these features by solving a perturbation risk minimization problem. Furthermore, an upper bound of the perturbation risk is derived to enhance computational efficiency. Comprehensive experiments on standard multi-label benchmark datasets demonstrate the effectiveness of the proposed method.
Abstract: Autonomous driving systems (ADSs) have gained significant attention from both industry and academia due to their substantial economic, safety, and societal benefits, leading to in-depth research and the gradual popularization of applications. However, the introduction of such complex ecosystems can give rise to new safety issues that threaten the lives of pedestrians and impact the existing legal system. Therefore, it is imperative to validate ADSs through various methods such as simulation testing, access reviews, and pilot operations before the implementation and commercialization of ADSs. While the research on module safety has matured, there is still a lack of comprehensive research and organization regarding the safety of complete vehicle systems. Therefore, this study systematically analyzes vehicle system safety testing for ADSs and comprehensively reviews the current mainstream work. First, the architecture of ADSs and the basic procedure of simulation testing are outlined. The literature on vehicle system safety testing over the past six years is reviewed. Based on a universal testing framework, an autonomous driving safety testing framework tailored for vehicle systems is developed. Second, five core research issues are identified based on the aforementioned framework, namely critical scenario generation, test adequacy, adversarial sample generation, test optimization, and test oracle. A detailed analysis and organization of the key technologies, research status, and development context for each issue are provided. The commonly used evaluation metrics and comparative methods in current research are also summarized. Finally, the severe challenges faced by various research directions are summarized, and future research opportunities are anticipated, along with potential solutions.
Abstract: Android application developers need to quickly and accurately reproduce error reports to ensure application quality. However, existing methods often rely solely on crash information provided in stack traces to generate event sequences, making it difficult to accurately locate the crash page and offer effective guidance for dynamic exploration to trigger the crash. To address this issue, this study proposes a component-aware automatic crash reproduction method for Android applications, called CReDroid, which effectively reproduces the crash by leveraging both the title and stack trace of the crash report. First, CReDroid dynamically explores the application under test to construct a component transition graph (CTG) and combines the dynamic exception information from the stack traces with the static component interaction data from the CTG to accurately locate the target crash component. Second, based on the critical operations in the crash report title and the reachable paths in the CTG, CReDroid designs an adaptive strategy that uses the contextual relationship between the current page’s component and the crash component to assign priority scores to GUI widgets. The dynamic exploration process is globally optimized through reinforcement learning to effectively reduce inaccuracies in the prediction process. This study evaluates CReDroid using 74 crash reports and compares its performance with state-of-the-art crash reproduction tools, including CrashTranslator, ReCDroid, and ReproBot, as well as widely used automated testing tools, Monkey and APE. The experimental results show that CReDroid successfully reproduces 57 crash reports, which is 13, 25, 27, 30, and 17 more than CrashTranslator, ReCDroid, ReproBot, Monkey, and APE, respectively. Moreover, for the successfully reproduced crashes, CReDroid reduces the average reproduction time by 26.71%, 94.96%, 71.65%, 84.72%, and 88.56%, compared to CrashTranslator, ReCDroid, ReproBot, Monkey, and APE.
Abstract: The computation of signatures is typically performed on physically insecure devices such as mobile phones or small IoT devices, which may lead to private key exposure and subsequently compromise the entire cryptographic system. Key-insulated signature schemes serve as a method to mitigate the damage caused by private key exposure. In a key-insulated cryptosystem, the public key remains constant throughout the entire time period, and the fixed private key is stored on a physically secure device. At the beginning of each time period, the insecure device interacts with the physically secure device storing the fixed private key to obtain the temporary private key for the current time slice. A secure identity-based key-insulated signature scheme must satisfy both unforgeability and key insulation. Key insulation ensures that even if an adversary obtains temporary private keys for multiple time periods, they cannot forge signatures for other periods. SM9 is a commercial identity-based cryptographic standard independently developed by China. This study applies the key-insulated method to the SM9 identity-based signature scheme to resolve the private key exposure issue present in the original scheme. First, a security model for identity-based key-insulated signatures is presented. Then, an identity-based key-insulated signature scheme based on SM9 is constructed. Finally, detailed security proofs and experimental analysis are provided.
Abstract: Geo-distributed consortium blockchains leverage the characteristics of decentralization, immutability, and traceability to support large-scale applications such as e-commerce, supply chain management, and finance by distributing nodes across multiple data centers. However, traditional consortium blockchains face challenges in performance, scalability, and elasticity in large-scale deployment. Existing blockchains have proposed various approaches in consensus algorithms, concurrency control, and ledger sharding to address the above challenges. First, consensus algorithms are categorized based on network topology, the number of primary nodes, and network models, and different communication optimization strategies during consensus are explored. Second, the advantages and disadvantages of optimistic concurrency control, dependency graph, deterministic concurrency control, and coordination-free consistency in geo-distributed scenarios are discussed. Next, cross-shard commit protocols for blockchain are categorized, and their cross-region coordination overheads are analyzed. Finally, the technical challenges of existing geo-distributed consortium blockchains are highlighted, and future research directions are provided.
Abstract: With the rapid development of technologies such as deep learning and significant breakthroughs in areas including computer hardware and cloud computing, increasingly mature artificial intelligence (AI) technologies are being applied to software systems across various fields. Software systems that incorporate AI models as core components are collectively referred to as intelligence software systems. Based on the application fields of AI technologies, these systems are categorized into image processing, natural language processing, speech processing, and other applications. Unlike traditional software systems, AI models adopt a data-driven programming paradigm in which all decision logic is learned from large-scale datasets. This paradigm shift renders traditional code-based test case generation methods ineffective for evaluating the quality of intelligence software systems. As a result, numerous testing methods tailored for intelligence software systems have been proposed in recent years, including novel approaches for test case generation and evaluation that address the unique characteristics of such systems. This study reviews 80 relevant publications, classifies existing methods according to the types of systems they target, and systematically summarizes test case generation methods for image processing, natural language processing, speech processing, point cloud processing, multimodal data processing, and deep learning models. Potential future directions for test case generation in intelligence software systems are also discussed to provide a reference for researchers in this field.
Abstract: Resource public key infrastructure (RPKI) is a key technology for enhancing border gateway protocol (BGP) security, using cryptographic verification to prevent attacks such as prefix hijacking. Since its formal deployment in 2012, RPKI has grown to cover over half of Internet prefixes. Ongoing research on RPKI deployment helps to provide insights into current trends and identify security issues. This study reviews existing works on RPKI measurement from three perspectives: RPKI data object measurement, ROV measurement, and RPKI infrastructure measurement. It analyzes RPKI data object and ROV coverage metrics, deployment trends, and the effectiveness of different measurement approaches. Moreover, key security vulnerabilities and data quality issues are identified, and recommendations to promote large-scale RPKI deployment are proposed.
Abstract: As an emerging technique in software engineering, automatic source code summarization aims to generate natural language descriptions for given code snippets. State-of-the-art code summarization techniques utilize encoder-decoder neural models; the encoder extracts the semantic representations of the source code, while the decoder translates them into human-readable code summary. However, many existing approaches treat input code snippets as standalone functions, often overlooking the context dependencies between the target function and its invoked subfunctions. Ignoring these dependencies can result in the omission of crucial semantic information, potentially reducing the quality of the generated summary. To this end, in this paper, we introduce DHCS, a dependency-aware hierarchical code summarization neural model. DHCS is designed to improve code summarization by explicitly modeling the hierarchical dependencies between the target function and its subfunctions. Our approach employs a hierarchical encoder consisting of both a subfunction encoder and a target function encoder, allowing us to capture both local and contextual semantic representations effectively. Meanwhile, we introduce a self-supervised task, namely the masked subfunction prediction, to enhance the representation learning of subfunctions. Furthermore, we propose to mine the topic distribution of subfunctions and incorporate them into a summary decoder with a topic-aware copy mechanism. Therefore, it enables the direct extraction of key information from subfunctions, facilitating more effective summary generation for the target function. Finally, we have conducted extensive experiments on three real-world datasets constructed for Python, Java and Go languages, which clearly validate the effectiveness of our approach.
Abstract: The advent of the big data era has introduced massive data applications characterized by four defining attributes—Volume, Variety, Velocity, and Value (4V)—posing revolutionary challenges to conventional data acquisition methods, management strategies, and database processing capabilities. Recent breakthroughs in artificial intelligence (AI), particularly in machine learning and deep learning, have demonstrated remarkable advancements in representation learning, computational efficiency, and model interpretability, thereby offering innovative solutions to these challenges. This convergence of AI and database systems has given rise to a new generation of intelligent database management systems, which integrate AI technologies across three core architectural layers: (1) natural language interfaces for user interaction, (2) automated database administration frameworks (including parameter tuning, index recommendation, diagnostics, and workload management), and (3) machine learning-based high-performance components (such as learned indexes, adaptive partitioning, query optimization, and scheduling). Furthermore, new intelligent component application programming interfaces (APIs) have lowered the integration barrier between AI and database systems. This work systematically investigates intelligent databases through an innovative standardization-centric framework, delineating common processing paradigms across core research themes—interaction paradigms, management architectures, and kernel design. By examining standardized processes, interfaces, and collaboration mechanisms, it uncovers the core logic enabling database self-optimization, synthesizes current research advancements, and critically assesses persistent technical challenges and prospects for future development.
Abstract: The CDCL algorithm for SAT solving is widely used in the field of hardware and software verification, with restart being one of its core components. Currently, mainstream CDCL solvers often employ the "warm restart" technique, which retains key search information such as variable order, assignment preferences, and learnt clauses, and has a very high restart frequency. The warm restart technique tends to make CDCL solvers more inclined to visit the search space that was explored before restarts, which may lead to being trapped in an unfavorable local search space for a long time, lacking exploration for another regions. This paper first tests the existing CDCL algorithms and confirms that under different initial search settings, the runtime for an instance of mainstream CDCL solvers exhibits significant fluctuations. To leverage this observation, the paper proposes the "cold restart" technique that forgets search information, specifically by periodically forgetting variable order, assignment preferences, and learned clauses. Experimental results demonstrate that this technique can effectively improve mainstream CDCL algorithms. Additionally, the paper further extends its parallel version, where each thread explores different search spaces, enhancing the performance of the parallel algorithm. Moreover, the cold restart technique primarily improves the performance of sequential and parallel solvers for the solving ability on satisfiable instances, providing new insights for designing satisfiable-oriented solvers. Specifically, our parallel cold restart techniques can improve 41.84% of the PAR2 score of Pakis on satisfiable instances. The parallel SAT solvers named ParKissat-RS including the ideas in this paper won the parallel track of SAT competitions with a significantly margin of 24% faster.
Abstract: Blockchain, as a distributed ledger technology, ensures data security, transparency, and immutability through encryption and consensus mechanisms, offering transformative solutions across various industries. In China, blockchain-based software has attracted widespread attention and application, demonstrating considerable potential in fields such as cross-border payments, supply chain finance, and government services. These applications not only enhance the efficiency and transparency of business processes but also reduce trust costs and offer new approaches for the digital transformation of traditional industries. This study investigates the development trends and core technologies of Chinese blockchain software, focusing on key technological breakthroughs, promoting integration and innovation, and providing a foundation for the formulation of technical standards. The aim is to enhance the competitiveness of Chinese blockchain technologies, broaden application scenarios, and support the standardized development of the industry. Three core research questions are addressed: (1) What are the development trends of Chinese blockchain software? (2) What are the core technologies involved? (3) What are the differences in core technologies between Chinese and foreign blockchain software? To address these questions, 1268 blockchain software entries have been collected through three channels. Based on information regarding affiliated companies and chief technology officers (CTOs), 103 Chinese blockchain software entries are identified. A statistical analysis of basic software attributes is conducted, examining development trends from three perspectives: software development history, distribution, and interrelationships. Given the importance of technical and development documentation, 39 high-quality blockchain software entries containing detailed technical information are further selected. Subsequently, a statistical and analytical evaluation of the core technologies of these 39 software systems is conducted across six technical layers of blockchain architecture. Based on this analysis, differences in core technologies between Chinese and foreign blockchain software are compared. In total, 28 phenomena and 13 insights are identified. These findings provide researchers, developers, and practitioners with a comprehensive understanding of the current state of Chinese blockchain development and offer valuable references for future adoption and improvement of Chinese blockchain software.
Abstract: This study investigates meet-in-the-middle attacks on three types of unbalanced generalized Feistel structures and conducts quantum meet-in-the-middle attacks in Q1 model. First, for the 3-branch Type-III generalized Feistel structure, a 4-round meet-in-the-middle distinguisher is constructed using multiset and differential enumeration techniques. By expanding one round forward and one round backward, a 6-round meet-in-the-middle attack is conducted. With the help of Grover’s algorithm and the quantum claw finding algorithm, a 6-round quantum key recovery attack is performed, requiring O(23?/2·?) quantum queries, where ? is the branch length of the generalized Feistel structure. Then, for the 3-branch Type-I structure, a 9-round distinguisher is similarly extended by one round in both directions to conduct an 11-round meet-in-the-middle attack and a quantum key recovery attack with time complexities of O(22?) 11-round encryptions and O(23?/2·?) quantum queries. Finally, taking the 3-cell generalized Feistel structure as a representative case, this study explores a quantum meet-in-the-middle attack on an n-cell structure. A 2n-round meet-in-the-middle distinguisher is constructed, enabling a 2(n+1)-round meet-in-the-middle attack and quantum key recovery attack. The associated time complexities are O(22?) 2(n+1)-round encryptions and O(23?/2·?) quantum queries. The results demonstrate that the time complexity in Q1 model is significantly reduced compared with classical scenarios.
Abstract: Query optimization is a critical component in database systems, where execution costs are minimized by identifying the most efficient query execution plan. Traditional query optimizers typically rely on fixed rules or simple heuristic algorithms to refine or select candidate plans. However, with the growing complexity of relational schemas and queries in real-world applications, such optimizers struggle to meet the demands of modern applications. Learned query optimization algorithms integrate machine learning techniques into the optimization process. They capture features of query plans and complex schemas to assist traditional optimizers. These algorithms offer innovative and effective solutions in areas such as cost modeling, join optimization, plan generation, and query rewriting. This study reviews recent achievements and developments in four main categories of learned query optimization algorithms. Future research directions are also discussed, aiming to provide a comprehensive understanding of the current state of research and to support further investigation in this field.
Abstract: Smart contracts, as automatically executed computer transaction protocols, are widely applied in blockchain networks to implement various types of business logic. However, the strict immutability of blockchain poses significant challenges for smart contract maintenance, making upgradeability a prominent research topic. This study focuses on upgradeable smart contracts, systematically reviewing their development status both domestically and internationally, and introducing seven mainstream upgradeable contract models. The research is summarized from four key perspectives: upgradeable smart contracts, application requirements, upgrade frameworks, and security oversight. It covers multiple stages, including design, implementation, testing, deployment, and maintenance. The goal is to provide insights and references for the further development of blockchain applications.
Abstract: Accurate workload forecasting is essential for effective cloud resource management. However, existing models typically employ fixed architectures to extract sequential features from different perspectives, which limits the flexibility of combining various model structures to further improve forecasting performance. To address this limitation, a novel ensemble framework SAC-MWF is proposed based on the soft actor-critic (SAC) algorithm for multi-view workload forecasting. A set of feature sequence construction methods is developed to generate multi-view feature sequences at low computational cost from historical windows, enabling the model to focus on workload patterns from different perspectives. Subsequently, a base prediction model and several feature prediction models are trained on historical windows and their corresponding feature sequences, respectively, to capture workload dynamics from different views. Finally, the SAC algorithm is employed to integrate these models to generate the final forecast. Experimental results on three datasets demonstrate that SAC-MWF performs excellently in terms of effectiveness and computational efficiency.
Abstract: In recent years, pre-trained models that take code as input have achieved significant performance gains in various critical code-based tasks. However, these models remain susceptible to adversarial attacks implemented through semantic-preserving code transformations, which can severely compromise model robustness and pose serious security issues. Although adversarial training, leveraging adversarial examples as augmented data, has been employed to enhance robustness, its effectiveness and efficiency often fall short when facing unseen attacks with varying granularities and strategies. To address these limitations, a novel adversarial defense technique based on code normalization, named CoDefense, is proposed. This method integrates a multi-granularity code normalization approach as a preprocessing module, which normalizes both the original training data during training and the inputcode during inference. By doing so, the proposed method mitigates the impact of potential adversarial examples and effectively defends against attacks of diverse types and granularities. To evaluate the effectiveness and efficiency of CoDefense, a comprehensive experimental study is constructed, encompassing 27 scenarios across three representative adversarial attack methods, three widely-used pre-trained code models, and three code-based classification and generation tasks. Experimental results demonstrate that CoDefense significantly outperforms state-of-the-art adversarial training methods in both robustness and efficiency. Specifically, it achieves an average defense success rate of 95.33% against adversarial attacks and improves time efficiency by an average of 85.86%.
Abstract: The evolution of RFID-based passive Internet of Things (IoT) systems comprises three stages: traditional UHF RFID (also referred to as standalone or Passive 1.0), local area network-based coverage (networked or Passive 2.0), and wide-area cellular coverage (cellular or Passive 3.0). Wireless sensing in passive IoT is characterized by zero power consumption, low cost, and ease of deployment, enabling object tagging and close-proximity sensing. With the emergence of cellular passive IoT, passive IoT wireless sensing is playing an increasingly important role in enabling ubiquitous sensing within IoT systems. This study first introduces the concept and development path of passive IoT. Based on fundamental sensing principles, recent research advancements are reviewed across four representative objectives: localization and tracking, object status detection, human behavior recognition, and vital sign monitoring. Given that most existing research relies on commercial UHF RFID devices to extract signal features for data processing, the development direction of passive IoT wireless sensing technology is further examined from the perspectives of new architecture, new air interface, and new capabilities. Moreover, this study offers reflections on the integration of communication and sensing in the design of next-generation air interfaces from a sensing-oriented perspective, aiming to provide new insights into the advancements in passive IoT wireless sensing technologies.
Abstract: As concerns over data privacy continue to grow, secure multi-party computation (MPC) has gained considerable research attention due to its ability to protect sensitive information. However, the communication and memory demands of MPC protocols limit their performance in privacy-preserving machine learning (PPML). Reducing interaction rounds and memory overhead in secure computation protocols remains both essential and challenging, particularly in GPU-accelerated environments. This study focuses on the design and implementation of GPU-friendly protocols for linear and nonlinear computations. To eliminate overhead associated with integer operations, 64-bit integer matrix multiplication, and convolution are implemented using CUDA extensions in PyTorch. A most significant bit (MSB) extraction protocol with low communication rounds is proposed, based on 0-1 encoding. In addition, a low-communication-complexity hybrid multiplication protocol is introduced to reduce the communication overhead of secure comparison, enabling efficient computation of ReLU activation layers. Finally, Antelope, a GPU-based 3-party framework, is proposed to support efficient privacy-preserving machine learning. This framework significantly reduces the performance gap between secure and plaintext computation and supports end-to-end training of deep neural networks. Experimental results demonstrate that the proposed framework achieves 29×–101× speedup in training and 1.6×–35× in inference compared to the widely used CPU-based FALCON (PoPETs 2020). When compared with GPU-based approaches, training performance reaches 2.5×–3× that of CryptGPU (S&P 2021) and 1.2×–1.6× that of Piranha (USENIX Security 2022), while inference is accelerated by factors of 11× and 2.8×, respectively. Notably, the proposed secure comparison protocol exhibits significant advantages when processing small input sizes.
Abstract: Test case prioritization (TCP) has gained significant attention due to its potential to reduce testing costs. Greedy algorithms based on various prioritization strategies are commonly used in TCP. However, most existing greedy algorithm-based TCP techniques rely on a single prioritization strategy and process all test cases simultaneously during each iteration, without considering the relationships between test cases. This results in excessive computational overhead when handling coverage information and performing prioritization, thus reducing overall efficiency. Among single-strategy approaches, the Additional strategy has been extensively studied but remains highly sensitive to random factors. When a tie occurs, test cases are typically selected at random, compromising prioritization effectiveness. To address these issues, a test case prioritization approach based on two-phase grouping (TPG-TCP) is proposed. In the first phase, coarse-grained grouping is conducted by mining hidden relationships among test cases, thus dividing them into a key group and an ordinary group. This lays the groundwork for applying diversity-based strategies in the next phase to enhance prioritization efficiency. In the second phase, fine-grained prioritization of test cases is performed. Key test cases are further subdivided based on the number of iterations. To mitigate the randomness inherent in the Additional strategy, a TP-Additional strategy based on test case potency is introduced to prioritize a portion of the key test cases. Meanwhile, a simple and efficient Total strategy is applied to prioritize the ordinary test cases and remaining key test cases. The results from the Total strategy are appended to those produced by the TP-Additional strategy. This method improves both the effectiveness and efficiency of test case prioritization. Experimental results on six datasets, compared with eight existing methods, demonstrate that the proposed method achieves average improvements of 1.29% in APFD and 9.54% in TETC.
Abstract: With the rapid development of lattice-based post-quantum cryptography, algorithms for hard problems in lattices have become an essential tool for evaluating the security of post-quantum cryptographic schemes. Algorithms such as enumeration, sieve, and lattice basis reduction have been developed under the classical computing model, while quantum algorithms for solving hard problems in lattices, such as quantum sieve and quantum enumeration, are gradually attracting attention. Although lattice problems possess post-quantum properties, techniques such as quantum search can accelerate a range of lattice algorithms. Given the challenges involved in solving hard problems in lattices, this study first summarizes and analyzes the research status of quantum algorithms for such problems and organizes their design principles. Then, the quantum computing techniques applied in these algorithms are introduced, followed by an analysis and comparison of their computational complexities. Finally, potential future developments and research directions for quantum algorithms addressing Lattice-based hard problems are discussed.
Abstract: Segment routing over IPv6 (SRv6), as a key enabling technology for the next-generation network architecture, introduces a flexible segment routing forwarding plane, offering revolutionary opportunities to enhance network intelligence and expand service capabilities. This study aims to provide a comprehensive review of the evolution and research status of SRv6 in recent years. First, the study systematically summarizes the applications of SRv6 in network architecture and performance, network management and operation, and emerging service support, highlighting the unique advantages of SRv6 in fine-grained scheduling, flexible programming, and service convergence. Meanwhile, the study deeply analyzes the key challenges SRv6 faces in performance and efficiency, reliability and security, and deployment and evolution strategies, and focuses on discussing the current mainstream solutions and development trends. Finally, from the perspectives of industrial ecosystem construction, artificial intelligence integration, and industry convergence innovation, the study provides forward-looking thoughts and prospects on the future development directions and challenges of SRv6. The research findings of this study will provide theoretical references and practical guidance for operators in building open, intelligent, and secure next-generation networks.
Abstract: With the development of information technology, the interaction between information networks, human society, and physical space deepens, and the phenomenon of information space risk overflow becomes more severe. Fraudulent incidents have sharply increased, making fraud detection an important research field. Fraudulent behavior has brought numerous negative impacts to society, gradually presenting emerging characteristics such as intelligence, industrialization, and high concealment. Traditional expert rules and deep graph neural network algorithms are becoming increasingly limited in addressing fraudulent activities. Current fraud detection methods often rely on local information from the nodes themselves and neighboring nodes, either focusing on individual users, analyzing the relationship between nodes and graph topology, or utilizing graph embedding technology to learn node representations. Although these approaches offer certain fraud detection capabilities, they overlook the crucial role of long-range association patterns of entities and fail to explore common patterns among massive fraudulent paths, limiting comprehensive fraud detection capabilities. In response to the limitations of existing fraud detection methods, this study proposes a graph fraud detection model called path aggregation graph neural network (PA-GNN), based on path aggregation. The model includes variable-length path sampling, position-related unified path encoding, path interaction and aggregation, and aggregation-related fraud detection. Several paths originating from a node interact globally and compare their similarities, extracting common patterns among fraudulent paths, thus more comprehensively revealing the association patterns between fraudulent behaviors, and achieving fraud detection through path aggregation. Experimental results across multiple datasets in fraud scenarios, including financial transactions, social networks, and review networks, show that the area under the curve (AUC) and average precision (AP) metrics of the proposed method have significantly improved compared to the optimal benchmark models. In addition, the proposed method uncovers potential common fraudulent path patterns for fraud detection tasks, driving nodes to learn these important patterns and obtain more expressive representations, which offers a certain level of interpretability.
Abstract: The (t, N) threshold multi-party private set intersection (TMP-PSI) protocol allows a given party’s data element x to appear in the private sets of no fewer than t–1 other parties. The data element x is then output as the intersection result, which is widely applied in scenarios such as proposal voting, financial transaction threat identification, and security assessment. Existing threshold multi-party private set intersection protocols suffer from low efficiency, high communication rounds, and a limitation that only a specific participant can obtain the intersection. To address these issues, this study proposes a threshold testing method based on robust secret sharing (RSS) and a TMP-PSI scheme combined with oblivious key-value store (OKVS), which effectively reduces both computational overhead and the number of communication rounds. To meet the demand for multiple participants to access the intersection information from their private sets, this study also proposes a second extended threshold multi-party private set intersection (ETMP-PSI) protocol, which modifies the share distribution method. Compared to the first scheme, the secret distributor and secret reconstructor do not incur additional communication rounds or computational complexity, allowing multiple participants to obtain the intersection elements from their private sets. The proposed protocol runs in 6.4 seconds (TMP-PSI) and 8.7 seconds (ETMP-PSI) in a three-party scenario with a dataset size of n=216. Compared to existing threshold multi-party private set intersection protocols, the communication complexity between the reconstructor and distributor is reduced from O(nNtlognλ) to O(bNλ).
Abstract: Intelligent question answering (QA) system utilizes information retrieval and natural language processing techniques to deliver automated responses to user inquiries. Like other artificial intelligence software, intelligent QA system is prone to bugs. These bugs can degrade user experience, cause financial losses, or even trigger social panic. Therefore, it is crucial to detect and fix bugs in intelligent QA system promptly. Automated testing approaches fall into two categories. The first approach synthesizes hypothetical facts based on questions and predicted answers, then generates new questions and expected answers to detect bugs. The second approach generates semantically equivalent test inputs by injecting knowledge from existing datasets, ensuring the answer to the question remains unchanged. However, both methods have limitations in practical use. They rely heavily on the intelligent QA system’s output or training set, which results in poor testing effectiveness and generalization, especially for large-language-model-based intelligent QA systems. Moreover, these methods primarily assess semantic understanding while neglecting the logical reasoning capabilities of intelligent QA system. To address this gap, a logic-guided testing technique named QALT is proposed. It designs three logically related metamorphic relations and uses semantic similarity measurement and dependency parsing to generate high-quality test cases. The experimental results show that QALT detected a total of 9247 bugs in two different intelligent QA systems, which is 3150 and 3897 more bugs than the two current state-of-the-art techniques (i.e., QAQA and QAAskeR), respectively. Based on the statistical analysis of manually labeled results, QALT detects approximately 8073 true bugs, which is 2142 more than QAQA and 4867 more than QAAskeR. Moreover, the test inputs generated by QALT successfully reduce the MR violation rate from 22.33% to 14.37% when used for fine-tuning the intelligent QA system under test.
Abstract: The performance and operational characteristics of the domain name system (DNS) protocol continue to attract significant attention from both the research community and network operators. In this study, data collected from a large-scale DNS recursive service is measured and analyzed to examine user access patterns and resolution behavior from the perspective of a major DNS operator. To handle the massive volume of DNS data, this study proposes a distributed parallel measurement mechanism and a big data-based storage and monitoring solution, enabling efficient processing and analysis. The characteristics of DNS data are systematically examined across several dimensions, including user request response rates, domain name request patterns, user distribution, and resolution outcomes. Several valuable insights are presented, offering meaningful guidance for DNS operation optimization and improved understanding of DNS behavior. Finally, based on the analysis of DNS cache hit rates, this study proposes a general framework for online anomaly detection tailored to large-scale DNS operators. The correctness and feasibility of the proposed framework are preliminarily verified.
Abstract: In the field of time series data analysis, cross-domain data distribution shifts significantly weaken model generalization performance. To address this, an end-to-end time series domain adaptation framework, called TPN, is developed. This framework creatively integrates a temporal pattern activation module (TPAM) with a Transformer encoder. TPAM captures spatial and temporal dependencies of sequence features through dual-layer spatio-temporal convolution operations, combines Sigmoid and Tanh activation functions for the non-linear fusion of extracted features, and restores the original channel dimensions via linear projection, thus enhancing the model’s ability to extract temporal features. TPN also introduces an enhanced adversarial paradigm (EAP), which strengthens generator-discriminator-based collaborative adversarial learning through domain classification loss and operation order prediction loss. This effectively reduces data distribution discrepancies between source and target domains, improving the model’s domain adaptability. Empirical results on three public human activity recognition datasets (Opportunity, WISDM, and HHAR) demonstrate that TPN improves accuracy and F1 by up to 6% compared to existing methods, with fewer parameters and shorter runtime. In-depth ablation and visualization experiments further validate the effectiveness of TPAM and EAP, showing TPN’s strong performance in feature extraction and domain alignment.
Abstract: Blockchain, also known as a distributed ledger, is a prominent example of next-generation information technology. It has been widely applied in various fields, including finance, healthcare, energy, and government affairs. Privacy protection technologies within the blockchain that can be regulated not only safeguard users’ privacy and enhance trust but also prevent misuse of blockchain for illegal activities, ensuring compliance with regulations. Current privacy protection schemes for regulatable blockchains are typically based on bilinear pairing, which exhibit relatively low computational efficiency and fail to meet the demands of high-concurrency scenarios. To address these issues, this study proposes an efficient regulatable identity privacy protection scheme in blockchain. By designing a zero-knowledge proof to verify the consistency of the receiver’s identity without bilinear pairing, along with a traceable ring signature scheme, this approach effectively protects the identity privacy of both parties in transactions while maintaining the effectiveness of supervision. The experimental results indicate that when the number of ring members is set to 16, as required by Monero, the execution time of all algorithms in the efficient regulatable identity privacy protection scheme in blockchain is within 5 milliseconds. Compared to similar schemes, efficiency has improved by more than 14 times, and the message length has been reduced to 50% of the original scheme, demonstrating enhanced computational efficiency and a shorter message length.
Abstract: Attribute-based searchable encryption (ABSE) enables secure and fine-grained sharing of encrypted data in multi-user environments. However, it typically encounters challenges such as high computational overhead for encryption and decryption, limited query efficiency, and the inability to update indexes dynamically. To address these limitations, this study proposes an efficient searchable scheme based on ABSE that supports dynamic index updates. The reuse of identical access policies minimizes redundant computation during encryption. Most decryption operations are securely outsourced to the cloud, thus reducing the local device’s computational load. An inverted index structure supporting multi-keyword Boolean retrieval is constructed by integrating hash tables with skip lists. BLS short signature technology is employed to verify the permissions for index updates, ensuring data owners can manage the retrieval of encrypted data. Formal security analysis confirms that the proposed scheme effectively defends against collusion attacks, chosen plaintext attacks, forged update tokens, and decryption key forgery. Experimental results demonstrate high efficiency in both retrieval and index update operations, along with a significant reduction in encryption overhead when access policy reuse occurs.
Abstract: In recent years, the increasing complexity of space missions has led to an exponential growth in space-generated data. However, limited satellites-to-ground bandwidth and scarce frequency resources pose significant challenges to traditional bent-pipe architecture, which faces severe transmission bottlenecks. In addition, onboard data must wait for satellites to pass over ground stations before transmission. The large-scale construction of ground stations is not only cost-prohibitive but also carries geopolitical and economic risks. Satellite edge computing has emerged as a promising solution to these bottlenecks by integrating mobile edge computing technology into satellite edges. This approach significantly enhances user experience and reduces redundant network traffic. By enabling onboard data processing, satellite edge computing shortens data acquisition times and reduces reliance on extensive ground station infrastructure. Furthermore, the integration of artificial intelligence (AI) and edge computing technologies offers an efficient and forward-looking path to address existing challenges. This study reviews the latest progress in intelligent satellite edge computing. First, the demands and applications of satellite edge computing in various typical scenarios are discussed. Next, key challenges and recent research advancements in this field are analyzed. Finally, several open research topics are highlighted, and new ideas are proposed to guide future studies. This discussion aims to provide valuable insights to promote technological innovation and the practical implementation of satellite edge computing.
Abstract: 文章由CCF软件工程专业委员会白颖教授推荐。
文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,315-325页.
原文链接如下:https://doi.org/10.1145/3106237.3106242,
读者如需引用该文请标引原文出处。
Abstract: 文章由CCF软件工程专业委员会白颖教授推荐。
文章发表Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the ACM SigSoft Symposium on The Foundations of Software Engineering (ESEC/FSE),ACM,2017年9月,303-314页.
原文链接如下:https://doi.org/10.1145/3106237.3106239,
读者如需引用该文请标引原文出处。
Abstract: GitHub, a popular social-software-development
platform, has fostered a variety of software ecosystems where
projects depend on one another and
practitioners interact with
each other. Projects within an
ecosystem often have complex
inter-dependencies that impose new challenges in bug
reporting and fixing. In this paper, we conduct an empirical
study on cross-project correlated bugs, i.e., causally related
bugs reported to different projects, focusing on two aspects: 1)
how developers track the root causes across projects; and 2)
how the downstream developers coordinate to deal with
upstream bugs. Through manual inspection of bug reports collected from the scientific Python ecosystem and an online survey with developers, this study reveals the common practices of developers and the
various factors in fixing cross-project bugs. These findings provide implications for future software bug analysis in the scope of ecosystem, as well as shed light on the requirements of issue trackers for such bugs.
Abstract: 文章由CCF软件工程专业委员会白颖教授推荐。
文章发表在Proceedings of the 39th International Conference on Software Engineering, Pages 27-37, Buenos Aires, Argentina — May 20 - 28, 2017, IEEE Press Piscataway, NJ, USA ?2017, ISBN: 978-1-5386-3868-2
原文链接如下:http://dl.acm.org/citation.cfm?id=3097373,
读者如需引用该文请标引原文出处。
Abstract: 文章由CCF软件工程专业委员会白颖教授推荐。
文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 871-882. DOI: https://doi.org/10.1145/2950290.2950364
原文链接如下:http://dl.acm.org/citation.cfm?id=2950364,
读者如需引用该文请标引原文出处。
Abstract: 文章由CCF软件工程专业委员会白颖教授推荐。
文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Pages 133—143, Seattle WA, USA, November 2016.
原文链接如下:http://dl.acm.org/citation.cfm?id=2950327,
读者如需引用该文请标引原文出处。
Abstract: 文章由CCF软件工程专业委员会白颖教授推荐。
文章发表在Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'16), 810 – 821, November 13 - 18, 2016.
原文链接如下:https://doi.org/10.1145/2950290.2950310,
读者如需引用该文请标引原文出处。
Abstract: 文章由CCF软件工程专业委员会白颖教授推荐。
文章发表在FSE'16会议上Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering,
原文链接如下:http://dl.acm.org/citation.cfm?id=2950340,
读者如需引用该文请标引原文出处。
Abstract: CCF 软件工程专业委员会白晓颖教授(清华大学)推荐。
原文发表在 ASE 2016 Proceedings of the 31st IEEE/ACM International Conference on Automated
Software Engineering。 全文链接:http://dx.doi.org/10.1145/2970276.2970307。
重要提示:读者如引用该文时请标注原文出处。
Abstract: Social recommender systems have recently become one of the hottest topics in the domain of recommender systems. The main task of social recommender system is to alleviate data sparsity and cold-start problems, and improve its performance utilizing users' social attributes. This paper presents an overview of the field of social recommender systems, including trust inference algorithms, key techniques and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
Abstract: This paper presents several new insights into system software, which is one of the basic concepts in computing discipline, from three perspectives of essential features, characteristics of the times, and the future development trend. The first insight is that system software stems theoretically and technically from universal Turing machine and the idea of stored-program, with an essential feature of "manipulating the execution of a computing system". There are two typical manipulation modes:encoding and then loading, executing and controlling. The second insight is that software system is a kind of software, in the Internet age, providing substantial online services continuously, which lay the foundation for the newly emerged "software-as-a-service" paradigm. The final insight is about its development trend:system software will evolve online continuously. Driven by innovations of computing systems, integration of cyber and physical spaces, and intelligence technologies, system software will become the core of future software ecology.
Abstract: With the rapid development of cloud computing technology, its security issues have become more and more obvious and received much attention in both industry and academia. High security risk is widespread in traditional cloud architecture. Hacking into a virtual machine destroys the availability of cloud services or resources. Un-Trusted cloud storage makes it more difficult to share or search users' private data. The risk of privacy leakage is caused by various outsourcing computation and application requirements. From the perspective of security and privacy preserving technologies in cloud computing, this paper first introduces related research progress of cloud virtualization security, cloud data security and cloud application security. In addition, it analyzes the characteristics and application scopes of typical schemes, and compares their different effectiveness on the security defense and privacy preserving. Finally, the paper discusses current limitations and possible directions for future research.
Abstract: In recent years, transfer learning has provoked vast amount of attention and research. Transfer learning is a new machine learning method that applies the knowledge from related but different domains to target domains. It relaxes the two basic assumptions in traditional machine learning: (1) the training (also referred as source domain) and test data (also referred target domain) follow the independent and identically distributed (i.i.d.) condition; (2) there are enough labeled samples to learn a good classification model, aiming to solve the problems that there are few or even not any labeled data in target domains. This paper surveys the research progress of transfer learning and introduces its own works, especially the ones in building transfer learning models by applying generative model on the concept level. Finally, the paper introduces the applications of transfer learning, such as text classification and collaborative filtering, and further suggests the future research direction of transfer learning.
Abstract: Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
Abstract: Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
Abstract: Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
Abstract: Cloud Computing is the fundamental change happening in the field of Information Technology. It is a
representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
Abstract: Mobile recommender systems have recently become one of the hottest topics in the domain of recommender systems. The main task of mobile recommender systems is to improve the performance and accuracy along with user satisfaction utilizing mobile context, mobile social network and other information. This paper presents an overview of the field of mobile recommender systems including key techniques, evaluation and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
Abstract: Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
Abstract: Learning to rank(L2R) techniques try to solve sorting problems using machine learning methods, and have been well studied and widely used in various fields such as information retrieval, text mining, personalized recommendation, and biomedicine.The main task of L2R based recommendation algorithms is integrating L2R techniques into recommendation algorithms, and studying how to organize a large number of users and features of items, build more suitable user models according to user preferences requirements, and improve the performance and user satisfaction of recommendation algorithms.This paper surveys L2R based recommendation algorithms in recent years, summarizes the problem definition, compares key technologies and analyzes evaluation metrics and their applications.In addition, the paper discusses the future development trend of L2R based recommendation algorithms.
Abstract: The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
Abstract: This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
Abstract: Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
Abstract: The development of mobile internet and the popularity of mobile terminals produce massive trajectory data of moving objects under the era of big data. Trajectory data has spatio-temporal characteristics and rich information. Trajectory data processing techniques can be used to mine the patterns of human activities and behaviors, the moving patterns of vehicles in the city and the changes of atmospheric environment. However, trajectory data also can be exploited to disclose moving objects' privacy information (e.g., behaviors, hobbies and social relationships). Accordingly, attackers can easily access moving objects' privacy information by digging into their trajectory data such as activities and check-in locations. In another front of research, quantum computation presents an important theoretical direction to mine big data due to its scalable and powerful storage and computing capacity. Applying quantum computing approaches to handle trajectory big data could make some complex problem solvable and achieve higher efficiency. This paper reviews the key technologies of processing trajectory data. First the concept and characteristics of trajectory data is introduced, and the pre-processing methods, including noise filtering and data compression, are summarized. Then, the trajectory indexing and querying techniques, and the current achievements of mining trajectory data, such as pattern mining and trajectory classification, are reviewed. Next, an overview of the basic theories and characteristics of privacy preserving with respect to trajectory data is provided. The supporting techniques of trajectory big data mining, such as processing framework and data visualization, are presented in detail. Some possible ways of applying quantum computation into trajectory data processing, as well as the implementation of some core trajectory mining algorithms by quantum computation are also described. Finally, the challenges of trajectory data processing and promising future research directions are discussed.
Abstract: With the increasing of social network, social recommendation becomes hot research topic in recommendation systems. Matrix factorization based (MF-based) recommendation model gradually becomes the key component of social recommendation due to its high expansibility and flexibility. Thus, this paper focuses on MF-based social recommendation methods. Firstly, it reviews the existing social recommendation models according to the model construction strategies. Next, it conducts a series of experiments on real-world datasets to demonstrate the performance of different social recommendation methods from three perspectives including whole-users, cold start-users, and long-tail items. Finally, the paper analyzes the problems of MF-based social recommendation model, and discusses the possible future research directions and development trends in this research area.
Abstract: Recommender systems have been successfully adopted as an effective tool to alleviate information overload and assist users to make decisions. Recently, it has been demonstrated that incorporating social relationships into recommender models can enhance recommendation performance. Despite its remarkable progress, a majority of social recommendation models have overlooked the item relations-a key factor that can also significantly influence recommendation performance. In this paper, a approach is first proposed to acquire item relations by measuring correlations among items. Then, a co-regularized recommendation model is put forward to integrate the item relations with social relationships by introducing co-regularization term in the matrix factorization model. Meanwhile, that the co-regularization term is a case of weighted atomic norm is illustrated. Finally, based on the proposed model a recommendation algorithm named CRMF is constructed. CRMF is compared with existing state-of-the-art recommendation algorithms based on the evaluations over four real-world data sets. The experimental results demonstrate that CRMF is able to not only effectively alleviate the user cold-start problem, but also help obtain more accurate rating predictions of various users.
Abstract: The explosive growth of the digital data brings great challenges to the relational database management systems in addressing issues in areas such as scalability and fault tolerance. The cloud computing techniques have been widely used in many applications and become the standard effective approach to manage large scale data because of their high scalability, high availability and fault tolerance. The existing cloud-based data management systems can't efficiently support complex queries such as multi-dimensional queries and join queries because of lacking of index or view techniques, limiting the application of cloud computing in many respects. This paper conducts an in-depth research on the index techniques for cloud data management to highlight their strengths and weaknesses. This paper also introduces its own preliminary work on the index for massive IOT data in cloud environment. Finally, it points out some challenges in the index techniques for big data in cloud environment.
Abstract: Graph embedding is a fundamental technique for graph data mining. The real-world graphs not only consist of complex network structures, but also contain diverse vertex information. How to integrate the network structure and vertex information into the graph embedding procedure is a big challenge. To deal with this challenge, a graph embedding method, which is based on deep leaning technique while taking into account the prior knowledge on vertices information, is proposed in this paper. The basic idea of the proposed method is to regard the vertex features as the prior knowledge, and learn the representation vector through optimizing an objective function that simultaneously keeps the similarity of network structure and vertex features. The time complexity of the proposed method is O(|V|), where|V|is the count of vertices in the graph. This indicates the proposed method is suitable for large-scale graph analysis. Experiments on several data sets demonstrate that, compared with the state-of-art baselines, the proposed method is able to achieve favorable and stable results for the task of node classification.
Abstract: Group recommender systems have recently become one of the most prevalent topics in recommender systems. As an effective solution to the problem of group recommendation, Group recommender systems have been utilized in news, music, movies, food, and so forth through extending individual recommendation to group recommendation. The existing group recommender systems usually employ aggregating preference strategy or aggregating recommendation strategy, but the effectiveness of both two methods is not well solved yet, and they respectively have their own advantages and disadvantages. Aggregating preference strategy possesses a fairness problem between group members, whereas aggregating recommendation strategy pays less attention to the interaction between group members. This paper proposes an enhanced group recommendation method based on preference aggregation, incorporating simultaneously the advantages of the aforesaid two aggregation methods. Further, the paper demonstrates that group preference and personal preference are similar, which is also considered in the proposed method. Experimental results show that the proposed method outperforms baselines in terms of effectiveness based on Movielens dataset.
Abstract: Event-Based social networks (EBSNs) have experienced rapid growth in people's daily life. Hence, event recommendation plays an important role in helping people discover interesting online events and attend offline activities face to face in the real world. However, event recommendation is quite different from traditional recommender systems, and there are several challenges:(1) One user can only attend a scarce number of events, leading to a very sparse user-event matrix; (2) The response data of users is implicit feedback; (3) Events have their life cycles, so outdated events should not be recommended to users; (4) A large number of new events which are created every day need to be recommended to users in time. To cope with these challenges, this article proposes to jointly model heterogeneous social and content information for event recommendation. This approach explores both the online and offline social interactions and fuses the content of events to model their joint effect on users' decision-making for events. Extensive experiments are conducted to evaluate the performance of the proposed model on Meetup dataset. The experimental results demonstrate that the proposed model outperforms state-of-the-art methods.
Abstract: The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
Abstract: Since the factorization machine (FM) model can effectively solve the sparsity problem of high-dimensional data feature combination with high prediction accuracy and computational efficiency, it has been widely studied and applied in the field of click-through-rate (CTR) prediction and recommender systems. The review of the progress on the subsequent research on FM and its related models will help to promote the further improvement and application of the model. By comparing the relationship between the FM model and the polynomial regression model and the factorization model, the flexibility and generality of the FM model are described. Considering width extension, the strategies, methods, and key technologies are summarized from the dimensions of high-order feature interaction, field-aware feature interaction and hierarchical feature interaction, as well as feature extraction, combining, intelligent selection and promotion based on feature engineering. The integration approaches and benefits of FM model with other models, especially the combination with deep learning models are compared and analyzed, which provides insights into the in-depth expansion of traditional models. The learning and optimization methods of FM models and the implementation based on different parallel and distributed computing frameworks are summarized, compared, and analyzed. Finally, the authors forecast the difficult points, hot spots and development trends in the FM model that need to be further studied.
Abstract: Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
Abstract: This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
Abstract: The development of Internet has brought convenience to the public, but also troubles users in making choices among enormous data. Thus, recommender systems based on user understanding are urgently in need. Different from the traditional techniques that usually focus on individual users, the social-based recommender systems perform better with integrating social influence modeling to achieve more accurate user profiling. However, current works usually generalize influence in simple mode, while deep discussions on intrinsic mechanism have been largely ignored. To solve this problem, this paper studies the social influence within users who affects both rating and user attributes, and then proposes a novel trust-driven PMF (TPMF) algorithm to merge these two mechanisms. Furthermore, to deal with the task that different user should have personalized parameters, the study clusters users according to rating correlation and then maps them to corresponding weights, thereby achieving the personalized selection of users' model parameters. Comprehensive experiments on open data sets validate that TPMF and its derivation algorithm can effectively predict users' rating compared with several state of the art baselines, which demonstrates the capability of the presented influence mechanism and technical framework.
Abstract: Recommending valuable and interesting contents for microblog users is an important way to improve the user experience. In this study, tags are considered as the users' interests and a microblog recommendation method based on hypergraph random walk tag Extension and tag probability correlation is proposed via the analysis of characteristics and the existing limitations of microblog recommendation algorithm. Firstly, microblogs are considered as hyperedges, while each term is taken as the hypervertex, and the weighting strategies for both hyperedges and hypervertexes are established. A random walk is conducted on the hypergraph to obtain a number of keywords for the expansion of microblog users. And then the weight of the tag for each user is enhanced based on the relevance weighting scheme and the user tag matrix can be constructed. Probability correlation between tags is calculated to construct the tag similarity matrix, which can be used to update the matrix is updated using the label similarity matrix, which contains both the user interest information and the relationship between tags and tags. Experimental results show that the algorithm is effective in microblog recommendation.
Abstract: The newly emerging event-based social network (EBSN) based on the event as the core combines the online relationship with offline activities to promote the formation of real and effective social relationship among users. However, excessive activity information would make users difficult to distinguish and choose. The context-aware local event recommendation is an effective solution for the information overload problem, but most of existing local event recommendation algorithms only learns users' preference for contextual information indirectly from statistics of historical event participation and ignores latent correlations among them, which impacts on recommendation effectiveness. To take full advantage of latent correlations between users' event preference and contextual information, the proposed collective contextual relation learning (CCRL) algorithm models relations among users' participation records and related contextual information such as event organizer, description text, venue, and starting time. Then multi-relational Bayesian personalized ranking (MRBPR) algorithm is adapted for collective contextual relation learning and local event recommendation. Experiment results on Meetup dataset demonstrate that proposed algorithm outperforms state-of-the-art local event recommendation algorithms in terms of many metrics.
Abstract: With the rapid development of e-business, web applications based on the Web are developed from localization to globalization, from B2C(business-to-customer) to B2B(business-to-business), from centralized fashion to decentralized fashion. Web service is a new application model for decentralized computing, and it is also an effective mechanism for the data and service integration on the web. Thus, web service has become a solution to e-business. It is important and necessary to carry out the research on the new architecture of web services, on the combinations with other good techniques, and on the integration of services. In this paper, a survey presents on various aspects of the research of web services from the basic concepts to the principal research problems and the underlying techniques, including data integration in web services, web service composition, semantic web service, web service discovery, web service security, the solution to web services in the P2P (Peer-to-Peer) computing environment, and the grid service, etc. This paper also presents a summary of the current art of the state of these techniques, a discussion on the future research topics, and the challenges of the web services.
Abstract: Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
Abstract: Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
Abstract: Information flow analysis is a promising approach for protecting the confidentiality and integrity of information manipulated by computing systems. Taint analysis, as in practice, is widely used in the area of software security assurance. This survey summarizes the latest advances on taint analysis, especially the solutions applied in different platform applications. Firstly, the basic principle of taint analysis is introduced along with the general technology of taint propagation implemented by dynamic and static analyses. Then, the proposals applied in different platform frameworks, including techniques for protecting privacy leakage on Android and finding security vulnerabilities on Web, are analyzed. Lastly, further research directions and future work are discussed.
Abstract: Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
Abstract: Cyber-Physical Systems (CPSs) have great potentials in several application domains. Time plays an important role in CPS and should be specified in the very early phase of requirements engineering. This paper proposes a framework to model and verify timing requirements for the CPS. To begin with, a conceptual model is presented for providing basic concepts of timing and functional requirements. Guided by this model, the CPS software timing requirement specification can be obtained from CPS environment properties and constraints. To support formal verification, formal semantics for the conceptual model is provided. Based on the semantics, the consistency properties of the timing requirements specification are defined and expressed as CTL formulas. The timing requirements specification is transformed into a NuSMV model and checked by this well-known model checker.
Abstract: In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
Abstract: This paper firstly presents a summary of AADL (architecture analysis and design language), including
its progress over the years and its modeling elements. Then, it surveys the research and practice of AADL from a
model-based perspective, such as AADL modeling, AADL formal semantics, model transformation, verification and
code generation. Finally, the potential research directions are discussed.
Abstract: Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
Abstract: The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
Abstract: This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
Abstract: With the explosive growth of network applications and complexity, the threat of Internet worms against network security becomes increasingly serious. Especially under the environment of Internet, the variety of the propagation ways and the complexity of the application environment result in worm with much higher frequency of outbreak, much deeper latency and more wider coverage, and Internet worms have been a primary issue faced by malicious code researchers. In this paper, the concept and research situation of Internet worms, exploration function component and execution mechanism are first presented, then the scanning strategies and propagation model are discussed, and finally the critical techniques of Internet worm prevention are given. Some major problems and research trends in this area are also addressed.
Abstract: This paper studies uncertain graph data mining and especially investigates the problem of mining frequent subgraph patterns from uncertain graph data. A data model is introduced for representing uncertainties in graphs, and an expected support is employed to evaluate the significance of subgraph patterns. By using the apriori property of expected support, a depth-first search-based mining algorithm is proposed with an efficient method for computing expected supports and a technique for pruning search space, which reduces the number of subgraph isomorphism testings needed by computing expected support from the exponential scale to the linear scale. Experimental results show that the proposed algorithm is 3 to 5 orders of magnitude faster than a na?ve depth-first search algorithm, and is efficient and scalable.
Abstract: This paper introduces the concrete details of combining the automated reasoning techniques with planning methods, which includes planning as satisfiability using propositional logic, Conformant planning using modal logic and disjunctive reasoning, planning as nonmonotonic logic, and Flexible planning as fuzzy description logic. After considering experimental results of International Planning Competition and relevant papers, it concludes that planning methods based on automated reasoning techniques is helpful and can be adopted. It also proposes the challenges and possible hotspots.
Abstract: Sensor networks are integration of sensor techniques, nested computation techniques, distributed computation techniques and wireless communication techniques. They can be used for testing, sensing, collecting and processing information of monitored objects and transferring the processed information to users. Sensor network is a new research area of computer science and technology and has a wide application future. Both academia and industries are very interested in it. The concepts and characteristics of the sensor networks and the data in the networks are introduced, and the issues of the sensor networks and the data management of sensor networks are discussed. The advance of the research on sensor networks and the data management of sensor networks are also presented.
Abstract: Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
Abstract: Intrusion detection is a highlighted topic of network security research in recent years. In this paper, first the necessity o f intrusion detection is presented, and its concepts and models are described. T hen, many intrusion detection techniques and architectures are summarized. Final ly, the existing problems and the future direction in this field are discussed.
Abstract: In a multi-hop wireless sensor network (WSN), the sensors closest to the sink tend to deplete their energy faster than other sensors, which is known as an energy hole around the sink. No more data can be delivered to the sink after an energy hole appears, while a considerable amount of energy is wasted and the network lifetime ends prematurely. This paper investigates the energy hole problem, and based on the improved corona model with levels, it concludes that the assignment of transmission ranges of nodes in different coronas is an effective approach for achieving energy-efficient network. It proves that the optimal transmission ranges for all areas is a multi-objective optimization problem (MOP), which is NP hard. The paper proposes an ACO (ant colony optimization)-based distributed algorithm to prolong the network lifetime, which can help nodes in different areas to adaptively find approximate optimal transmission range based on the node distribution. Furthermore, the simulation results indicate that the network lifetime under this solution approximates to that using the optimal list. Compared with existing algorithms, this ACO-based algorithm can not only make the network lifetime be extended more than two times longer, but also have good performance in the non-uniform node distribution.
Abstract: With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.
Abstract: Software architecture (SA) is emerging as one of the primary research areas in software engineering recently and one of the key technologies to the development of large-scale software-intensive system and software product line system. The history and the major direction of SA are summarized, and the concept of SA is brought up based on analyzing and comparing the several classical definitions about SA. Based on summing up the activities about SA, two categories of study about SA are extracted out, and the advancements of researches on SA are subsequently introduced from seven aspects.Additionally,some disadvantages of study on SA are discussed,and the causes are explained at the same.Finally,it is concluded with some singificantly promising tendency about research on SA.
Abstract: Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.
Abstract: Routing technology at the network layer is pivotal in the architecture of wireless sensor networks. As an active branch of routing technology, cluster-based routing protocols excel in network topology management, energy minimization, data aggregation and so on. In this paper, cluster-based routing mechanisms for wireless sensor networks are analyzed. Cluster head selection, cluster formation and data transmission are three key techniques in cluster-based routing protocols. As viewed from the three techniques, recent representative cluster-based routing protocols are presented, and their characteristics and application areas are compared. Finally, the future research issues in this area are pointed out.
Abstract: Sensor network, which is made by the convergence of sensor, micro-electro-mechanism system and networks technologies, is a novel technology about acquiring and processing information. In this paper, the architecture of wireless sensor network is briefly introduced. Next, some valuable applications are explained and forecasted. Combining with the existing work, the hot spots including power-aware routing and media access control schemes are discussed and presented in detail. Finally, taking account of application requirements, several future research directions are put forward.
Abstract: The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
Abstract: Cloud Computing is the fundamental change happening in the field of Information Technology. It is a
representation of a movement towards the intensive, large scale specialization. On the other hand, it brings about not only convenience and efficiency problems, but also great challenges in the field of data security and privacy protection. Currently, security has been regarded as one of the greatest problems in the development of Cloud Computing. This paper describes the great requirements in Cloud Computing, security key technology, standard and regulation etc., and provides a Cloud Computing security framework. This paper argues that the changes in the above aspects will result in a technical revolution in the field of information security.
Abstract: This paper surveys the state of the art of sentiment analysis. First, three important tasks of sentiment analysis are summarized and analyzed in detail, including sentiment extraction, sentiment classification, sentiment retrieval and summarization. Then, the evaluation and corpus for sentiment analysis are introduced. Finally, the applications of sentiment analysis are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, making detailed comparison and analysis.
Abstract: Network community structure is one of the most fundamental and important topological properties of complex networks, within which the links between nodes are very dense, but between which they are quite sparse. Network clustering algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications, and can be used to analyze the topological structures, understand the functions, recognize the hidden patterns, and predict the behaviors of complex networks including social networks, biological networks, World Wide Webs and so on. This paper reviews the background, the motivation, the state of arts as well as the main issues of existing works related to discovering network communities, and tries to draw a comprehensive and clear outline for this new and active research area. This work is hopefully beneficial to the researchers from the communities of complex network analysis, data mining, intelligent Web and bioinformatics.
Abstract: Evolutionary multi-objective optimization (EMO), whose main task is to deal with multi-objective optimization problems by evolutionary computation, has become a hot topic in evolutionary computation community. After summarizing the EMO algorithms before 2003 briefly, the recent advances in EMO are discussed in details. The current research directions are concluded. On the one hand, more new evolutionary paradigms have been introduced into EMO community, such as particle swarm optimization, artificial immune systems, and estimation distribution algorithms. On the other hand, in order to deal with many-objective optimization problems, many new dominance schemes different from traditional Pareto-dominance come forth. Furthermore, the essential characteristics of multi-objective optimization problems are deeply investigated. This paper also gives experimental comparison of several representative algorithms. Finally, several viewpoints for the future research of EMO are proposed.
Abstract: This paper surveys the current technologies adopted in cloud computing as well as the systems in enterprises. Cloud computing can be viewed from two different aspects. One is about the cloud infrastructure which is the building block for the up layer cloud application. The other is of course the cloud application. This paper focuses on the cloud infrastructure including the systems and current research. Some attractive cloud applications are also discussed. Cloud computing infrastructure has three distinct characteristics. First, the infrastructure is built on top of large scale clusters which contain a large number of cheap PC servers. Second, the applications are co-designed with the fundamental infrastructure that the computing resources can be maximally utilized. Third, the reliability of the whole system is achieved by software building on top of redundant hardware instead of mere hardware. All these technologies are for the two important goals for distributed system: high scalability and high availability. Scalability means that the cloud infrastructure can be expanded to very large scale even to thousands of nodes. Availability means that the services are available even when quite a number of nodes fail. From this paper, readers will capture the current status of cloud computing as well as its future trends.
Abstract: This paper first introduces the key features of big data in different processing modes and their typical application scenarios, as well as corresponding representative processing systems. It then summarizes three development trends of big data processing systems. Next, the paper gives a brief survey on system supported analytic technologies and applications (including deep learning, knowledge computing, social computing, and visualization), and summarizes the key roles of individual technologies in big data analysis and understanding. Finally, the paper lays out three grand challenges of big data processing and analysis, i.e., data complexity, computation complexity, and system complexity. Potential ways for dealing with each complexity are also discussed.
Abstract: This paper makes a comprehensive survey of the recommender system research aiming to facilitate readers to understand this field. First the research background is introduced, including commercial application demands, academic institutes, conferences and journals. After formally and informally describing the recommendation problem, a comparison study is conducted based on categorized algorithms. In addition, the commonly adopted benchmarked datasets and evaluation methods are exhibited and most difficulties and future directions are concluded.
Abstract: Automatic generation of poetry has always been considered a hard nut in natural language generation.This paper reports some pioneering research on a possible generic algorithm and its automatic generation of SONGCI. In light of the characteristics of Chinese ancient poetry, this paper designed the level and oblique tones-based coding method, the syntactic and semantic weighted function of fitness, the elitism and roulette-combined selection operator, and the partially mapped crossover operator and the heuristic mutation operator. As shown by tests, the system constructed on the basis of the computing model designed in this paper is basically capable of generating Chinese SONGCI with some aesthetic merit. This work represents progress in the field of Chinese poetry automatic generation.
Abstract: Few-shot learning is defined as learning models to solve problems from small samples. In recent years, under the trend of training model with big data, machine learning and deep learning have achieved success in many fields. However, in many application scenarios in the real world, there is not a large amount of data or labeled data for model training, and labeling a large number of unlabeled samples will cost a lot of manpower. Therefore, how to use a small number of samples for learning has become a problem that needs to be paid attention to at present. This paper systematically combs the current approaches of few-shot learning. It introduces each kind of corresponding model from the three categories: fine-tune based, data augmentation based, and transfer learning based. Then, the data augmentation based approaches are subdivided into unlabeled data based, data generation based, and feature augmentation based approaches. The transfer learning based approaches are subdivided into metric learning based, meta-learning based, and graph neural network based methods. In the following, the paper summarizes the few-shot datasets and the results in the experiments of the aforementioned models. Next, the paper summarizes the current situation and challenges in few-shot learning. Finally, the future technological development of few-shot learning is prospected.
Abstract: Graphics processing unit (GPU) has been developing rapidly in recent years at a speed over Moor抯 law, and as a result, various applications associated with computer graphics advance greatly. At the same time, the highly processing power, parallelism and programmability available nowadays on the contemporary GPU provide an ideal platform on which the general-purpose computation could be made. Starting from an introduction to the development history and the architecture of GPU, the technical fundamentals of GPU are described in the paper. Then in the main part of the paper, the development of various applications on general purpose computation on GPU is introduced, and among those applications, fluid dynamics, algebraic computation, database operations, and spectrum analysis are introduced in detail. The experience of our work on fluid dynamics has been also given, and the development of software tools in this area is introduced. Finally, a conclusion is made, and the future development and the new challenge on both hardware and software in this subject are discussed.
Abstract: Probabilistic graphical models are powerful tools for compactly representing complex probability distributions, efficiently computing (approximate) marginal and conditional distributions, and conveniently learning parameters and hyperparameters in probabilistic models. As a result, they have been widely used in applications that require some sort of automated probabilistic reasoning, such as computer vision and natural language processing, as a formal approach to deal with uncertainty. This paper surveys the basic concepts and key results of representation, inference and learning in probabilistic graphical models, and demonstrates their uses in two important probabilistic models. It also reviews some recent advances in speeding up classic approximate inference algorithms, followed by a discussion of promising research directions.
Abstract: Android is a modern and most popular software platform for smartphones. According to report, Android accounted for a huge 81% of all smartphones in 2014 and shipped over 1 billion units worldwide for the first time ever. Apple, Microsoft, Blackberry and Firefox trailed a long way behind. At the same time, increased popularity of the Android smartphones has attracted hackers, leading to massive increase of Android malware applications. This paper summarizes and analyzes the latest advances in Android security from multidimensional perspectives, covering Android architecture, design principles, security mechanisms, major security threats, classification and detection of malware, static and dynamic analyses, machine learning approaches, and security extension proposals.
Abstract: Symbolic propagation methods based on linear abstraction play a significant role in neural network verification. This study proposes the notion of multi-path back-propagation for these methods. Existing methods are viewed as using only a single back-propagation path to calculate the upper and lower bounds of each node in a given neural network, being specific instances of the proposed notion. Leveraging multiple back-propagation paths effectively improves the accuracy of this kind of method. For evaluation, the proposed method is quantitatively compared using multiple back-propagation paths with the state-of-the-art tool DeepPoly on benchmarks ACAS Xu, MNIST, and CIFAR10. The experiment results show that the proposed method achieves significant accuracy improvement while introducing only a low extra time cost. In addition, the multi-path back-propagation method is compared with the Optimized LiRPA based on global optimization, on the dataset MNIST. The results show that the proposed method still has an accuracy advantage.
Abstract: Context-Aware recommender systems, aiming to further improve performance accuracy and user satisfaction by fully utilizing contextual information, have recently become one of the hottest topics in the domain of recommender systems. This paper presents an overview of the field of context-aware recommender systems from a process-oriented perspective, including system frameworks, key techniques, main models, evaluation, and typical applications. The prospects for future development and suggestions for possible extensions are also discussed.
Abstract: Computer aided detection/diagnosis (CAD) can improve the accuracy of diagnosis,reduce false positive,and provide decision supports for doctors.The main purpose of this paper is to analyze the latest development of computer aided diagnosis tools.Focusing on the top four fatal cancer's incidence positions,major recent publications on CAD applications in different medical imaging areas are reviewed in this survey according to different imaging techniques and diseases.Further more,multidimentional analysis is made on the researches from image data sets,algorithms and evaluation methods.Finally,existing problems,research trend and development direction in the field of medical image CAD system are discussed.
Abstract: Ultrasonography is the first choice of imaging examination and preoperative evaluation for thyroid and breast cancer. However, ultrasonic characteristics of benign and malignant nodules are commonly overlapped. The diagnosis heavily relies on operator's experience other than quantitative and stable methods. In recent years, medical imaging analysis based on computer technology has developed rapidly, and a series of landmark breakthroughs have been made, which provides effective decision supports for medical imaging diagnosis. In this work, the research progress of computer vision and image recognition technologies in thyroid and breast ultrasound images is studied. A series of key technologies involved in automatic diagnosis of ultrasound images is the main lines of the work. The major algorithms in recent years are summarized and analyzed, such as ultrasound image preprocessing, lesion localization and segmentation, feature extraction and classification. Moreover, multi-dimensional analysis is made on the algorithms, data sets, and evaluation methods. Finally, existing problems related to automatic analysis of those two kinds of ultrasound imaging are discussed, research trend and development direction in the field of ultrasound images analysis are discussed.
Abstract: Network abstraction brings about the naissance of software-defined networking. SDN decouples data plane and control plane, and simplifies network management. The paper starts with a discussion on the background in the naissance and developments of SDN, combing its architecture that includes data layer, control layer and application layer. Then their key technologies are elaborated according to the hierarchical architecture of SDN. The characteristics of consistency, availability, and tolerance are especially analyzed. Moreover, latest achievements for profiled scenes are introduced. The future works are summarized in the end.
Abstract: Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
Abstract: In many areas such as science, simulation, Internet, and e-commerce, the volume of data to be analyzed grows rapidly. Parallel techniques which could be expanded cost-effectively should be invented to deal with the big data. Relational data management technique has gone through a history of nearly 40 years. Now it encounters the tough obstacle of scalability, which relational techniques can not handle large data easily. In the mean time, none relational techniques, such as MapReduce as a typical representation, emerge as a new force, and expand their application from Web search to territories that used to be occupied by relational database systems. They confront relational technique with high availability, high scalability and massive parallel processing capability. Relational technique community, after losing the big deal of Web search, begins to learn from MapReduce. MapReduce also borrows valuable ideas from relational technique community to improve performance. Relational technique and MapReduce compete with each other, and learn from each other; new data analysis platform and new data analysis eco-system are emerging. Finally the two camps of techniques will find their right places in the new eco-system of big data analysis.
Abstract: Wireless Sensor Networks, a novel technology about acquiring and processing information, have been proposed for a multitude of diverse applications. The problem of self-localization, that is, determining where a given node is physically or relatively located in the networks, is a challenging one, and yet extremely crucial for many applications. In this paper, the evaluation criterion of the performance and the taxonomy for wireless sensor networks self-localization systems and algorithms are described, the principles and characteristics of recent representative localization approaches are discussed and presented, and the directions of research in this area are introduced.
Abstract: Task parallel programming model is a widely used parallel programming model on multi-core platforms. With the intention of simplifying parallel programming and improving the utilization of multiple cores, this paper provides an introduction to the essential programming interfaces and the supporting mechanism used in task parallel programming models and discusses issues and the latest achievements from three perspectives: Parallelism expression, data management and task scheduling. In the end, some future trends in this area are discussed.
Abstract: The Internet traffic model is the key issue for network performance management, Quality of Service
management, and admission control. The paper first summarizes the primary characteristics of Internet traffic, as well as the metrics of Internet traffic. It also illustrates the significance and classification of traffic modeling. Next, the paper chronologically categorizes the research activities of traffic modeling into three phases: 1) traditional Poisson modeling; 2) self-similar modeling; and 3) new research debates and new progress. Thorough reviews of the major research achievements of each phase are conducted. Finally, the paper identifies some open research issue and points out possible future research directions in traffic modeling area.
Abstract: The development of mobile internet and the popularity of mobile terminals produce massive trajectory data of moving objects under the era of big data. Trajectory data has spatio-temporal characteristics and rich information. Trajectory data processing techniques can be used to mine the patterns of human activities and behaviors, the moving patterns of vehicles in the city and the changes of atmospheric environment. However, trajectory data also can be exploited to disclose moving objects' privacy information (e.g., behaviors, hobbies and social relationships). Accordingly, attackers can easily access moving objects' privacy information by digging into their trajectory data such as activities and check-in locations. In another front of research, quantum computation presents an important theoretical direction to mine big data due to its scalable and powerful storage and computing capacity. Applying quantum computing approaches to handle trajectory big data could make some complex problem solvable and achieve higher efficiency. This paper reviews the key technologies of processing trajectory data. First the concept and characteristics of trajectory data is introduced, and the pre-processing methods, including noise filtering and data compression, are summarized. Then, the trajectory indexing and querying techniques, and the current achievements of mining trajectory data, such as pattern mining and trajectory classification, are reviewed. Next, an overview of the basic theories and characteristics of privacy preserving with respect to trajectory data is provided. The supporting techniques of trajectory big data mining, such as processing framework and data visualization, are presented in detail. Some possible ways of applying quantum computation into trajectory data processing, as well as the implementation of some core trajectory mining algorithms by quantum computation are also described. Finally, the challenges of trajectory data processing and promising future research directions are discussed.
Abstract: In this paper, the existing intrusion tolerance and self-destruction technology are integrated into autonomic computing in order to construct an autonomic dependability model based on SM-PEPA (semi-Markov performance evaluation process algebra) which is capable of formal analysis and verification. It can hierarchically anticipate Threats to dependability (TtD) at different levels in a self-management manner to satisfy the special requirements for dependability of mission-critical systems. Based on this model, a quantification approach is proposed on the view of steady-state probability to evaluate autonomic dependability. Finally, this paper analyzes the impacts of parameters of the model on autonomic dependability in a case study, and the experimental results demonstrate that improving the detection rate of TtD as well as the successful rate of self-healing will greatly increase the autonomic dependability.
Abstract: Attribute-Based encryption (ABE) scheme takes attributes as the public key and associates the ciphertext and user’s secret key with attributes, so that it can support expressive access control policies. This dramatically reduces the cost of network bandwidth and sending node’s operation in fine-grained access control of data sharing. Therefore, ABE has a broad prospect of application in the area of fine-grained access control. After analyzing the basic ABE system and its two variants, Key-Policy ABE (KP-ABE) and Ciphertext-Policy ABE (CP-ABE), this study elaborates the research problems relating to ABE systems, including access structure design for CP-ABE, attribute key revocation, key abuse and multi-authorities ABE with an extensive comparison of their functionality and performance. Finally, this study discusses the need-to-be solved problems and main research directions in ABE.
Abstract: Nowadays it has been widely accepted that the quality of software highly depends on the process that iscarried out in an organization. As part of the effort to support software process engineering activities, the researchon software process modeling and analysis is to provide an effective means to represent and analyze a process and,by doing so, to enhance the understanding of the modeled process. In addition, an enactable process model canprovide a direct guidance for the actual development process. Thus, the enforcement of the process model candirectly contribute to the improvement of the software quality. In this paper, a systematic review is carried out tosurvey the recent development in software process modeling. 72 papers from 20 conference proceedings and 7journals are identified as the evidence. The review aims to promote a better understanding of the literature byanswering the following three questions: 1) What kinds of paradigms are existing methods based on? 2) What kinds of purposes does the existing research have? 3) What kinds of new trends are reflected in the current research? Afterproviding the systematic review, we present our software process modeling method based on a multi-dimensionaland integration methodology that is intended to address several core issues facing the community.
Abstract: This paper surveys the state of the art of speech emotion recognition (SER), and presents an outlook on the trend of future SER technology. First, the survey summarizes and analyzes SER in detail from five perspectives, including emotion representation models, representative emotional speech corpora, emotion-related acoustic features extraction, SER methods and applications. Then, based on the survey, the challenges faced by current SER research are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, and presents detailed comparison and analysis between these methods.
Abstract: In recent years, the rapid development of Internet technology and Web applications has triggered the explosion of various data on the Internet, which generates a large amount of valuable knowledge. How to organize, represent and analyze these knowledge has attracted much attention. Knowledge graph was thus developed to organize these knowledge in a semantical and visualized manner. Knowledge reasoning over knowledge graph then becomes one of the hot research topics and plays an important role in many applications such as vertical search and intelligent question-answer. The goal of knowledge reasoning over knowledge graph is to infer new facts or identify erroneous facts according to existing ones. Unlike traditional knowledge reasoning, knowledge reasoning over knowledge graph is more diversified, due to the simplicity, intuitiveness, flexibility, and richness of knowledge representation in knowledge graph. Starting with the basic concept of knowledge reasoning, this paper presents a survey on the recently developed methods for knowledge reasoning over knowledge graph. Specifically, the research progress is reviewed in detail from two aspects:One-Step reasoning and multi-step reasoning, each including rule based reasoning, distributed embedding based reasoning, neural network based reasoning and hybrid reasoning. Finally, future research directions and outlook of knowledge reasoning over knowledge graph are discussed.
Abstract: Honeypot is a proactive defense technology, introduced by the defense side to change the asymmetric situation of a network attack and defensive game. Through the deployment of the honeypots, i.e. security resources without any production purpose, the defenders can deceive attackers to illegally take advantage of the honeypots and capture and analyze the attack behaviors to understand the attack tools and methods, and to learn the intentions and motivations. Honeypot technology has won the sustained attention of the security community to make considerable progress and get wide application, and has become one of the main technical means of the Internet security threat monitoring and analysis. In this paper, the origin and evolution process of the honeypot technology are presented first. Next, the key mechanisms of honeypot technology are comprehensively analyzed, the development process of the honeypot deployment structure is also reviewed, and the latest applications of honeypot technology in the directions of Internet security threat monitoring, analysis and prevention are summarized. Finally, the problems of honeypot technology, development trends and further research directions are discussed.
Abstract: Designing problems are ubiquitous in science research and industry applications. In recent years, Bayesian optimization, which acts as a very effective global optimization algorithm, has been widely applied in designing problems. By structuring the probabilistic surrogate model and the acquisition function appropriately, Bayesian optimization framework can guarantee to obtain the optimal solution under a few numbers of function evaluations, thus it is very suitable to solve the extremely complex optimization problems in which their objective functions could not be expressed, or the functions are non-convex, multimodal and computational expensive. This paper provides a detailed analysis on Bayesian optimization in methodology and application areas, and discusses its research status and the problems in future researches. This work is hopefully beneficial to the researchers from the related communities.
Abstract: Uncertainty exists widely in the subjective and objective world. In all kinds of uncertainty, randomness and fuzziness are the most important and fundamental. In this paper, the relationship between randomness and fuzziness is discussed. Uncertain states and their changes can be measured by entropy and hyper-entropy respectively. Taken advantage of entropy and hyper-entropy, the uncertainty of chaos, fractal and complex networks by their various evolution and differentiation are further studied. A simple and effective way is proposed to simulate the uncertainty by means of knowledge representation which provides a basis for the automation of both logic and image thinking with uncertainty. The AI (artificial intelligence) with uncertainty is a new cross-discipline, which covers computer science, physics, mathematics, brain science, psychology, cognitive science, biology and philosophy, and results in the automation of representation, process and thinking for uncertain information and knowledge.
Abstract: The appearance of plenty of intelligent devices equipped for short-range wireless communications boosts the fast rise of wireless ad hoc networks application. However, in many realistic application environments, nodes form a disconnected network for most of the time due to nodal mobility, low density, lossy link, etc. Conventional communication model of mobile ad hoc network (MANET) requires at least one path existing from source to destination nodes, which results in communication failure in these scenarios. Opportunistic networks utilize the communication opportunities arising from node movement to forward messages in a hop-by-hop way, and implement communications between nodes based on the "store-carry-forward" routing pattern. This networking approach, totally different from the traditional communication model, captures great interests from researchers. This paper first introduces the conceptions and theories of opportunistic networks and some current typical applications. Then it elaborates the popular research problems including opportunistic forwarding mechanism, mobility model and opportunistic data dissemination and retrieval. Some other interesting research points such as communication middleware, cooperation and security problem and new applications are stated briefly. Finally, the paper concludes and looks forward to the possible research focuses for opportunistic networks in the future.
Abstract: The rapid development of Internet leads to an increase in system complexity and uncertainty. Traditional network management can not meet the requirement, and it shall evolve to fusion based Cyberspace Situational Awareness (CSA). Based on the analysis of function shortage and development requirement, this paper introduces CSA as well as its origin, conception, objective and characteristics. Firstly, a CSA research framework is proposed and the research history is investigated, based on which the main aspects and the existing issues of the research are analyzed. Meanwhile, assessment methods are divided into three categories: Mathematics model, knowledge reasoning and pattern recognition. Then, this paper discusses CSA from three aspects: Model, knowledge representation and assessment methods, and then goes into detail about main idea, assessment process, merits and shortcomings of novel methods. Many typical methods are compared. The current application research of CSA in the fields of security, transmission, survivable, system evaluation and so on is presented. Finally, this paper points the development directions of CSA and offers the conclusions from issue system, technical system and application system.
Abstract: The popularity of the Internet and the boom of the World Wide Web foster innovative changes in software technology that give birth to a new form of software—networked software, which delivers diversified and personalized on-demand services to the public. With the ever-increasing expansion of applications and users, the scale and complexity of networked software are growing beyond the information processing capability of human beings, which brings software engineers a series of challenges to face. In order to come to a scientific understanding of this kind of ultra-large-scale artificial complex systems, a survey research on the infrastructure, application services, and social interactions of networked software is conducted from a three-dimensional perspective of cyberization, servicesation, and socialization. Interestingly enough, most of them have been found to share the same global characteristics of complex networks such as “Small World” and “Scale Free”. Next, the impact of the empirical study on software engineering research and practice and its implications for further investigations are systematically set forth. The convergence of software engineering and other disciplines will put forth new ideas and thoughts that will breed a new way of thinking and input new methodologies for the study of networked software. This convergence is also expected to achieve the innovations of theories, methods, and key technologies of software engineering to promote the rapid development of software service industry in China.
Abstract: Recent years, applying Deep Learning (DL) into Image Semantic Segmentation (ISS) has been widely used due to its state-of-the-art performances and high-quality results. This paper systematically reviews the contribution of DL to the field of ISS. Different methods of ISS based on DL (ISSbDL) are summarized. These methods are divided into ISS based on the Regional Classification (ISSbRC) and ISS based on the Pixel Classification (ISSbPC) according to the image segmentation characteristics and segmentation granularity. Then, the methods of ISSbPC are surveyed from two points of view:ISS based on Fully Supervised Learning (ISSbFSL) and ISS based on Weakly Supervised Learning (ISSbWSL). The representative algorithms of each method are introduced and analyzed, as well as the basic workflow, framework, advantages and disadvantages of these methods are detailedly analyzed and compared. In addition, the related experiments of ISS are analyzed and summarized, and the common data sets and performance evaluation indexes in ISS experiments are introduced. Finally, possible research directions and trends are given and analyzed.
Abstract: Blockchain is a distributed public ledger technology that originates from the digital cryptocurrency, bitcoin. Its development has attracted wide attention in industry and academia fields. Blockchain has the advantages of de-centralization, trustworthiness, anonymity and immutability. It breaks through the limitation of traditional center-based technology and has broad development prospect. This paper introduces the research progress of blockchain technology and its application in the field of information security. Firstly, the basic theory and model of blockchain are introduced from five aspects:Basic framework, key technology, technical feature, and application mode and area. Secondly, from the perspective of current research situation of blockchain in the field of information security, this paper summarizes the research progress of blockchain in authentication technology, access control technology and data protection technology, and compares the characteristics of various researches. Finally, the application challenges of blockchain technology are analyzed, and the development outlook of blockchain in the field of information security is highlighted. This study intends to provide certain reference value for future research work.
Abstract: The paper gives some thinking according to the following four aspects: 1) from the law of things development, revealing the development history of software engineering technology; 2) from the point of software natural characteristic, analyzing the construction of every abstraction layer of virtual machine; 3) from the point of software development, proposing the research content of software engineering discipline, and research the pattern of industrialized software production; 4) based on the appearance of Internet technology, exploring the development trend of software technology.
Abstract: Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.
Abstract: In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
Abstract: Deep learning has achieved great success in the field of computer vision, surpassing many traditional methods. However, in recent years, deep learning technology has been abused in the production of fake videos, making fake videos represented by Deepfakes flooding on the Internet. This technique produces pornographic movies, fake news, political rumors by tampering or replacing the face information of the original videos and synthesizes fake speech. In order to eliminate the negative effects brought by such forgery technologies, many researchers have conducted in-depth research on the identification of fake videos and proposed a series of detection methods to help institutions or communities to identify such fake videos. Nevertheless, the current detection technology still has many limitations such as specific distribution data, specific compression ratio, and so on, far behind the generation technology of fake video. In addition, different researchers handle the problem from different angles. The data sets and evaluation indicators used are not uniform. So far, the academic community still lacks a unified understanding of deep forgery and detection technology. The architecture of deep forgery and detection technology research is not clear. In this review, the development of deep forgery and detection technologies are reviewed. Besides, existing research works are systematically summarize and scientifically classified. Finally, the social risks posed by the spread of Deepfakes technology are discussed, the limitations of detection technology are analyzed, and the challenges and potential research directions of detection technology are discussed, aiming to provide guidance for follow-up researchers to further promote the development and deployment of Deepfakes detection technology.
Abstract: The Distributed denial of service (DDoS) attack is a major threat to the current network. Based on the attack packet level, the study divides DDoS attacks into network-level DDoS attacks and application-level DDoS attacks. Next, the study analyzes the detection and control methods of these two kinds of DDoS attacks in detail, and it also analyzes the drawbacks of different control methods implemented in different network positions. Finally, the study analyzes the drawbacks of the current detection and control methods, the development trend of the DDoS filter system, and corresponding technological challenges are also proposed.
Abstract: With the proliferation of the Chinese social network (especially the rise of weibo), the productivity and lifestyle of the country's society is more and more profoundly influenced by the Chinese internet public events. Due to the lack of the effective technical means, the efficiency of information processing is limited. This paper proposes a public event information entropy calculation method. First, a mathematical modeling of event information content is built. Then, multidimensional random variable information entropy of the public events is calculated based on Shannon information theory. Furthermore, a new technical index of quantitative analysis to the internet public events is put forward, laying out a foundation for further research work.
Abstract: This paper presents a survey on the theory of provable security and its applications to the design and analysis of security protocols. It clarifies what the provable security is, explains some basic notions involved in the theory of provable security and illustrates the basic idea of random oracle model. It also reviews the development and advances of provably secure public-key encryption and digital signature schemes, in the random oracle model or the standard model, as well as the applications of provable security to the design and analysis of session-key distribution protocols and their advances.
Abstract: Many specific application oriented NoSQL database systems are developed for satisfying the new requirement of big data management. This paper surveys researches on typical NoSQL database based on key-value data model. First, the characteristics of big data, and the key technique issues supporting big data management are introduced. Then frontier efforts and research challenges are given, including system architecture, data model, access mode, index, transaction, system elasticity, load balance, replica strategy, data consistency, flash cache, MapReduce based data process and new generation data management system etc. Finally, research prospects are given.
Abstract: Source code bug (vulnerability) detection is a process of judging whether there are unexpected behaviors in the program code. It is widely used in software engineering tasks such as software testing and software maintenance, and plays a vital role in software functional assurance and application security. Traditional vulnerability detection research is based on program analysis, which usually requires strong domain knowledge and complex calculation rules, and faces the problem of state explosion, resulting in limited detection performance, and there is room for greater improvement in the rate of false positives and false negatives. In recent years, the open source community's vigorous development has accumulated massive amounts of data with open source code as the core. In this context, the feature learning capabilities of deep learning can automatically learn semantically rich code representations, thereby providing a new way for vulnerability detection. This study collected the latest high-level papers in this field, systematically summarized and explained the current methods from two aspects:vulnerability code dataset and deep learning vulnerability detection model. Finally, it summarizes the main challenges faced by the research in this field, and looks forward to the possible future research focus.
Abstract: Under the new application mode, the traditional hierarchy data centers face several limitations in size, bandwidth, scalability, and cost. In order to meet the needs of new applications, data center network should fulfill the requirements with low-cost, such as high scalability, low configuration overhead, robustness and energy-saving. First, the shortcomings of the traditional data center network architecture are summarized, and new requirements are pointed out. Secondly, the existing proposals are divided into two categories, i.e. server-centric and network-centric. Then, several representative architectures of these two categories are overviewed and compared in detail. Finally, the future directions of data center network are discussed.
Abstract: Machine learning has become a core technology in areas such as big data, Internet of Things, and cloud computing. Training machine learning models requires a large amount of data, which is often collected by means of crowdsourcing and contains a large number of private data including personally identifiable information (such as phone number, id number, etc.) and sensitive information (such as financial data, health care, etc.). How to protect these data with low cost and high efficiency is an important issue. This paper first introduces the concept of machine learning, explains various definitions of privacy in machine learning and demonstrates all kinds of privacy threats encountered in machine learning, then continues to elaborate on the working principle and outstanding features of the mainstream technology of machine learning privacy protection. According to differential privacy, homomorphic encryption, and secure multi-party computing, the research achievements in the field of machine learning privacy protection are summarized respectively. On this basis, the paper comparatively analyzes the main advantages and disadvantages of different mechanisms of privacy preserving for machine learning. Finally, the developing trend of privacy preserving for machine learning is prospected, and the possible research directions in this field are proposed.
Abstract: Recommendation system is one of the most important technologies in E-commerce. With the development of E-commerce, the magnitudes of users and commodities grow rapidly, resulted in the extreme sparsity of user rating data. Traditional similarity measure methods work poor in this situation, make the quality of recommendation system decreased dramatically. To address this issue a novel collaborative filtering algorithm based on item rating prediction is proposed. This method predicts item ratings that users have not rated by the similarity of items, then uses a new similarity measure to find the target users?neighbors. The experimental results show that this method can efficiently improve the extreme sparsity of user rating data, and provid better recommendation results than traditional collaborative filtering algorithms.
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.