HUANG Jiu-Ming , WU Quan-Yuan , LIU Chun-Yang , ZHANG Xu , JIA Yan , ZHOU Bin
2012, 23(4):735-747. DOI: 10.3724/SP.J.1001.2012.04031 CSTR:
Abstract:Short text message streams are produced by Short Message Service, Instant Messager and BBS, which are widely used. Each stream usually contains. Extracting the conversations in the streams is helpful to various applications including business intelligence, investigation of crime and public opinion analysis. Existing research mainly based on text similarity encounter challenges such as the anomaly, dynamics, and the sparse eigenvector of short text message. This paper proposes an innovative conversation extraction method to cover the challenges. Firstly, the study detects the conversation boundary of short text message streams using temporal feature; secondly, contextually correlative degree is introduced to replace similar degree, and an instance-based machine learning method is proposed to compute the correlative degree. Finally, the study designs Single-Pass based conversation extraction algorithm SPFC (single-pass based on frequency and correlation), which combines the temporal and contextually correlative characteristics. Experimental results on a large real Chinese dataset show that this method SPFC improves the performance by 30% when compared with the best existing variation algorithm in terms of F1 measure.
HE Ping , XU Xiao-Hua , CHEN Ling
2012, 23(4):748-764. DOI: 10.3724/SP.J.1001.2012.04039 CSTR:
Abstract:This paper proposes a nonlinear classification algorithm S3C (supervised spectral space classifier), short for supervised spectral space classifier. S3C integrates the discriminative information into the construction of the low-dimensional supervised spectral space. The input training data is mapped into the supervised spectral space, followed by the optimization of the partitioning hyperplane with maximum margin. The test data is also transformed into the same feature space via an intermediate “bridge” between the original feature space and the target feature space. The classification result of S3C is obtained by applying the optimal partitioning hyperplane to the transformed test data, directly. S3C enables researchers to examine the transformed data in the supervised spectral space, which is beneficial to both algorithm evaluation and parameter selection. Moreover, the study presents a supervised spectral space transformation algorithm (S3T) on the basis of S3C. S3T (supervised spectral space transformation) estimates the class indicating matrix by projecting the data from the supervised spectral space to the class indicating space. S3T can directly deal with multi-class classification problems, and it is more robust on the data sets containing noise. Experimental results on both synthetic and real-world data sets demonstrate the superiority of S3C and S3T algorithms compared with other state-of-the-art classification algorithms.
LIU Quan , WANG Xiao-Yan , FU Qi-Ming , ZHANG Yong-Gang , ZHANG Xiao-Fang
2012, 23(4):765-775. DOI: 10.3724/SP.J.1001.2012.04040 CSTR:
Abstract:A new double elite coevolutionary genetic algorithm (DECGA) is proposed to avoid the premature convergence and low speed of convergence, based on the elite strategy and the concept of coevolutionary. In the DECGA, the two different and high fitness individuals (elite individuals) are selected as the core of the evolutionary operation, and the team members are selected by the different evaluation functions to form two teams through these two elite individuals. The two sub-populations can balance the capability of exploration and exploitation by the different evolutionary strategies. Theoretical analysis proves that the algorithm converges to the global optimization solution. Tests on the functions show that the algorithm can find the global optimal solution for most test functions, and it can also maintain the population diversity to a certain range. Compared with the existing algorithms, DECGA has a higher performance in precision of convergence and search efficiency.
2012, 23(4):776-785. DOI: 10.3724/SP.J.1001.2012.04116 CSTR:
Abstract:Almost all existing knowledge-based word sense disambiguation (WSD) methods used exploit context information contain, in certain window size around ambiguous word, are ineffective because all words in the window size have the same impact on determining the sense of ambiguous word. In order to solve the problem, this paper proposes a novel WSD model based on distance between words, which is built on the basics of traditional graph WSD model and can make full use of distance information. Through model reconstruction, optimization, parameter estimation and evaluation of comparison, the study demonstrates the feature of the new model: The words nearby ambiguous word will have more impact to the final sense of ambiguous word while the words far away from it will have less. Experimental results show that the proposed model can improve Chinese WSD performance, compared with the best evaluation results of SemEval-2007: task #5, this model gets MacroAve (macro-average accuracy) increase 3.1%.
HUANG Xiang , WANG Wei , ZHANG Wen-Bo , WEI Jun , HUANG Tao
2012, 23(4):786-801. DOI: 10.3724/SP.J.1001.2012.04027 CSTR:
Abstract:This paper proposes an automatic performance modeling approach for performance profiling of Web applications. In addition, the study proposes an automatic approach to build performance model. Both the user behaviors and their corresponding internal service relations are modeled, and the CPU time consumed by each service is also obtained through Kalman filter, which can “absorb” some level of noise in real-world data. Experimental results show that this approach can adapt to the change in both the inner and outer environments of Web applications and provide valuable information for capacity planning and bottleneck detection.
JIE Jun-Jing , SHI Ting-Xun , JIAO Wen-Pin , MENG Fan-Jing
2012, 23(4):802-815. DOI: 10.3724/SP.J.1001.2012.04029 CSTR:
Abstract:Self-Adaptive software in open and distributed environments (especially the Internet) has been widely researched in academia and industry. However, software entities scattered on the Internet are independently developed and deployed by different organizations, and they autonomously take actions on behalf of their owners. They can no longer be considered passive and manageable. In the construction of self-adaptive software systems in open and distributed environments, constituent elements should be modeled and designed as autonomous computing entities, and the adaptive logic of systems should be encapsulated into constituent elements. Existing researches on autonomous computing entities are still insufficient in self-adaptive policies’ in online customization and dynamic evolution. Therefore, this paper proposes an autonomous component model, which supports self-adaptive policies’ in online customizing, by which components can gain new policies or behavior modes at runtime. Meanwhile, this paper implements a self-adaptive mechanism based on dynamic quality evaluation, by which components can evaluate the policies and select the best policy to improve their qualities of service at runtime. Finally, the paper provides some implementation details of the proposal and an experiment, which demonstrates the process of self-adaptation based on dynamic quality evaluation and the process of online policy customization.
HE Xiao , MA Zhi-Yi , FENG Chao , SHAO Wei-Zhong
2012, 23(4):816-830. DOI: 10.3724/SP.J.1001.2012.04041 CSTR:
Abstract:Model transformation is the key technique of model driven development. To handle difficult problems, it is necessary to combine smaller transformations into a complex one. Due to the heterogeneity among different transformation techniques, it is difficult to combine them together. The paper analyzes four essential conditions for composite transformations. Next, the paper proposes a composite transformation model, which consists of common type representation, common model representation, common transformation description, and composite transformation definition, in order to realize composite transformations. The paper also introduces the design and implementation of a transformation composition framework. A case study is also presented to illustrate the feasibility of this approach.
CHEN Xiang-Ping , HUANG Gang , SONG Hui , SUN Yan-Chun , MEI Hong
2012, 23(4):831-845. DOI: 10.3724/SP.J.1001.2012.04043 CSTR:
Abstract:In software adaptation, multiple analysis methods are used for analyzing system, planning adaptation strategy, and decision-making. Because the analysis results are interpreted and understood with SA (software architecture) model as context, synthesizing analysis results and the SA model become important. However, current researches on integration of analysis methods do not pay enough attention to the integrating analysis result with the SA model. Considering integration challenges existing in meta-model, model, and view levels, this study proposes an MOF (meta-object facility)-based framework for integration of analysis results. The framework provides the following: An ADL (architectural description language) extension mechanism with support for upward compatibility; automatic generation of model transformation for model synthesis, and code generation to extend modeling tool for view of synthesized model. In the case study, this study uses the framework to integrate three analysis methods for reliability evaluations and fault tolerant reconfiguration, and applies these analysis methods to analyze SA model of J2EE system Ecperf.
DAI Fei , LI Tong , XIE Zhongwen , YU Qian , LU Ping , YU Yong , ZHAO Na
Abstract:As a number of software evolution process models increased, as modeled by EPMM (software evolution process meta model), verifying the correctness of these models becomes the important. This paper extends EPMM with ACP (algebra of communicating processes) and proposes EPMM-A (software evolution meta modelalgebra). In order to discuss behavior verification in the unified framework of EPMM-A, a process term is used to define an algebraic semantics of a software evolution process model. Based on equational reasoning of EPMM-A, behavior verification of a software evolution process model emphasizes algebraic reasoning as opposed to modelbased reasoning. This method combines the advantages of both Petri nets and ACP, which can effectively support software evolution process formal verification.
CAI Zhi-Ping , LIU Qiang , Lü Pin , XIAO Nong , WANG Zhi-Ying
2012, 23(4):864-877. DOI: 10.3724/SP.J.1001.2012.04063 CSTR:
Abstract:Network virtualization can serve as the important technology for constructing the next-generation Internet architecture. It has been proposed as a powerful vehicle for running multiple architectures or experiments simultaneously on a shared infrastructure. It allows multiple service providers to offer customized end-to-end services over a common physical infrastructure. Virtual network embedding is a critical step for network virtualization that deals with the efficient mapping of virtual nodes and virtual links onto the substrate network resources. The virtual network-mapping problem is extremely challenging for four main practical reasons: Node and link constraints, admission control, online request, and diverse topologies. Due to different application scenarios, optimization objectives, mapping approaches, and constraints, there are many virtual network-mapping models. Unfortunately, the optimization problems of these models are generally NP-hard. The model of virtual network mapping is presented. The virtual network mapping methods and algorithms are concluded. The technological approaches to solve the optimization problems of the virtual network mapping are summarized. Finally, some open issues are presented to be further studied in the virtual network-mapping field.
ZHAO Zhong-Hua , HUANGFU Wei , SUN Li-Min , ZHOU Xin-Yun
2012, 23(4):878-893. DOI: 10.3724/SP.J.1001.2012.04013 CSTR:
Abstract:Accurate runtime monitoring and testing is critical in the successful field deployment of wireless sensor networks (WSNs). Conventional techniques that demand extra reports from the sensor nodes could easily alter the node and network behaviors. The delay and asynchrony of the extra reports also implies that fine-grained monitoring can hardly be achieved. Sniff-Inside approach is presented in this case, and a non-intrusive backplane-based testing platform for wireless sensor networks is developed. With auxiliary test boards, the testing platform directly captures chip-level signals from wireless sensor nodes and the captured data are transferred through independent network paths to a monitoring server. The remote access client gains access to the test server by subscribing to test data on the server and finishes the analysis and processing of test data. It has been demonstrated that the operations of the monitored WSN can be fully identified through parsing the chip-level data across different nodes. The experimental results show that, through chip-level signal sniffing, the testing platform effectively gathers accurate runtime data with no side effects on the spontaneous behavior of sensor networks. The case study further shows the testing platform facilitates the tests of high-level functionalities, such as signal analysis, protocol verification and accurate evaluation of network performance.
MA Wen-Ming , MENG Xiang-Wu , ZHANG Yu-Jie
2012, 23(4):894-911. DOI: 10.3724/SP.J.1001.2012.04086 CSTR:
Abstract:The improvements of random walk search mainly depend on allocating weight for neighbor peers, which is always incurred on a high overhead and are not very helpful for rare items. This paper proposes a bidirectional random walk search mechanism (short for BRWS) for unstructured P2P network, according to the analysis of basic properties about random walk as well as the special property that random walk tends toward high degree nodes. The mechanism is proved theoretically in this paper, and can improve search success rate, including searching for rare items. It also has a high tolerance for churn. In the static and dynamic environment, comparisons were made among Random Walk, APS (adaptive probabilistic search), PQR (path-traceable query routing), P2PBSN (peer-to- peer based on social network) and BRWS based on three topologies: Random graph, scale free, small world. The experimental results show that BRWS can actually improve the search success rate with lower overhead even when searching rare resources. The method proposed in this paper can apply in P2P file sharing networks.
BAO Yi-Bao , YIN Li-Hua , FANG Bin-Xing , GUO Li
2012, 23(4):912-927. DOI: 10.3724/SP.J.1001.2012.04023 CSTR:
Abstract:This study proposes a logic-based security policy framework. First, the study proposes the security policy syntax and semantic. Next, four algoritms are proposed to transfer first-order logic based security policies into extended logic programs to evaluate queries with simple goals, to transfer complex queries into simple ones, and to verify security policies against complex security properties. Under well-founded semantics, all the algorithms are sound and completed, and their computational complexities are polynomial. In this framework, security policy declaration, evaluation and verification are executed under the same semantics, which is significant for security policy management. Furthmore, the framework can manage the security policies with advanced features, such as non-monotony and recursion, which is not supported in many existent security policy management frameworks.
SUN Yan-Bin , GU Li-Ze , QING Si-Han , ZHENG Shi-Hui , YANG Yi-Xian
2012, 23(4):928-940. DOI: 10.3724/SP.J.1001.2012.04024 CSTR:
Abstract:Utilizing the Cha-Cheon’s identity-based signature scheme, a provably secure identity-based verifiably encrypted signature (VES) scheme is proposed. Utilizing the proposed scheme and identity-based proxy verifiably encrypted signature (PVES) scheme, a novel multiplex contract signing protocol is also proposed. The original signer or proxy signer uses VES or PVES to realize the interaction and certification of the commitment message in the information exchange process. The proposed scheme does not need the zero-knowledge proof and excessive computation. An optimized trusted third party who participates in the protocol extracts the formal signature from the VES or PVES only when problem occurs. The performance analysis results show that the scheme satisfies non-repudiation, timeliness and fairness.
Lü Shao-He , WANG Xiao-Dong , ZHOU Xing-Ming
2012, 23(4):941-951. DOI: 10.3724/SP.J.1001.2012.04035 CSTR:
Abstract:Successive interference cancellation (SIC) is an effective way of multipacket reception (MPR) to combat interference at the physical layer. The fact that the links decoded sequentially by SIC are correlated at the receiver poses key technical challenges. The study characterizes the link dependence and proposes simultaneity graph (SG) to capture the effect of SIC. Then, the interference number is defined to measure the interference of a link and facilitate the design of scheduling scheme. The study shows that link scheduling over SG is NP-hard, and the maximum interference number bounds the performance of maximal greedy schemes. An independent set based greedy scheme is explored to efficiently construct maximal feasible schedule. Moreover, with careful selection of link ordering, the study presents a scheduling scheme that achieves a better bound. Simulations evaluate the performance. The throughput gain is up to 110% over IEEE 802.11, while the complexity of SG is comparable with that of the conflict graph.
TAN Gang-Min , ZENG Guang , HAN Wen-Bao , LIU Xiang-Hui
2012, 23(4):952-961. DOI: 10.3724/SP.J.1001.2012.04006 CSTR:
Abstract:The coordinate sequences of a primitive σ-LFSR sequence over GF(2k) are m-sequences with the same minimal polynomial over GF(2), thus a primitive σ-LFSR sequence over GF(2k) can be constructed by m-sequences over GF(2) if its interval vector is known. This paper studies the calculation of interval vectors of a class of primitive σ-LFSR sequences—Z primitive σ-LFSR sequences and presents an improved method to calculate the interval vectors of Z primitive σ-LFSR sequences in order n over GF(2k), which uses the interval vectors of Z primitive σ-LFSR sequences of order 1 to calculate that of Z primitive σ-LFSR sequences in order n over GF(2k). In addition, it is more effective than other existing methods. More importantly, the new method can also be applied to the calculation of interval vectors of m-sequences over GF(2k). The enumeration formula of Z primitive σ-LFSR sequences of order n over GF(2k) is also presented, which shows that the number of Z primitive σ-LFSR sequences of order n is much larger than the number of m-sequences of order n over GF(2k).
WANG Yi-Jie , SUN Wei-Dong , ZHOU Song , PEI Xiao-Qiang , LI Xiao-Yong
2012, 23(4):962-986. DOI: 10.3724/SP.J.1001.2012.04175 CSTR:
Abstract:Considered as the next generation computing model, cloud computing plays an important role in scientific and commercial computing area and draws great attention from both academia and industry fields. Under cloud computing environment, data center consist of a large amount of computers, usually up to millions, and stores petabyte even exabyte of data, which may easily lead to the failure of the computers or data. The large amount of computers composition not only leads to great challenges to the scalability of the data center and its storage system, but also results in high hardware infrastructure cost and power cost. Therefore, fault-tolerance, scalability, and power consumption of the distributed storage for a data center becomes key part in the technology of cloud computing, in order to ensure the data availability and reliability. In this paper, a survey is made on the state of art of the key technologies in cloud computing in the following aspects: Design of data center network, organization and arrangement of data, strategies to improve fault-tolerance, methods to save storage space, and energy. Firstly, many kinds of classical topologies of data center network are introduced and compared. Secondly, kinds of current fault-tolerant storage techniques are discussed, and data replication and erasure code strategies are especially compared. Thirdly, the main current energy saving technology is addressed and analyzed. Finally, challenges in distributed storage are reviewed as well as future research trends are predicted.
QIU Jian-Ping , ZHANG Guang-Yan , SHU Ji-Wu
2012, 23(4):987-995. DOI: 10.3724/SP.J.1001.2012.04046 CSTR:
Abstract:Realistic system states and representative workloads are required to evaluate a hierarchical storage management (HSM) system. The existing method to evaluate a HSM system reconstructs system states by replaying a trace within period of time. Ignoring files not used recently, the result of this method is not convincing enough. This paper proposes DMStone, a tool for evaluating HSM systems. DMStone uses file-system snapshots to generate a system state. Furthermore, it extracts workload characteristics by comparing two contiguous snapshots and generates representative workloads. DMStone can provide a realistic file-system state, including files used recently and those not used for a long time. Moreover, it makes subsequent workloads conform to realistic system states.
ZHANG Dong-Song , WU Tong , CHEN Fang-Yuan , JIN Shi-Yao
2012, 23(4):996-1009. DOI: 10.3724/SP.J.1001.2012.04054 CSTR:
Abstract:As the energy consumption of multi-core systems becomes increasingly prominent, meeting the time constraints while reducing as much as possible system energy consumption is still an urgent problem in real-time energy-aware scheduling in multi-core system. Most existing works have assumptions on priori information of real-time tasks, but in real applications the tasks’ property be received only when the tasks arrive. Therefore, based on the general task model with no priori to tasks’ properties, this paper proposes a global EDF-based on-line energy-aware scheduling algorithm for hard real-time tasks in multi-core system. The proposed algorithm can reduce the execution speed of task in multi-core system and reach a reasonable compromise between real-time constraints and energy savings, as it introduces a speed scale factor for utilizing the slack time and combines dynamic power management with dynamic voltage/frequency scaling techniques. The algorithm implements dynamic voltage/frequency scaling only in each context switch time and task completion time, with the smaller computational complexity, and easier to be included in real-time operating system. Experimental results show that the algorithm can be well applied to different kinds of dynamic voltage/frequency scaling on chip, and compared with Global EDF algorithm, it gain more energy savings in all cases, which can improve energy savings about 15% to 20% at most and about 5% to 10% at least.
ZHU Ping , YANG Fu-Min , TU Gang , ZHANG Jie , ZHOU Zheng-Yong
2012, 23(4):1010-1021. DOI: 10.3724/SP.J.1001.2012.04004 CSTR:
Abstract:In distributed hard-real-time systems, when a hardware failure occurs, the task instance in current period is usually more urgent than the subsequent ones. According to this, a novel strategy of delay in non-urgent period (referred to as DNUP) is proposed. DNUP strategy can postpone the execution of non-urgent instance as late as possible and reserve the slack time for the instance with low priority. Thus it has a better chance to complete its execution in an urgent period. Extensive simulations reveal that DNUP can improve the schedulability of periodic tasks and achieve a remarkable saving on the number of processors required with respect to several well-known fault-tolerant scheduling algorithms.
WANG Zhi-Yuan , YANG Xue-Jun , ZHOU Yun
2012, 23(4):1022-1035. DOI: 10.3724/SP.J.1001.2012.04011 CSTR:
Abstract:The scale-up of system brings improvement in performance as well as reliability degradation, so there is a need to apply some fault tolerance mechanism to tolerate hardware failure or recover data. Currently, the popular fault tolerance mechanisms, such as Checkpoint/Restart and N-modular redundancy, all need additional overhead, which limits the scalability of parallel computing to some extent. Therefore, it is very important to develop scalable fault tolerance mechanisms for increasingly high performance supercomputing. This paper takes triple modular redundancy (TMR) as an example, describes the implementation of TMR on large-scale MPI parallel computing, and argues that traditional TMR fault-tolerant mechanism limits the scalability of parallel computing. To solve these practical problems, the paper proposes the scalable triple modular redundancy (STMR), and verifies the validity and scalability of it. STMR can not only handle the fail-stop failures that are traditionally handled by Checkpoint/Restart, but can also deal with most of data errors not perceived directly by the hardware. Finally, the study conducts the simulation using the system parameters of BlueGene/L, which shows the scalability change of parallel computing with the TMR and the STMR respectively when the system size increases. The results further validate STMR position as scalable fault-tolerant mechanism.