Abstract:Malware similarity comparison is one of the basic works in malware analysis and detection. Presently,most similarity comparison methods treat malware as CFG, or behavior sequences. Malware writers use obfuscation, packers and other means of technique to confuse traditional similarity comparison methods. This paper proposes a new approach in identifyling the similarities between malware samples, which rely on control dependence and data dependence. First, the dynamic taint analysis is performed to obtain control dependence relations and data dependence relations. Next, a control dependence graph and data dependence graph are constructed. Similarity information is obtained by comparing these two types of graph. In order to take full advantage of the inherent behavior of malicious codes and to increase the accuracy of comparison and anti-jamming capability, the loops are recued and the rubbish is removed by means of the dependence graph pre-processing, which reduces the complexity of the similarity comparison algorithm and improves the performance of the algorithm. The proposed prototype system has been applied to wild malware collections. The results show that the accuracy of the method and comparison capabilities all have an obvious advantage.