Abstract:Dynamic behavior analysis is a common method of malware detection. It uses graphs to represent malware’s system calls or resource dependencies. It uses graph mining algorithms to find common malicious feature subgraphs in known malware samples, and detect unknown programs through these features. However, these methods often rely on the graph matching algorithm, and the inevitable calculation of the graph matching is slow, and the relationship between the subgraphs is also neglected in the algorithm. It can improve the detection accuracy of the model if the subgraphs’ relationship is considered. In order to solve these two problems, a sub-graph similarity malware detection method called DMBSS is proposed. It uses the data flow graph to represent the system behavior or event of the running malicious program, and then extracts the malicious behavior feature subgraph from the data flow graph, and uses “inverse topology identification” algorithm to represent the feature subgraph as a string, and the string implied the structural information of the subgraph, using a string instead of the matching of the graph. The neural network is then used to calculate the similarity between the subgraphs and to represent the subgraph structure as a high dimensional vector, so that the similar subgraphs’ distance is also shorter in the vector space. Finally, the subgraph vector is used to construct the similarity function of the malicious program, and based on this, the SVM classifier is used to detect the malicious program. The experimental results show that compared with other methods, DMBSS is faster in detecting malicious programs and has higher accuracy.