Abstract:As both Android frameworks and malware continue to evolve, the performance of existing malware classifiers degrades significantly over time. This study proposes droid slow aging (DroidSA), a method for Android malware detection based on API clustering and call graph optimization. Firstly, API clustering is performed before malware detection to generate cluster centers that reflect API functionality. To make clustering results more accurate, this study obtains embeddings fully reflecting the semantic similarity of APIs by designing API sentences to summarize vital features such as API names and permissions and using NLP tools to mine the semantic information of API sentences. Then, call graphs are extracted from apps and optimized by removing unknown methods while preserving the connectivity among API nodes. Call graph optimization enables detection methods to extract more robust contextual information of APIs which reflects the mode of app behavior. DroidSA extracts pairs of function calls from the optimized call graphs and abstracts the APIs in the pairs into cluster centers obtained in API clustering to better adjust to the changes in Android frameworks and malware. Finally, one-hot encoding is used to generate feature vectors, and the best-performing classifier is selected from random forests, support vector machines, and the k-nearest neighbors algorithm for malware detection. Experimental results demonstrate that DroidSA achieves an average F1-Measure of 96.7% for malware detection. Under the experimental setup where temporal bias is eliminated, DroidSA trained with apps from 2012 to 2013 achieves an average F1-Measure of 82.6% when detecting malware developed from 2014 to 2018. Compared with the state-of-the-art detection methods MaMaDroid and MalScan, DroidSA stably maintains high detection metrics with minimal impact from temporal changes and effectively detects evolved malware.