Automated Detection and Classification of Third-Party Libraries in Large Scale Android Apps
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61421061, 61421091); National High Technology Research and Development Program of China (863) (2015AA017202)

  • Article
  • | |
  • Metrics
  • |
  • Reference [37]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Third-Party libraries are widely used in mobile applications such as Android apps. Much research on app analysis or access control needs to detect or classify third-party libraries first in order to provide accurate results. Most previous studies use a whitelist to identify third-party libraries and manually categorize them. However, it is impossible to build a complete whitelist of third-party libraries and classify them because:(1) there are too many of them; and (2) common techniques such as library obfuscation and library masquerading cannot be handled with a whitelist. In this paper, an automated approach is proposed to detect and classify frequently-used third-party libraries in Android apps. A multi-level clustering based method is presented to identify third-party libraries, and a machine learning based technique is applied to classify the libraries. Experiments on more than 130 000 apps show that 4 916 third-party libraries can be detected without prior knowledge. The classification result of 10-folds cross validation on sampled libraries is 84.28%. With the trained classifier, the proposed approach is able to classify more than 75% of the 4 916 libraries into six categories with an accuracy of 75%.

    Reference
    [1] Viennot N, Garcia E, Nieh J. A measurement study of Google play. In:Proc. of the 2014 ACM Int'l Conf. on Measurement and Modeling of Computer Systems (SIGMETRICS 2014). New York:ACM Press, 2014. 221-233.[doi:10.1145/2591971.2592003]
    [2] Ruiz MIJ, Nagappan M, Adams B, Berger T, Dienst S, Hassan AE. Analyzing ad library updates in Android apps. IEEE Software, 2016,33(2):74-80.[doi:10.1109/MS.2014.81]
    [3] PrivacyGrade:Grading the privacy of Smartphone apps. http://privacygrade.org/
    [4] Stevens R, Gibler C, Crussell J, Erickson J, Chen H. Investigating user privacy in Android ad libraries. In:Proc. of the Workshop on Mobile Security Technologies (MoST). 2012.
    [5] Hu WH, Octeau D, McDaniel P, Liu P. Duet:Library integrity verification for Android applications. In:Proc. of the ACM Conf. on Security and Privacy in Wireless and Mobile Networks (WiSec). New York:ACM Press, 2014. 141-152.[doi:10.1145/2627393. 2627404]
    [6] Pearce P, Felt AP, Nunez G, Wagner D. AdDroid:Privilege separation for applications and advertisers in Android. In:Proc. of the 7th ACM Symp. on Information, Computer and Communications Security (ASIACCS 2012). New York:ACM Press, 2012. 71-72.[doi:10.1145/2414456.2414498]
    [7] Shekhar S, Dietz M, Wallach DS. AdSplit:Separating smartphone advertising from applications. In:Proc. of the 21st USENIX Conf. on Security Symp. (Security 2012). 2012.
    [8] Liu B, Liu B, Jin HX, View R. Efficient privilege de-escalation for ad libraries in mobile apps. In:Proc. of the 13th Int'l Conf. on Mobile Systems, Applications, and Services (MobiSys 2015). New York:ACM Press, 2015. 89-103.[doi:10.1145/2742647. 2742668]
    [9] Lin JL, Amini S, Hong JI, Sadeh N, Lindqvist J, Zhang J. Expectation and purpose:Understanding users' mental models of mobile app privacy through crowdsourcing. In:Proc. of the 2012 ACM Conf. on Ubiquitous Computing. New York:ACM Press, 2012. 501-510.[doi:10.1145/2370216.2370290]
    [10] Lin JL, Liu B, Sadeh N, Hong JI. Modeling users' mobile app privacy preferences:Restoring usability in a sea of permission settings. In:Proc. of the 2014 Symp. on Usable Privacy and Security (SOUPS 2014). 2014.
    [11] Wang HY, Hong JI, Guo Y. Using text mining to infer the purpose of permission use in mobile apps. In:Proc. of the 2015 ACM Int'l Joint Conf. on Pervasive and Ubiquitous Computing. New York:ACM Press, 2015. 1107-1118.[doi:10.1145/2750858. 2805833]
    [12] Grace MC, Zhou W, Jiang XX, Sadeghi AR. Unsafe exposure analysis of mobile in-app advertisements. In:Proc. of the 5th ACM Conf. on Security and Privacy in Wireless and Mobile Networks. New York:ACM Press, 2012. 101-112.[doi:10.1145/2185448. 2185464]
    [13] Crussell J, Gibler C, Chen H. Attack of the clones:Detecting cloned applications on Android markets. In:Proc. of the 17th European Symp. on Research in Computer Security. Berlin, Heidelberg:Springer-Verlag, 2012.[doi:10.1007/978-3-642-33167- 1_3]
    [14] Zhou W, Zhou YJ, Jiang XX, Ning P. Detecting repackaged smartphone applications in third-party Android marketplaces. In:Proc. of the 2nd ACM Conf. on Data and Application Security and Privacy. New York:ACM Press, 2012. 317-326.[doi:10.1145/2133601.2133640]
    [15] Hanna S, Huang L, Wu E, Li S, Chen C, Song D. Juxtapp:A scalable system for detecting code reuse among Android applications. In:Proc. of the 9th Conf. on Detection of Intrusions and Malware and Vulnerability Assessment. Berlin, Heidelberg:SpringerVerlag, 2012. 62-81.[doi:10.1007/978-3-642-37300-8_4]
    [16] Wang HY, Wang ZY, Guo Y, Chen XQ. Detecting repackaged Android applications based on code clone detection technique. Science China:Information Sciences, 2014,44(1):142-157(in Chinese with English abstract).[doi:10.1360/N112013-00130]
    [17] Gibler C, Stevens R, Crussell J, Chen H, Zang H, Choi H. AdRob:Examining the landscape and impact of Android application plagiarism. In:Proc. of the 11th Annual Int'l Conf. on Mobile Systems, Applications, and Services. New York:ACM Press, 2013. 431-444.[doi:10.1145/2462456.2464461]
    [18] Chen K, Liu P, Zhang YJ. Achieving accuracy and scalability simultaneously in detecting application clones on Android markets. In:Proc. of the 36th Int'l Conf. on Software Engineering. New York:ACM Press, 2014. 175-186.[doi:10.1145/2568225.2568286]
    [19] Wang HY, Guo Y, Ma ZA, Chen XQ. WuKong:A scalable and accurate two-phase approach to Android app clone detection. In:Proc. of the ACM Int'l Symp. on Software Testing and Analysis. New York:ACM Press, 2015. 71-82.[doi:10.1145/2771783. 2771795]
    [20] Android library statistics. http://www.appbrain.com/stats/libraries/
    [21] Narayanan A, Chen LH, Chan CK. AdDetect:Automated detection of Android ad libraries using semantic analysis. In:Proc. of IEEE the 9th Int'l Conf. on Intelligent Sensors, Sensor Networks and Information Processing. Washington:IEEE Computer Society, 2014. 1-6.[doi:10.1109/ISSNIP.2014.6827639]
    [22] Crussell J, Gibler C, Chen H. AnDarwin:Scalable detection of Android application clones based on semantics. IEEE Trans. on Mobile Computing, 2015,14(10):2007-2019.[doi:10.1109/TMC.2014.2381212]
    [23] Mojica Ruiz IJ, Nagappan M, Adams B, Berger T, Dienst S, Hassan AE. Impact of ad libraries on ratings of Android mobile apps. IEEE Software, 2014,31(6):86-92.[doi:10.1109/MS.2014.79] 1388
    [24] Ma ZA, Wang HY, Guo Y, Chen XQ. LibRadar:Fast and accurate detection of third-party libraries in Android apps. In:Proc. of the 38th Int'l Conf. on Software Engineering Companion. New York:ACM Press, 2016. 653-656.[doi:10.1145/2889160.2889178]
    [25] Apktool. Android-Apktool. https://code.google.com/p/android-apktool/
    [26] ProGuard. ProGuard. https://proguard.sourceforge.net/
    [27] Au YKW, Zhou YF, Huang Z, Lie D. PScout:Analyzing the Android permission specification. In:Proc. of the 2012 ACM Conf. on Computer and Communications Security. New York:ACM Press, 2012. 217-228.[doi:10.1145/2382196.2382222]
    [28] Content provider (URI strings) with permissions. http://pscout.csl.toronto.edu/download.php?file=results/jellybean_contentpro viderpermission
    [29] Documented API calls mappings. http://pscout.csl.toronto.edu/download.php?file=results/jellybean_publishedapimapping
    [30] Intents with permissions. http://pscout.csl.toronto.edu/download.php?file=results/jellybean_intentpermissions
    [31] Rasthofer S, Arzt S, Bodden E. A machine-learning approach for classifying and categorizing Android sources and sinks. In:Proc. of the 2014 Network and Distributed System Security Symp. 2014.
    [32] Wikipedia. Naive Bayes classifier. http://en.wikipedia.org/wiki/Naive_Bayes_classifier
    [33] Wikipedia. Maximum entropy classifier. http://en.wikipedia.org/w/index.php?title=Maximum_entropy_classifier&redirect=no
    [34] Wikipedia. C4.5 algorithm. http://en.wikipedia.org/wiki/C4.5_algorithm
    [35] Mallet:MAchine learning for LanguagE toolkit. http://mallet.cs.umass.edu/
    附中文参考文献:
    [16] 王浩宇,王仲禹,郭耀,陈向群.基于代码克隆检测技术的Android应用重打包检测.中国科学:信息科学,2014,44(1):142-157.[doi:10.1360/N112013-00130]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

王浩宇,郭耀,马子昂,陈向群.大规模移动应用第三方库自动检测和分类方法.软件学报,2017,28(6):1373-1388

Copy
Share
Article Metrics
  • Abstract:4913
  • PDF: 8194
  • HTML: 4599
  • Cited by: 0
History
  • Received:May 08,2016
  • Revised:September 09,2016
  • Online: February 21,2017
You are the first2044842Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063