RTDMiner: Detecting Reference Count Update Bugs Based on Data Mining

doi:10.13328/j.cnki.jos.006676

微信服务号

微信订阅号

2025-5-11- 18

Home > Archive>Volume 34, Issue 10, 2023 >4724-4742. DOI:10.13328/j.cnki.jos.006676

PDF HTML XML Export Cite reminder

RTDMiner: Detecting Reference Count Update Bugs Based on Data Mining
DOI:
                        10.13328/j.cnki.jos.006676
                    
Author:
                        BIAN PanBIAN Pan
School of Information, Renmin University of China, Beijing 100872, China;Huawei Technologies Co. Ltd., Beijing 100085, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LIANG BinLIANG Bin
School of Information, Renmin University of China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HUANG Jian-JunHUANG Jian-Jun
School of Information, Renmin University of China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YOU WeiYOU Wei
School of Information, Renmin University of China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHI Wen-ChangSHI Wen-Chang
School of Information, Renmin University of China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG JianZHANG Jian
State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP311
Fund Project:

Article

Figures

Metrics

Reference [38]

Related [20]

Cited by

Materials

Comments

Abstract:

Reference counts are widely employed in large-scale low-level systems including Linux kernel to manage shared resources, and should be consistent with the number of objects referring to resources. Otherwise, bugs of improper update of reference counts may be caused, and resources can never be released or will be released earlier. To detect improper updates of reference counts, available static detection methods have to know the functions which increase reference counts or decrease the counts. However, manually collecting prior knowledge of reference counts is too time-consuming and may be incomplete. Though mining-based methods can reduce the dependency on prior knowledge, it is difficult to effectively detect path-sensitive bugs containing improper updates of reference counts. To this end, this study proposes a method RTDMiner that deeply integrates data mining into static analysis to detect improper updates of reference counts. First, according to the general principles of reference counts, the data mining technique is leveraged to identify functions that raise or reduce reference counts. Then, a path-sensitive static analysis method is employed to detect defective paths that increase reference counts instead of decreasing the counts. To reduce false positives, the study adopts the data mining technique to identify exceptional patterns during detection. The experiment results on the Linux kernel demonstrate that the proposed method can automatically identify functions increasing or decreasing reference counts with the precision of nearly 90%. Moreover, 24 out of the top 50 suspicious bugs detected by RTDMiner have been confirmed to be real bugs by kernel maintainers.

Key words:reference count;bug detection;data mining;static analysis

Reference

[1] Wilson PR. Uniprocessor garbage collection techniques. In: Proc. of the 1992 Int’l Workshop on Memory Management. St. Malo: Springer, 1992. 1-42.

[2] Levanoni Y, Petrank E. An on-the-fly reference counting garbage collector for Java. In: Proc. of the 16th ACM SIGPLAN Conf. on Object-oriented Programming, Systems, Languages, and Applications. Tampa Bay: ACM, 2001. 367-380.

[3] Bevan D. Distributed garbage collection using reference counting. In: Proc. of the 1987 Int’l Conf. on Parallel Architectures and Languages Europe. Eindhoven: Springer, 1987. 176-187.

[4] McKenney PE, Slingwine JD. Read-copy update: Using execution history to solve concurrency problems. In: Proc. of the 1998 Parallel and Distributed Computing and Systems. 1998. 509-518.

[5] McKenney PE. Overview of Linux-kernel reference counting. Technical Report, N2167=07-0027, IBM Beaverton.

[6] Mao JJ, Chen Y, Xiao QX, Shi YC. RID: Finding reference count bugs with inconsistent path pair checking. In: Proc. of the 21st Int’l Conf. on Architectural Support for Programming Languages and Operating Systems. Atlanta: ACM, 2016. 531-544.

[7] Maebe J, Ronsse M, Bosschere KD. Precise detection of memory leaks. In: Proc. of the 2nd Int’l Workshop on Dynamic Analysis. 2004. 25-31.

[8] Manès VJM, Han HS, Han C, Cha SK, Egele M, Schwartz EJ, Woo M. The art, science, and engineering of fuzzing: A survey. IEEE Trans. on Software Engineering, 2021, 47(11): 2312-2331. [doi: 10.1109/tse.2019.2946563

[9] Engler D, Chen DY, Hallem S, Chou A, Chelf B. Bugs as deviant behavior: A general approach to inferring errors in systems code. ACM SIGOPS Operating Systems Review, 2001, 35(5): 57-72. [doi: 10.1145/502059.502041

[10] Li ZM, Zhou YY. PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. ACM SIGSOFT Software Engineering Notes, 2005, 30(5): 306-315. [doi: 10.1145/1095430.1081755

[11] Liang B, Bian P, Zhang Y, Shi WC, You W, Cai Y. AntMiner: Mining more bugs by reducing noise interference. In: Proc. of the 38th IEEE/ACM Int’l Conf. on Software Engineering. Austin: IEEE, 2016. 333-344.

[12] Yun I, Min C, Si XJ, Jang Y, Kim T, Naik M. APISAN: Sanitizing API usages through semantic cross-checking. In: Proc. of the 25th USENIX Conf. on Security Symp. Austin: USENIX Association, 2016. 363-378.

[13] Lal A, Ramalingam G. Reference count analysis with shallow aliasing. Information Processing Letters, 2010, 111(2): 57-63. [doi: 10.1016/j.ipl.2010.08.003

[14] Malcom D. CPyChecker. 2021. https://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html

[15] Li SL, Tan G. Finding reference-counting errors in Python/C programs with affine analysis. In: Proc. of the 28th European Conf. on Object-oriented Programming. Uppsala: Springer, 2014. 80-104.

[16] Kremenek T, Twohey P, Back G, Ng A, Engler D. From uncertainty to belief: Inferring the specification within. In: Proc. of the 7th Symp. on Operating Systems Design and Implementation. Seattle: USENIX Association, 2006. 161-176.

[17] Ganapathy V, King D, Jaeger T, Jha S. Mining security-sensitive operations in legacy code using concept analysis. In: Proc. of the 29th Int’l Conf. on Software Engineering (ICSE 2007). Minneapolis: IEEE, 2007. 458-467.

[18] Tan L, Zhang XL, Ma X, Xiong WW, Zhou YY. AutoISES: Automatically inferring security specification and detecting violations. In: Proc. of the 17th USENIX Security Symp. San Jose: USENIX Association, 2008. 379-394.

[19] Rasthofer S, Arzt S, Bodden E. A machine-learning approach for classifying and categorizing Android sources and sinks. NDSS. 2014, 14: 1125. http://www.bodden.de/pubs/rab14classifying.pdf

[20] Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Le Traon Y, Octeau D, McDaniel P. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android Apps. In: Proc. of the 2014 ACM SIGPLAN Conf. on Programming Language Design and Implementation. Edinburgh: ACM, 2014. 259-269.

[21] Bian P, Liang B, Huang JJ, Shi WC, Wang XD, Zhang J. SinkFinder: Harvesting hundreds of unknown interesting function pairs with just one seed. In: Proc. of the 28th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Virtual: ACM, 2020. 1101-1113.

[22] Bian P, Liang B, Shi WC, Huang JJ, Cai Y. NAR-miner: Discovering negative association rules from code for bug detection. In: Proc. of the 26th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Lake Buena: ACM, 2018. 411-422.

[23] Yamaguchi F, Wressnegger C, Gascon H, Rieck K. Chucky: Exposing missing checks in source code for vulnerability discovery. In: Proc. of the 2013 ACM SIGSAC Conf. on Computer & Communications Security. Berlin: ACM, 2013. 499-510.

[24] Lu S, Park S, Hu CF, Ma X, Jiang WH, Li ZM, Popa RA, Zhou YY. MUVI: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. ACM SIGOPS Operating Systems Review, 2007, 41(6): 103-116. [doi: 10.1145/1323293.1294272

[25] Yamaguchi F, Maier A, Gascon H, Rieck K. Automatic inference of search patterns for taint-style vulnerabilities. In: Proc. of the 2015 IEEE Symp. on Security and Privacy. San Jose: IEEE, 2015. 797-812.

[26] Kang Y, Ray B, Jana S. Apex: Automated inference of error specifications for C APIs. In: Proc. of the 31st IEEE/ACM Int’l Conf. on Automated Software Engineering. Singapore: ACM, 2016. 472-482.

[27] Acharya M, Xie T, Pei J, Xu J. Mining API patterns as partial orders from source code: From usage scenarios to specifications. In: Proc. of the 6th Joint Meeting of the European Software Engineering Conf. and the ACM SIGSOFT Symp. on the Foundations of Software Engineering. Dubrovnik: ACM, 2007. 25-34.

[28] Pradel M, Jaspan C, Aldrich J, Gross TR. Statically checking API protocol conformance with mined multi-object specifications. In: Proc. of the 34th Int’l Conf. on Software Engineering. Zurich: IEEE, 2012. 925-935.

[29] Sun BY, Shu G, Podgurski A, Robinson B. Extending static analysis by mining project-specific rules. In: Proc. of the 34th Int’l Conf. on Software Engineering (ICSE 2012). Zurich: IEEE, 2012. 1054-1063.

[30] Fast commits for ext4. 2021. https://lwn.net/Articles/842385

[31] Hess B, Bekker H, Berendsen HJC, Fraaije JGEM. LINCS: A linear constraint solver for molecular simulations. Journal of Computational Chemistry, 1997, 18(12): 1463-1472. [doi: 10.1002/(SICI)1096-987X(199709)18:12<1463::AID-JCC4>3.0.CO;2-H

[32] Xie YC, Chou A, Engler D. ARCHER: Using symbolic, path-sensitive analysis to detect memory access errors. In: Proc. of the 9th European Software Engineering Conf. Held Jointly with the 11th ACM SIGSOFT Int’l Symp. on Foundations of Software Engineering. Helsinki: ACM, 2003. 327-336.

[33] Ganesh V, Dill DL. A decision procedure for bit-vectors and arrays. In: Proc. of the 2007 Int’l Conf. on Computer Aided Verification. Berlin, Heidelberg: Springer, 2007. 519-531.

[34] Gens D, Schmitt S, Davi L, Sadeghi AR. K-Miner: Uncovering memory corruption in Linux. In: Proc. of the 2018 Network and Distributed System Security Symp. San Diego: The Internet Society, 2018.

[35] Rubio-González C, Liblit B. Defective error/pointer interactions in the Linux kernel. In: Proc. of the 2011 Int’l Symp. on Software Testing and Analysis. Toronto: ACM, 2011. 111-121.

[36] Jana S, Kang YJ, Roth S, Ray B. Automatically detecting error handling bugs using error specifications. In: Proc. of the 25th USENIX Conf. on Security Symp. Austin: USENIX Association, 2016. 345-362.

[37] Engler DR, Chelf B, Chou A, Hallem S. Checking system rules using system-specific, programmer-written compiler extensions. In: Proc. of the 4th Symp. on Operating System Design & Implementation. San Diego: USENIX Association, 2000. 1.

[38] Sharir M, Pnueli A. Two approaches to interprocedural data flow analysis. New York: Courant Institute of Mathematical Sciences, Computer Science Department, New York University, 1978. https://www.cse.psu.edu/~trj1/cse598-f11/docs/sharir_pnueli1.pdf

Get Citation

边攀,梁彬,黄建军,游伟,石文昌,张健. RTDMiner: 基于数据挖掘的引用计数更新缺陷检测方法.软件学报,2023,34(10):4724-4742

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:June 11,2021
Revised:December 18,2021
Adopted:
Online: April 04,2023
Published: October 06,2023

You are the first2043809Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History