面向函数内联场景的二进制到源代码函数相似性检测方法

doi:10.13328/j.cnki.jos.007335

微信服务号

微信订阅号

2025年4月26日 19:05 星期六

首页 > 过刊浏览>2025年第36卷第7期 >1-19. DOI:10.13328/j.cnki.jos.007335

PDF HTML阅读 XML下载导出引用引用提醒

面向函数内联场景的二进制到源代码函数相似性检测方法
DOI:
                        10.13328/j.cnki.jos.007335
                    
CSTR:
                        
                    
作者:
                        贾昂贾昂
西安交通大学 电子与信息学部, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
范铭范铭
西安交通大学 电子与信息学部, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
徐茜徐茜
西安交通大学 电子与信息学部, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
晋武侠晋武侠
西安交通大学 电子与信息学部, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
王海军王海军
西安交通大学 电子与信息学部, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
刘烃刘烃
西安交通大学 电子与信息学部, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:范铭,E-mail:mingfan@mail.xjtu.edu.cn
中图分类号:TP311
基金项目:国家自然科学基金(62232014, 62272377, 62372368, 62372367); 陕西省科学技术协会青年人才托举计划

Binary2Source Function Similarity Detection Method Under Function Inlining

Author:

JIA Ang
JIA Ang
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
在期刊界中查找
在百度中查找
在本站中查找
FAN Ming
FAN Ming
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
在期刊界中查找
在百度中查找
在本站中查找
XU Xi
XU Xi
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
在期刊界中查找
在百度中查找
在本站中查找
JIN Wu-Xia
JIN Wu-Xia
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Hai-Jun
WANG Hai-Jun
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Ting
LIU Ting
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [29]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

二进制到源代码函数相似性检测是软件组成成分分析的基础性工作之一. 现有方法主要采用一对一的匹配策略, 即使用单一的二进制函数和单一的源代码函数进行比对. 然而, 由于函数内联的存在, 函数之间的映射关系实际上表现为一对多——单一的二进制函数能够关联至多个源代码函数. 这一差异导致现有方法在函数内联场景下遭受了30%的性能损失. 针对函数内联场景下的二进制到源代码函数匹配需求, 提出了一种面向一对多匹配的二进制到源代码函数相似性检测方法, 旨在生成源代码函数集合作为内联二进制函数的匹配对象, 以弥补源代码函数库的缺失. 通过一系列实验评估了方法的有效性. 实验数据表明, 方法不仅能够提升现有二进制到源代码函数相似性检测的能力, 而且还能够找到内联的源代码函数, 帮助现有工具更好地应对内联挑战.

关键词:二进制到源代码函数相似性检测;函数内联;源代码函数集合

Abstract:

Binary2Source function similarity detection is regarded as one of the fundamental tasks in software composition analysis. In the existing binary2Source matching works, the 1-to-1 matching mechanism is mainly adopted, where one binary function is matched against one source function. However, it is found that such a mapping may be 1-to-n (one binary function is mapped to multiple source functions) due to the existence of function inlining. A 30% performance loss is suffered by the existing binary2Source matching methods under function inlining due to this difference. Aimed at the matching requirement of binary to source functions in the scene of function inlining, a binary2Source function similarity detection method for 1-to-n matching is proposed in this study, which is designed to generate source function sets as the matching objects for the inlined binary functions to make up for the lack of the source function library. The effectiveness of the proposed method is evaluated through a series of experiments. The experimental data indicate that the method can not only improve the existing binary2Source function similarity detection ability but also identify the inlined source code functions, helping the existing tools better cope with the challenges of inlining.

Key words:binary2Source function similarity detection;function inlining;source function set

参考文献

[1] Hemel A, Kalleberg KT, Vermaas R, Dolstra E. Finding software license violations through binary code clone detection. In: Proc. of the 8th Working Conf. on Mining Software Repositories. Honolulu: ACM, 2011. 63–72.

[2] Rahimian A, Charland P, Preda S, Debbabi M. RESource: A framework for online matching of assembly with open source code. In: Proc. of the 5th Int’l Symp. on Foundations and Practice of Security. Montreal: Springer, 2012. 211–226. [doi: 10.1007/978-3-642-37119-6_14]

[3] Kim D, Cho S, Han S, Park M, You I. Open source software detection using function-level static software birthmark. Journal of Internet Services and Information Security, 2014, 4(4): 25–37.

[4] Miyani D, Huang Z, Lie D. BinPro: A tool for binary source code provenance. arXiv:1711.00830, 2017.

[5] Duan RA, Bijlani A, Xu M, Kim T, Lee W. Identifying open-source license violation and 1-day security risk at large scale. In: Proc. of the 2017 ACM SIGSAC Conf. on Computer and Communications Security. Dallas: ACM, 2017. 2169–2185. [doi: 10.1145/3133956.3134048]

[6] Feng MY, Mao WX, Yuan ZM, Xiao Y, Ban G, Wang W, Wang SY, Tang Q, Xu JH, Su H, Liu BH, Huo W. Open-source license violations of binary software at large scale. In: Proc. of the 26th Int’l Conf. on Software Analysis, Evolution and Reengineering (SANER). Hangzhou: IEEE, 2019. 564–568. [doi: 10.1109/SANER.2019.8667977]

[7] Yuan ZM, Feng MY, Li F, Ban G, Xiao Y, Wang SY, Tang Q, Su H, Yu CD, Xu JH, Piao AH, Xuey J, Huo W. B2SFinder: Detecting open-source software reuse in COTS software. In: Proc. of the 34th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). San Diego: IEEE, 2019. 1038–1049. [doi: 10.1109/ASE.2019.00100]

[8] Ban G, Xu LL, Xiao Y, Li XH, Yuan ZM, Huo W. B2SMatcher: Fine-grained version identification of open-source software in binary files. Cybersecurity, 2021, 4(1): 21.

[9] Ji YD, Cui L, Huang HH. BugGraph: Differentiating source-binary code similarity with graph triplet-loss network. In: Proc. of the 2021 ACM Asia Conf. on Computer and Communications Security. Hong Kong: ACM, 2021. 702–715. [doi: 10.1145/3433210.3437533]

[10] Gui Y, Wan Y, Zhang HY, Huang HF, Sui YL, Xu GD, Shao ZY, Jin H. Cross-language binary-source code matching with intermediate representations. In: Proc. of the 2022 IEEE Int’l Conf. on Software Analysis, Evolution and Reengineering (SANER). Honolulu: IEEE, 2022. 601–612. [doi: 10.1109/SANER53432.2022.00077]

[11] Yu ZP, Zheng WX, Wang JQ, Tang QY, Nie S, Wu S. CodeCMR: Cross-modal retrieval for function-level binary source code matching. In: Proc. of the 34th Int’l Conf. on Neural Information Processing Systems. Vancouver: ACM, 2020. 326.

[12] Jia A, Fan M, Jin WX, Xu X, Zhou ZH, Tang QY, Nie S, Wu S, Liu T. 1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis. ACM Trans. on Software Engineering and Methodology, 2023, 32(4): 87.

[13] Theodoridis T, Grosser T, Su ZD. Understanding and exploiting optimal function inlining. In: Proc. of the 27th ACM Int’l Conf. on Architectural Support for Programming Languages and Operating Systems. Lausanne: ACM, 2022. 977–989. [doi: 10.1145/3503222.3507744]

[14] Damásio T, Pacheco V, Goes F, Pereira F, Rocha R. Inlining for code size reduction. In: Proc. of the 25th Brazilian Symp. on Programming Languages. Joinville: ACM, 2021. 17–24. [doi: 10.1145/3475061.3475081]

[15] Gupta P, Jha A, Gupta B, Sumpi K, Sahoo S, Chalapathi MMV. Techniques and trade-offs in function inlining optimization. EAI Endorsed Trans. on Scalable Information Systems, 2024, 11(4): 1–7.

[16] Weingarten ME, Theodoridis T, Prokopec A. Inlining-benefit prediction with interprocedural partial escape analysis. In: Proc. of the 14th ACM SIGPLAN Int’l Workshop on Virtual Machines and Intermediate Languages. Auckland: ACM, 2022. 13–24. [doi: 10.1145/3563838.3567677]

[17] Ben-Asher Y, Faour N, Shinaar O. Mutual inlining: An inlining algorithm to reduce the executable size. In: Proc. of the 2022 CS & IT Conf. 2022. 1–16. [doi: 10.5121/csit.2022.120601]

[18] Muts K, Falk H. Multi-criteria function inlining for hard real-time systems. In: Proc. of the 28th Int’l Conf. on Real-time Networks and Systems. Paris: ACM, 2020. 56–66. [doi: 10.1145/3394810.3394819]

[19] Romano A, Wang WH. When function inlining meets WebAssembly: Counterintuitive impacts on runtime performance. In: Proc. of the 31st ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. San Francisco: ACM, 2023. 350–362. [doi: 10.1145/3611643.3616311]

[20] Chandramohan M, Xue YX, Xu ZZ, Liu Y, Cho CY, Tan HBK. BinGo: Cross-architecture cross-os binary search. In: Proc. of the 24th ACM SIGSOFT Int’l Symp. on Foundations of Software Engineering. Seattle: ACM, 2016. 678–689. [doi: 10.1145/2950290.2950350]

[21] Ding SHH, Fung BCM, Charland P. Asm2Vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: Proc. of the 2019 IEEE Symp. on Security and Privacy (SP). San Francisco: IEEE, 2019. 472–489. [doi: 10.1109/SP.2019.00003]

[22] Kim D, Kim E, Cha SK, Son S, Kim Y. Revisiting binary code similarity analysis using interpretable feature engineering and lessons learned. IEEE Trans. on Software Engineering, 2023, 49(4): 1661–1682.

[23] Moyano JM, Gibaja EL, Cios KJ, Ventura S. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion, 2018, 44: 33–45.

[24] Bogatinovski J, Todorovski L, Džeroski S, Kocev D. Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications, 2022, 203: 117215.

[25] Kocev D, Vens C, Struyf J, Džeroski S. Tree ensembles for predicting structured outputs. Pattern Recognition, 2013, 46(3): 817–833.

[26] Tsoumakas G, Katakis I. Multi-label classification: An overview. Int’l Journal of Data Warehousing and Mining, 2007, 3(3): 1–13.

[27] Read J. Scalable multi-label classification [Ph.D. Thesis]. Hamilton: University of Waikato, 2010.

[28] Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictions. In: Proc. of the 11th Annual Conf. on Computational Learning Theory. Madison: ACM, 1998. 80–91. [doi: 10.1145/279943.279960]

[29] Kenner A, Kästner C, Haase S, Leich T. TypeChef: Toward type checking #ifdef variability in C. In: Proc. of the 2nd Workshop on Feature-Oriented Software Development. Eindhoven: ACM, 2010. 25–32. [doi: 10.1145/1868688.1868693]

引用本文

贾昂,范铭,徐茜,晋武侠,王海军,刘烃.面向函数内联场景的二进制到源代码函数相似性检测方法.软件学报,2025,36(7):1-19

复制

文章指标

点击次数:83
下载次数: 183
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2024-08-22
最后修改日期:2024-10-15
录用日期:
在线发布日期: 2024-12-10
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码