Refcount Field Identification for Linux Kernel Based on Deep Learning

doi:10.13328/j.cnki.jos.006567

微信服务号

微信订阅号

2025-6-2- 21

Home > Archive>Volume 33, Issue 6, 2022 >2030-2046. DOI:10.13328/j.cnki.jos.006567

PDF HTML XML Export Cite reminder

Refcount Field Identification for Linux Kernel Based on Deep Learning
DOI:
                        10.13328/j.cnki.jos.006567
                    
Author:
                        TAN XinTAN Xin
School of Computer Science, Fudan University, Shanghai 201203, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YANG Xi-YuYANG Xi-Yu
School of Computer Science, Fudan University, Shanghai 201203, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CAO Jia-JunCAO Jia-Jun
School of Computer Science, Fudan University, Shanghai 201203, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG YuanZHANG Yuan
School of Computer Science, Fudan University, Shanghai 201203, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP311
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Reference counting (refcount) is a common memory management technique in modern software. Refcount errors can often lead to severe memory errors such as memory leak, use-after-free, etc. Many efforts to harden refcount security rely on known refcount fields as their input. However, due to the complexity of software code, identifying refcount fields in source code is very challenging. Traditional methods of identifying refcount fields are mainly based on code pattern matching and have great limitations such as requiring expert experience to summarize patterns, which is a laborious job. Besides, the manually-summarized patterns do not cover all cases, resulting in a low recall. To address these issues, this studyproposes to characterize a field based on the field name and the code behaviour associated with the field; and designs a multimodal deep learning based approach. The study implements a prototype of the new approach for Linux kernel code. In the evaluation, the precision and recall achieved by the prototype system are 96.98% and 93.54%. In contrast, the traditional code-pattern-based identification method did not report any refcount fields on the testing set. In addition, sixty-one refcount fields are identified which are implemented with insecure data types in the latest Linux kernel. Until now, twenty-one of them are reported to the Linux community, of which six have been confirmed.

Key words:refcount field identification;static program analysis;multimodal deep learning

Get Citation

谈心,杨悉瑜,曹家俊,张源.基于深度学习的Linux内核引用计数字段识别方法.软件学报,2022,33(6):2030-2046

Copy

Article Metrics

Abstract:1532
PDF: 4386
HTML: 3193
Cited by: 0

History

Received:September 05,2021
Revised:October 15,2021
Adopted:
Online: January 28,2022
Published: June 06,2022

You are the first2049922Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History