Cost-sensitive Decision Tree Induction on Dirty Data

doi:10.13328/j.cnki.jos.005691

微信服务号

微信订阅号

2025-5-2- 10

Home > Archive>Volume 30, Issue 3, 2019 >604-619. DOI:10.13328/j.cnki.jos.005691

PDF HTML XML Export Cite reminder

Cost-sensitive Decision Tree Induction on Dirty Data
DOI:
                        10.13328/j.cnki.jos.005691
                    
Author:
                        QI Zhi-XinQI Zhi-Xin
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Hong-ZhiWANG Hong-Zhi
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHOU XiongZHOU Xiong
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Jian-ZhongLI Jian-Zhong
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GAO HongGAO Hong
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (U1509216, 61472099); National Sci-Tech Support Plan (2015BAH10F01)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Cost-sensitive decision tree is a kind of decision tree which maximizes the sum of misclassification costs and test costs. Recently, with the explosive growth of data size, dirty data appears more frequently. In the process of cost-sensitive decision tree induction, dirty data in training datasets have negative impacts on selection of splitting attributes and division of decision tree nodes. Therefore, dirty data cleaning is necessary before classification tasks. Nevertheless, in practice, many users provide an acceptable threshold of data cleaning costs since time costs and expenses of data cleaning are expensive. Therefore, in addition to misclassification cost and test cost, data-cleaning cost is also an essential factor in cost-sensitive decision tree induction. However, existing researches have not considered data quality in the problem. To fill this gap, this study aims to focus on cost-sensitive decision tree induction on dirty data. Three decision tree induction methods integrated with data cleaning algorithms are presented. Experimental results demonstrate the effective of the proposed approaches.

Key words:cost-sensitive decision tree;dirty data;data cleaning;misclassification cost;test cost

Get Citation

齐志鑫,王宏志,周雄,李建中,高宏.劣质数据上代价敏感决策树的建立.软件学报,2019,30(3):604-619

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:July 19,2018
Revised:September 20,2018
Adopted:
Online: March 06,2019
Published:

You are the first2041648Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History