具有高可理解性的二分决策树生成算法研究
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

Supported by the National Natural Science Foundation of China under Grant No.69825104 (国家自然科学基金)


Constructing Binary Classification Trees with High Intelligibility
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    二分离散化是决策树生成中处理连续属性最常用的方法,对于连续属性较多的问题,生成的决策树庞大,知识表示难以理解.针对两类分类问题,提出一种基于属性变换的多区间离散化方法--RCAT,该方法首先将连续属性转化为某类别的概率属性,此概率属性的二分法结果对应于原连续属性的多区间划分,然后对这些区间的边缘进行优化,获得原连续属性的信息熵增益,最后采用悲观剪枝与无损合并剪枝技术对RCAT决策树进行简化.对多个领域的数据集进行实验,结果表明:对比二分离散化,RCAT算法的执行效率高,生成的决策树在保持分类精度的同时,树的规模小,可理解性强.

    Abstract:

    Binarization is the most popular discretization method in decision tree generation, while for the domain with many continuous attributes, it always gets a big incomprehensible tree which can't be described as knowledge. In order to get a more intelligible decision tree, this paper presents a new discretization algorithm, RCAT, for continuous attributes in the generation of binary classification tree. It uses simple binarization to solve the multisplitting problem through mapping a continuous attribute into another probability attribute based on statistic information. Two pruning methods are introduced to simplify the constructed tree. Empirical results of several domains show that, for the two-class problem with a preponderance of continuous attributes, RCAT algorithm can generate a much smaller decision tree efficiently with higher intelligibility than binarization while retaining predictive accuracy.

    参考文献
    相似文献
    引证文献
引用本文

蒋艳凰,杨学军,赵强利.具有高可理解性的二分决策树生成算法研究.软件学报,2003,14(12):1996-2005

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2002-10-29
  • 最后修改日期:2002-12-31
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号