多标记学习中基于交互表示的深度森林方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(62176117)


Interaction-representation-based Deep Forest Method in Multi-label Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在多标记学习中, 每个样本都与多个标记关联, 关键任务是如何在构建模型时利用标记之间的相关性. 多标记深度森林算法尝试在深度集成学习的框架下使用逐层的表示学习来挖掘标记之间的相关性, 并利用得到的标记概率表示提升预测精度. 然而, 一方面标记概率表示与标记信息高度相关, 这会导致其多样性较低. 随着深度森林的深度增加, 性能会下降. 另一方面, 标记概率的计算需要我们存储所有层数的森林结构并在测试阶段逐一使用, 这会造成难以承受的计算和存储开销. 针对这些问题, 提出基于交互表示的多标记深度森林算法(interaction- representation-based multi-label deep forest, iMLDF). iMLDF从森林模型的决策路径中挖掘特征空间中的结构信息, 利用随机交互树抽取决策树路径中的特征交互, 分别得到特征置信度得分和标记概率分布两种交互表示. iMLDF一方面充分利用模型中的特征结构信息来丰富标记间的相关信息, 另一方面通过交互表达式计算所有的表示, 从而使得算法无需存储森林结构, 大大地提升了计算效率. 实验结果表明: 在交互表示基础上进行表示学习的iMLDF算法取得了更好的预测性能, 而且针对样本较多的数据集, 计算效率比MLDF算法提升了一个数量级.

    Abstract:

    In multi-label learning, each sample is associated with multiple labels. The key task is how to use the correlation between labels when building the model. Multi-label deep forest (MLDF) algorithm attempts to mine the correlation between labels by using layer-by-layer representation learning under the framework of deep ensemble learning and use the obtained label probability representation to improve prediction accuracy. However, on the one hand, the label probability representation is highly correlated with the label information, which will lead to its low diversity. As the depth of the deep forest increases, the performance will decline. On the other hand, the calculation of label probability requires the storage of forest structures with all layers and the application of these structures one by one in the test stage, which will cause unbearable computational and storage overhead. To solve these problems, this study proposes interaction-representation-based MLDF (iMLDF). iMLDF mines the structural information in the feature space from the decision path of the forest model, extracts the feature interaction in the decision tree path by using the random interaction trees, and obtains two interaction representations of feature confidence score and label probability distribution, respectively. On the one hand, iMLDF makes full use of the feature structural information in the forest model to enrich the relevant information between labels. On the other hand, it calculates all the representations through interaction expressions so that the algorithm does not need to store all the forest structures, which greatly improves computational efficiency. The experimental results show that iMLDF algorithm achieves better prediction performance, and the computational efficiency is improved by an order of magnitude compared with MLDF for datasets with massive samples.

    参考文献
    相似文献
    引证文献
引用本文

吕沈欢,陈一赫,姜远.多标记学习中基于交互表示的深度森林方法.软件学报,2024,35(4):1934-1944

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-03-15
  • 最后修改日期:2022-10-19
  • 录用日期:
  • 在线发布日期: 2023-07-28
  • 出版日期: 2024-04-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号