多粒度信息关系增强的多标签文本分类

doi:10.13328/j.cnki.jos.006802

微信服务号

微信订阅号

2025年6月15日 3:38 星期日

首页 > 过刊浏览>2023年第34卷第12期 >5686-5703. DOI:10.13328/j.cnki.jos.006802

PDF HTML阅读 XML下载导出引用引用提醒

多粒度信息关系增强的多标签文本分类
DOI:
                        10.13328/j.cnki.jos.006802
                    
CSTR:
                        
                    
作者:
                        李芳芳李芳芳
中南大学 计算机学院, 湖南 长沙 410038
在期刊界中查找
在百度中查找
在本站中查找
苏朴真苏朴真
中南大学 计算机学院, 湖南 长沙 410038
在期刊界中查找
在百度中查找
在本站中查找
段俊文段俊文
中南大学 计算机学院, 湖南 长沙 410038
在期刊界中查找
在百度中查找
在本站中查找
张师超张师超
中南大学 计算机学院, 湖南 长沙 410038
在期刊界中查找
在百度中查找
在本站中查找
毛星亮毛星亮
湖南工商大学 大数据与互联网创新研究院, 湖南 长沙 410205
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:李芳芳(1983－),女,博士,副教授,博士生导师,CCF专业会员,主要研究领域为机器学习,自然语言处理,文本挖掘.;苏朴真(1999－),男,硕士生,CCF学生会员,主要研究领域为机器学习,自然语言处理.;段俊文(1990－),男,博士,讲师,CCF专业会员,主要研究领域为自然语言处理,信息抽取.;张师超(1962－),男,博士,教授,博士生导师,主要研究领域为数据挖掘,知识发现.;毛星亮(1979－),男,博士,副教授,CCF专业会员,主要研究领域为自然语言处理,文本挖掘.
通讯作者:段俊文,E-mail:jwduan@csu.edu.cn;张师超,E-mail:zhangsc@csu.edu.cn
中图分类号:TP18
基金项目:国家自然科学基金(62172449, 61836016, 71790615, 62006251, 62172441); 湖南省自然科学基金(2021JJ30870, 2021JJ40783); 长沙市自然科学基金(kq2014134); 国防科技重点实验室基金(6142101190302)

Multi-label Text Classification with Enhancing Multi-granularity Information Relations

Author:

LI Fang-Fang
LI Fang-Fang
School of Computer Science and Engineering, Central South University, Changsha 410038, China
在期刊界中查找
在百度中查找
在本站中查找
SU Pu-Zhen
SU Pu-Zhen
School of Computer Science and Engineering, Central South University, Changsha 410038, China
在期刊界中查找
在百度中查找
在本站中查找
DUAN Jun-Wen
DUAN Jun-Wen
School of Computer Science and Engineering, Central South University, Changsha 410038, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Shi-Chao
ZHANG Shi-Chao
School of Computer Science and Engineering, Central South University, Changsha 410038, China
在期刊界中查找
在百度中查找
在本站中查找
MAO Xing-Liang
MAO Xing-Liang
Institute of Big Data and Internet Innovation, Hunan University of Technology and Business, Changsha 410205, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

基于深度学习的多标签文本分类方法存在两个主要缺陷: 缺乏对文本信息多粒度的学习, 以及对标签间约束性关系的利用. 针对这些问题, 提出一种多粒度信息关系增强的多标签文本分类方法. 首先, 通过联合嵌入的方式将文本与标签嵌入到同一空间, 并利用BERT预训练模型获得文本和标签的隐向量特征表示. 然后, 构建3个多粒度信息关系增强模块: 文档级信息浅层标签注意力分类模块、词级信息深层标签注意力分类模块和标签约束性关系匹配辅助模块. 其中, 前两个模块针对共享特征表示进行多粒度学习: 文档级文本信息与标签信息浅层交互学习, 以及词级文本信息与标签信息深层交互学习. 辅助模块通过学习标签间关系来提升分类性能. 最后, 所提方法在3个代表性数据集上, 与当前主流的多标签文本分类算法进行了比较. 结果表明, 在主要指标Micro-F1、Macro-F1、nDCG@k、P@k上均达到了最佳效果.

关键词:注意力机制;多标签文本分类;标签关系;多粒度信息

Abstract:

Multi-label text classification methods based on deep learning lack multi-granularity learning of text information and the utilization of constraint relations between labels. To solve these problems, this study proposes a multi-label text classification method with enhancing multi-granularity information relations. First, this method embeds text and labels in the same space by joint embedding and employs the BERT pre-trained model to obtain the implicit vector feature representation of text and labels. Then, three multi-granularity information relations enhancing modules including document-level information shallow label attention (DISLA) classification module, word-level information deep label attention (WIDLA) classification module, and label constraint relation matching auxiliary module are constructed. The first two modules carry out multi-granularity learning from shared feature representation: the shallow interactive learning between document-level text information and label information, and the deep interactive learning between word-level text information and label information. The auxiliary module improves the classification performance by learning the relation between labels. Finally, the comparison with current mainstream multi-label text classification algorithms on three representative datasets shows that the proposed method achieves the best performance on main indicators of Micro-F1, Macro-F1, nDCG@k, and P@k.

Key words:attention mechanism;multi-label text classification;label relation;multi-granularity information

引用本文

李芳芳,苏朴真,段俊文,张师超,毛星亮.多粒度信息关系增强的多标签文本分类.软件学报,2023,34(12):5686-5703

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-06-07
最后修改日期:2022-08-29
录用日期:
在线发布日期: 2023-03-15
出版日期: 2023-12-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码