自动术语抽取研究综述

doi:10.13328/j.cnki.jos.006040

微信服务号

微信订阅号

2025年7月19日 21:23 星期六

首页 > 过刊浏览>2020年第31卷第7期 >2062-2094. DOI:10.13328/j.cnki.jos.006040

PDF HTML阅读 XML下载导出引用引用提醒

自动术语抽取研究综述
DOI:
                        10.13328/j.cnki.jos.006040
                    
CSTR:
                        
                    
作者:
                        张雪张雪
数据工程与知识工程教育部重点实验室(中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
孙宏宇孙宏宇
数据工程与知识工程教育部重点实验室(中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
辛东兴辛东兴
数据工程与知识工程教育部重点实验室(中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
李翠平李翠平
数据工程与知识工程教育部重点实验室(中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
陈红陈红
数据工程与知识工程教育部重点实验室(中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:张雪(1989-),女,博士,主要研究领域为自然语言处理,数据挖掘;李翠平(1971-),女,博士,教授,博士生导师,CCF杰出会员,主要研究领域为社会网络分析,社会推荐,大数据分析和挖掘;孙宏宇(1994-),男,硕士,主要研究领域为数据挖掘,自然语言处理;陈红(1965-),女,博士,教授,博士生导师,CCF杰出会员,主要研究领域为大数据管理与隐私保护,基于新硬件的数据管理与数据分析,数据仓库与数据挖掘;辛东兴(1994-),男,硕士,主要研究领域为数据挖掘,自然语言处理.
通讯作者:李翠平,E-mail:licuiping@ruc.edu.cn
中图分类号:
基金项目:国家自然科学基金（61772537，61772536，61702522，61532021）；国家重点研发计划（2018YFB1004401）

Survey on Automatic Term Extraction Research

Author:

ZHANG Xue
ZHANG Xue
Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education(Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
SUN Hong-Yu
SUN Hong-Yu
Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education(Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
XIN Dong-Xing
XIN Dong-Xing
Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education(Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
LI Cui-Ping
LI Cui-Ping
Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education(Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Hong
CHEN Hong
Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education(Renmin University of China), Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Natural Science Foundation of China (61772537, 61772536, 61702522, 61532021); National Key Research and Development Program of China (2018YFB1004401)

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

自动术语抽取是从文本集合中自动抽取领域相关的词或短语，是本体构建、文本摘要、知识图谱等领域的关键基础问题和研究热点.特别是，随着近年来对非结构化文本大数据研究的兴起，使得自动术语抽取技术进一步得到学者的广泛关注，取得了较为丰富的研究成果.以术语排序算法为主线，对自动术语抽取方法的理论、技术、现状及优缺点进行研究综述：首先概述了自动术语抽取问题的形式化定义和解决框架.然后围绕"浅层语言分析"中基础语言信息和关系结构信息两个层面的特征对近年来国内外的研究成果进行分类，系统总结了现有自动术语抽取方法的研究进展和面临的挑战.最后对术语抽取使用的数据资源及实验评价进行分析，并对自动术语抽取未来可能的研究趋势进行了探讨与展望.

关键词:自动术语抽取;术语识别;文本处理;机器学习

Abstract:

Automatic term extraction is to extract domain-related words or phrases from document collections. It is a core basic problem and research hotspot in the fields of ontology construction, text summarization, and knowledge graph. In particular, under the rise of unstructured text studies in big data, automatic term extraction technology has been further concerned by researchers and has obtained rich research results recently. With the terminology sorting algorithm as the main clue, this study surveys the basic theories, technologies, current research works, advantages and disadvantages of automatic term extraction methods. First, the formalized definition and solution framework of automatic term extraction problem are outlined. Then, based on the features of the basic language information and the relational structure information in the "shallow parsing", the latest study results are classified, research progress and major challenges of existing automatic term extraction methods are summarized systematically. Finally, some available data resources are listed, evaluation approaches are analyzed, and the possible research trends in the future are predicted.

Key words:automatic term extraction;term recognition;text processing;machine learning

引用本文

张雪,孙宏宇,辛东兴,李翠平,陈红.自动术语抽取研究综述.软件学报,2020,31(7):2062-2094

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2019-09-17
最后修改日期:2020-02-09
录用日期:
在线发布日期: 2020-04-21
出版日期: 2020-07-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码