一种基于概率主题模型的命名实体链接方法

doi:10.13328/j.cnki.jos.004642

微信服务号

微信订阅号

2025年5月1日 23:18 星期四

首页 > 过刊浏览>2014年第25卷第9期 >2076-2087. DOI:10.13328/j.cnki.jos.004642

PDF HTML阅读 XML下载导出引用引用提醒

一种基于概率主题模型的命名实体链接方法
DOI:
                        10.13328/j.cnki.jos.004642
                    
CSTR:
                        
                    
作者:
                        怀宝兴怀宝兴
中国科学技术大学 计算机科学与技术学院, 安徽 合肥 230027
在期刊界中查找
在百度中查找
在本站中查找
宝腾飞宝腾飞
中国科学技术大学 计算机科学与技术学院, 安徽 合肥 230027
在期刊界中查找
在百度中查找
在本站中查找
祝恒书祝恒书
中国科学技术大学 计算机科学与技术学院, 安徽 合肥 230027
在期刊界中查找
在百度中查找
在本站中查找
刘淇刘淇
中国科学技术大学 计算机科学与技术学院, 安徽 合肥 230027
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家杰出青年科学基金(61325010); 国家高技术研究发展计划(863)(2014AA015203); 安徽省科技专项资金(13Z02008-5); 安徽省国际科技合作计划(1303063008); 安徽省科技攻关计划(1301022064); 安徽省自然科学基金(1408085QF110)

Topic Modeling Approach to Named Entity Linking

Author:

HUAI Bao-Xing
HUAI Bao-Xing
School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
在期刊界中查找
在百度中查找
在本站中查找
BAO Teng-Fei
BAO Teng-Fei
School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
在期刊界中查找
在百度中查找
在本站中查找
ZHU Heng-Shu
ZHU Heng-Shu
School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Qi
LIU Qi
School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

命名实体链接(named entity linking,简称NEL)是把文档中给定的命名实体链接到知识库中一个无歧义实体的过程,包括同义实体的合并、歧义实体的消歧等.该技术可以提升在线推荐系统、互联网搜索引擎等实际应用的信息过滤能力.然而,实体数量的激增给实体消歧等带来了巨大挑战,使得当前的命名实体链接技术越来越难以满足人们对链接准确率的要求.考虑到文档中的词和实体往往具有不同的语义主题(如“苹果”既能表示水果又可以是某电子品牌),而同一文档中的词与实体应当具有相似的主题,因此提出在语义层面对文档进行建模和实体消歧的思想.基于此设计一种完整的、基于概率主题模型的命名实体链接方法.首先,利用维基百科(Wikipedia)构建知识库;然后,利用概率主题模型将词和命名实体映射到同一个主题空间,并根据实体在主题空间中的位置向量,把给定文本中的命名实体链接到知识库中一个无歧义的命名实体;最后,在真实的数据集上进行大量实验,并与标准方法进行对比.实验结果表明:所提出的框架能够较好地解决了实体歧义问题,取得了更高的实体链接准确度.

关键词:命名实体链接;概率主题模型;维基百科

Abstract:

Named entity linking (NEL) is an advanced technology which links a given named entity to an unambiguous entity in the knowledge base, and thus plays an important role in a wide range of Internet services, such as online recommender systems and Web search engines. However, with the explosive increasing of online information and applications, traditional solutions of NEL are facing more and more challenges towards linking accuracy due to the large number of online entities. Moreover, the entities are usually associated with different semantic topics (e.g., the entity “Apple” could be either a fruit or a brand) whereas the latent topic distributions of words and entities in same documents should be similar. To address this issue, this paper proposes a novel topic modeling approach to named entity linking. Different from existing works, the new approach provides a comprehensive framework for NEL and can uncover the semantic relationship between documents and named entities. Specifically, it first builds a knowledge base of unambiguous entities with the help of Wikipedia. Then, it proposes a novel bipartite topic model to capture the latent topic distribution between entities and documents. Therefore, given a new named entity, the new approach can link it to the unambiguous entity in the knowledge base by calculating their semantic similarity with respect to latent topics. Finally, the paper conducts extensive experiments on a real-world data set to evaluate our approach for named entity linking. Experimental results clearly show that the proposed approach outperforms other state-of-the-art baselines with a significant margin.

Key words:named entity linking;probabilistic topic models;Wikipedia

引用本文

怀宝兴,宝腾飞,祝恒书,刘淇.一种基于概率主题模型的命名实体链接方法.软件学报,2014,25(9):2076-2087

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2014-04-05
最后修改日期:2014-05-14
录用日期:
在线发布日期: 2014-09-09
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码