[关键词]
[摘要]
大数据时代,面向知识产权的科技资源呈现数据规模大、时效性高和价值密度较低等趋势,为有效利用知识产权资源带来严峻的挑战.同时,各个国家对知识产权中隐匿信息挖掘的需求日益增加,使得面向知识产权的科技资源画像构建成为当下的研究热点.目标是通过智能化的数据获取、实体识别以及可视化的方式对知识产权进行画像构建.然而,现有的科技资源画像构建方法只适用于结构化数据,忽略了词语的词性对句子语义理解的影响.因此,提出了一种新颖的面向知识产权的科技资源画像构建算法,针对自动获取的知识产权资源,通过引入词性级别的注意力机制提高实体识别准确率,并以可视化的形式构建知识产权科技资源画像.相比于现有方法,所提出的面向知识产权的科技资源画像构建方法具有以下优势:1)该算法利用词语的词性信息学习句子语义层面的含义,并融合注意力机制,以有监督的方式避免语义理解中的歧义;2)该模型能够智能自动地完成科技数据获取、命名实体识别、科技资源画像构建;3)大量实验结果表明,所提方法利用词语的词性进行有监督学习,在命名实体识别任务中综合性能优于对比算法.
[Key word]
[Abstract]
In the era of big data, intellectual-property-oriented scientific and technological resources show trends such as large data scale, high timeliness, and low value density, which poses severe challenges for the effective use of intellectual property resources. At the same time, the demand for the mining of hidden information in intellectual property rights is increasing in various countries, making the construction of intellectual-property-oriented scientific and technological resource portraits a current research hotspot. This study aim at building a portrait of intellectual property through intelligent data acquisition, entity recognition and visualization. However, the existing methods for constructing scientific and technological resource portraits are only suitable for structured data and ignore the impact of words’ part of speech on the semantic understanding of sentences. Therefore, a novel algorithm is proposed for the construction of intellectual-property-oriented portraits of scientific and technological resources. Regarding the automatically acquired intellectual property resources, attention mechanism of part-of-speech level is introduced to improve the accuracy of entity recognition, and intellectual-property-oriented scientific and technological resource portraits are visually constructed. Compared with the existing methods, the proposed method has the following advantages: 1) This utilizes the part-of-speech information of words to learn the semantic meaning of sentences, and integrates the attention mechanism to avoid ambiguities in semantic understanding in a supervised way. 2) This model can intelligently and automatically complete sci-tech data acquisition, named entity recognition, and construction of scientific and technological resource portraits. 3) Extensive experiments demonstrate that our method performs better than baselines in named entity recognition by utilizing the part of speech of words for supervised learning.
[中图分类号]
[基金项目]
国家重点研发计划(2018YFB1402600);国家自然科学基金(61772083,61802028);广西科技重大专项(桂科AA18118054)