国家重点研发计划(2018YFC0830105, 2018YFC0830101, 2018YFC0830100); 云南省重大科技专项计划(202002AD080001); 云南省基础研究专项面上项目(202001AT070047, 202001AT070046)
微博评价对象识别是涉案网络舆情分析的基础. 目前基于主题表征的评价对象识别方法需要预设固定的主题数目, 且最终评价对象识别依赖人工推断. 针对此问题, 提出一种弱监督涉案微博评价对象识别方法, 仅采用少量标签评论即可实现对评价对象的自动识别. 具体实现思路为: 首先基于变分双主题表征网络对评论进行两次编码和重构, 获得丰富的主题特征; 然后, 利用少量标签评论, 引导主题表征网络自动判别评价对象类别; 最后采用联合训练策略, 对双主题表征的重构损失与评价对象分类损失进行联合调优, 最终实现对评价对象的自动分类和评价对象词项的挖掘. 在涉案舆情的两个数据集上进行了实验, 结果表明, 所提出的模型在评价对象分类、评价对象词项的主题连贯性和多样性等方面均优于几个基线模型.
The identification of opinion targets in microblog is the basis of analyzing network public opinion involved in cases. At present, the identification method of opinion targets based on topic representation needs to preset a fixed number of topics, and the final results rely on artificial inference. In order to solve these problems, this study proposes a weak supervision method, which only uses a small number of labelled comments to automatically identify the opinion targets in microblog. The specific implementation is as follows. Firstly, the comments are encoded and reconstructed twice based on the variational dual topic representation network to obtain rich topic features. Secondly, a small number of labelled comments are used to guide the topic representation network to automatically identify the opinion targets. Finally, the reconstruction loss of double topic representation and the classification loss of opinion targets identification are optimized together by the joint training strategy, to classify comments of opinion targets automatically and mine target terms. Experiments are carried out on two data sets of microblogs involved in cases. The results show that the proposed model outperforms several baseline models in the classification of opinion targets, topic coherence, and diversity of target terms.