基于去噪图自编码器的无监督社交媒体文本摘要
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

国家自然科学基金(62376192, 62376188)


Denoising Graph Auto-encoder for Unsupervised Social Media Text Summarization
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    社交媒体文本摘要旨在为面向特定话题的大规模社交媒体短文本(称为帖子)产生简明扼要的摘要描述. 考虑帖子表达内容短小、非正式等特点, 传统方法面临特征稀疏与信息不足的挑战. 近期研究利用帖子间的社交关系学习更好的帖子表示并去除冗余信息, 但其忽略了真实社交媒体情景中存在的不可靠噪声关系, 使得模型会误导帖子的重要性与多样性判断. 因此, 提出一种新颖的无监督模型DSNSum, 其通过去除社交网络中的噪声关系来改善摘要性能. 首先, 对真实社交关系网络中的噪声关系进行了统计验证; 其次, 根据社会学理论设计了两个噪声函数, 并构建了一种去噪图自编码器(Denoising Graph Auto-Encoder, DGAE), 以降低噪声关系的影响, 并学习融合了可信社交关系的帖子表示; 最终, 通过稀疏重构框架选择保持覆盖性、重要性及多样性的帖子构成一定长度的摘要. 在两个真实社交媒体(Twitter与新浪微博)共计22个话题上的实验结果证明了本文模型的有效性, 也为后续相关领域的研究提供了新的思路.

    Abstract:

    Social media text summarization aims to provide concise summaries for large-scale social media short texts (referred to as posts) targeting specific topics. Given the brief and informal contents of posts, traditional methods confront the challenges of sparse features and insufficient information. Recent research endeavors have leveraged social relationships among posts to refine post contents and remove redundant information, but these efforts neglect the presence of unreliable noise relationships in real social media contexts, leading to erroneous assessments of post importance and diversity. Therefore, this study proposes a novel unsupervised model DSNSum, which improves summarization performance by removing noise relationships in the social networks. Firstly, the noise relationships in real social relationship networks are statistically verified. Secondly, two noise functions are designed based on sociological theories, and a denoising graph auto-encoder (DGAE) is constructed to mitigate the influence of noise relationships and cultivate post contents of credible social relationships. Finally, a sparse reconstruction framework is utilized to select posts that maintain coverage, importance, and diversity to form a summary of a certain length. Experimental results on a total of 22 topics from two real social media platforms (Twitter and Sina Weibo) demonstrate the efficacy of the proposed model and provide new insights for subsequent research in related fields.

    参考文献
    相似文献
    引证文献
引用本文

贺瑞芳,赵堂龙,刘焕宇.基于去噪图自编码器的无监督社交媒体文本摘要.软件学报,,():1-22

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-07-05
  • 最后修改日期:2023-11-22
  • 录用日期:
  • 在线发布日期: 2024-06-20
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号