基于文本摘要及引用关系的可视辅助文献阅读
作者:
基金项目:

国家社会科学基金(12&ZD213);国家科技支撑计划(2013BAK01B05,2014BAK09B04)


Visualization Guided Document Reading by Citation and Text Summarization
Author:
Fund Project:

National Social Science Foundation of China (12&ZD213); National Key Technology R&D Program of China (2013BAK01B05, 2014BAK09B04)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [29]
  • |
  • 相似文献
  • | | |
  • 文章评论
    摘要:

    近年来,科技论文发表数量与日俱增,科研人员需要阅读文献的数量也随之迅速增长.如何快速而有效地阅读一篇科技论文,逐渐成为一个重要的研究课题.另一方面,在阅读科技论文时,理解与其相关的重要参考文献可帮助读者更好地理解文章的内容.然而,如何从众多的参考文献中快速找到最重要、最相关的几篇,如何避免在阅读过程中迷失在文档的多维空间,仍是值得研究的问题.为了解决上述问题,提出了一个基于文本摘要和引用关系的可视辅助文献阅读系统.该系统利用一种基于阅读目的的文本摘要技术提取出论文中重要的句子,并采用多尺度的可视化方式进行展示;使用LDA(latent dirichlet allocation)话题模型抽取参考文献的核心话题;记录用户的阅读行为,用于提示其阅读上下文,以保证用户关注点不发生迷失.同时,在一个具体的案例场景中详细介绍了系统的使用方法,并进行了用户研究以验证系统的可用性.

    Abstract:

    With growing volume of publications in recent years, researchers have to read much more literatures. Therefore, how to read a scientific article in an efficient way becomes an importance issue. When reading an article, it's necessary to read its references in order to get a better understanding. However, how to differentiate between the relevant and non-relevant references, and how to stay in topic in a large document collection are still challenging tasks. This paper presents GUDOR (GUidedDOcument Reader), a visualization guided reader based on citation and summarization. It (1) extracts the important sentences from a scientific article with an objective-based summarization technique, and visualizes the extraction results by a multi-resolution method; (2) identifies the main topics of the references with a LDA (Latent Dirichlet Allocation) model; (3) tracks user's reading behavior to keep him or her focusing on the reading objective. In addition, the paper describes the functions and operations of the system in a usage scenario and validates its applicability by a user study.

    参考文献
    [1] Tenopir C, King DW, Edwards S, Wu L. Electronic journals and changes in scholarly article seeking and reading patterns. Aslib Proc.:New Information Perspectives, 2009,61(1):5-32.[doi:10.1108/00012530910932267]
    [2] Strobelt H, Oelke D, Rohrdantz C, Stoffel A, Keim DA, Deussen O. Document cards:A top trumps visualization for documents. IEEE Trans. on Visualization & Computer Graphics, 2009,15(6):1145-1152.[doi:10.1109/TVCG.2009.139]
    [3] Dunne C, Shneiderman B, Gove R, Klavans J, Dorr B. Rapid understanding of scientific paper collections:Integrating statistics, text analytics, and visualization. Journal of the American Society for Information Science & Technology, 2012,63(12):2351-2369.[doi:10.1002/asi.22652]
    [4] Hoang VCD, Kan MY. Towards automated related work summarization. In:Aravind KJ, ed. Proc. of the 23rd Int'l Conf. on Computational Linguistics:Posters. Stroudsburg:Association for Computational Linguistics, 2010. 427-435.
    [5] Bergstrom P, Whitehead EJ. CircleView:Scalable visualization and navigation of citation networks. In:Proc. of the 2006 Symp. on Interactive Visual Information Collections and Activity IVICA. Texas:Citeseer, 2006. e4806.
    [6] Chou JK, Yang CK. PaperVis:Literature review made easy. Computer Graphics Forum, 2011,30(3):721-730.[doi:10.1111/j.1467-8659.2011.01921.x]
    [7] Lehmann S, Schwanecke U, Ralf D. Interactive visualization for opportunistic exploration of large document collections. Information Systems, 2010,35(2):260-269.[doi:10.1016/j.is.2009.10.004]
    [8] Paris C, Wan S. Capturing the user's reading context for tailoring summaries. In:Houben GJ, McCalla G, eds. Proc. of the 17th Int'l Conf. on User Modeling, Adaptation, and Personalization:Formerly UM and AH. Heidelberg:Springer-Verlag, 2009. 337-342.[doi:10.1007/978-3-642-02247-0_33]
    [9] Viegas FB, Wattenberg M. TIMELINES:Tag clouds and the case for vernacular visualization. Interactions, 2008,15(4):49-52.[doi:10.1145/1374489.1374501]
    [10] Viegas FB, Wattenberg M, Feinberg J. Participatory visualization with wordle. IEEE Trans. on Visualization & Computer Graphics, 2009,15(6):1137-1144.[doi:10.1109/TVCG.2009.171]
    [11] Paley WB. TextArc:Showing word frequency and distribution in text. In:Proc. of the Poster at IEEE Symp. on Information Visualization. 2002.
    [12] Collins C, Carpendale S, Penn G. DocuBurst:Visualizing document content using language structure. Computer Graphics Forum, 2009,28(3):1039-1046.[doi:10.1111/j.1467-8659.2009.01439.x]
    [13] Stoffel A, Strobelt H, Deussen O, Keim DA. Document thumbnails with variable text scaling. Computer Graphics Forum, 2012, 31(3pt3):1165-1173.[doi:10.1111/j.1467-8659.2012.03109.x]
    [14] Koch S, John M, Worner M, Muller A, Ertl T. VarifocalReader:In-depth visual analysis of large text documents. IEEE Trans. on Visualization and Computer Graphics, 2014,20(12):1723-1732.[doi:10.1109/TVCG.2014.2346677]
    [15] Chen C. CiteSpace II:Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for information Science and Technology, 2006,57(3):359-377.[doi:10.1002/asi.20317]
    [16] Schafer U, Kasterka U. Scientific authoring support:A tool to navigate in typed citation graphs. In:Piotrowski M, Mahlow C, eds. Proc. of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing:Writing Processes and Authoring Aids (CL&W 2010). Stroudsburg:Association for Computational Linguistics, 2010. 7-14.
    [17] Teufel S. Argumentative zoning:Information extraction from scientific text university of Edinburgh[Ph.D. Thesis]. University of Edinburgh, 1999.
    [18] Teufel S, Kan MY. Robust argumentative zoning for sensemaking in scholarly documents. In:Bernardi R, Chambers S, eds. Proc. of the Advanced Language Technologies for Digital Libraries. Heidelberg:Springer-Verlag, 2011. 154-170.[doi:10.1007/978-3-642-23160-5_10]
    [19] Teufel S, Moens M. Summarizing scientific articles:Experiments with relevance and rhetorical status. Computational Linguistics, 2002,28(4):409-445.[doi:10.1162/089120102762671936]
    [20] Mei Q, Zhai C. Generating impact-based summaries for scientific literature. In:Kathleen M, ed. Proc. of the ACL 2008:HLT. Columbus:Association for Computational Linguistics, 2008. 816-824.
    [21] Bhaskar P, Nongmeikapam K, Bandyopadhyay S. Keyphrase extraction in scientific articles:A supervised approach. In:Martin K, Christian B, eds. Proc. of the COLING 2012:Demonstration Papers. Mumbai:The COLING 2012 Organizing Committee, 2012. 17-24.
    [22] Nguyen TD, Kan MY. Keyphrase extraction in scientific publications. In:Goh DH, Cao TH, eds. Proc. of the 10th Int'l Conf. on Asian Digital Libraries:Looking Back 10 Years and Forging New Frontiers (ICADL 2007). Heidelberg:Springer-Verlag, 2007. 317-326.[doi:10.1007/978-3-540-77094-7_41]
    [23] Nguyen TD, Luong MT. Wingnus:Keyphrase extraction utilizing document logical structure. In:Katrin E, Carlo S, eds. Proc. of the 5th Int'l Workshop on Semantic Evaluation (SemEval 2010). Stroudsburg:Association for Computational Linguistics, 2010. 166-169.
    [24] Ribaupierre HD, Falquet G. New trends for reading scientific documents. In:Kazai G, Eickhoff C, eds. Proc. of the 4th ACM Workshop on Online Books, Complementary Social Media and Crowdsourcing. New York:ACM Press, 2011. 19-24.[doi:10.1145/2064058.2064064]
    [25] Nicholas D, Huntington P, Jamali HR, Dobrowolski T. Characterising and evaluating information seeking behaviour in a digital environment:Spotlight on the bouncer. Information Processing & Management, 2007,43(4):1085-1102.[doi:10.1016/j.ipm.2006. 08.007]
    [26] ICEpdf. http://www.icesoft.org/java/home.jsf
    [27] Noonburg D. Xpdf. 2002. http://www.foolabs.com/xpdf
    [28] MEAD. 2006. http://www.summarization.com/mead/
    [29] Blei D, Ng A, Jordan M. Latent dirichlet allocation. Journal of Machine Learning Research, 2003,3:993-1022.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张加万,杨思琪,李泽宇,杨伟强,王锦东,贺瑞芳,黄茂林.基于文本摘要及引用关系的可视辅助文献阅读.软件学报,2016,27(5):1163-1173

复制
分享
文章指标
  • 点击次数:5183
  • 下载次数: 6384
  • HTML阅读次数: 2617
  • 引用次数: 0
历史
  • 收稿日期:2015-07-30
  • 最后修改日期:2015-11-09
  • 在线发布日期: 2016-05-06
文章二维码
您是第19894497位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号