Macro Discourse Structure Representation Schema and Corpus Construction
Author:
Affiliation:

Clc Number:

TP18

Fund Project:

National Natural Science Foundation of China (61773276, 61673290, 61836007)

  • Article
  • | |
  • Metrics
  • |
  • Reference [56]
  • |
  • Related [20]
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Discourse structure analysis is an important research topic in natural language processing. Discourse structure analysis not only helps to understand the discourse structure and semantics, but also provides strong support for deep applications of natural language processing, such as automatic summarization, information extraction, question answering, etc. At present, the analysis of discourse structure is mainly concentrated on the micro level. The analysis focuses on the relations and structures between sentences or sentences groups, while the analysis on macro level is less. Therefore, this study takes discourse structure as the research object, and focuses on the construction of representation schema and corpus resources on the macro level. This study discusses the importance of discourse structure analysis, expounds the research status of discourse structure analysis from three aspects, namely, theory system, corpora resource, and computing model, and puts forward the macro-micro unified discourse structure representation framework with the primary-secondary relation as the carrier. Furthermore, this study constructs the logical semantic structure and functional pragmatic structure of macro discourse level respectively. On this basis, this study annotates a macro Chinese discourse structure corpus, consisting of 720 newswire articles, and analyzes the results of the annotations in consistency and statistical data.

    Reference
    [1] Atkinson J, Munoz R. Rhetorics-based multi-document summarization. Expert Systems with Applications, 2013,40(11):4346-4352.
    [2] Ferreira R, de Souza Cabral L, Freitas F, Lins, RD, de França Silva G, Simske SJ, Favaro L. A multi-document summarization system based on statistics and linguistic treatment. Expert Systems with Applications, 2014,41(13):5780-5787.
    [3] Cohan A, Goharian N. Scientific article summarization using citation-context and article's discourse structure. In:Proc. of the Conf. on Empirical Methods in Natural Language Processing. Stroudsburg:ACL, 2015. 390-400.
    [4] Meyer T, Popescu-Belis A. Using sense-labeled discourse connectives for statistical machine translation. In:Proc. of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation and Hybrid Approaches to Machine Translation. Stroudsburg:ACL, 2012. 129-138.
    [5] Guzmán F, Joty S, Màrquez L, Nakov P. Using discourse structure improves machine translation evaluation. In:Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg:ACL, 2014. 687-698.
    [6] Peldszus A, Stede M. Joint prediction in MST-style discourse parsing for argumentation mining. In:Proc. of the Conf. on Empirical Methods in Natural Language Processing. Stroudsburg:ACL, 2015. 938-948.
    [7] Presutti V, Draicchio F, Gangemi A. Knowledge extraction based on discourse representation theory and linguistic frames. In:Proc. of the Int'l Conf. on Knowledge Engineering and Knowledge Management. Berlin:Springer-Verlag, 2012. 114-129.
    [8] Zou B, Zhou G, Zhu Q. Negation focus identification with contextual discourse information. In:Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg:ACL, 2014. 522-530.
    [9] Liakata M, Dobnik S, Saha S, Batchelor C, Rebholz-Schuhmann D. A discourse-driven content model for summarizing scientific articles evaluated in a complex question answering task. In:Proc. of the Conf. on the Empirical Methods in Natural Language Processing. Stroudsburg:ACL, 2013. 747-757.
    [10] Chu X, Zhu Q, Zhou G. Discourse primary-secondary relationships in natural language processing. Chinese Journal of Computers, 2017,40(4):842-860(in Chinese with English abstract).
    [11] Li Y, Feng W, Sun J, Kong F, Zhou G. Building Chinese discourse corpus with connective-driven dependency tree structure. In:Proc. of the Conf. on Empirical Methods in Natural Language Processing. Stroudsburg:ACL, 2014. 2105-2114.
    [12] Xue N, Chiou FD, Palmer M. Building a large-scale annotated Chinese corpus. In:Proc. of the 19th Int'l Conf. on Computational Linguistics. Vol.1. Stroudsburg:ACL, 2002. 1-8.
    [13] Zhou Y, Xue N. The Chinese discourse treebank:A Chinese corpus annotated with discourse relations. Language Resources and Evaluation, 2015,49(2):397-431.
    [14] Halliday MAK, Hasan R. Cohesion in English. Longman, 1976.
    [15] Hobbs JR. On the coherence and structure of discourse. In:Proc. of the Center for the Study of Language and Information. 1985. 1-36.
    [16] Hobbs JR. Coherence and coreference. Cognitive Science, 1979,3(1):67-90.
    [17] Mann WC, Thompson SA. Relational propositions in discourse. Discourse Processing, 1986,9(1):57-90.
    [18] Mann WC, Thompson SA. Rhetorical structure theory:A theory of text organization. Technical Report, ISI/RS-87-190, Information Sciences Institute, University of Southern California, 1987.
    [19] Mann WC, Matthiessen C, Thompson SA. Rhetorical structure theory and text analysis. In:Proc. of the Discourse Description:Diverse Linguistic Analysis of a Fund-raising Text. 1992. 39-78.
    [20] Xue N. Annotating discourse connectives in the Chinese treebank. In:Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics. Stroudsburg:ACL, 2005. 84-91.
    [21] Prasad R, Dinesh N, Lee A, Miltsakaki E, Robaldo L, Joshi AK, Webber BL. The Penn discourse treebank 2.0. In:Proc. of the Language Resources and Evaluation Conf. Berlin:Springer-Verlag, 2008. 2961-2968.
    [22] Zhou Y, Xue N. PDTB-style discourse annotation of Chinese text. In:Proc. of the Association for Computational Linguistics. Stroudsburg:ACL, 2012. 69-77.
    [23] Grosz BJ, Sidner CL. Attention, intentions, and the structure of discourse. Computational Linguistics, 1986,12(3):175-204.
    [24] Grosz BJ, Weinstein S, Joshi A. Centering:A framework for modeling the local coherence of discourse. Computational Linguistics, 1995,21(2):203-225.
    [25] Wu WZ, Tian XL. The Chinese Sentence Group. Beijing:The Commercial Press, 2000(in Chinese).
    [26] Xing FY. Research on Chinese Complex Sentence. Beijing:The Commercial Press, 2001(in Chinese).
    [27] Yao SY. A research on the collocation of the relation markers of Chinese compound sentences and some relevant explanation[Ph.D. Thesis]. Wuhan:Central China Normal University, 2006(in Chinese with English abstract).
    [28] Li YC. Research of Chinese discourse structure representation and resource construction[Ph.D. Thesis]. Suzhou:Soochow University, 2015(in Chinese with English abstract).
    [29] Hoey M. On the Surface of Discourse. Buckley:George Aller, and Unwin Publisher, Ltd., 1983. 93-129.
    [30] Martin JR, Rose D. Working with Discourse:Meaning Beyond the Clause. London:Continuum, 2003.
    [31] van Dijk TA. Macrostructure:An Interdisciplinary Study of Global Structures in Discourse, Interaction, and Cognition. Hillsdale:Lawrence Erlbaum Associates, Inc., 1980.
    [32] van Dijk TA. Text and Context:Explorations in the Semantics and Pragmatics of Discourse. London:Longman, 1977.
    [33] van Dijk TA, Kintsch W. Strategies of Discourse Comprehension. New York:Academic Press, 1983.
    [34] van Dijk TA. Handbook of discourse analysis. In:Proc. of the Discourse and dialogue. Academic Press, 1985.
    [35] van Dijk TA. News as Discourse. Hillsdale:Lawrence Erlbaum Associates, Inc., Publishers, 1988.
    [36] Carlson L, Marcu D, Okurowski ME. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In:Proc. of the Current and New Directions in Discourse and Dialogue. Springer Netherlands, 2003. 85-112.
    [37] Prasad R, Miltsakaki E, Dinesh N, Lee A, Joshi A, Robaldo L, Webber BL. The Penn discourse treebank 2.0 annotation manual. 2007. https://www.seas.upenn.edu/~pdtb/PDTBAPI/pdtb-annotation-manual.pdf
    [38] Yue M. Rhetorical structure annotation of Chinese news commentaries. Journal of Chinese Information Processing, 2008,22(4):19-23(in Chinese with English abstract).
    [39] Soricut R, Marcu D. Sentence level discourse parsing using syntactic and lexical information. In:Proc. of the 2003 Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL-HLT). 2003. 149-156.
    [40] Hernault H, Prendinger H, Ishizuka M. HILDA:A discourse parser using support vector machine classification. Dialogue & Discourse, 2010,1(3):1-33.
    [41] Joty S, Carenini G, Ng RT, Mehdad Y. Combining intra-and multi-sentential rhetorical parsing for document-level discourse analysis. In:Proc. of the 51st Annual Meeting of Association for Computational Linguistics. Stroudsburg:ACL, 2013. 486-496.
    [42] Joty S, Carenini G, Ng RT. A novel discriminative framework for sentence-level discourse analysis. In:Proc. of the Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Stroudsburg:ACL, 2012. 904-915.
    [43] Feng VW, Hirst G. A linear-time bottom-up discourse parser with constraints and post-editing. In:Proc. of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:ACL, 2014. 511-521.
    [44] Ji Y, Eisenstein J. Representation learning for text-level discourse parsing. In:Proc. of the 52th Annual Meeting of the Association for Computational Linguistics (ACL). 2014. 13-24.
    [45] Sporleder C, Lascarides A. Combining hierarchical clustering and machine learning to predict high-level discourse structure. In:Proc. of the 20th Int'l Conf. on Computational Linguistics. 2004. 43-49.
    [46] Chu X, Wang Z, Zhu Q, Zhou G. Recognizing nuclearity between Chinese discourse units. In:Proc. of the 19th Int'l Conf. on Asian Language Processing. IEEE, 2015. 197-200.
    [47] Fairclough N. Media Discourse. London:Edward Arnold, 1995.
    [48] Bell A, Garrett PD. Approaches to Media Discourse. Oxford:Wiley-Blackwell, 1998.
    [49] Marcu D, Amorrortu E, Romera M. Experiments in constructing a corpus of discourse trees. In:Proc. of the ACL'99 Workshop on Standards and Tools for Discourse Tagging. 1999. 48-57.
    附中文参考文献:
    [10] 褚晓敏,朱巧明,周国栋.自然语言处理中的篇章主次关系研究.计算机学报,2017,40(4):842-860.
    [25] 吴为章,田小琳.汉语句群.北京:商务印书馆,2000.
    [26] 郉福义.汉语复句研究.北京:商务印书馆,2001.
    [27] 姚双云.复句关系标记的搭配研究及相关解释[博士学位论文].武汉:华中师范大学,2006.
    [28] 李艳翠.汉语篇章结构表示体系及资源构建研究[博士学位论文].苏州:苏州大学,2013.
    [38] 乐明.汉语篇章修辞结构的标注研究.中文信息学报,2008,22(4):19-23.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

褚晓敏,奚雪峰,蒋峰,徐昇,朱巧明,周国栋.宏观篇章结构表示体系和语料建设.软件学报,2020,31(2):321-343

Copy
Share
Article Metrics
  • Abstract:2494
  • PDF: 5042
  • HTML: 1905
  • Cited by: 0
History
  • Received:January 09,2018
  • Revised:April 19,2019
  • Online: August 12,2019
  • Published: February 06,2020
You are the first2038008Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063