
Supported by the National Natural Science Foundation of China under Grant Nos.60803093, 60675034 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant No.2008AA01Z144 (国家高技术研究发展计划(863))

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [94]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论



    This paper surveys the state-of-the-art research on paraphrasing in natural language processing, including the applications, the acquisition of resources, the generation, and the evaluation of paraphrases, as well as some closely related topics. This paper aims to make a summary, comparison and analysis of the mainstream methods and the latest progress in the field, expecting to be helpful to the future research.

    [1] Barzilay R, McKeown KR. Extracting paraphrases from a parallel corpus. In: Proc. of the ACL/EACL. Morristown: Association for Computational Linguistics, 2001. 50?57.
    [2] Rinaldi F, Dowdall J, Kaljurand K, Hess M, MolláD. Exploiting paraphrases in a question answering system. In: Proc. of the IWP. Morristown: Association for Computational Linguistics, 2003. 25?32.
    [3] Boonthum C. iSTART: Paraphrase recognition. In: Proc. of the ACL 2004 Workshop on Student Research. Morristown: Association for Computational Linguistics, 2004. 31?36.
    [4] Zhao SQ, Wang HF, Liu T, Li S. Pivot approach for extracting paraphrase patterns from bilingual corpora. In: Proc. of the ACL 2008: HLT. Morristown: Association for Computational Linguistics, 2008. 780?788.
    [5] Zong CQ, Zhang YJ, Yamamoto K, Sakamoto M, Shirai S. Approach to spoken Chinese paraphrasing based on feature extraction. In: Proc. of the NLPRS. 2001. 551?556.
    [6] Zong CQ, Zhang YJ, Yamamoto K, Sakamoto M, Shirai S. Paraphrasing Chinese utterances in spoken language translation system. In: Proc. of the ICCC. 2001. 395?401 (in Chinese with English abstract).
    [7] Li WG, Liu T, Zhang Y, Li S, He W. Automated generalization of phrasal paraphrases from the Web. In: Proc. of the IWP. 2005. 49?56.
    [8] Liu T, Li WG, Zhang Y, Li S. 2006. Survey on paraphrasing technology. Journal of Chinese Information Processing, 2006,40(4): 25?33 (in Chinese with English abstract).
    [9] Li WG. Research on Chinese paraphrase example and paraphrase template extraction [Ph.D. Thesis]. Harbin: Harbin Institute of Technology, 2008 (in Chinese with English abstract).
    [10] Zhao SQ, Niu C, Zhou M, Liu T, Li S. Combining multiple resources to improve SMT-based paraphrasing model. In: Proc. of the ACL 2008: HLT. Morristown: Association for Computational Linguistics, 2008. 1021?1029.
    [11] Zhao SQ, Zhou M, Liu T. Learning question paraphrases for QA from Encarta logs. In: Proc. of the IJCAI. Menlo Park: AAAI Press, 2007. 1796?1800.
    [12] Zhao SQ, Liu T, Yuan XC, Li S, Zhang Y. Automatic acquisition of context-specific lexical paraphrases. In: Proc. of the IJCAI. Menlo Park: AAAI Press, 2007. 1789?1794.
    [13] Mitamura T, Nyberg E. Automatic rewriting for controlled language translation. In: Proc. of the NLPRS. 2001. 1?12.
    [14] Yamamoto K. Machine translation by interaction between paraphraser and transfer. In: Proc. of the COLING. Morristown: Association for Computational Linguistics, 2002. 1107?1113.
    [15] Zhang YJ, Yamamoto K. Paraphrasing of Chinese utterances. In: Proc. of the COLING. Morristown: Association for Computational Linguistics, 2002. 1163?1169.
    [16] Shimohata M, Sumita E, Y Matsumoto. Building a paraphrase corpus for speech translation. In: Proc. of the LREC. Paris: ELRA, 2004. 1407?1410.
    [17] Callison-Burch C, Koehn P, Osborne M. Improved statistical machine translation using paraphrases. In: Proc. of the HLT-NAACL. Morristown: Association for Computational Linguistics, 2006. 17?24.
    [18] Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: A method for automatic evaluation of machine translation. In: Proc. of the ACL. Morristown: Association for Computational Linguistics, 2002. 311?318.
    [19] Kauchak D, Barzilay R. Paraphrasing for automatic evaluation. In: Proc. of the HLT-NAACL. Morristown: Association for Computational Linguistics, 2006. 455?462.
    [20] Zhou L, Lin CY, Hovy E. Re-Evaluating machine translation results with paraphrase support. In: Proc. of the EMNLP. Morristown: Association for Computational Linguistics, 2006. 77?84.
    [21] Lepage Y, Denoual E. Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation. In: Proc. of the IWP. 2005. 57?64.
    [22] Kanayama H. Paraphrasing rules for automatic evaluation of translation into Japanese. In: Proc. of the IWP. Morristown: Association for Computational Linguistics, 2003. 88?93.
    [23] Madnani N, Ayan NF, Resnik P, Dorr BJ. Using paraphrases for parameter tuning in statistical machine translation. In: Proc. of the 2nd Workshop on Statistical Machine Translation. Morristown: Association for Computational Linguistics, 2007. 120?127.
    [24] McKeown KR. Paraphrasing using given and new information in a question-answer system. In: Proc. of the ACL. Morristown:Association for Computational Linguistics, 1979. 67?72.
    [25] Duboue PA, Chu-Carroll J. Answering the question you wish they had asked: The impact of paraphrasing for question answering. In: Proc. of the HLT-NAACL. Morristown: Association for Computational Linguistics, 2006. 33?36.
    [26] Ravichandran D, Hovy E. Learning surface text patterns for a question answering system. In: Proc. of the ACL. Morristown: Association for Computational Linguistics, 2002. 41?47.
    [27] Hermjakob U, Echihabi A, Marcu D. Natural language based reformulation resource and Web exploitation for question answering. In: Proc. of the TREC. 2002.
    [28] Duclaye F, Yvon F. Learning paraphrases to improve a question-answering system. In: Proc. of the EACL Workshop on NLP for Question Answering. 2003.
    [29] Shinyama Y, Sekine S, Sudo K. Automatic paraphrase acquisition from news articles. In: Proc. of the HLT. San Francisco: Morgan Kaufmann Publishers Inc., 2002. 40?46.
    [30] Sekine S. Automatic paraphrase discovery based on context and keywords between NE pairs. In: Proc. of the IWP. 2005. 80?87.
    [31] Sekine S. On-demand information extraction. In: Proc. of the ACL. Morristown: Association for Computational Linguistics, 2006. 731?738.
    [32] Romano L, Kouylekov M, Szpektor I, Dagan I, Lavelli A. Investigating a generic paraphrase-based approach for relation extraction. In: Proc. of the EACL. Morristown: Association for Computational Linguistics, 2006. 409?416.
    [33] Bhagat R, Ravichandran D. Large scale acquisition of paraphrases for learning surface patterns. In: Proc. of the ACL-08: HLT. Morristown: Association for Computational Linguistics, 2008. 674?682.
    [34] Zukerman I, Raskutti B. Lexical query paraphrasing for document retrieval. In: Proc. of the COLING. Morristown: Association for Computational Linguistics, 2002. 1?7.
    [35] McKeown KR, Barzilay R, Evans D, Hatzivassiloglou V, Klavans JL, Nenkova A, Sable C, Schiffman B, Sigelman S. Tracking and summarizing news on a daily basis with Columbia’s newsblaster. In: Proc. of the HLT. San Francisco: Morgan Kaufmann Publishers Inc., 2002. 280?285.
    [36] Zhou L, Lin CY, Munteanu DS, Hovy E. ParaEval: Using paraphrases to evaluate summaries automatically. In: Proc. of the HLT-NAACL. Morristown: Association for Computational Linguistics, 2006. 447?454.
    [37] Iordanskaja L, Kittredge R, Polguère A. Lexical selection and paraphrase in a meaning-text generation model. In: Paris CL, Swartout WR, Mann WC, eds. Natural Language Generation in Artificial Intelligence and Computational Linguistics. 1991. 293?312.
    [38] Knight K, Chander I. Automated postediting of documents. In: Proc. of the AAAI. Menlo Park: AAAI Press, 1994. 779?784.
    [39] Carroll J, Minnen G, Pearce D, Canning Y, Devlin S, Tait J. Simplifying text for language-impaired readers. In: Proc. of the EACL. Morristown: Association for Computational Linguistics, 1999. 269?270.
    [40] Bolshakov IA, Gelbukh A. Synonymous paraphrasing using WordNet and Internet. In: Proc. of the NLDB. Berlin, Heidelberg: Springer-Verlag, 2004. 312?323.
    [41] Uzuner ?, Katz B, Nahnsen T. Using syntactic information to identify plagiarism. In: Proc. of the 2nd Workshop on Building Educational Applications Using NLP. Morristown: Association for Computational Linguistics, 2005. 37?44.
    [42] Ibrahim A, Katz B, Lin J. Extracting structural paraphrases from aligned monolingual corpora. In: Proc. of the IWP. Morristown: Association for Computational Linguistics, 2003. 57?64.
    [43] Pang B, Knight K, Marcu D. Syntax-based alignment of multiple translations: Extracting Paraphrases and Generating New Sentences. In: Proc. of the HLT-NAACL. Morristown: Association for Computational Linguistics, 2003. 102?109.
    [44] Barzilay R, Lee L. Learning to paraphrase: An unsupervised approach using multiple-sequence alignment. In: Proc. of the HLT-NAACL. Morristown: Association for Computational Linguistics, 2003. 16?23.
    [45] Dolan B, Quirk C, Brockett C. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: Proc. of the COLING. Morristown: Association for Computational Linguistics, 2004. 350?356.
    [46] Quirk C, Brockett C, Dolan W. Monolingual machine translation for paraphrase generation. In: Proc. of the EMNLP. Morristown: Association for Computational Linguistics, 2004. 142?149.
    [47] Brockett C, Dolan WB. Support vector machines for paraphrase identification and corpus construction. In: Proc. of the IWP. 2005. 1?8.
    [48] Dolan WB, Brockett C. Automatically constructing a corpus of sentential paraphrases. In: Proc. of the IWP. 2005. 9?16.
    [49] Finch A, Hwang YS, Sumita E. Using machine translation evaluation techniques to determine sentence-level semantic equivalence. In: Proc. of the IWP. Morristown: Association for Computational Linguistics, 2005. 17?24.
    [50] Hatzivassiloglou V, Klavans JL, Eskin E. Detecting text similarity over short passages: Exploring linguistic feature combinationsvia machine learning. In: Proc. of the EMNLP. Morristown: Association for Computational Linguistics, 1999. 203?212.
    [51] Wu DK. Recognizing paraphrases and textual entailment using inversion transduction grammars. In: Proc. of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment. Morristown: Association for Computational Linguistics, 2005. 25?30.
    [52] Brockett C, Dolan WB. Echo chamber: A game for eliciting a colloquial paraphrase corpus. In: Proc. of the KCVC. Menlo Park: AAAI Press, 2005. 8?15.
    [53] Lin DK. Automatic retrieval and clustering of similar words. In: Proc. of the COLING/ACL. Morristown: Association for Computational Linguistics, 1998. 768?774.
    [54] Kaji N, Kawahara D, Kurohash S, Sato S. Verb paraphrase based on case frame alignment. In: Proc. of the ACL. Morristown: Association for Computational Linguistics, 2002. 215?222.
    [55] Higashinaka R, Nagao K. Interactive paraphrasing based on linguistic annotation. In: Proc. of the COLING. Morristown: Association for Computational Linguistics, 2002. 1218?1222.
    [56] Takao K, Imamura K, Kashioka H. Comparing and extracting paraphrasing words with 2-way bilingual dictionaries. In: Proc. of the LREC. Paris: ELRA, 2002. 1016?1022.
    [57] Wu H, Zhou M. Synonymous collocation extraction using translation information. In: Proc. of the ACL. Morristown: Association for Computational Linguistics, 2003. 120?127.
    [58] Bannard C, Callison-Burch C. Paraphrasing with bilingual parallel corpora. In: Proc. of the ACL. Morristown: Association for Computational Linguistics, 2005. 597?604.
    [59] Harris ZS. Distributional structure. In: Martinet A, Weinreich U, eds. Linguistics Today. New York: Linguistic Circle of New York, 1954. 26?42.
    [60] Lin DK, Pantel P. Discovery of inference rules for question answering. Natural Language Engineering, 2001,7(4):343?360.
    [61] Brin S. Extracting patterns and relations from the World Wide Web. In: Proc. of the WebDB’98. Berlin, Heidelberg: Springer-Verlag, 1998. 172?183.
    [62] Pasca M, Dienes P. Aligning needles in a haystack: Paraphrase acquisition across the Web. In: Proc. of the IJCNLP. Berlin, Heidelberg: Springer-Verlag, 2005. 119?130.
    [63] Szpektor I, Tanev H, Dagan I, Coppola B. Scaling Web-based acquisition of entailment relations. In: Proc. of the EMNLP. Morristown: Association for Computational Linguistics, 2004. 41?48.
    [64] Takahashi T, Iwakura T, Iida R, Fujita A, Inui K. KURA: A transfer-based lexico-structural paraphrasing engine. In: Proc. of the NLPRS. 2001. 37?46.
    [65] Fujita A, Inui K. A class-oriented approach to building a paraphrase corpus. In: Proc. of the IWP. 2005. 25?32.
    [66] Power R, Scott D. Automatic generation of large-scale paraphrases. In: Proc. of the IWP. 2005. 73?79.
    [67] Fujita A, Inui K, Matsumoto Y. Exploiting lexical conceptual structure for paraphrase generation. In: Proc. of the IJCNLP. Berlin, Heidelberg: Springer-Verlag, 2005. 908?919.
    [68] Kozlowski R, McCoy KF, Vijay-Shanker K. Generation of single-sentence paraphrases from predicate/argument structure using lexico-grammatical resources. In: Proc. of the IWP. Morristown: Association for Computational Linguistics, 2003. 1?8.
    [69] Finch A, Watanabe T, Akiba Y, Sumita E. Paraphrasing as machine translation. Journal of Natural Language Processing, 2004, 11(5):87?111.
    [70] Callison-Burch C, Cohn T, Lapata M. ParaMetric: An automatic evaluation metric for paraphrasing. In: Proc. of the COLING. Morristown: Association for Computational Linguistics, 2008. 97?104.
    [71] Fujita A, Sato S. A probabilistic model for measuring grammaticality and similarity of automatically generated paraphrases of predicate phrases. In: Proc. of the COLING. Morristown: Association for Computational Linguistics, 2008. 225?232.
    [72] Glickman O, Dagan I. Identifying lexical paraphrases from a single corpus: A case study for verbs. In: Proc. of the RANLP. 2003.
    [73] Pantel P, Bhagat R, Coppola B, Chklovski T, Hovy E. ISP: Learning inferential selectional preferences. In: Proc. of the HLT-NAACL. Morristown: Association for Computational Linguistics, 2007. 564?571.
    [74] Szpektor I, Shnarch E, Dagan I. Instance-Based evaluation of entailment rule acquisition. In: Proc. of the ACL. Morristown: Association for Computational Linguistics, 2007. 456?463.
    [75] Dagan I, Glickman U. Probabilistic textual entailment: Generic applied modeling of language variability. In: Proc. of the PASCAL. 2004.
    [76] Dagan I, Glickman O, Magnini B. The PASCAL recognising textual entailment challenge. In: Proc. of the MLCW 2005. Berlin, Heidelberg: Springer-Verlag, 2006. 177?190.
    [77] Ferrandez O, Micol D, Munoz R, Palomar M. A perspective-based approach for solving textual entailment recognition. In: Proc. ofthe Workshop on Textual Entailment and Paraphrasing. Morristown: Association for Computational Linguistics, 2007. 66?71.
    [78] Wang R, Neumann G. Recognizing textual entailment using sentence similarity based on dependency tree skeletons. In: Proc. of the Workshop on Textual Entailment and Paraphrasing. Morristown: Association for Computational Linguistics, 2007. 36?41.
    [79] Malakasiotis P, Androutsopoulos I. Learning textual entailment using SVMs and string similarity measures. In: Proc. of the Workshop on Textual Entailment and Paraphrasing. Morristown: Association for Computational Linguistics, 2007. 42?47.
    [80] Ferres D, Rodriguez H. Machine learning with semantic-based distances between sentences for textual entailment. In: Proc. of the Workshop on Textual Entailment and Paraphrasing. Morristown: Association for Computational Linguistics, 2007. 60?65.
    [81] Montejo-Ráez A, Perea JM, Martínez-Santiago F, García-Cumbreras Má, Martín-Valdivia M, Ure?a-López A. Combining lexical-syntactic information with machine learning for recognizing textual entailment. In: Proc. of the Workshop on Textual Entailment and Paraphrasing. Morristown: Association for Computational Linguistics, 2007. 78?82.
    [82] Adams R, Nicolae G, Nicolae C, Harabagiu S. Textual entailment through extended lexical overlap and lexico-semantic matching. In: Proc. of the Workshop on Textual Entailment and Paraphrasing. Morristown: Association for Computational Linguistics, 2007. 119?124.
    [83] Li BL, Irwin J, Garcia EV, Ram A. Machine learning based semantic inference: Experiments and Observations at RTE-3. In: Proc. of the Workshop on Textual Entailment and Paraphrasing. Morristown: Association for Computational Linguistics, 2007. 159?164.
    [84] Tatu M, Moldovan D. COGEX at RTE3. In: Proc. of the Workshop on Textual Entailment and Paraphrasing. Morristown: Association for Computational Linguistics, 2007. 22?27.
    [85] Hickl A, Bensley J. A discourse commitment-based framework for recognizing textual entailment. In: Proc. of the Workshop on Textual Entailment and Paraphrasing. Morristown: Association for Computational Linguistics, 2007. 171?176.
    [86] McCarthy D, Navigli R. SemEval-2007 Task 10: English lexical substitution task. In: Proc. of the SemEval 2007. Morristown: Association for Computational Linguistics, 2007. 48?53.
    [87] Hassan S, Csomai A, Banea C, Sinha R, Mihalcea R. UNT: SubFinder: Combining knowledge sources for automatic lexical substitution. In: Proc. of the SemEval 2007. Morristown: Association for Computational Linguistics, 2007. 410?413.
    [88] Giuliano C, Gliozzo A, Strapparava C. FBK-irst: Lexical substitution task exploiting domain and syntagmatic coherence. In: Proc. of the SemEval 2007. Morristown: Association for Computational Linguistics, 2007. 145?148.
    [89] Martinez D, Kim SN, Baldwin T. MELB-MKB: Lexical substitution system based on relatives in context. In: Proc. of the SemEval 2007. Morristown: Association for Computational Linguistics, 2007. 237?240.
    [90] Zhao SQ, ZhaoL, Zhang Y, Liu T, Li S. HIT: Web based scoring method for English lexical substitution. In: Proc. of the SemEval 2007. Morristown: Association for Computational Linguistics, 2007. 173?176.
    [91] Brants T, Franz A. Web 1T 5-gram Version 1. Technical Report, Philadelphia: Linguistic Data Consortium, 2006. 附中文参考文献:
    [6] 宗成庆,张玉洁,山本和英,坂本仁,白井谕.口语自动翻译系统中的汉语语句改写.见:中文计算国际会议(ICCC).2001.395?401.
    [8] 刘挺,李维刚,张宇,李生.复述技术研究综述.中文信息学报,2006,40(4):25?33.
    [9] 李维刚.中文复述实例与复述模板抽取技术研究[博士学位论文].哈尔滨:哈尔滨工业大学,2008.
    发 布


  • 点击次数:7826
  • 下载次数: 12820
  • HTML阅读次数: 0
  • 引用次数: 0
  • 收稿日期:2008-11-13
  • 最后修改日期:2009-01-15
版权所有:中国科学院软件研究所 京ICP备05046678号-3
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn

京公网安备 11040202500063号