基于二元分类的复述搭配抽取
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

Supported by the National Natural Science Foundation of China under Grant Nos.60803093, 60675034 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant No.2008AA01Z144 (国家高技术研究发展计划(863))


Paraphrase Collocation Extraction Based on Binary Classification
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    以动宾关系的搭配为例研究复述搭配的抽取.具体地,该方法将复述搭配抽取视作二元分类问题,并综合使用了基于翻译、词典、极性词以及网络挖掘的多种特征.实验结果表明,所采用的二元分类方法对于抽取复述搭配是行之有效的,其中使用的各种特征对于提高复述搭配抽取的效果皆有帮助.利用该方法,共抽取出28万余对的复述搭配,其准确率超过70%.进一步的实验结果表明,使用抽取的复述搭配,可以为约40%的句子实现复述生成,从而说明了该方法的实际应用价值.

    Abstract:

    This paper addresses the problem of paraphrase collocation extraction by using “OBJ” relationship as a case study. Specifically, the proposed method recasts paraphrase collocation extraction as a binary classification problem, which combines multiple features based on translation, thesaurus, polarity words, and web mining. Experimental results show that the binary classification-based method is effective for paraphrase collocation extraction. Especially, the exploited features are all helpful for improving the extraction performance. With the proposed method, more than 280 000 pairs of paraphrase collocations are extracted, the precision of which is above 70%. Further experiments show that nearly 40% of sentences can be paraphrased by using the extracted paraphrase collocations, which demonstrates that the proposed method is useful in practice.

    参考文献
    相似文献
    引证文献
引用本文

赵世奇,赵 琳,刘 挺,李 生.基于二元分类的复述搭配抽取.软件学报,2010,21(6):1267-1276

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:2009-01-15
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号