一种特征匹配方法:稀疏特征树
作者:
基金项目:

Supported by the National Natural Science Foundation of China under Grant No.60435020 (国家自然科学基金)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [19]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    在大规模或实时环境要求下,机器学习算法的计算效率非常重要.描述了用于最大熵模型执行系统的一种高效的数据结构及其相关的生成和查找算法.这种数据结构称为稀疏特征树,用于表示特征集合,以提高特征查找(或特征匹配)的速度,从而提高概率计算和执行系统的速度.基本短语识别和词性标注的实验显示,这种新的数据结构的确能够极大地加快最大熵方法执行系统的速度,同时保持空间复杂度不变.

    Abstract:

    Computational efficiency is an important concern for machine learning algorithms, especially for applications on large test sets or in real-time scenarios. In this paper, a novel data structure and the corresponding algorithms for the execution system of the maximum entropy model are described. This data structure, called sparse feature tree, is used to represent the feature set to speed up the process of feature search (or feature matching), so that speed up the process of probability calculation and execution system. Experiments on chunking recognition and Part-of-Speech tagging are conducted to show that the new data structure greatly speeds up the feature matching process while keeping the same space complexity.

    参考文献
    [1]Pietra SD,Pietra VD,Mercer RL,Roukos S.Adaptive language modeling using minimum discriminant estimation.In:Proc.of the DARPA Workshop on Speech and Natural Language.San Mateo:Morgan Kaufmann,1992.103-106.http://www.cs.mu.oz.au/acl/H/H92/H92-1020.pdf
    [2]Berger AL,Pietra SAD,Pietra VJD.A maximum entropy approach to natural language processing.Computational Linguistic,1996,22(1):39-71.
    [3]Rosenfeld R.Adaptive statistical language modeling:A maximum entropy approach[Ph.D.Thesis].Pittsburgh:Carnegie Mellon University,1994.
    [4]Rosenfeld R.A maximum entropy approach to adaptive statistical language modeling.Computer,Speech and Language,1996,10:187-228.
    [5]Ratnaparkhi A.Maximum entropy models for natural language ambiguity resolution[Ph.D.Thesis].Philadelphia:University of Pennsylvania,1998.
    [6]Ratnaparkhi A,Reynar J,Roukos S.A maximum entropy model for prepositional phrase attachment.In:Proc.of the ARPA Workshop on Human Language Technology.San Mateo:Morgan Kaufmann,1994.250-255.http://acl.ldc.upenn.edu//H/H94/H94-1048.pdf
    [7]Ratnaparkhi A.A maximum entropy model for part-of-speech tagging.In:Brill E,Church K,eds.Proc.of the Conf.on Empirical Methods in Natural Language Processing.Somerset:Association for Computational Linguistics,1996.133-142.
    [8]Reynar JC,Ratnaparkhi A.A maximum entropy approach to identifying sentence boundaries.In:Proc.of the 5th Conf.on Applied Natural Language Processing.San Mateo:Morgan Kaufmann,1997.16-19.http://acl.ldc.upenn.edu/A/A97/A97-1004.pdf
    [9]Ratnaparkhi A.Learning to parse natural language with maximum entropy models.Machine Learning,1999,34(1-3):151-176.
    [10]Koeling R.Chunking with maximum entropy models.In:Proc.of the CoNLL-2000 and LLL-2000.2000.139-141.http://acl.ldc.upenn.edu/W/W00/W00-0729.pdf
    [11]Luo XQ,Ittycheriah A,Jing HY,Kambhatla N,Roukos S.A mention-synchronous coreference resolution algorithm based on the.bell tree.In:Proc.of the 42nd Annual Meeting of the Association for Computational Linguistics.2004.135-142.http://acl.ldc.upenn.edu/ac12004/main/pdf/243_pdf_2-col.pdf
    [12]Nigam K,Lafferty L,McCallum A.Using maximum entropy for text classification.In:Proc.of the IJCAI'99 Workshop on Machine Learning for Information Filtering.1999.61-67.http://www.kamalnigam.com/papers/maxent-ijcaiws99.pdf
    [13]Ittycheriah A,Roukos S.IBM's statistical question answering system for trec-11.In:Voorhees EM,Buckland LP,eds.Proc.of the TREC-11 Conf.Gaithersburg:NIST,2002.394-401.http://trec.nist.gov/pubs/trec 11/papers/ibm.ittycheriah.pdf
    [14]Zhou YQ,Guo YK,Huang XJ.Chinese and English BaseNP recognition based on a maximum entropy model.Journal of Computer Research and Development,2003,40(3):440-446 (in Chinese with English abstract).
    [15]Zhou YQ,Weng FL,Wu LD,Schmidt H.A fast algorithm for feature selection in conditional maximum entropy modeling.In:Proc.of the 2003 Conf.on Empirical Methods in Natural Language Processing.2003.153-159.http://acl.ldc.upenn.edu/W/W03/W03-1020.pdf
    [16]Knuth DE.The Art of Computer Programming,Volume 3,Sorting and Searching.Addison-Wesley,1973.Algorithm based on the Bell tree.ACL 2004.135-142.
    [17]Bentley JL.Multidimensional binary search trees used for associative searching.Communications of the ACM,1975,18(9):509-517.
    [18]Moore A,Lee MS.Cached sufficient statistics for efficient machine learning with large datasets.Journal of Artificial Intelligence Research,1998,8:67-91.
    [14]周雅倩,郭以昆,黄萱菁,吴立德.基于最大熵方法的中英文基本名词短语识别.计算机研究与发展,2003,40(3):440-446.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

周雅倩,黄萱菁,吴立德.一种特征匹配方法:稀疏特征树.软件学报,2006,17(5):1026-1033

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2005-02-14
  • 最后修改日期:2005-08-25
文章二维码
您是第19793956位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号