语义查询扩展中词语-概念相关度的计算
作者:
基金项目:

Supported by the National Natural Science Foundation of China under Grant Nos.60496325, 60573092 (国家自然科学基金)


Computing Term-Concept Association in Semantic-Based Query Expansion
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [29]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    在基于语义的查询扩展中,为了找到描述查询需求语义的相关概念,词语-概念相关度的计算是语义查询扩展中的关键一步.针对词语-概念相关度的计算,提出一种K2CM(keyword to concept method)方法.K2CM方法从词语-文档-概念所属程度和词语-概念共现程度两个方面来计算词语-概念相关度.词语-文档-概念所属程度来源于标注的文档集中词语对概念的所属关系,即词语出现在若干文档中而文档被标注了若干概念.词语-概念共现程度是在词语概念对的共现性基础上增加了词语概念对的文本距离和文档分布特征的考虑.3种不同类型数据集上的语义检索实验结果表明,与传统方法相比,基于K2CM的语义查询扩展可以提高查询效果.

    Abstract:

    In semantic-based query expansion, computing term-concept association is a key step in finding associated concepts to describe the needed query. A method called K2CM (keyword to concept method) is proposed to compute the term-concept association. In K2CM, the attaching relationship among term, document and concept together with term-concept co-occurrence relationship are introduced to compute term-concept association. The attaching relationship derives from the fact that a term is attached to some concepts in annotated corpus, where a term is in some documents and the documents are labeled with some concepts. For term-concept co-occurrence relationship, it is enhanced by the text distance and the distribution feature of term-concept pair in corpus. Experimental results of semantic-based search on three different corpuses show that compared with classical methods, semantic-based query expansion on the basis of K2CM can improve search effectiveness.

    参考文献
    [1]Baeza-Yates R,Ribeiro-Neto B.Modern Information Retrieval.New York:Addison-Wesley-Longman,1999.
    [2]Furnas GW,Landauer TK,Gomez LM,Dumais ST.The vocabulary problem in Human-System communication.Communications of the ACM,1987,30(11):964-971.
    [3]Qiu YG,Frei HP.Concept based query expansion.In:Korfhage R,Rasmussen E,Willett P,eds.Proc.of the 16th annual Int'l ACM SIGIR Conf.on research and development in information retrieval.Pittsburgh:ACM Press,1993.160-169.
    [4]Chang Y,Ounis I,Kim M.Query reformulation using automatically generated query concepts from a document space.Information Processing and Management,2006,42:453-468.
    [5]Jing YF,Croft WB.An association thesaurus for information retrieval.Technical Report,UM-CS-1994-017,Amherst:University of Massachusetts,1994.
    [6]van Rijsbergen CJ.Information retrieval.Department of Computing Science,University of Glasgow,1979.http://www.dcs.gla.ac.uk/Keith/Preface.html.
    [7]Xu JX,Croft WB.Improving the effectiveness of information retrieval with local context analysis.ACM Trans.on Information Systems,2000,18(1):79-112.
    [8]Mitra M,Singhal A,Buckley C.Improving automatic query expansion.In:Croft W B,Moffat A,Wilkinson R,Zobel J,eds.Proc.of the 21st Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval.Melbourne:ACM Press,1998.206-214.
    [9]Salton G,Buckley C.Improving retrieval performance by relevance feedback.Journal of the American Society for Information Science,1990,41(4):288-297.
    [10]Cui H,Wen JR,Li MQ.A statistical query expansion model based on query logs.Journal of Software,2003,14(9):1593-1599 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/19/9.htm
    [11]Ido D,Lillian L,Fernando CNP.Similarity-Based models of word cooccurrence probabilities.Machine Learning,1999,34(1-3):43-69.
    [12]Zazo áF,Figuerola CG,Berrocal JLA,Rodriguez E.Reformulation of queries using similarity thesauri.Information Processing and Management,2005,41(5):1163-1173.
    [13]Zhang M,Song RH,Ma SP.Document Refinternet based on semantic query expansion.Chinese Journal of Computers,2004,27(10):1395-1401 (in Chinese with English abstract).
    [14]Lin DK.Automatic retrieval and clustering of similar words.In:Boitet C,Whitelock P,eds.Proc.of the 17th Int'l Conf.on Computational Linguistics.Montreal:Association for Computational Linguistics,1998.79-112.
    [15]Kim JW,Candan KS.CP/CV:Concept similarity mining without frequency information from domain describing taxonomies.In:Yu PS,Tsotras VJ,Fox EA,Liu B,eds.Proc.of the 15th ACM Int'l Conf.on Information And Knowledge Management.Arlington:ACM Press,2006.483-492.
    [16]Jang MG,Myaeng SH,Park SY.Using mutual information to resolve query translation ambiguities and query term weighting.In:Dale R,Church K,eds.Proc.of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics.College Park:Association for Computational Linguistics,1999.223-229.
    [17]Gao JF,Zhou M,Nie JY,He HZ,Chen WJ.Resolving query translation ambiguity using a decaying Co-Occurrence model and syntactic dependence relations.In:Jarvelin K,Chairs P,Baeza-Yates R,Myaeng SH,eds.Proc.Of the 25th Annual Int'l ACM SIGIR Conf.On Research and Development in Information Retrieval.Tampere:ACM Press,2002.183-190.
    [18]Gregory G.Use of syntactic context to produce term association lists for text retrieval.In:Belkin N,Ingwersen P,Pejtersen AM,eds.Proc.of the 15th Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval.Copenhagen:ACM Press,1992.89-97.
    [19]Loh S,Wives LK,de Oliveira JPM.Concept-Based knowledge discovery in texts extracted from the Web.ACM SIGKDD Explorations Newsletter,2000,2(1):29-39.
    [20]Fraenkel AS,Klein ST.Information retrieval from annotated texts.Journal of the American Society for Information Science,1999,50(10):845-854.
    [21]Sun RX,Ong CH,Chua TS.Mining dependency relations for query expansion in passage retrieval.In:Efthimiadis EN,Dumais ST,Hawking D,Jarvelin K,eds.Proc.Of the 29th Annual Int'l ACM SIGIR Conf.On Research and Development in Information Retrieval.Seattle:ACM Press,2006.382-389.
    [22]Lu S,Bai S.Quantitative analysis of context field in natural language Processing.Chinese Journal of Computers,2001,24(7):742-747 (in Chinese with English abstract).
    [23]Xu JX,Croft WB.Query expansion using local and global document analysis.In:Frei HP,Harman D,Schable P,Wilkinson R,eds.Proc.Of the 19th Annual Int'l ACM SIGIR Conf.On Research and Development in Information Retrieval.Zürich:ACM Press,1996.4-11.
    [24]Martin T,Ralf S,Gerhard W.Efficient and self-tuning incremental query expansion for Top-K query Processing.In:Baeza-Yates R,Ziviani N,eds.Proc.of the 28th Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval.Salvador:ACM Press,2005.242-249.
    [25]Green SJ.Building hypertext links by computing semantic similarity.IEEE Trans.on Knowledge and Data Engineering,1999,11(5):713-730.
    [26]Kandola JS,Shawe-Taylor J,Cristianini N.Learning semantic similarity.In:Becker S,Thrun S,Obermayer K,eds.Advances in Neural Information Processing Systems 15 (Neural Information Processing Systems,NIPS 2002).Vancouver:MIT Press,2002.657-664.
    [27]Varelas G,Voutsakis E,Raftopoulou P,Petrakis EGM,Milios EE.Semantic similarity methods in WordNet and Their Application to Information Retrieval on the Web.In:Bonifati A,Lee D,eds.Proc.of the 7th Annual ACM Int'l Workshop on Web Information and Data Management.Bremen:ACM Press,2005.10-16.
    [28]Fang H,Tao T,Zhai CX.A formal study of information retrieval heuristics.In:Sanderson M,Jarvelin K,Allan J,Bruza P,eds.Proc.Of the 27th annual Int'l ACM SIGIR Conf.On Research and Development in Information Retrieval.Sheffield:ACM Press,2004.49-56.
    [29]Lin J,Demner-Fushman D.The role of knowledge in conceptual retrieval:A study in the domain of clinical medicine.In:Efthimiadis EN,Dumais ST,Hawking D,Jarvelin K,eds.Proc.Of the 29th Annual Int'l ACM SIGIR Conf.On Research and Development in Information Retrieval.Seattle:ACM Press,2006.99-106.
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

田 萱,杜小勇,李海华.语义查询扩展中词语-概念相关度的计算.软件学报,2008,19(8):2043-2053

复制
分享
文章指标
  • 点击次数:5534
  • 下载次数: 7733
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2007-02-14
  • 最后修改日期:2007-08-24
文章二维码
您是第19763741位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号