在基于语义的查询扩展中,为了找到描述查询需求语义的相关概念,词语-概念相关度的计算是语义查询扩展中的关键一步.针对词语-概念相关度的计算,提出一种K2CM(keyword to concept method)方法.K2CM方法从词语-文档-概念所属程度和词语-概念共现程度两个方面来计算词语-概念相关度.词语-文档-概念所属程度来源于标注的文档集中词语对概念的所属关系,即词语出现在若干文档中而文档被标注了若干概念.词语-概念共现程度是在词语概念对的共现性基础上增加了词语概念对的文本距离和文档分布特征的考虑.3种不同类型数据集上的语义检索实验结果表明,与传统方法相比,基于K2CM的语义查询扩展可以提高查询效果.
In semantic-based query expansion, computing term-concept association is a key step in finding associated concepts to describe the needed query. A method called K2CM (keyword to concept method) is proposed to compute the term-concept association. In K2CM, the attaching relationship among term, document and concept together with term-concept co-occurrence relationship are introduced to compute term-concept association. The attaching relationship derives from the fact that a term is attached to some concepts in annotated corpus, where a term is in some documents and the documents are labeled with some concepts. For term-concept co-occurrence relationship, it is enhanced by the text distance and the distribution feature of term-concept pair in corpus. Experimental results of semantic-based search on three different corpuses show that compared with classical methods, semantic-based query expansion on the basis of K2CM can improve search effectiveness.
[1]Baeza-Yates R,Ribeiro-Neto B.Modern Information Retrieval.New York:Addison-Wesley-Longman,1999.
[2]Furnas GW,Landauer TK,Gomez LM,Dumais ST.The vocabulary problem in Human-System communication.Communications of the ACM,1987,30(11):964-971.
[3]Qiu YG,Frei HP.Concept based query expansion.In:Korfhage R,Rasmussen E,Willett P,eds.Proc.of the 16th annual Int'l ACM SIGIR Conf.on research and development in information retrieval.Pittsburgh:ACM Press,1993.160-169.
[4]Chang Y,Ounis I,Kim M.Query reformulation using automatically generated query concepts from a document space.Information Processing and Management,2006,42:453-468.
[5]Jing YF,Croft WB.An association thesaurus for information retrieval.Technical Report,UM-CS-1994-017,Amherst:University of Massachusetts,1994.
[6]van Rijsbergen CJ.Information retrieval.Department of Computing Science,University of Glasgow,1979.http://www.dcs.gla.ac.uk/Keith/Preface.html.
[7]Xu JX,Croft WB.Improving the effectiveness of information retrieval with local context analysis.ACM Trans.on Information Systems,2000,18(1):79-112.
[8]Mitra M,Singhal A,Buckley C.Improving automatic query expansion.In:Croft W B,Moffat A,Wilkinson R,Zobel J,eds.Proc.of the 21st Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval.Melbourne:ACM Press,1998.206-214.
[9]Salton G,Buckley C.Improving retrieval performance by relevance feedback.Journal of the American Society for Information Science,1990,41(4):288-297.
[10]Cui H,Wen JR,Li MQ.A statistical query expansion model based on query logs.Journal of Software,2003,14(9):1593-1599 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/19/9.htm
[11]Ido D,Lillian L,Fernando CNP.Similarity-Based models of word cooccurrence probabilities.Machine Learning,1999,34(1-3):43-69.
[12]Zazo áF,Figuerola CG,Berrocal JLA,Rodriguez E.Reformulation of queries using similarity thesauri.Information Processing and Management,2005,41(5):1163-1173.
[13]Zhang M,Song RH,Ma SP.Document Refinternet based on semantic query expansion.Chinese Journal of Computers,2004,27(10):1395-1401 (in Chinese with English abstract).
[14]Lin DK.Automatic retrieval and clustering of similar words.In:Boitet C,Whitelock P,eds.Proc.of the 17th Int'l Conf.on Computational Linguistics.Montreal:Association for Computational Linguistics,1998.79-112.
[15]Kim JW,Candan KS.CP/CV:Concept similarity mining without frequency information from domain describing taxonomies.In:Yu PS,Tsotras VJ,Fox EA,Liu B,eds.Proc.of the 15th ACM Int'l Conf.on Information And Knowledge Management.Arlington:ACM Press,2006.483-492.
[16]Jang MG,Myaeng SH,Park SY.Using mutual information to resolve query translation ambiguities and query term weighting.In:Dale R,Church K,eds.Proc.of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics.College Park:Association for Computational Linguistics,1999.223-229.
[17]Gao JF,Zhou M,Nie JY,He HZ,Chen WJ.Resolving query translation ambiguity using a decaying Co-Occurrence model and syntactic dependence relations.In:Jarvelin K,Chairs P,Baeza-Yates R,Myaeng SH,eds.Proc.Of the 25th Annual Int'l ACM SIGIR Conf.On Research and Development in Information Retrieval.Tampere:ACM Press,2002.183-190.
[18]Gregory G.Use of syntactic context to produce term association lists for text retrieval.In:Belkin N,Ingwersen P,Pejtersen AM,eds.Proc.of the 15th Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval.Copenhagen:ACM Press,1992.89-97.
[19]Loh S,Wives LK,de Oliveira JPM.Concept-Based knowledge discovery in texts extracted from the Web.ACM SIGKDD Explorations Newsletter,2000,2(1):29-39.
[20]Fraenkel AS,Klein ST.Information retrieval from annotated texts.Journal of the American Society for Information Science,1999,50(10):845-854.
[21]Sun RX,Ong CH,Chua TS.Mining dependency relations for query expansion in passage retrieval.In:Efthimiadis EN,Dumais ST,Hawking D,Jarvelin K,eds.Proc.Of the 29th Annual Int'l ACM SIGIR Conf.On Research and Development in Information Retrieval.Seattle:ACM Press,2006.382-389.
[22]Lu S,Bai S.Quantitative analysis of context field in natural language Processing.Chinese Journal of Computers,2001,24(7):742-747 (in Chinese with English abstract).
[23]Xu JX,Croft WB.Query expansion using local and global document analysis.In:Frei HP,Harman D,Schable P,Wilkinson R,eds.Proc.Of the 19th Annual Int'l ACM SIGIR Conf.On Research and Development in Information Retrieval.Zürich:ACM Press,1996.4-11.
[24]Martin T,Ralf S,Gerhard W.Efficient and self-tuning incremental query expansion for Top-K query Processing.In:Baeza-Yates R,Ziviani N,eds.Proc.of the 28th Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval.Salvador:ACM Press,2005.242-249.
[25]Green SJ.Building hypertext links by computing semantic similarity.IEEE Trans.on Knowledge and Data Engineering,1999,11(5):713-730.
[26]Kandola JS,Shawe-Taylor J,Cristianini N.Learning semantic similarity.In:Becker S,Thrun S,Obermayer K,eds.Advances in Neural Information Processing Systems 15 (Neural Information Processing Systems,NIPS 2002).Vancouver:MIT Press,2002.657-664.
[27]Varelas G,Voutsakis E,Raftopoulou P,Petrakis EGM,Milios EE.Semantic similarity methods in WordNet and Their Application to Information Retrieval on the Web.In:Bonifati A,Lee D,eds.Proc.of the 7th Annual ACM Int'l Workshop on Web Information and Data Management.Bremen:ACM Press,2005.10-16.
[28]Fang H,Tao T,Zhai CX.A formal study of information retrieval heuristics.In:Sanderson M,Jarvelin K,Allan J,Bruza P,eds.Proc.Of the 27th annual Int'l ACM SIGIR Conf.On Research and Development in Information Retrieval.Sheffield:ACM Press,2004.49-56.
[29]Lin J,Demner-Fushman D.The role of knowledge in conceptual retrieval:A study in the domain of clinical medicine.In:Efthimiadis EN,Dumais ST,Hawking D,Jarvelin K,eds.Proc.Of the 29th Annual Int'l ACM SIGIR Conf.On Research and Development in Information Retrieval.Seattle:ACM Press,2006.99-106.