给互联网建立索引:基于词关系网络的智能查询推荐
作者:
基金项目:

国家自然科学基金(60603094, 60776797); 国家重点基础研究发展计划(973)(2007CB311103); 国家高技术研究发展计划(863)(2006AA010105); 北京市自然科学基金(4082030)


Indexing the World Wide Web: Intelligent Query Suggestion Based on Term Relation Network
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [28]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    搜索引擎用户经常提交意图模糊的查询,从而导致搜索失败.为此,提出一种检索交互方式——智能查询推荐,它可以自动辨别查询是否语义明确,并对模糊查询建立体现其不同语义概念的分类目录,这个目录将帮助用户快速定位到合适查询.为了实现智能查询推荐,提出了一种基于自然语言小世界性质的查询语义识别算法——TECH(term concept hunting).TECH 综合利用了物理学领域社区发现知识和计算机领域信息检索技术,给出了一种可扩展的算法框架.实验结果表明,与传统查询推荐方式相比,用户更喜欢智能查询推荐;TECH 能够有效地辨识模糊查询的不同语义概念,并统计显著优于3 个知名的对比系统.

    Abstract:

    Search engine queries are often too vague to achieve relevant results. This paper presents an intelligent query approach that can distinguish vague queries and organize the related queries of each vague query into a concept hierarchy. Through the concept hierarchy, users can quickly find proper queries for their informational needs. The TECH (term concept hunting) is proposed, based on the small world of human languages. TECH utilizes both the community detection algorithms in the physical field and IR techniques in the computer science field to generate an extensible framework. Experimental results show that compared with the traditional listing query suggestion manner, users prefer the intelligent query suggestion. TECH can effectively distinguish vague queries and significantly outperforms the other three state-of-the-art hierarchical building systems statistically.

    参考文献
    [1] Song R, Luo Z, Wen JR, Yu Y, Hon HW. Identifying ambiguous queries in Web search. In: Proc. of the 16th Int’l World Wide Web Conf. (WWW 2007). New York: ACM, 2007. 1169?1170. [doi: 10.1145/1242572.1242749]
    [2] Huang CK, Chien LF, Oyang YJ, Relevant term suggestion in interactive Web search based on contextual information in query session logs. Journal of the American Society for Information Science and Technology, 2003,54(7):638?649. [doi: 10.1002/asi. 10256]
    [3] Wang JM, Peng B. User behavior analysis for a large-scale search engine. Journal of the China Society for Scientific and Technical Information, 2006,25(2):154?162 (in Chinese with English abstract).
    [4] Cui H, Wen JR, Li MQ. A statistical query expansion model based on query Logs. Journal of Software, 2003,14(9):1593?1599 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/14/1593.htm
    [5] Xu J, Croft WB. Query expansion using local and global document analysis. In: Proc. of the ACM-SIGIR Conf. on Research and Development in Information Retrieval. New York: ACM, 1996. 4?11. [doi: 10.1145/243199.243202]
    [6] Jones R, Rey B, Madani O, Greiner W. Generating query substitutions. In: Proc. of the 15th Int’l Conf. on World Wide Web (WWW 2006). New York: ACM, 2006. 387?396. [doi: 10.1145/1135777.1135835]
    [7] Belkin NJ. Helping people find what they don’t know. Communication of ACM (CACM), 2000,43(8):58?61. [doi: 10.1145/345124. 345143]
    [8] Hearst MA, Pedersen JO. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In: Proc. of the 19th Annual Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. New York: ACM, 1996. 76?84. [doi: 10.1145/243199. 243216]
    [9] Joho H, Sanderson M, Beaulieu M. Hierarchical approach to term suggestion device. In: Proc. of the 25th Annual Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2002). New York: ACM, 2002. 454?461. [doi: 10.1145/564376.564495]
    [10] Ferragina P, Gulli A. A personalized search engine based on Web-snippet hierarchical clustering. In: Special Interest Tracks and Posters of the 14th Int’l Conf. on World Wide Web. New York: ACM, 2005. 801?810. [doi: 10.1145/1062745. 1062760]
    [11] Wang XH, Zhai CX. Learn from Web search logs to organize search results. In: Proc. of the 30th Annual Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. New York: ACM, 2007. 87?94. [doi: 10.1145/1277741.1277759]
    [12] Stoica E, Hearst M, Richardson M. Automating creation of hierarchical faceted metadata structures. In: Proc. of the Human Language Technologies: The Annual Conf. of the North American Chapter of the Association for Computation Linguistics (NAACL-HLT 2007). Rochester, 2007. 244?251.
    [13] Cancho RFI, Solé RV. The small world of human language. Proc. of the Royal Society B: Biological Sciences, 2001,268: 2261?2265. [doi: 10.1098/rspb.2001.1800]
    [14] Liu ZY, Sun MS. Chinese word co-occurrence network: Its small world effect and scale-free property. Journal of Chinese Information Processing, 2007,21(6):52?58 (in Chinese with English abstract).
    [15] Li YN, Zhang S, Wang B, Li JT. Characteristics of Chinese Web searching: A large-scale analysis of Chinese query logs. Journal of Computational Information Systems, 2008,4(3):1127?1136.
    [16] Yu HJ, Liu YQ, Zhang M, Ru LY, Ma SP. Research in search engine user behavior based on log analysis. In: Proc. of the SWCL 2006. 2006. 76?80 (in Chinese with English abstract).
    [17] Watts DJ, Stogatz SH. Collective dynamics of ‘small-world’ networks. Nature, 1998,393(6684):440?442. [doi: 10.1038/30918]
    [18] Cheng XQ. Analysis the topological structure and the content relevance of the information networks [Ph.D. Thesis]. Beijing: Institute of Computing Technology, the Chinese Academy of Sciences, 2005 (in Chinese with English abstract).
    [19] Newman MEJ. Analysis of weighted networks. Physical Review, E, 2004,70:056131. [doi: 10.1103/PhysRevE.70.056131]
    [20] Shi JB, Malik J. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2000,22(8):888?905. [doi: 10.1109/34.868688]
    [21] Zhang D, Mao R. Classifying networked entities with modularity kernels. In: Proc. of the 17th ACM Conf. on Information and Knowledge Management (CIKM 2008). New York: ACM, 2008. 113?121. [doi: 10.1145/1458082.1458100]
    [22] Wang XF, Li X, Chen GR. Complicated Network Theory and Application. Beijing: Tsinghua University Press, 2006.
    [23] Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Physical Review, E, 2004,69:026113. [doi: 10.1103/PhysRevE.69.026113]
    [24] Newman MEJ. Finding community structure in networks using the eigenvectors of matrices. Physical Review, E, 2006,74:036104. [doi: 10.1103/PhysRevE.74.036104]
    [25] Clauset A, Newman MEJ, Moore C. Finding community structure in very large networks. Physics Review, E, 2004,70:066111. [doi: 10.1103/PhysRevE.70.066111]
    [26] Nguyen VB, Kan MY. Functional faceted Web query analysis. In: Proc. of the 16th Int’l World Wide Web Conf. (WWW2007) Workshop on Query Log Analysis: Social and Technological Challenges. New York: ACM, 2007. http://www2007.org/workshop-W6.php
    [27] Aslam JA, Pelekhov E, Rus D. The star clustering algorithm for static and dynamic information organization. Journal of Graph Algorithms and Applications, 2004,8(1):95?129.
    [28] Dix AJ, Finlay JE, Abowd GD, Beale RB, Finley JE. Human-Computer Interaction. 2nd ed., Prentice Hall, Inc., 1998.
引用本文

李亚楠,王斌,李锦涛,李鹏.给互联网建立索引:基于词关系网络的智能查询推荐.软件学报,2011,22(8):1771-1784

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2009-06-10
  • 最后修改日期:2010-03-08
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号