Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100190, China; Graduate University, The Chinese Academy of Sciences, Beijing 100049, China 在期刊界中查找 在百度中查找 在本站中查找
Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100190, China; Graduate University, The Chinese Academy of Sciences, Beijing 100049, China 在期刊界中查找 在百度中查找 在本站中查找
Search engine queries are often too vague to achieve relevant results. This paper presents an intelligent query approach that can distinguish vague queries and organize the related queries of each vague query into a concept hierarchy. Through the concept hierarchy, users can quickly find proper queries for their informational needs. The TECH (term concept hunting) is proposed, based on the small world of human languages. TECH utilizes both the community detection algorithms in the physical field and IR techniques in the computer science field to generate an extensible framework. Experimental results show that compared with the traditional listing query suggestion manner, users prefer the intelligent query suggestion. TECH can effectively distinguish vague queries and significantly outperforms the other three state-of-the-art hierarchical building systems statistically.
[1] Song R, Luo Z, Wen JR, Yu Y, Hon HW. Identifying ambiguous queries in Web search. In: Proc. of the 16th Int’l World Wide Web Conf. (WWW 2007). New York: ACM, 2007. 1169?1170. [doi: 10.1145/1242572.1242749]
[2] Huang CK, Chien LF, Oyang YJ, Relevant term suggestion in interactive Web search based on contextual information in query session logs. Journal of the American Society for Information Science and Technology, 2003,54(7):638?649. [doi: 10.1002/asi. 10256]
[3] Wang JM, Peng B. User behavior analysis for a large-scale search engine. Journal of the China Society for Scientific and Technical Information, 2006,25(2):154?162 (in Chinese with English abstract).
[4] Cui H, Wen JR, Li MQ. A statistical query expansion model based on query Logs. Journal of Software, 2003,14(9):1593?1599 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/14/1593.htm
[5] Xu J, Croft WB. Query expansion using local and global document analysis. In: Proc. of the ACM-SIGIR Conf. on Research and Development in Information Retrieval. New York: ACM, 1996. 4?11. [doi: 10.1145/243199.243202]
[6] Jones R, Rey B, Madani O, Greiner W. Generating query substitutions. In: Proc. of the 15th Int’l Conf. on World Wide Web (WWW 2006). New York: ACM, 2006. 387?396. [doi: 10.1145/1135777.1135835]
[7] Belkin NJ. Helping people find what they don’t know. Communication of ACM (CACM), 2000,43(8):58?61. [doi: 10.1145/345124. 345143]
[8] Hearst MA, Pedersen JO. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In: Proc. of the 19th Annual Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. New York: ACM, 1996. 76?84. [doi: 10.1145/243199. 243216]
[9] Joho H, Sanderson M, Beaulieu M. Hierarchical approach to term suggestion device. In: Proc. of the 25th Annual Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 2002). New York: ACM, 2002. 454?461. [doi: 10.1145/564376.564495]
[10] Ferragina P, Gulli A. A personalized search engine based on Web-snippet hierarchical clustering. In: Special Interest Tracks and Posters of the 14th Int’l Conf. on World Wide Web. New York: ACM, 2005. 801?810. [doi: 10.1145/1062745. 1062760]
[11] Wang XH, Zhai CX. Learn from Web search logs to organize search results. In: Proc. of the 30th Annual Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. New York: ACM, 2007. 87?94. [doi: 10.1145/1277741.1277759]
[12] Stoica E, Hearst M, Richardson M. Automating creation of hierarchical faceted metadata structures. In: Proc. of the Human Language Technologies: The Annual Conf. of the North American Chapter of the Association for Computation Linguistics (NAACL-HLT 2007). Rochester, 2007. 244?251.
[13] Cancho RFI, Solé RV. The small world of human language. Proc. of the Royal Society B: Biological Sciences, 2001,268: 2261?2265. [doi: 10.1098/rspb.2001.1800]
[14] Liu ZY, Sun MS. Chinese word co-occurrence network: Its small world effect and scale-free property. Journal of Chinese Information Processing, 2007,21(6):52?58 (in Chinese with English abstract).
[15] Li YN, Zhang S, Wang B, Li JT. Characteristics of Chinese Web searching: A large-scale analysis of Chinese query logs. Journal of Computational Information Systems, 2008,4(3):1127?1136.
[16] Yu HJ, Liu YQ, Zhang M, Ru LY, Ma SP. Research in search engine user behavior based on log analysis. In: Proc. of the SWCL 2006. 2006. 76?80 (in Chinese with English abstract).
[18] Cheng XQ. Analysis the topological structure and the content relevance of the information networks [Ph.D. Thesis]. Beijing: Institute of Computing Technology, the Chinese Academy of Sciences, 2005 (in Chinese with English abstract).
[19] Newman MEJ. Analysis of weighted networks. Physical Review, E, 2004,70:056131. [doi: 10.1103/PhysRevE.70.056131]
[20] Shi JB, Malik J. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2000,22(8):888?905. [doi: 10.1109/34.868688]
[21] Zhang D, Mao R. Classifying networked entities with modularity kernels. In: Proc. of the 17th ACM Conf. on Information and Knowledge Management (CIKM 2008). New York: ACM, 2008. 113?121. [doi: 10.1145/1458082.1458100]
[22] Wang XF, Li X, Chen GR. Complicated Network Theory and Application. Beijing: Tsinghua University Press, 2006.
[23] Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Physical Review, E, 2004,69:026113. [doi: 10.1103/PhysRevE.69.026113]
[24] Newman MEJ. Finding community structure in networks using the eigenvectors of matrices. Physical Review, E, 2006,74:036104. [doi: 10.1103/PhysRevE.74.036104]
[25] Clauset A, Newman MEJ, Moore C. Finding community structure in very large networks. Physics Review, E, 2004,70:066111. [doi: 10.1103/PhysRevE.70.066111]
[26] Nguyen VB, Kan MY. Functional faceted Web query analysis. In: Proc. of the 16th Int’l World Wide Web Conf. (WWW2007) Workshop on Query Log Analysis: Social and Technological Challenges. New York: ACM, 2007. http://www2007.org/workshop-W6.php
[27] Aslam JA, Pelekhov E, Rus D. The star clustering algorithm for static and dynamic information organization. Journal of Graph Algorithms and Applications, 2004,8(1):95?129.