一种基于相似度分析的主题提取和发现算法
作者:
基金项目:

Supported by the National Natural Science Foundation of China under Grant No.60003016 (国家自然科学基金); the National Grand Fundamental Research 973 Program of China under Grant No.G1998030404 (国家重点基础研究发展规划(973))

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [9]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    试图从另一个角度来考察主题提取算法HITS,即提出一种基于相似度的链接分析模型来观察主题提取的过程.通过给出一种一般化的相似度定义,提出了一种仅使用链接分析来改善主题提取的质量的主题提取算法.同时,还将主题发现的功能也结合到了算法的框架中.通过该功能,用户可以搜索到次流行的主题.实验结果显示了这一新算法的两个优点:不必使用内容分析即能改善主题提取的质量以及能够进一步发现在查询结果中显现出来的不同主题.

    Abstract:

    In this paper, the authors attempt to revisit the behaviour of HITS from a different point of view. Namely, a similarity-based analysis model is proposed to observe the distillation procedure. By defining a generalized similarity, an algorithm is presented, which can improve the quality of distillation using only hyperlinks. A topic exploration function is also integrated into the algorithm framework, which enables end-users to search less popular topics when multi-topics are involved in queries. The experimental results reveal two benefits from the new algorithm: the improvement of distillation quality without utilizing any content information of pages, and an additional ability to explore the topics emerging in the query results.

    参考文献
    [1]Bharat K, Henzinger M. Improved algorithms for topic distillation in a hyperlinked environment. In: Voorhees E, Kirsch S, eds. Proceedings of the 21st ACMSIGIR International Conference on Research and Development in Information Retrieval. Melbourne: ACM Press, 1998. 104~111.
    [2]Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. In: Thistlewaite P, et al. eds. Proceedings of the 7th ACM-WWW International Conference. Brisbane: ACM Press, 1998. 107~117.
    [3]Kleinberg J. Authoritative sources in a hyperlinked environment. In: Tarjan RE, Baecker T, eds. Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms. New Orleans: ACM Press, 1997. 668~677.
    [4]Chakrabarti S, Dom B, Gibson D, Kleinberg J, Raghavan P, Rajagopalan S. Automatic resource compilation by analyzing hyperlink structure and associated text. In: Thistlewaite P, et al. eds. Proceedings of the 7th ACM-WWW International Conference. Brisbane: ACM Press, 1998. 65~74.
    [5]Chakrabarti S. Integrating the document object model with hyperlinks for Enhanced topic distillation and information extraction. In: Vincent Y S, et al. eds. Proceedings of the 10th ACM-WWW International Conference. Hong Kong: ACM Press, 2001. 211~220.
    [6]Borodin A, Roberts G, Rosenthal J, Tsaparas P. Finding authorities and hubs from link structures on the World Wide Web. In: Vincent Y S, et al. eds. Proceedings of the 10th ACM-WWW International Conference. Hong Kong: ACM Press, 2001. 415~429.
    [7]Davison B, Gerasoulis A, Kleisouris K, Lu Y, Seo H, Wang W, Wu B. DiscoWeb: Applying link analysis to web search (extended abstract). In: Vezza A, Maloney M, Cailliau R, eds. Proceedings of the 8th ACM-WWW International Conference. Toronto: ACM Press, 1999. 148~149.
    [8]Golub GH, Van Loan CF. Matrix Computations. London: Johns Hopkins University Press, 1989. 40~45.
    [9]http://www.yahoo.com. 2001.
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王晓宇,熊方,凌波,周傲英.一种基于相似度分析的主题提取和发现算法.软件学报,2003,14(9):1578-1585

复制
分享
文章指标
  • 点击次数:4827
  • 下载次数: 6769
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2002-06-05
  • 最后修改日期:2002-08-14
文章二维码
您是第19868195位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号