• Article
  • | |
  • Metrics
  • |
  • Reference [18]
  • |
  • Related [20]
  • |
  • Cited by [12]
  • | |
  • Comments
    Abstract:

    Key resource page is one of the most important search target pages for Web search users. Decision tree learning is one of the most widely-used and practical methods for inductive inference in machine learning. Because of the difficulty in uniform sampling of Web pages, there are not enough negative instances for training a key resource decision tree. To solve the problem, the original algorithm is partly modified to learn from global instead of individual instance information. With the same evaluation method as TREC (Text Retrieval Conference) 2003, large scale retrieval experiments based on improved decision tree algorithm achieves more than 40% improvement than the ones based on the original algorithm. It not only offers an effective way for selecting Web key resource pages, but also shows a possible way to improve decision tree learning performances.

    Reference
    [1]Amento B,Terveen L,Hill W.Does authority mean quality? Predicting expert quality ratings of Web documents.In:Belkin NJ,Ingwersen P,Leong MK,eds.SIGIR 2000:Proc.of the 23rd Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval 2000.New York:ACM Press,2000.296-303.
    [2]Davison BD.Topical locality in the Web.In:Belkin NJ,Ingwersen P,Leong MK,eds.SIGIR 2000:Proc.of the 23rd Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval 2000.New York:ACM Press,2000.272-279.
    [3]Bharat K,Henzinger M.Improved algorithms for topic distillation in a hyperlinked environment.In:Croft BW,Moffat A,van Rijsbergen CJ,Wilkinson R,Zobel J,eds.SIGIR'98:Proc.of the 21st Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval.New York:ACM Press,1998.104-111.
    [4]Broder A.A taxonomy of Web search.SIGIR Forum,2002,36(2):1-8.
    [5]Henzinger MR,Motwani R,Silverstein C.Challenges in Web search engines.In:Gottlob G,Walsh T,eds.IJCAI2003,Proc.of the18th Int'l Joint Conf.on Artificial Intelligence.San Francisco:Morgan Kaufmann Publishers,2003.1573-1579.
    [6]Kleinberg JM.Authoritative sources in a hyperlinked environment.Journal of the ACM,1999,46(5):604-632.
    [7]Chakrabarti S,Dom B,Knmar R,Raghavan P,Rajagopalan S,Tomkins A.Experiments in topic distillation.In:Brown E,Smeaton A,eds.Proc.of the ACM SIGIR Workshop on Hypertext Information Retrieval.New York:ACM Press,1998.13-21.
    [8]Chakrabarti S,Joshi M,Tawde V,Bombay IIT.Enhanced topic distillation using text,markup,tags and hyperlinks.In:Croft BW,Harper DJ,Kraft DH,Zobel J,eds.SIGIR 2001:Proc.of the 24th Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval.New York:ACM Press,2001.208-216.
    [9]Mitchell TM.Machine Learning.New York:McGraw-Hill,1997.55-64.
    [10]Liu XH,Li S.An optimized algorithm of decision tree.Journal of Software,1998,9(10):797-800 (in Chinese with English abstract).
    [11]Hong JR,Ding MF,Li XY,Wang LW.A new algorithm of decision tree induction.Chinese Journal of Computers,1995,18(6):470-474 (in Chinese with English abstract).
    [12]Craswell N,Hawking D.Query-Independent evidence in home page finding.ACM Trans.on Information Systems (TOIS),2003,21 (3):286-313.
    [13]Hawking D,Craswell N.Overview of the TREC-2002 Web track.In:Voorhees EM,Buckland LP,eds.NIST Special Publication500-251:The 11th Text REtrieval Conf.(TREC 2002).Washington:Department of Commerce,National Institute of Standards and Technology,2002.
    [14]Hawking D,Craswell N.Overview of the TREC 2003 Web track.In:Voorhees EM,Buckland LP,eds.NIST Special Publication500-255:The 12th Text REtrieval Conf.(TREC 2003).Washington:Department of Commerce,National Institute of Standards and Technology,2003.78-92.
    [15]Liu YQ,Zhang M,Ma SP.Effective topic distillation with key resource pre-selection.In:Myaeng SH,et al.,eds.Proc.of the AIRS2004.LNCS 3411,Berlin/Heidelberg:Springer-Verlag,2005.129-140.
    [16]Kraaij W,Westerveld T,Hiemstra D.The importance of prior probabilities for entry page search.In:Ricardo BY,ed.Proc.of the25th Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval.New York:ACM Press,2002.27-34.
    [10]刘小虎,李生.决策树的优化算法.软件学报,1998,9(10):797-800.
    [11]洪家荣,丁明峰,李星原,王丽薇.一种新的决策树归纳学习算法.计算机学报,1995,18(6):470-474.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

刘奕群,张敏,马少平.基于改进决策树算法的网络关键资源页面判定.软件学报,2005,16(11):1958-1966

Copy
Share
Article Metrics
  • Abstract:4021
  • PDF: 5221
  • HTML: 0
  • Cited by: 0
History
  • Received:July 26,2004
  • Revised:June 02,2005
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063