Web Key Resource Page Judgment Based on Improved Decision Tree Algorithm

微信服务号

微信订阅号

2025-6-6- 15

Home > Archive>Volume 16, Issue 11, 2005 >1958-1966

Web Key Resource Page Judgment Based on Improved Decision Tree Algorithm
DOI:
                        
                    
Author:
                        LIU Yi-QunLIU Yi-Qun

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG MinZHANG Min

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MA Shao-PingMA Shao-Ping

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [18]

Related [20]

Cited by [12]

Materials

Comments

Abstract:

Key resource page is one of the most important search target pages for Web search users. Decision tree learning is one of the most widely-used and practical methods for inductive inference in machine learning. Because of the difficulty in uniform sampling of Web pages, there are not enough negative instances for training a key resource decision tree. To solve the problem, the original algorithm is partly modified to learn from global instead of individual instance information. With the same evaluation method as TREC (Text Retrieval Conference) 2003, large scale retrieval experiments based on improved decision tree algorithm achieves more than 40% improvement than the ones based on the original algorithm. It not only offers an effective way for selecting Web key resource pages, but also shows a possible way to improve decision tree learning performances.

Key words:Web information retrieval; key resource page; machine learning; decision tree

Reference

[1]Amento B,Terveen L,Hill W.Does authority mean quality? Predicting expert quality ratings of Web documents.In:Belkin NJ,Ingwersen P,Leong MK,eds.SIGIR 2000:Proc.of the 23rd Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval 2000.New York:ACM Press,2000.296-303.

[2]Davison BD.Topical locality in the Web.In:Belkin NJ,Ingwersen P,Leong MK,eds.SIGIR 2000:Proc.of the 23rd Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval 2000.New York:ACM Press,2000.272-279.

[3]Bharat K,Henzinger M.Improved algorithms for topic distillation in a hyperlinked environment.In:Croft BW,Moffat A,van Rijsbergen CJ,Wilkinson R,Zobel J,eds.SIGIR'98:Proc.of the 21st Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval.New York:ACM Press,1998.104-111.

[4]Broder A.A taxonomy of Web search.SIGIR Forum,2002,36(2):1-8.

[5]Henzinger MR,Motwani R,Silverstein C.Challenges in Web search engines.In:Gottlob G,Walsh T,eds.IJCAI2003,Proc.of the18th Int'l Joint Conf.on Artificial Intelligence.San Francisco:Morgan Kaufmann Publishers,2003.1573-1579.

[6]Kleinberg JM.Authoritative sources in a hyperlinked environment.Journal of the ACM,1999,46(5):604-632.

[7]Chakrabarti S,Dom B,Knmar R,Raghavan P,Rajagopalan S,Tomkins A.Experiments in topic distillation.In:Brown E,Smeaton A,eds.Proc.of the ACM SIGIR Workshop on Hypertext Information Retrieval.New York:ACM Press,1998.13-21.

[8]Chakrabarti S,Joshi M,Tawde V,Bombay IIT.Enhanced topic distillation using text,markup,tags and hyperlinks.In:Croft BW,Harper DJ,Kraft DH,Zobel J,eds.SIGIR 2001:Proc.of the 24th Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval.New York:ACM Press,2001.208-216.

[9]Mitchell TM.Machine Learning.New York:McGraw-Hill,1997.55-64.

[10]Liu XH,Li S.An optimized algorithm of decision tree.Journal of Software,1998,9(10):797-800 (in Chinese with English abstract).

[11]Hong JR,Ding MF,Li XY,Wang LW.A new algorithm of decision tree induction.Chinese Journal of Computers,1995,18(6):470-474 (in Chinese with English abstract).

[12]Craswell N,Hawking D.Query-Independent evidence in home page finding.ACM Trans.on Information Systems (TOIS),2003,21 (3):286-313.

[13]Hawking D,Craswell N.Overview of the TREC-2002 Web track.In:Voorhees EM,Buckland LP,eds.NIST Special Publication500-251:The 11th Text REtrieval Conf.(TREC 2002).Washington:Department of Commerce,National Institute of Standards and Technology,2002.

[14]Hawking D,Craswell N.Overview of the TREC 2003 Web track.In:Voorhees EM,Buckland LP,eds.NIST Special Publication500-255:The 12th Text REtrieval Conf.(TREC 2003).Washington:Department of Commerce,National Institute of Standards and Technology,2003.78-92.

[15]Liu YQ,Zhang M,Ma SP.Effective topic distillation with key resource pre-selection.In:Myaeng SH,et al.,eds.Proc.of the AIRS2004.LNCS 3411,Berlin/Heidelberg:Springer-Verlag,2005.129-140.

[16]Kraaij W,Westerveld T,Hiemstra D.The importance of prior probabilities for entry page search.In:Ricardo BY,ed.Proc.of the25th Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval.New York:ACM Press,2002.27-34.

[10]刘小虎,李生.决策树的优化算法.软件学报,1998,9(10):797-800.

[11]洪家荣,丁明峰,李星原,王丽薇.一种新的决策树归纳学习算法.计算机学报,1995,18(6):470-474.

Get Citation

刘奕群,张敏,马少平.基于改进决策树算法的网络关键资源页面判定.软件学报,2005,16(11):1958-1966

Copy

Article Metrics

Abstract:4021
PDF: 5221
HTML: 0
Cited by: 0

History

Received:July 26,2004
Revised:June 02,2005
Adopted:
Online:
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History