用Naive Bayes方法协调分类Web网页

微信服务号

微信订阅号

2025年6月2日 0:18 星期一

首页 > 过刊浏览>2001年第12卷第9期 >1386-1392

用Naive Bayes方法协调分类Web网页
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        范焱范焱
中国科学技术大学计算机科学与技术系安徽合肥 230027
在期刊界中查找
在百度中查找
在本站中查找
郑诚郑诚
中国科学技术大学计算机科学与技术系安徽合肥 230027;安徽大学计算机系安徽合肥 230027
在期刊界中查找
在百度中查找
在本站中查找
王清毅王清毅
中国科学技术大学计算机科学与技术系安徽合肥 230027
在期刊界中查找
在百度中查找
在本站中查找
蔡庆生蔡庆生
中国科学技术大学计算机科学与技术系安徽合肥 230027
在期刊界中查找
在百度中查找
在本站中查找
刘洁刘洁
中国科学技术大学计算机科学与技术系安徽合肥 230027
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金资助项目(69675016)

Using Naive Bayes to Coordinate the Classification of Web Pages

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [12]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

WWW上的信息极大丰富,如何从巨量的信息中有效地发现有用的信息,是亟待解决的问题,而Web网页的正确分类正是其中的核心问题.针对超文本结构中的结构特征,提出了用NaiveBayes方法协调分别利用超文本页面中的文本信息和结构信息进行分类的方法.经实验验证,与只用单种方法对超文本进行分类的方法相比,综合分类法有效地提高了分类的正确率.

关键词:超文本;Web;分类;机器学习;互联网;数据挖掘;信息检索;WWW

Abstract:

There is a vast source of information in WWW. How to find the useful information from Internet is an exact issue to be solved. The correct classification of Web pages is the core. Based on the structure characteristics of hypertext, the method of Naive Bayes is adopted in this paper to coordinate the two classifiers that use the text document and hypertext structure. Compared with the two separate classifiers, the combining classifier promotes the correctness of Web pages'classification evidently and steadily.

Key words:hypertext; Web; classification; machine learning; Internet; data mining; information retrieval; WWW

参考文献

[1] Craven, M., DiPasquo, D., Freitag, D., et al. Learning to extract symbolic knowledge from the World Wide Web. Technical Report, MU-CS-98-122, School of Computer Science, CMU, 1998.

[2] Quek, C.Y. Classification of World Wide Web documents [MS. Thesis]. School of Computer Science, CMU, 1997.

[3] Paazzani, M., Billsus, D. Learning and revising user profiles: the identification of interesting Web sites. Machine Learning, 1997,27(3):313～331.

[4] Chakrabarti, S., Dom, B., Agrawal, R., et al. Using taxonomy, discriminants, and signatures for navigating in text databases. In: Jarke, M., Carey, M.J., eds. Proceedings of the 23rd International Conference on Very Large Databases (VLDB'97). San Fransisco, CA: Morgan Kaufmann Publishers, 1997. 446～455.

[5] Andrew, McCallum, Kamal, Nigam. A comparison of event models for naive bayes text classification. In: Sahami, M., ed. AAAI-98 Workshop on Learning for Text Categorization. Menlo Park: AAAI Press, 1998. 509～516.

[6] Apte, C., Damerau, F., Weiss, S.M. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 1994,12(3):233～251.

[7] Lang, K. News weeder: learning to filter net-news. In: Preditis, Russell, eds. Proceedings of the 12th International Conference on Machine Learning (ICML-95). San Fransisco, CA: Morgan Kaufmann Publishers, 1995. 331～339.

[8] Mitchell, T.M. Machine Learning. New York: McGraw-Hill, 1997.

[9] KontKanen, P., Myllymaki, P., Silander, T., et al. BAYDA: software for Bayesian classification and feature selection. In: Agrawal, R., Stolorz, P.E., Piatetsky-Shapiro, G., eds. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD'98). Menlo Park: AAAI Press, 1998. 254～258.

[10] Koller, D., Sahami, M. Hierarchically classifying documents using very few words. In: Fisher, D.H., ed. Proceedings of the 14th International Conference on Machine Learning (ICML-97). San Fransisco, CA: Morgan Kaufmann Publishers, 1997. 170～178.

[11] Quinlan, J.R. Constructing Decision Tree in C4.5: Programs for Machine Learning. San Matco, CA: Morgan Kaufmann Publishers, 1993. 17～26.

[12] Dunja, Mladenic, Marko, Grobelnik. Feature selection for unbalanced class distribution and naive Bayes. In: Bratko, I., Dzeroski, S., eds. Proceedings of the 16th International Conference on Machine Learning (ICML-99). San Francisco, CA: Morgan Kaufmann Publishers, 1999. 258～267.

引用本文

范焱,郑诚,王清毅,蔡庆生,刘洁.用Naive Bayes方法协调分类Web网页.软件学报,2001,12(9):1386-1392

复制

文章指标

点击次数:3870
下载次数: 4742
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2000-02-24
最后修改日期:2000-05-10
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码