用Naive Bayes方法协调分类Web网页
作者:
基金项目:

国家自然科学基金资助项目(69675016)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [12]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    WWW上的信息极大丰富,如何从巨量的信息中有效地发现有用的信息,是亟待解决的问题,而Web网页的正确分类正是其中的核心问题.针对超文本结构中的结构特征,提出了用NaiveBayes方法协调分别利用超文本页面中的文本信息和结构信息进行分类的方法.经实验验证,与只用单种方法对超文本进行分类的方法相比,综合分类法有效地提高了分类的正确率.

    Abstract:

    There is a vast source of information in WWW. How to find the useful information from Internet is an exact issue to be solved. The correct classification of Web pages is the core. Based on the structure characteristics of hypertext, the method of Naive Bayes is adopted in this paper to coordinate the two classifiers that use the text document and hypertext structure. Compared with the two separate classifiers, the combining classifier promotes the correctness of Web pages'classification evidently and steadily.

    参考文献
    [1] Craven, M., DiPasquo, D., Freitag, D., et al. Learning to extract symbolic knowledge from the World Wide Web. Technical Report, MU-CS-98-122, School of Computer Science, CMU, 1998.
    [2] Quek, C.Y. Classification of World Wide Web documents [MS. Thesis]. School of Computer Science, CMU, 1997.
    [3] Paazzani, M., Billsus, D. Learning and revising user profiles: the identification of interesting Web sites. Machine Learning, 1997,27(3):313~331.
    [4] Chakrabarti, S., Dom, B., Agrawal, R., et al. Using taxonomy, discriminants, and signatures for navigating in text databases. In: Jarke, M., Carey, M.J., eds. Proceedings of the 23rd International Conference on Very Large Databases (VLDB'97). San Fransisco, CA: Morgan Kaufmann Publishers, 1997. 446~455.
    [5] Andrew, McCallum, Kamal, Nigam. A comparison of event models for naive bayes text classification. In: Sahami, M., ed. AAAI-98 Workshop on Learning for Text Categorization. Menlo Park: AAAI Press, 1998. 509~516.
    [6] Apte, C., Damerau, F., Weiss, S.M. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 1994,12(3):233~251.
    [7] Lang, K. News weeder: learning to filter net-news. In: Preditis, Russell, eds. Proceedings of the 12th International Conference on Machine Learning (ICML-95). San Fransisco, CA: Morgan Kaufmann Publishers, 1995. 331~339.
    [8] Mitchell, T.M. Machine Learning. New York: McGraw-Hill, 1997.
    [9] KontKanen, P., Myllymaki, P., Silander, T., et al. BAYDA: software for Bayesian classification and feature selection. In: Agrawal, R., Stolorz, P.E., Piatetsky-Shapiro, G., eds. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD'98). Menlo Park: AAAI Press, 1998. 254~258.
    [10] Koller, D., Sahami, M. Hierarchically classifying documents using very few words. In: Fisher, D.H., ed. Proceedings of the 14th International Conference on Machine Learning (ICML-97). San Fransisco, CA: Morgan Kaufmann Publishers, 1997. 170~178.
    [11] Quinlan, J.R. Constructing Decision Tree in C4.5: Programs for Machine Learning. San Matco, CA: Morgan Kaufmann Publishers, 1993. 17~26.
    [12] Dunja, Mladenic, Marko, Grobelnik. Feature selection for unbalanced class distribution and naive Bayes. In: Bratko, I., Dzeroski, S., eds. Proceedings of the 16th International Conference on Machine Learning (ICML-99). San Francisco, CA: Morgan Kaufmann Publishers, 1999. 258~267.
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

范焱,郑诚,王清毅,蔡庆生,刘洁.用Naive Bayes方法协调分类Web网页.软件学报,2001,12(9):1386-1392

复制
分享
文章指标
  • 点击次数:3870
  • 下载次数: 4742
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2000-02-24
  • 最后修改日期:2000-05-10
文章二维码
您是第19987962位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号