There is a vast source of information in WWW. How to find the useful information from Internet is an exact issue to be solved. The correct classification of Web pages is the core. Based on the structure characteristics of hypertext, the method of Naive Bayes is adopted in this paper to coordinate the two classifiers that use the text document and hypertext structure. Compared with the two separate classifiers, the combining classifier promotes the correctness of Web pages'classification evidently and steadily.
[1] Craven, M., DiPasquo, D., Freitag, D., et al. Learning to extract symbolic knowledge from the World Wide Web. Technical Report, MU-CS-98-122, School of Computer Science, CMU, 1998.
[2] Quek, C.Y. Classification of World Wide Web documents [MS. Thesis]. School of Computer Science, CMU, 1997.
[3] Paazzani, M., Billsus, D. Learning and revising user profiles: the identification of interesting Web sites. Machine Learning, 1997,27(3):313~331.
[4] Chakrabarti, S., Dom, B., Agrawal, R., et al. Using taxonomy, discriminants, and signatures for navigating in text databases. In: Jarke, M., Carey, M.J., eds. Proceedings of the 23rd International Conference on Very Large Databases (VLDB'97). San Fransisco, CA: Morgan Kaufmann Publishers, 1997. 446~455.
[5] Andrew, McCallum, Kamal, Nigam. A comparison of event models for naive bayes text classification. In: Sahami, M., ed. AAAI-98 Workshop on Learning for Text Categorization. Menlo Park: AAAI Press, 1998. 509~516.
[6] Apte, C., Damerau, F., Weiss, S.M. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 1994,12(3):233~251.
[7] Lang, K. News weeder: learning to filter net-news. In: Preditis, Russell, eds. Proceedings of the 12th International Conference on Machine Learning (ICML-95). San Fransisco, CA: Morgan Kaufmann Publishers, 1995. 331~339.
[8] Mitchell, T.M. Machine Learning. New York: McGraw-Hill, 1997.
[9] KontKanen, P., Myllymaki, P., Silander, T., et al. BAYDA: software for Bayesian classification and feature selection. In: Agrawal, R., Stolorz, P.E., Piatetsky-Shapiro, G., eds. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD'98). Menlo Park: AAAI Press, 1998. 254~258.
[10] Koller, D., Sahami, M. Hierarchically classifying documents using very few words. In: Fisher, D.H., ed. Proceedings of the 14th International Conference on Machine Learning (ICML-97). San Fransisco, CA: Morgan Kaufmann Publishers, 1997. 170~178.
[11] Quinlan, J.R. Constructing Decision Tree in C4.5: Programs for Machine Learning. San Matco, CA: Morgan Kaufmann Publishers, 1993. 17~26.
[12] Dunja, Mladenic, Marko, Grobelnik. Feature selection for unbalanced class distribution and naive Bayes. In: Bratko, I., Dzeroski, S., eds. Proceedings of the 16th International Conference on Machine Learning (ICML-99). San Francisco, CA: Morgan Kaufmann Publishers, 1999. 258~267.