基于扩展角分类神经网络的文档分类方法(英文)
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

Supported by the National Natural Science Foundation of China under Grant No.60005004 (国家自然科学基金); the National Grand Fundamental Research 973 Program of China under Grant No.G1998030509 (国家重点基础研究发展规划973项目)


An Extended Corner Classification Neural Network Based Document Classification Approach
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    CC4神经网络是一种三层前馈网络的新型角分类(corner classification)训练算法,原用于元搜索引擎Anvish的文档分类.当各文档之间的规模接近时,CC4神经网络有较好的分类效果.然而当文档之间规模差别较大时,其分类性能较差.针对这一问题,本文意图扩展原始CC4神经网络,达到对文档有效分类的效果.为此,提出了一种基于MDS-NN的数据索引方法,将每一文档映射至k维空间数据点,并尽可能多地保持原始文档之间的距离信息.其次,通过将索引信息变换为CC4神经网络接受的0,1序列,实现对CC4神经网络的扩展,使其能够接受索引信息作为输入.实验结果表明对相互之间规模差别较大的文档,扩展CC4神经网络的性能优于原始CC4神经网络的性能.同时,扩展CC4神经网络的分类精度与文档索引方法有密切关系.

    Abstract:

    CC4 (the 4th version of corner classification) neural network is a new type of corner classification training algorithm for three-layered feedforward neural networks. It has been provided as a document classification approach for metasearch engine Anvish. On the condition that documents are almost of the same size, CC4 neural network is an effective document classification algorithm. However, when there is great difference in document sizes, CC4 neural network does not perform well. This paper aims to extend the original CC4 neural network for effectively classifying documents having much difference in sizes. To achieve this goal, the authors propose a MDS-NN based data indexing method thus making all documents be mapped to k-dimensional points while their distance information is kept well. The authors also extend CC4 neural network so that it can accept k-dimensional indexes of documents as its input, then transform these indexes to binary sequences required by CC4 neural network. The experimental results show that the performance of ExtendedCC4 is much better than that of InitialCC4 when there is a great difference in document sizes. At the same time, the high classification precision of ExtendedCC4 has much relationship with the effectiveness of indexing methods.

    参考文献
    相似文献
    引证文献
引用本文

陈恩红,张振亚,合源一幸,王煦法.基于扩展角分类神经网络的文档分类方法(英文).软件学报,2002,13(5):871-878

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2001-05-28
  • 最后修改日期:2001-11-14
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号