Supported by the National Natural Science Foundation of China under Grant No.60373075 (国家自然科学基金); the Shanghai Education Commission Foundation for Excellent Young High Education Teacher of China under Grant No.YYY-07008 (上海高校选拔培养优秀青年教师科研专项基金); the Open Research Foundation of Shanghai Institute of Technology of China under Grant No.YJ2007-24 (上海应用技术学院引进人才科研启动项目)

An Algorithm for Automatic Clustering Number Determination in Networks Intrusion Detection
针对模糊C均值算法(fuzzy C-means algorithm,简称FCM)在入侵检测中需要预先指定聚类数的问题,提出了一种自动决定聚类数算法(fuzzy C-means and support vector machine algorithm,简称F-CMSVM).它首先用模糊C均值算法把目标数据集分为两类,然后使用带有模糊成员函数的支持向量机(support vector machine,简称SVM)算法对结果进行评估以确定目标数据集是否可分,再迭代计算,最终得到聚类结果.支持向量机算法引入模糊C均值算法得出的隶属矩阵作为模糊成员函数,使得不同的输入样本可以得到不同的惩罚值,从而得到最优的分类超平面.该算法既不需要对训练数据集进行标记,也不需要指定聚类数,因此是一种真正的无监督算法.在对KDD CUP 1999数据集的仿真实验结果表明,该算法不仅能够得到最佳聚类数,而且对入侵有较好的检测效果.

To address the issue in fuzzy C-means algorithm (FCM) that clustering number has to be pre-defined, a clustering algorithm, F-CMSVM (fuzzy C-means and support vector machine algorithm), is proposed for automatic clustering number determination. Above all, the data set is classified into two clusters by FCM. Then, support vector machine (SVM) with a fuzzy membership function is used to testify whether the data set can be classified further. Finally, the result of clusters can be obtained by repeating the computation process. Because affiliating matrix, obtained by the introduction of SVM into FCM, is defined to be the fuzzy membership function, each different input data sample can have different penalty value, and the separating hyper-plane is optimized. F-CMSVM is an unsupervised algorithm in which it is neither needed to label training data set nor specify clustering number. As shown from our simulation experiment over networks connection records from KDD CUP 1999 data set, F-CMSVM has efficient performance in clustering number optimization and intrusion detection.

