王继奎(1978-), 男, 博士, 副教授, CCF专业会员, 主要研究领域为机器学习, 人工智能
杨正国(1987-), 男, 博士, 副教授, CCF专业会员, 主要研究领域为机器学习, 人工智能
刘学文(1996-), 男, 硕士生, CCF学生会员, 主要研究领域为机器学习, 人工智能
易纪海(1974-), 男, 讲师, 主要研究领域为机器学习, 人工智能
李冰(1997-), 女, 硕士生, 主要研究领域为机器学习, 人工智能
聂飞平(1977-), 男, 博士, 教授, 博士生导师, CCF专业会员, 主要研究领域为机器学习, 人工智能
现实世界中高维数据无处不在, 然而在高维数据中往往存在大量的冗余和噪声信息, 这导致很多传统聚类算法在对高维数据聚类时不能获得很好的性能. 实践中发现高维数据的类簇结构往往嵌入在较低维的子空间中. 因而, 降维成为挖掘高维数据类簇结构的关键技术. 在众多降维方法中, 基于图的降维方法是研究的热点. 然而, 大部分基于图的降维算法存在以下两个问题: (1)需要计算或者学习邻接图, 计算复杂度高; (2)降维的过程中没有考虑降维后的用途. 针对这两个问题, 提出一种基于极大熵的快速无监督降维算法MEDR. MEDR算法融合线性投影和极大熵聚类模型, 通过一种有效的迭代优化算法寻找高维数据嵌入在低维子空间的潜在最优类簇结构. MEDR算法不需事先输入邻接图, 具有样本个数的线性时间复杂度. 在真实数据集上的实验结果表明, 与传统的降维方法相比, MEDR算法能够找到更好地将高维数据投影到低维子空间的投影矩阵, 使投影后的数据有利于聚类.
High-dimensional data is widely adopted in the real world. However, there is usually plenty of redundant and noisy information existing in high-dimensional data, which accounts for the poor performance of many traditional clustering algorithms when clustering high-dimensional data. In practice, it is found that the cluster structure of high-dimensional data is often embedded in the lower dimensional subspace. Therefore, dimension reduction becomes the key technology of mining high-dimensional data. Among many dimension reduction methods, graph-based method becomes a research hotspot. However, most graph-based dimension reduction algorithms suffer from the following two problems: (1) most of the graph-based dimension reduction algorithms need to calculate or learn adjacency graphs, which have high computational complexity; (2) the purpose of dimension reduction is not considered in the process of dimension reduction. To address the problem, a fast unsupervised dimension reduction algorithm is proposed based on the maximum entropy-MEDR, which combines linear projection and the maximum entropy clustering model to find the potential optimal cluster structure of high-dimensional data embedded in low-dimensional subspace through an effective iterative optimization algorithm. The MEDR algorithm does not need the adjacency graph as an input in advance, and has linear time complexity of input data scale. A large number of experimental results on real datasets show that the MEDR algorithm can find a better projection matrix to project high-dimensional data into low-dimensional subspace compared with the traditional dimensionality reduction method, so that the projected data is conducive to clustering analysis.
随着科技的发展, 涌现出越来越多的高维数据. 传统的机器学习算法在面对高维数据时存在很多问题, 比如计算复杂度高、样本采样密度过于稀疏导致距离计算困难等. 然而, 高维数据有意义的类簇结构往往嵌入在低维子空间中. 为了寻找高维数据嵌入在低维空间的类簇结构, 研究者们提出了很多降维方法. 这些降维方法主要包括无监督降维、半监督降维和有监督降维3种. 主成分分析(principal component analysis, PCA)[
基于PCA的系列算法和基于LPP的系列算法都是线性降维算法, 不适用于流形数据. 为了挖掘流形数据嵌入在低维子空间中的类簇结构, 研究人员提出了系列流形学习降维算法, 比如locally linear embedding[
聚类也是机器学习领域的研究热点之一. K-means[
传统的方法将高维数据的聚类分成降维和对降维后数据的聚类两个独立的阶段分别进行. 比如, 先用PCA进行数据降维, 然后采用K-means算法进行聚类学习. 目前也有一些研究将降维后数据的聚类信息融入降维过程中, 用于指导降维[
以提高降维后数据的聚类性能为目标, 我们提出了一种将极大熵理论与线性投影技术结合的快速无监督线性降维算法(fast unsupervised linear dimension reduction method based on maximum entropy, MEDR). MEDR算法将极大熵模型融合进线性降维过程中, 用于监督降维过程. 我们提出了一种有效的迭代优化算法进行MEDR模型优化. 具体做法如下: 固定权重关系矩阵, 优化投影矩阵和类簇中心; 固定投影矩阵和类簇中心, 优化权重关系矩阵. MEDR算法迭代地优化投影矩阵、类簇中心和权重关系矩阵, 直至算法收敛. MEDR算法融合了线性投影技术和极大熵模型, 利用极大熵模型监督降维过程, 使降维后嵌入在低维子空间的数据更适合聚类. 所以, MEDR算法具有明显的目的, 使降维后的数据更适合进行聚类.
LPP是Laplacian Eigenmaps的一个线性扩展, LPP的目标是使高维数据在低维子空间的投影依然保持原空间的近邻关系. LPP最小化以下目标函数:
公式(1)中,
公式(2)中,
公式(3)可以转化为以下广义特征向量求解问题:
公式(4)中,
为了解决LPP需预先输入邻接图的问题, 研究人员提出了图优化的局部近邻保持算法(graph-optimized locality preserving projections, GoLPP). GoLPP也是一种线性投影算法, 它结合LPP和极大熵理论, 可以自动学习权重矩阵
公式(5)中,
为了解决LPP的时间复杂度高, 不适用于大规模数据的问题, 文献[
(1)利用一种聚类算法获得
(2)利用以下公式计算锚点图权重矩阵:
公式(6)中,
(3)与LPP类似, 利用以下公式求解投影向量:
公式(7)中,
为了解决LPP算法面临的需预先输入邻接图和计算复杂度高的问题, 我们将GoLPP和AgLPP的思想结合起来, 提出了一种基于极大熵的快速线性降维算法MEDR. 我们将高维数据嵌入在低维子空间中的虚拟类簇中心作为锚点, 结合极大熵理论, 改变降维后数据的分布, 使MEDR算法降维后的数据更适用于聚类. MEDR降维算法解决了经典LPP算法需预先输入邻接图和计算复杂度高两个问题, 同时使降维后的数据具有更好的类簇结构.
基于以上分析, 我们提出了如下基于极大熵的快速线性降维模型.
公式(8)中,
MEDR降维算法与GoLPP等基于图的降维算法形式上很像, 但与它们相比有两点关键不同: (1) MEDR算法只需计算样本与类簇中心的距离, 时间复杂度为
公式(8)可以采用迭代的方法求解, 具体来说, 我们先固定一组变量, 优化另一组变量, 迭代的进行优化, 直至算法收敛. 公式(8)的优化分两步进行:
步骤1. 固定
公式(9)关于
由公式(10)可以得到
其中,
为了得到
引理
引理1的证明见附录1. 所以公式(9)中
步骤2. 固定
其中,
为求解以上模型, 暂时不考虑
令
由公式(16)可得:
将公式(17)代入公式(16)可得:
由公式(18)可得:
将公式(19)代入公式(17)可得:
显然
由以上推导过程可得:
从公式(21)可以看出,
通过以上分析, MEDR算法描述如算法1.
算法
输入: 数据集
参数: 降维后数据的维度
输出: 投影矩阵
1. 随机初始化
2. WHILE 不收敛 DO
3. 计算原始空间聚类中心
4. 计算
5. 计算
6. 更新
7. 根据公式(11)更新
8. 计算
9. 选择最小的前
10. END WHILE
11. 输出
令
PCA算法的空间复杂度为
令第
第1步, 固定
第2步, 固定
联合公式(22)、公式(23)可得:
因为公式(8)中
实验环境为Win7操作系统、2.7 GHz AMD A12-9800B R7 CPU、Matlab 2012a. 实验使用的PCA、LPP、ISOMAP、LLE和NPE算法实现均来自Matlab的drtoolbox库, AutoEncoder采用Matlab官网上的实现, GoLPP[
我们选取Australian、Cars、Cleve、Diabetes、German、Glass、UPS、ORL、Yale和Palm等10个UCI基准数据集(
The details of benchmark datasets
数据集信息表
数据集 | 样本数 | 维度 | 类簇数 | 数据集 | 样本数 | 维度 | 类簇数 | |
Australian | 690 | 14 | 2 | Glass | 214 | 9 | 6 | |
Cars | 392 | 8 | 3 | Yale | 165 | 1024 | 15 | |
Cleve | 303 | 13 | 4 | ORL | 400 | 1024 | 40 | |
Diabetes | 768 | 8 | 2 | UPS | 1854 | 256 | 10 | |
German | 1000 | 20 | 2 | Palm | 2000 | 256 | 100 |
以上10个数据集都是经典的用于降维、聚类研究的数据集. 对于大于100维的数据集, 先利用PCA降到100维. 对比的降维方法包括PCA、LPP、ISOMAP、AutoEncoder、LLE、NPE、GoLPP和AgLPP. 利用K-means算法对降维后的数据进行聚类. 选取常用的准确度(accuracy)、互信息(NMI)和F分数(F-score)作为聚类性能评价指标, accuracy代表聚类的准确度, 其取值范围为
在实验中各算法的参数设置如下, 对于所有算法
对于PCA、LPP、ISOMAP、AutoEncoder、LLE、NPE、GoLPP和AgLPP算法, K-means算法运行50次, 取聚类平均性能最好的结果进行比较. MEDR算法由于学到了类簇中心, 仅需运行一次K-means算法. 实验结果如
The clustering performance of each algorithm on Australian, Cars, Cleve, Diabetes and German
各算法在Australian、Cars、Cleve、Diabetes和German数据集上的聚类性能
指标 | 数据集 | Australian | Cars | Cleve | Diabetes | German |
准确率 (accuracy) | PCA | 56.20±0.01(9) | 65.05±0.02(1) | 67.26±0.12(1) | 66.09±0.08(1) | 70.00±0.00(1) |
LPP | 69.68±0.30(13) | 68.29±0.11(5) | 68.09±0.15(10) | 72.4±0.52(1) | 70.05±0.00(16) | |
AutoEncoder | 84.49±0.26(2) | 70.23±0.11(6) | 65.1±0.41(1) | 70.00±0.00(1) | ||
ISOMAP | 56.20±0.21(6) | 67.4±0.02(3) | 60.4±0.36(1) | 65.23±0.31(1) | 70.00±000 (1) | |
LLE | 69.23±0.37(1) | 69.16±0.05(2) | 64.39±0.13(4) | 67.04±0.04(6) | 70.27±0.00(7) | |
NPE | 72.20±1.02(12) | 66.91±0.03(6) | 68.42±0.21(9) | 70.00±0.00(1) | ||
GoLPP | 60.17±0.24(2) | 67.91±0.09(7) | 62.74±0.19(3) | 71.74±0.22(7) | 70.00±0.00(1) | |
AgLPP | 68.55±0.32(1) | 69.39±0.16(5) | 65.35±0.15(5) | 72.92±0.14(1) | 70.12±0.00(8) | |
MEDR | 69.93(6) | 73.03(2) | ||||
互信息 (NMI) | PCA | 3.35±0.06(9) | 21.00±0.12(6) | 15.35±0.03(2) | 3.01±0.02(1) | 1.24±0.00(12) |
LPP | 17.31±0.88(13) | 27.46±0.35(5) | 17.38±0.28(13) | 11.30±0.03(1) | 3.10±0.00(20) | |
AutoEncoder | 40.50±0.29(2) | 26.52±0.10(7) | 13.36±0.01(12) | 1.09±0.05(1) | 2.03±0.00(8) | |
ISOMAP | 3.35±0.00(6) | 21.93±0.01(3) | 6.44±0.02(1) | 3.00±0.02(3) | 1.28±0.00(1) | |
LLE | 10.56±0.00(1) | 26.14±0.14(3) | 9.04±0.11(8) | 3.85±0.13(6) | 2.04±0.00(1) | |
NPE | 22.17±0.33(12) | 27.31±0.63(6) | 19.47±0.02(11) | 3.67±0.05(13) | ||
GoLPP | 6.73±0.10(2) | 15.39±0.03(4) | 10.97±0.03(3) | 12.04±0.08(7) | 2.99±0.00(4) | |
AgLPP | 10.88±0.02(5) | 24.36±0.12(8) | 9.74±0.03(5) | 12.20±0.06(6) | 2.21±0.00(5) | |
MEDR | 12.13(4) | |||||
F分数 (F-score) | PCA | 66.94±0.14(6) | 51.14±0.23(6) | 64.02±0.09(2) | 64.36±0.32(1) | 67.05±0.00(9) |
LPP | 70.68±0.27(13) | 63.20±0.63(7) | 61.06±0.48(13) | 72.12±0.10(1) | 67.09±0.00(1) | |
AutoEncoder | 84.53±0.02(2) | 62.22±0.13(8) | 60.42±0.23(6) | 65.96±0.17(17) | ||
ISOMAP | 66.94±0.23(1) | 59.78±0.35(1) | 61.62±0.71(1) | 69.39±0.31(1) | ||
LLE | 68.47±0.00(1) | 63.58±0.46(7) | 62.22±0.16(1) | 69.31±0.13(8) | 71.29±0.00(16) | |
NPE | 72.83±0.72(12) | 62.93±0.25(6) | 61.50±0.14(9) | 67.15±0.00(2) | ||
GoLPP | 66.99±0.35(2) | 55.22±0.31(5) | 56.06±0.11(11) | 71.89±0.52(7) | 68.51±0.00(17) | |
AgLPP | 68.51±0.57(1) | 64.69±0.83(1) | 63.49±0.72(2) | 72.74±0.62(1) | 68.41±0.00(7) | |
MEDR | 67.75(4) | 72.67(2) | 69.88(2) |
The clustering performance of each algorithm on Glass、UPS、ORL、Yale and Palm
各算法在Glass、UPS、ORL、Yale和Palm数据集上的聚类性能
指标 | 数据集 | Glass | UPS | ORL | Yale | Palm |
准确率 (accuracy) | PCA | 58.79±0.30(1) | 72.80±0.07(89) | 59.83±0.04(47) | 42.79±0.04(8) | 77.22±0.02(73) |
LPP | 60.42±0.20(3) | 76.83±0.09(8) | 71.90±0.05(24) | 48.30±0.01(15) | 86.41±0.08(29) | |
AutoEncoder | 62.38±0.10(1) | 69.67±0.04(68) | 58.87±0.04(73) | 41.21±0.25(23) | 72.51±0.04(92) | |
ISOMAP | 57.71±0.24(2) | 61.43±0.01(18) | 44.67±0.08(57) | 5.95±0.02(1) | ||
LLE | 55.84±0.11(6) | 75.96±0.16(4) | 63.73±0.04(26) | 50.06±0.15(17) | 55.33±0.02(62) | |
NPE | 59.11±0.26(3) | 74.75±0.08(11) | 71.08±0.02(17) | 49.39±0.04(13) | 85.14±0.04(30) | |
GoLPP | 50.23±0.18(5) | 47.81±0.05(100) | 46.10±0.01(100) | 23.45±0.01(100) | 79.66±0.01(100) | |
AgLPP | 56.50±0.32(6) | 75.12±0.09(56) | 50.83±0.03(28) | 43.33±0.15(68) | 73.03±0.12(57) | |
MEDR | 76.70(18) | |||||
互信息 (NMI) | PCA | 42.01±0.17(9) | 62.01±0.01(23) | 76.27±0.01(47) | 48.89±0.03(33) | 91.31±0.70(90) |
LPP | 40.10±0.15(3) | 68.97±0.03(6) | 83.88±0.01(27) | 53.09±0.02(15) | 96.07±0.62(40) | |
AutoEncoder | 39.33±0.10(6) | 60.50±0.03(68) | 75.67±0.03(73) | 45.89±0.06(23) | 88.58±0.61(92) | |
ISOMAP | 42.68±0.16(2) | 76.82±0.02(20) | 50.37±0.01(17) | 19.65±0.31(1) | ||
LLE | 44.47±0.05(2) | 73.63±0.01(4) | 79.36±0.01(26) | 53.11±0.07(20) | 76.81±0.25(62) | |
NPE | 40.77±0.13(4) | 66.29±0.01(11) | 83.63±0.01(17) | 52.87±0.01(13) | 95.48±0.26(30) | |
GoLPP | 23.67±0.06(8) | 41.58±0.05(100) | 61.46±0.01(100) | 24.40±0.02(1) | 93.07±0.29(100) | |
AgLPP | 35.64±0.02(4) | 66.50±0.21(56) | 69.43±0.22(28) | 47.66±0.15(66) | 86.52±0.31(57) | |
MEDR | 68.57(18) | |||||
F分数 (F-score) | PCA | 57.20±0.01(2) | 69.71±0.10(23) | 59.07±0.09(53) | 46.27±0.06(8) | 75.88±0.02(89) |
LPP | 56.53±0.07(3) | 74.37±0.04(11) | 69.98±0.04(26) | 52.73±0.01(15) | 85.90±0.08(35) | |
AutoEncoder | 57.85±0.12(1) | 67.15±0.13(68) | 57.56±0.06(73) | 44.03±0.13(23) | 71.45±0.05(92) | |
ISOMAP | 58.00±0.04(2) | 60.84±0.03(18) | 48.15±0.06(17) | 2.45±0.03(1) | ||
LLE | 56.69±0.01(2) | 77.40±0.05(4) | 62.49±0.06(26) | 53.67±0.01(62) | ||
NPE | 57.42±0.01(3) | 71.78±0.05(11) | 69.26±0.02(17) | 52.90±0.01(13) | 84.54±0.03(30) | |
GoLPP | 48.55±0.17(7) | 48.25±0.01(100) | 45.28±0.00(100) | 23.28±0.01(100) | 78.94±0.31(100) | |
AgLPP | 54.67±0.23(6) | 72.48±0.32(56) | 50.04±0.46(28) | 45.88±0.33(21) | 71.63±0.18(57) | |
MEDR | 73.75(18) | 52.24 (19) |
尽管MEDR算法在Cleve数据集上没有取得最好的accuracy值, 但取得了最好的NMI值和F-score值. 在上述5个数据集上, MEDR算法的聚类性能最好, AutoEncoder算法次之, PCA算法最差. 对German数据集由于其第5个维度为Credit amount, 其取值范围、方差明显大于其余维度, 所以数据集起作用的主要是第5个维度, 其余维度的数据被忽略, 所以各算法的聚类性能差不多.
各对比算法在Glass、ORL、Yale和Palm数据集的实验结果由
数据可视化是降维技术的一项重要应用. 我们选择UCI测试数据库(
The visualization results of each algorithm on Wine
各算法在Wine数据集上的二维可视化结果
MEDR具有样本个数线性的时间复杂度. PCA、LPP、ISOMAP、AgLPP、LLE和NPE是非迭代算法, GoLPP、AutoEncoder和MEDR是迭代算法.
Comparison of running time (ms) of each algorithm on benchmark data sets
各算法在实验数据集上的运行时间比较 (ms)
数据集 | Australian | Cars | Cleve | Diabetes | German | Glass | UPS | ORL | Yale | Palm |
PCA | 28.13 | - | - | - | 7.81 | - | 15.63 | 10.94 | 6.25 | 17.19 |
LPP | 125.95 | 43.79 | 39.71 | 124.74 | 178.73 | 24.00 | 764.28 | 116.97 | 28.82 | 738.32 |
ISOMAP | 6050.35 | 2017.36 | 1116.32 | 8142.36 | 13652.78 | 564.24 | 80557.29 | 1947.92 | 388.89 | 58923.61 |
AutoEncoder | 757.81 | 295.31 | 307.81 | 214.06 | 454.69 | 262.50 | 406.25 | 345.31 | 265.63 | 534.38 |
LLE | 302.60 | 178.30 | 100.69 | 443.23 | 507.64 | 91.49 | 1964.06 | 128.99 | 56.94 | 1793.92 |
NPE | 137.85 | 102.26 | 62.50 | 240.28 | 208.85 | 59.55 | 1128.65 | 70.31 | 34.20 | 891.32 |
GoLPP | 98.05 | 39.45 | 25.00 | 113.28 | 220.70 | 15.23 | 960.55 | 64.84 | 23.05 | 913.67 |
AgLPP | 44.28 | 12.13 | 11.29 | 23.81 | 33.90 | 9.55 | 137.58 | 31.21 | 23.71 | 133.88 |
MEDR | 30.51 | 3.60 | 2.00 | 9.60 | 16.40 | 1.10 | 34.00 | 22.20 | 16.86 | 95.22 |
在实现中, 迭代次数往往远小于数据规模. 所以, 在时间复杂度实验中, GoLPP、AutoEncoder和MEDR选取一次迭代的平均时间进行比较. 实验在Australian、Cars、Cleve、Diabetes、German、Glass、UPS、ORL、Yale和Palm等10个基准数据集进行, 统一将数据降到2维, 各算法从参数取值范围取一组参数运行.
因为Cars、 Cleve 、Diabetes 和Glass 数据集较小, PCA运行速度很快, 所以用“-”表示. 从
MEDR算法涉及降维后的数据维度
(1)
不同的
The Fscore, NMI and Accuracy change with the value of
F分数、互信息和聚类准确度随
从
(2)
不同的
The Fscore, NMI and Accuracy change with the value of
F分数、互信息和聚类准确度随
从
(3)稀疏约束
The Fscore, NMI and Accuracy change with the value of
F分数、互信息和聚类准确度随
MEDR算法在10个基准数据集上的迭代运行100次, 参数
Convergence curve of MEDR algorithm on 10 benchmark datasets
MEDR算法在10个基准数据集的收敛曲线
由
针对常用的基于图的降维算法需预先构建或者学习邻接图, 计算复杂度高, 而且降维过程中没有考虑降维后数据的用途等问题, 我们提出了一种基于极大熵的快速线性降维算法MEDR. MEDR算法将线性降维过程与聚类模型融合在一起, 使降维过程与聚类过程相互监督. 因为在降维的过程中融合了极大熵聚类模型, 所以降维后的数据具有良好的类簇结构. 我们对权重矩阵
Hotelling H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 1933, 24(7): 498–520. [doi: 10.1037/h0070888]
Feng DC, Chen F, Xu WL. Learning robust principal components from L1-normmaximization. Journal of Zhejiang University SCIENCE C, 2012, 13(12): 901–908. [doi: 10.1631/jzus.C1200180]
Ji HJ, Huang S. Kernel entropy component analysis with nongreedy L1-norm maximization. Computational Intelligence and Neuroscience, 2018, 2018: 6791683. [doi: 10.1155/2018/6791683]
Kokiopoulou E, Saad Y. Orthogonal neighborhood preserving projections: A projection-based dimensionality reduction technique. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(12): 2143–2156. [doi: 10.1109/TPAMI.2007.1131]
Zhang LM, Qiao LS, Chen SC. Graph-optimized locality preserving projections. Pattern Recognition, 2010, 43(6): 1993–2002. [doi: 10.1016/j.patcog.2009.12.022]
Yi YG, Wang JZ, Zhou W, Fang YM, Kong J, Lu YH. Joint graph optimization and projection learning for dimensionality reduction. Pattern Recognition, 2019, 92: 258–273. [doi: 10.1016/j.patcog.2019.03.024]
Gou JP, Yang YY, Yi Z, Lv JC, Mao QR, Zhan Y. Discriminative globality and locality preserving graph embedding for dimensionality reduction. Expert Systems with Applications, 2019, 144: 113079. [doi: 10.1016/j.eswa.2019.113079]
Jiang R, Fu WJ, Wen L, Hao SJ, Hong RC. Dimensionality reduction on Anchorgraph with an efficient Locality Preserving Projection. Neurocomputing, 2016, 187: 109–118. [doi: 10.1016/j.neucom.2015.07.128]
Hinton GE, Salakhut DRR. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504–507. [doi: 10.1126/science.1127647]
Saul LK, Roweis ST. Fit locally: Unsupervised learning of nonlinear manifolds. Journal of Machine Learning Research, 2003, 4: 119–155. [doi: 10.1162/153244304322972667]
Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290(5500): 2323–2326. [doi: 10.1126/science.290.5500.2323]
Tenenbaum JB, De Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science, 2000, 290(5500): 2319–2323. [doi: 10.1126/science.290.5500.2319]
郑忠龙, 黄小巧, 贾泂, 杨杰. 稀疏局部保持投影. 计算机学报, 2014, 37(9): 2038–2046. [doi: 10.3724/SP.J.1016.2014.02038]
Zheng ZL, Huang XQ, Jia J, Yang J. Locality preserving projection with sparse penalty. Chinese Journal of Computers, 2014, 37(9): 2038–2046. (in Chinese with English abstract). [doi: 10.3724/SP.J.1016.2014.02038]
Belkin M, Niyogi P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003, 15(6): 1373–1396. [doi: 10.1162/089976603321780317]
Ye JP, Xiong T. Computational and theoretical analysis of null space and orthogonal linear discriminant analysis. Journal of Machine Learning Research, 2006, 7(43): 1183–1204. [doi: 10.5555/1248547.1248590]
Zheng WS, Lai JH, Yuen PC. GA-Fisher: A new LDA-based face recognition algorithm with selection of principal components. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2005, 35(5): 1065–1078. [doi: 10.1109/TSMCB.2005.850175]
]]>
Bezdek JC, Coray C, Gunderson R, Watson J. Detection and characterization of cluster substructure I. Linear structure: Fuzzy
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY. An efficient K-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 881–892. [doi: 10.1109/TPAMI.2002.1017616]
Frahling G, Sohler C. A fast K-means implementation using coresets. International Journal of Computational Geometry & Applications, 2008, 18(6): 605–625. [doi: 10.1142/S0218195908002787]
Xia SY, Peng DW, Meng DY, Zhang CQ, Wang GY, Giem E, Wei W, Chen ZZ. Ball
Giffon L, Emiya V, Kadri H, Ralaivola L. QuicK-means: Accelerating inference for K-means by learning fast transforms. Machine Learning, 2021, 110(5): 881–905. [doi: 10.1007/s10994-021-05965-0]
Hicks SC, Liu RX, Ni YW, Purdom E, Risso D. Mbkmeans: Fast clustering for single cell data using mini-batch
http://www.jos.org.cn/1000-9825/5813.htm]]>
http://www.jos.org.cn/1000-9825/5813.htm]]>
胡世哲, 娄铮铮, 王若彬, 闫小强, 叶阳东. 一种双重加权的多视角聚类方法. 计算机学报, 2020, 43(9): 1708–1720. [doi: 10.11897/SP.J.1016.2020.01708]
Hu SZ, Lou ZZ, Wang RB, Yan XQ, Ye YD. Dual-weighted multi-view clustering. Chinese Journal of Computers, 2020, 43(9): 1708–1720 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2020.01708]
Huang ZX. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998, 2(3): 283–304. [doi: 10.1023/A:1009769707641]
Huang ZX, Ng MK. A fuzzy k-modes algorithm for clustering categorical data. IEEE Transactions on Fuzzy Systems, 1999, 7(4): 446–452. [doi: 10.1109/91.784206]
Huang JZ, Ng MK, Rong HQ, Li ZC. Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(5): 657–668. [doi: 10.1109/TPAMI.2005.95]
Li MJ, Ng MK, Cheung YM, Huang JZ. Agglomerative fuzzy k-means clustering algorithm with selection of number of clusters. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(11): 1519–1534. [doi: 10.1109/TKDE.2008.88]
Shannon CE. A mathematical theory of communication. The Bell System Technical Journal, 1948, 27(3): 379–423. [doi: 10.1002/j.1538-7305.1948.tb01338.x]
Jaynes ET. Information theory and statistical mechanics. II. Physical Review, 1957, 108: 171–190. [doi: 10.1103/PhysRev.108.171]
张志华, 郑南宁, 史罡. 极大熵聚类算法及其全局收敛性分析. 中国科学(E辑), 2001, 31(1): 59–70. [doi: 10.3969/j.issn.1674-7259.2001.01.009]
Zhang ZH, Zheng NN, Shi G. Maximum-entropy clustering algorithm and its global convergence analysis. Science in China Series E: Technological Sciences, 2001, 44(1): 89–101 (in Chinese with English abstract). [doi: 10.1007/BF02916729]
任世军, 王亚东. 极大熵聚类算法的收敛性定理证明. 中国科学: 信息科学, 2010, 40(4): 583–590. [doi: 10.1360/zf2010-40-4-583]
Ren SJ, Wang YD. A proof of the convergence theorem of maximum-entropy clustering algorithm. Science China Information Sciences, 2010, 53(6): 1151–1158 (in Chinese with English abstract). [doi: 10.1007/s11432-010-3094-x]
Yamamoto M, Hwang H. A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika, 2014, 41(1): 115–129. [doi: 10.2333/bhmk.41.115]
Yamamoto M, Hwang H. Dimension-reduced clustering of functional data via subspace separation. Journal of Classification, 2017, 34(2): 294–326. [doi: 10.1007/s00357-017-9232-z]
Zhou J, Pedrycz W, Yue XD, Gao C, Lai ZH, Wan J. Projected fuzzy C-means clustering with locality preservation. Pattern Recognition, 2020, 113: 107748. [doi: 10.1016/j.patcog.2020.107748]
van de Velden M, D'Enza AI, Yamamoto M. Special feature: Dimension reduction and cluster analysis. Behaviormetrika, 2019, 46(2): 239–241 [doi: 10.1007/s41237-019-00092-6]
Wang R, Nie FP, Wang Z, Hu HJ, Li XL. Parameter-free weighted multi-view projected clustering with structured graph learning. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(10): 2014–2025. [doi: 10.1109/TKDE.2019.2913377]
Ran RS, Ren YS, Zhang SG, Fang B. A novel discriminant locality preserving projections method. Journal of Mathematical Imaging and Vision, 2021, 63(5): 541–554. [doi: 10.1007/s10851-020-01008-w]
Bezdek JC, Hathaway RJ. Convergence of alternating optimization. Neural, Parallel & Scientific Computations, 2003, 11(4): 351–368. [doi: 10.5555/964885.964886]
Huang XH, Ye YM, Guo HF, Cai Y, Zhang HJ, Li Y. DSKmeans: A new kmeans-type approach to discriminative subspace clustering. Knowledge-Based Systems, 2014, 70: 293–300. [doi: 10.1016/j.knosys.2014.07.009]