[关键词]
[摘要]
在实际应用中,聚类多视图数据是一项重要的数据挖掘任务.样本缺失所导致的多视图不完整给聚类任务带来了巨大的挑战.大部分已有的不完整多视图聚类方法主要基于浅层图结构信息,易受到噪声及缺失数据的影响,且难以准确刻画并兼容所有视图的潜在结构,从而降低了聚类性能.为此,提出了一种更为鲁棒和灵活的基于多阶近邻扩散融合的不完整多视图聚类算法.该算法在利用多阶相似性学习不完整视图潜在结构的基础上,通过跨视图交叉扩散的方式,将不同阶的深层结构信息进行非线性融合,以此挖掘视图间更全面的结构信息,从而降低了缺失样本所导致的视图结构不确定性.进一步证明了所提算法的收敛性.实验结果表明,相比已有方法,所提出的算法在处理不完整多视图聚类问题上是更加有效的.
[Key word]
[Abstract]
In real applications, it is an important field for clustering the multi-view data in data mining. The incompleteness of multi- view caused by missing samples brings great challenge to multi-view clustering task. The shallow graph structure information is easily affected by noise and missing data. Most of the existing multi-view clustering methods are difficult to describe the underlying structure of all views accurately and comprehensively, which reduces the performance of incomplete multi-view clustering. To this end, this study proposes a robust incomplete multi-view clustering algorithm based on the strategies of diffusing and fusing among multi-order neighborhoods. Firstly, the proposed algorithm obtains the potential structural information from incomplete views by utilizing multi-order similarities. Then, the deep structural information of multi-views is nonlinearly fused by the way of cross-view diffusion. Through all above, the much more comprehensive structural information among views can be extracted from the proposed algorithm, thereby reducing the uncertainty of views-structure caused by missing samples. In addition, this paper presents detailed steps to prove the convergence of the proposed algorithm. Experimental results show that the proposed method is more effective in solving the problem of incomplete multi-view clustering than other existing methods.
[中图分类号]
[基金项目]
国家重点研发计划(2020AAA0106100);国家自然科学基金(62022052,62072293)