[关键词]
[摘要]
图神经网络(graph neural network, GNN)是一种利用深度学习直接对图结构数据进行表征的框架, 近年来受到人们越来越多的关注. 然而传统的基于消息传递聚合的图神经网络(messaging passing GNN, MP-GNN)忽略了不同节点的平滑速度, 无差别地聚合了邻居信息, 易造成过平滑现象. 为此, 研究并提出一种线性结构熵的图核神经网络分类方法, 即KENN. 它首先利用图核方法对节点子图进行结构编码, 判断子图之间的同构性, 进而利用同构系数来定义不同邻居间的平滑系数. 其次基于低复杂度的线性结构熵提取图的结构信息, 加深和丰富图数据的结构表达能力. 通过将线性结构熵、图核和图神经网络三者进行深度融合提出了图核神经网络分类方法. 它不仅可以解决生物分子数据节点特征的稀疏问题, 也可以解决社交网络数据以节点度作为特征所产生的信息冗余问题, 同时还使得图神经网络能够自适应调整对图结构特征的表征能力, 使其超越MP-GNN的上界(WL测试). 最后, 在7个公开的图分类数据集上实验验证了所提出模型的性能优于其他的基准模型.
[Key word]
[Abstract]
Graph neural network (GNN) is a framework for directly characterizing graph structured data by deep learning, and has caught increasing attention in recent years. However, the traditional GNN based on message passing aggregation (MP-GNN) ignores the smoothing speed of different nodes and aggregates the neighbor information indiscriminately, which is prone to the over-smoothing phenomenon. Thus, this study proposes a graph kernel neural network classification method KENN based on linear structural entropy. KENN firstly adopts the graph kernel method to encode node subgraph structure, determines isomorphism among subgraphs, and then utilizes the isomorphism coefficient to define the smoothing coefficient among different neighbors. Secondly, it extracts the graph structural information based on the low-complexity linear structural entropy to deepen and enrich the structural expression capability of the graph data. This study puts forward a graph kernel neural network classification method by deeply integrating linear structural entropy, graph kernel and GNN, which can solve the sparse node features of biomolecular data and information redundancy generated by leveraging node degree as features in social network data. It also enables the GNN to adaptively adjust its ability to characterize the graph structural features and makes GNN beyond the upper bound of MP-GNN (WL test). Finally, experiments on seven public graph classification datasets verify that the proposed model outperforms other benchmark models.
[中图分类号]
[基金项目]
国家自然科学基金(62176085,62172458);国家自然科学基金区域(安徽)联合基金(U20A20229)