Abstract:Heterogeneous information networks (HINs) are directed graphs including multi-typed objects (vertices) and links (edges), which can express rich semantic information and complex structural information. The problem of cohesive subgraph query in HINs, i.e., given a query vertex q, it could be found that the cohesive subgraphs containing q in HINs has become an important problem, and has been widely used in various areas such as event planning, biological analysis, and product recommendation. Yet existing methods mainly have the two deficiencies:(1) cohesive subgraphs based on relational constraint and motif cliques contain multiple types of vertices, which makes it hard to solve the scenario of focusing on a specific type of vertices; (2) although the method based on meta-path can query the cohesive subgraphs with a specific type of vertices, it ignores the meta-path-based connectivity between the vertices in the subgraphs. Therefore, the connectivity with novel edge-disjoint paths is firstly proposed based on meta-path in HINs, i.e., path-connectivity. Then, the k-path connected component (k-PCC) is raised based on path-connectivity, which requires the path-connectivity of subgraph to be at least k. Next, the Steiner maximum path-connected component (SMPCC) is further proposed, which is the k-PCC containing q with the maximum path-connectivity. Finally, an efficient graph decomposition-based k-PCC discovery algorithm is designed, and based on this, an optimized SMPCC query algorithm is proposed. A large number of experiments on five real and synthetic HINs prove the effectiveness and efficiency of the proposed approaches.