Abstract:As a proven effective solution to privacy preservation, graph anonymization has been studied extensively. The goal of graph anonymization is to avoid disclosure of privacy in social networks through graph modifications while at the same time preserving data utility of the anonymized graph for social network analysis and graph queries. Reachability is an important graph data utility as reachable queries are not only common on graph databases but also serving as fundamental operations for many other graph queries. However, the reachability of each vertex in the anonymized graph is severely distorted after the anonymization due to neglecting that the reachability is highly sensitive to edge modifications. This work solves the problem by designing a reachability preserving anonymization (RPA) algorithm. The main idea of RPA is to organize vertices into groups and greedily anonymizes each vertex with low impact on reachability. A number of techniques are designed to make RPA efficient. Firstly, reachable interval is proposed to efficiently measure the anonymization cost incurred by an edge addition. Secondly, an index structure, CN-index is adopted to accelerate anonymizing each vertex. Extensive experiments on real datasets demonstrate that RPA performs with high efficiency and the generated anonymized social networks preserve high data utility on reachable queries.