Abstract:Given a distributed set D of categorical data defined on a domain D, this work studies differentially private algorithms for releasing a histogram to approximate the categorical data distribution in D. Existing solutions for this problem mostly use central/local differential privacy models, which are two extreme assumptions of differential privacy. The two models, however, cannot balance the contradiction between the privacy requirement of users and the analysis accuracy of collectors. To remedy the deficiency caused by the current solutions under central/local differential privacy, this study proposes a differentially private method in a shuffling way, called HP-SDP, to release histogram. HP-SDP firstly employs the local hash technology to design the shuffled randomized response mechanism. Based on this mechanism, each user perturbs her/his data in a linear decomposition way of perturbation function, without worrying about the domain size, and reports the perturbed messages to the shuffler. And then, the shuffler in HP-SDP permutes the reported messages by using a uniformly random permutation method, which makes sure the shuffled messages satisfy central differential privacy, and the collector cannot reidentify a target user. Furthermore, HP-SDP adopts the convex programming technology to boost the accuracy of the released histogram. Theoretical analysis and experimental evaluations show that the proposed methods can effectively improve the utility of the histogram, and outperform the existing solutions.