Abstract:The big era is coming with the ever-growing demands on frequency estimation based on sensitive multi-dimensional categorical data. The existing works are devoted to designing privacy protection algorithms based on centralized differential privacy or local differential privacy. However, the above models provide either the weak level of privacy protection or low accuracy of published results. Therefore, standing on the emerging shuffled differential privacy which remedies the above modes, the data collection mechanisms are designed, providing frequency distribution estimation service with high security and high availability. Considering the multi-dimensional characteristics of data and the heterogeneous characteristics existed in different attributes, the mechanisms including SRR-MS with multiple shufflers and ARR-SS with one shuffler are firstly proposed. And then in order to combine the advantages of the above two mechanisms, PSRR-SS with one single shuffler, is proposed to eliminate the heterogeneity among attributes by means of padding dummy values technology to the attribute domains. This study detailedly analyzes the degree of privacy protection and the error level of three strategies theoretically, and evaluates the performance of the proposed mechanisms on frequency estimation by using four real datasets. Besides, the proposals are used as the perturbing component of the techniques generating synthetic data and the training results of stochastic gradient descent are evaluated based on synthetic data. The experimental results show that the proposed method outperforms the existing algorithms.