Abstract:Recently, many countries and regions have enacted data security policies, such as General Data Protection Regulation proposed by the EU. The release of related laws and regulations has aggravated the problem of data silos, which makes it difficult to share data among various data owners. The data federation is a possible solution to this problem. Data federation refers to the calculation of query tasks jointly performed by multiple data owners without disclosing their original data and combining privacy computing technologies such as secure multi-party computation. This concept has become a research trend in recent years, and a series of representative systems have been proposed such as SMCQL and Conclave. However, for the fundamental join query in the relational database system, the existing data federation system still has the following problems. First of all, the join query type is single. It is difficult to meet the query requirements under complex join conditions. Secondly, the algorithm performance has huge improvement space, because the existing systems often call the security tool library directly, which has high running time and communication overhead. Therefore, a data federation join algorithm is proposed to address the above issues. The main contributions of this study are as follows. Firstly, multiparty-oriented federation security operators are designed and implemented, which can support a variety of operations. Secondly, a federated q-join algorithm and an optimization strategy are proposed to significantly reduce the security computation cost. Finally, the performance of this proposal is verified based on the benchmark dataset TPC-H. The experimental results show that the proposed algorithm can reduce the runtime and communication overhead by 61.33% and 95.26% compared with the existing data federation system SMCQL and Conclave.