Abstract:To apply link prediction methods into large-scale complex network, this paper designs and implements a parallel link prediction algorithm based on MapReduce, which includes nine similarity Indices via local information. The parallel link prediction algorithm has a time complexity of O(N) in sparse networks. First, the paper verifies the validity of the algorithm on public datasets, increase in the extraction factor, recall ascends, and precision descends. The experimental results on ten large-scale datasets of variety network types show that the parallel link prediction algorithm is more effective than traditional ones, and its running time decreases with more compute units. The upper and lower bounds of AUC (area under a receiver operating characteristic curve) are proposed. The experimental results show the median of the upper and lower bounds are close to the real value of AUC, which focuses on whether prediction score is zero rather than the actual score value. The network average clustering coefficient has the greatest impact on AUC among most topological features and AUC rises as the network average clustering coefficient increases.