Sampling-based Collection and Updating of Online Big Graph Data

doi:10.13328/j.cnki.jos.005843

微信服务号

微信订阅号

Home > Archive>Volume 31, Issue 11, 2020 >3540-3558. DOI:10.13328/j.cnki.jos.005843

PDF HTML XML Export Cite reminder

Sampling-based Collection and Updating of Online Big Graph Data
DOI:
                        10.13328/j.cnki.jos.005843
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (U1802271, 62002311); Science Foundation for Distinguished Young Scholars of Yunnan Province (2019FJ011); Young Talent Support Program of Yunnan Province(C6193032); Donglu Scholars Training Program of Yunnan University

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

The large volume of unstructured data obtained from Web pages, social media and knowledge bases on the Internet could be represented as an online big graph (OBG). Confronted with many challenges, such as its large-scale, widespread, heterogeneous, and fast-changing properties, OBG data acquisition includes data collection and updating, which is the basis of massive data analysis and knowledge engineering. In this study, the method for adaptive and parallel data collection and updating is proposed based on sampling techniques. First, the HD-QMC algorithm is given for adaptive data collection of OBG data by combining the branch-and-bound method and quasi-Monte Carlo sampling technique. Next, the EPP algorithm is given for efficient data updating based on entropy and Poisson process to make the collected data reflect the dynamic change of OBGs in real-world environments. Further, the effectiveness of the proposed algorithms is analyzed theoretically, and various kinds of collected OBG data are represented by triples universally to provide an easy-to-use data foundation for OBG analysis and relevant studies. Finally, the proposed algorithms for data collection and updating are implemented with Spark, and experimental results on simulated and real-world datasets show the effectiveness and efficiency of the proposed method.

Reference

Cited by

Get Citation

尹子都,岳昆,张彬彬,李劲.基于采样的在线大图数据收集和更新.软件学报,2020,31(11):3540-3558

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:October 25,2018
Revised:January 16,2019
Adopted:
Online: November 07,2020
Published: November 06,2020

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History