Incremental Data Sampling Method Using Affinity Propagation with Dynamic Weighting
Author:
Affiliation:

Clc Number:

TP311

Fund Project:

National Natural Science Foundation of China (61872166); Six Talent Peaks Project of Jiangsu Province (XYDXX-161); Science and Technology Planning Project of Jiangsu Province (BE2018056)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Data sampling is an important manner to efficiently extract useful information from original huge datasets. In order to fit with the requirements of efficiently dealing with more and more large-scale data, a novel incremental data sampling method originated from affinity propagation method is proposed, in which two integrated algorithm strategies including hierarchical incremental processing and the dynamic weighting of data samples are introduced. The proposed method mainly can balance the computational efficiency and sampling quality very well. For hierarchical incremental processing strategy, it firstly samples data items in batches and then composites samples by hierarchical way. For dynamic weighting of data samples strategy, it dynamically re-weights the preference to retain better global consistency of possible samples on data space in the incremental sampling procedure. In the experiments, artificial datasets, UCI datasets, and image datasets are used to analyze the sampling performance. The experimental results with several compared algorithms indicate that, the proposed method can gain similar sampling quality but with notably higher computational efficiency especially for more large-scale datasets. This study further applies the new method to data augmentation task in deep learning, and the corresponding experimental results show that the proposed method performs excellently. Concretely, if basic training dataset are processed by sampling enhancement with the proposed new method, the trained model performance using similar number of training samples can be significantly improved compared to traditional data enhancement strategies.

    Reference
    Related
    Cited by
Get Citation

陈晓琪,谢振平,刘渊,詹千熠.基于动态赋权近邻传播的数据增量采样方法.软件学报,2021,32(12):3884-3900

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 01,2019
  • Revised:June 15,2020
  • Adopted:
  • Online: December 02,2021
  • Published: December 06,2021
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063