顺序敏感的多源感知数据填补技术
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61472071,61272179);国家重点基础研究发展计划(973)(2012CB316201);中央高校基本科研业务费(N140404013)


Order-Sensitive Missing Value Imputation Technology for Multi-Source Sensory Data
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61472071, 61272179); National Key Basic Research Program of China (973) (2012CB316201); Fundamental Research Funds for Central Universities (N140404013)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来,随着感知网络的广泛应用,感知数据呈爆炸式增长.但是由于受到硬件设备的固有限制、部署环境的随机性以及数据处理过程中的人为失误等多方面因素的影响,感知数据中通常包含大量的缺失值.而大多数现有的上层应用分析工具无法处理包含缺失值的数据集,因此对缺失数据进行填补是不可或缺的.目前也有很多缺失数据填补算法,但在缺失数据较为密集的情况下,已有算法的填补准确性很难保证,同时未考虑填补顺序对填补精度的影响.基于此,提出了一种面向多源感知数据且顺序敏感的缺失值填补框架OMSMVI(order-sensitive missing value imputation framework for multi-source sensory data).该框架充分利用感知数据特有的多维度相关性:时间相关性、空间相关性、属性相关性,对不同数据源间的相似度进行衡量;进而,基于多维度相似性构建以缺失数据源为中心的相似图,并将已填补的缺失值作为观测值用于后续填补过程中.同时考虑缺失数据源的整体分布,提出对缺失值进行顺序敏感的填补,即:首先对缺失值的填补顺序进行决策,再对缺失值进行填补.对缺失值进行顺序填补能够有效缓解在缺失数据较为密集的情况下,由于缺失数据源的完整近邻与其相似度较低引起的填补精度下降问题;最后,对KNN填补算法进行改进,提出一种新的基于近邻节点的缺失值填补算法NI(neighborhood-based imputation),该算法利用感知数据的多维度相似性对缺失数据源的所有近邻节点进行查找,解决了KNN填补算法K值难以确定的问题,也进一步提高了填补准确性.利用两个真实数据集,并与基本填补算法进行对比,验证了算法的准确性及有效性.

    Abstract:

    In recent years, it is recognized that sensing data is growing explosively with widespread use of sensing network. Due to the inherent hardware limitation, the randomness of distribution environment and unconscious errors during data processing, a deluge of missing values are mingled in original sensing data. Thus, imputing the missing values is essential because most of the existed analysis tools are not competent to the data sets containing missing values. So far, there have been many missing data imputation algorithms, however the accuracy of these algorithms is difficult to be guaranteed in the scenario of lumped missing data. Besides, these existing algorithms don't take the imputation order which influences the imputation accuracy into consideration. To address the above issues, this paper proposes an order-sensitive missing value imputation framework called OMSMVI for multi-source sensory data. OMSMVI takes advantages of multi-dimensions relevancy, such as temporal relevancy, spatial relevancy and attributive relevancy of sensing data adequately. The missing-sources-centered similarity graphs are constructed based on multi-dimensions relevancy. At the same time, in the process of missing data imputation, the imputed missing values are used as observations to impute subsequent missing values. Taking the whole distribution of missing sources into consideration, the framework performs order-sensitive missing value imputation, meaning that the order of imputation is ascertained before applying the specific MVI (missing value imputation) methods. Order-sensitive imputation can remit the decrease of imputed result accuracy caused by the lower similarity between missing source and its neighbors when the missing sources are dense. Finally, a new neighborhood-based missing values imputation algorithm NI, which modifies the KNN imputation algorithm, is introduced into the OMSMVI framework. NI uses the multi-dimension similarity to search the missing sources' neighbors which reflect the similarity from multiple dimensions. Such NI algorithm overcomes the shortcoming that parameter K of KNN is difficult to determine. Furthermore, NI algorithm can improve the imputation accuracy further compared to KNN. Two true sensor data sets are used to compare with the baseline MVI methods to verify the accuracy and effectiveness of OMSMVI.

    参考文献
    相似文献
    引证文献
引用本文

马茜,谷峪,李芳芳,于戈.顺序敏感的多源感知数据填补技术.软件学报,2016,27(9):2332-2347

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2015-09-25
  • 最后修改日期:2016-01-12
  • 录用日期:
  • 在线发布日期: 2016-09-02
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号