Order-Sensitive Missing Value Imputation Technology for Multi-Source Sensory Data

doi:10.13328/j.cnki.jos.005045

微信服务号

微信订阅号

2025-4-24- 19

Home > Archive>Volume 27, Issue 9, 2016 >2332-2347. DOI:10.13328/j.cnki.jos.005045

PDF HTML XML Export Cite reminder

Order-Sensitive Missing Value Imputation Technology for Multi-Source Sensory Data
DOI:
                        10.13328/j.cnki.jos.005045
                    
Author:
                        MA QianMA Qian
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GU YuGU Yu
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Fang-FangLI Fang-Fang
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YU GeYU Ge
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (61472071, 61272179); National Key Basic Research Program of China (973) (2012CB316201); Fundamental Research Funds for Central Universities (N140404013)

Article

Figures

Metrics

Reference [23]

Related [20]

Cited by

Materials

Comments

Abstract:

In recent years, it is recognized that sensing data is growing explosively with widespread use of sensing network. Due to the inherent hardware limitation, the randomness of distribution environment and unconscious errors during data processing, a deluge of missing values are mingled in original sensing data. Thus, imputing the missing values is essential because most of the existed analysis tools are not competent to the data sets containing missing values. So far, there have been many missing data imputation algorithms, however the accuracy of these algorithms is difficult to be guaranteed in the scenario of lumped missing data. Besides, these existing algorithms don't take the imputation order which influences the imputation accuracy into consideration. To address the above issues, this paper proposes an order-sensitive missing value imputation framework called OMSMVI for multi-source sensory data. OMSMVI takes advantages of multi-dimensions relevancy, such as temporal relevancy, spatial relevancy and attributive relevancy of sensing data adequately. The missing-sources-centered similarity graphs are constructed based on multi-dimensions relevancy. At the same time, in the process of missing data imputation, the imputed missing values are used as observations to impute subsequent missing values. Taking the whole distribution of missing sources into consideration, the framework performs order-sensitive missing value imputation, meaning that the order of imputation is ascertained before applying the specific MVI (missing value imputation) methods. Order-sensitive imputation can remit the decrease of imputed result accuracy caused by the lower similarity between missing source and its neighbors when the missing sources are dense. Finally, a new neighborhood-based missing values imputation algorithm NI, which modifies the KNN imputation algorithm, is introduced into the OMSMVI framework. NI uses the multi-dimension similarity to search the missing sources' neighbors which reflect the similarity from multiple dimensions. Such NI algorithm overcomes the shortcoming that parameter K of KNN is difficult to determine. Furthermore, NI algorithm can improve the imputation accuracy further compared to KNN. Two true sensor data sets are used to compare with the baseline MVI methods to verify the accuracy and effectiveness of OMSMVI.

Key words:missing values;dense missing;sensing network;sequential-sensitive imputation;multi-dimensions relevancy

Reference

[1] Racine J,Li Q.Nonparametric estimation of regression functions with both categorical and continuous data.Journal of Econometrics,2004,119(1):99-130.[doi:10.1016/S0304-4076(03)00157-X]

[2] Zhu XF,Zhang SC,Jin Z,Zhang ZL,Xu ZM.Missing value estimation for mixed-attribute data sets.IEEE Trans.on Knowledge and Data Engineering,2011,23(1):110-121.[doi:10.1109/TKDE.2010.99]

[3] Zhou X,Wang X,Dougherty ER.Missing-Value estimation using linear and non-linear regression with Bayesian gene selection.Bioinformatics,2003,19(17):2302-2307.[doi:10.1093/bioinformatics/btg323]

[4] Qin YS,Zhang SC,Zhu XF,Zhang JL,Zhang CQ.POP algorithm:Kernel-Based imputation to treat missing values in knowledge discovery from databases.Expert Systems with Applications,2009,36(2):2794-2804.[doi:10.1016/j.eswa.2008.01.059]

[5] Velicer WF,Colby SM.A comparison of missing-data procedures for ARIMA time-series analysis.Educational and Psychological Measurement,2005,65(4):596-615.[doi:10.1177/0013164404272502]

[6] Troyanskaya O,Cantor M,Sherlock G,Brown P,Hastie T,Tibshirani R,Botstein D,Altman RB.Missing value estimation methods for DNA microarrays.Bioinformatics,2001,17(6):520-525.[doi:10.1093/bioinformatics/17.6.520]

[7] Joenssen DW,Bankhofer U.Hot deck methods for imputing missing data.In:Proc.of the Machine Learning and Data Mining in Pattern Recognition.Berlin,Heidelberg:Springer-Verlag,2012.63-75.[doi:10.1007/978-3-642-31537-4_6]

[8] David I,Michael PB,Abt A.Weighted sequential hot deck imputation:SAS Macro vs.SUDAAN's PROC HOTDECK.In:Proc.of the SAS Global Forum.2013.213-2013.

[9] Zhang CQ,Zhu XF,Zhang JL,Qin YS,Zhang SC.GBKⅡ:An imputation method for missing values.In:Proc.of the Advances in Knowledge Discovery and Data Mining.2007.1080-1087.[doi:10.1007/978-3-540-71701-0_122]

[10] Zhang S.Parimputation:From imputation and null-imputation to partially imputation.IEEE Intelligent Informatics Bulletin,2008,9(1):32-38.

[11] Caruana R.A non-parametric EM-style algorithm for imputing missing values.In:Proc.of the Artificial Intelligence and Statistics.2001.

[12] Meng XL,Rubin DB.Performing likelihood ratio tests with multiply-imputed data sets.Biometrika,1992,79(1):103-111.[doi:10.1093/biomet/79.1.103]

[13] Raghunathan TE,Lepkowski JM,Van Hoewyk J,Solenberger P.A multivariate technique for multiply imputing missing values using a sequence of regression models.Survey Methodology,2001,27(1):85-96.

[14] Aittokallio T.Dealing with missing values in large-scale studies:Microarray data imputation and beyond.Briefings in Bioinformatics,2010,11(2):253-264.[doi:10.1093/bib/bbp059]

[15] Mihail H,Gruenwald L.Estimating missing values in related sensor data streams.In:Proc.of the COMAD.2005.83-94.

[16] Jiang N,Gruenwald L.Estimating missing data in data streams.In:Proc.of the Advances in Databases:Concepts,Systems and Applications.Berlin,Heidelberg:Springer-Verlag,2007.981-987.[doi:10.1007/978-3-540-71703-4_89]

[17] Christos A,Peter T.Scaling out big data missing values imputations.In:Proc.of the SIGKDD.2014.651-660.[doi:10.1145/2623330.2623615]

[18] Zheng Y,Liu F,Hsieh HP.U-Air:When urban air quality inference meets big data.In:Proc.of the SIGKDD.2013.1436-1444.[doi:10.1145/2487575.2488188]

[19] Kim KY,Kim BJ,Yi GS.Reuse of imputed data in microarray analysis increases imputation efficiency.BMC Bioinformatics,2004,5(1):159-167.[doi:10.1186/1471-2105-5-159]

[20] Verboven S,Branden KV,Goos P.Sequential imputation for missing values.Computational Biology and Chemistry,2007,31(5):320-327.[doi:10.1016/j.compbiolchem.2007.07.001]

[21] Pan LQ,Li JZ,Lao JZ.A temporal and spatial correlation based missing values imputation algorithm in wireless sensor networks.Chinese Journal of Computers,2010,33(1):1-11(in Chinese with English abstract).http://cjc.ict.ac.cn/qwjs/view.asp?id=3008

附中文参考文献:

[21] 潘立强,李建中,骆吉洲.传感器网络中一种基于时-空相关性的缺失值估计算法.计算机学报,2010,33(1):1-11.http://cjc.ict.ac.cn/qwjs/view.asp?id=3008

Get Citation

马茜,谷峪,李芳芳,于戈.顺序敏感的多源感知数据填补技术.软件学报,2016,27(9):2332-2347

Copy

Article Metrics

Abstract:2302
PDF: 3954
HTML: 1189
Cited by: 0

History

Received:September 25,2015
Revised:January 12,2016
Adopted:
Online: September 02,2016
Published:

You are the first2038246Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History