Abstract:In recent years, it is recognized that sensing data is growing explosively with widespread use of sensing network. Due to the inherent hardware limitation, the randomness of distribution environment and unconscious errors during data processing, a deluge of missing values are mingled in original sensing data. Thus, imputing the missing values is essential because most of the existed analysis tools are not competent to the data sets containing missing values. So far, there have been many missing data imputation algorithms, however the accuracy of these algorithms is difficult to be guaranteed in the scenario of lumped missing data. Besides, these existing algorithms don't take the imputation order which influences the imputation accuracy into consideration. To address the above issues, this paper proposes an order-sensitive missing value imputation framework called OMSMVI for multi-source sensory data. OMSMVI takes advantages of multi-dimensions relevancy, such as temporal relevancy, spatial relevancy and attributive relevancy of sensing data adequately. The missing-sources-centered similarity graphs are constructed based on multi-dimensions relevancy. At the same time, in the process of missing data imputation, the imputed missing values are used as observations to impute subsequent missing values. Taking the whole distribution of missing sources into consideration, the framework performs order-sensitive missing value imputation, meaning that the order of imputation is ascertained before applying the specific MVI (missing value imputation) methods. Order-sensitive imputation can remit the decrease of imputed result accuracy caused by the lower similarity between missing source and its neighbors when the missing sources are dense. Finally, a new neighborhood-based missing values imputation algorithm NI, which modifies the KNN imputation algorithm, is introduced into the OMSMVI framework. NI uses the multi-dimension similarity to search the missing sources' neighbors which reflect the similarity from multiple dimensions. Such NI algorithm overcomes the shortcoming that parameter K of KNN is difficult to determine. Furthermore, NI algorithm can improve the imputation accuracy further compared to KNN. Two true sensor data sets are used to compare with the baseline MVI methods to verify the accuracy and effectiveness of OMSMVI.