Abstract:As a method of assessing validity of conflicting information provided by various data sources, truth discovery has been widely researched in the conventional database community. However, most of the existing solutions of truth discovery are not suitable for applications involving data streams, mainly because their methods include iterative processes. This paper studies the problem of continuous truth discovery in a special kind of data streams-sensor data streams. Combining with the characteristics of sensor data itself and its application, a strategy is proposed based on changing the frequency of assessing source reliability to reduce the iterative processes, and therefore to improve the efficiency of truth discovery in multiple-source sensor data streams. First, definitions are provided on when the relative errors and accumulative errors are relatively small, and the necessary conditions of the variation on source reliability from adjacent time points. Next, a probabilistic model is given to predict the probability of meeting these necessary conditions. Then, by integrating the above conclusions, maximal assessing period of source reliability is achieved, under the condition that the cumulative error of prediction is smaller than the given threshold in a certain confidence level of probabilities, in order to improve efficiency. Thus the truth discovery problem is transformed into an optimization problem. Furthermore, an algorithm, CTF-Stream (continuous truth finding over sensor data streams) is constructed to assessing source reliability with changeable frequencies. CTF-Stream utilizes the historic data to dynamically determine the time needed to assess the source reliability, and finds the truth with a certain accuracy given by customers while improving the efficiency. Finally, both efficiency and accuracy of the presented methods for truth discovery in sensor data streams are validated by conducting the extensive experiments on real sensor dataset.