2005, 16(7):1252-1261.
Abstract:
It is very important in a lot of applications to forecast future trend of data streams. For example, using predictive queries to a sensor network for monitoring environment, observers can forecast future average temperature and humidity in the area covered by the network to determine abnormal events. Recent works on query processing over data streams mainly focused on approximate queries over newly arriving data. To the best of the knowledge, there is nothing to date in the literature on predictive query processing over data streams. Adopting multivariable linear regression, a predictive mathematical model for forecasting the aggregate value over data streams is first proposed. Then, based on the model, a predictive aggregate query processing method over data streams is proposed in the paper. When the frequency of forecast failing is greater than a predefined threshold, an adaptive strategy for the predictive mathematical model is proposed. A mathematical model that characterizes the affects of the updating cycle of sliding window and data stream rate on predictive accuracy is also presented.Analytical and experimental results show that the proposed method is very effective, and the proposed algorithms have higher performance and provide better prediction of aggregate values over data streams to users. In experiments the TPC-H data and ocean air temperature data measured by TAO (tropical atmosphere ocean) are used to construct data streams.