Technique for Continuous Truth Discovery Over Multiple-Source Sensor Data Streams
Author:
Affiliation:

Fund Project:

National Key Basic Research Program of China (973) (2012CB316201); National Natural Science Foundation of China (61433008, 61472071, 61272179); Fundamental Research Funds for Central Universities (N140404013)

  • Article
  • | |
  • Metrics
  • |
  • Reference [23]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    As a method of assessing validity of conflicting information provided by various data sources, truth discovery has been widely researched in the conventional database community. However, most of the existing solutions of truth discovery are not suitable for applications involving data streams, mainly because their methods include iterative processes. This paper studies the problem of continuous truth discovery in a special kind of data streams-sensor data streams. Combining with the characteristics of sensor data itself and its application, a strategy is proposed based on changing the frequency of assessing source reliability to reduce the iterative processes, and therefore to improve the efficiency of truth discovery in multiple-source sensor data streams. First, definitions are provided on when the relative errors and accumulative errors are relatively small, and the necessary conditions of the variation on source reliability from adjacent time points. Next, a probabilistic model is given to predict the probability of meeting these necessary conditions. Then, by integrating the above conclusions, maximal assessing period of source reliability is achieved, under the condition that the cumulative error of prediction is smaller than the given threshold in a certain confidence level of probabilities, in order to improve efficiency. Thus the truth discovery problem is transformed into an optimization problem. Furthermore, an algorithm, CTF-Stream (continuous truth finding over sensor data streams) is constructed to assessing source reliability with changeable frequencies. CTF-Stream utilizes the historic data to dynamically determine the time needed to assess the source reliability, and finds the truth with a certain accuracy given by customers while improving the efficiency. Finally, both efficiency and accuracy of the presented methods for truth discovery in sensor data streams are validated by conducting the extensive experiments on real sensor dataset.

    Reference
    [1] Yin XX, Han JW, Yu PS. Truth discovery with multiple conflicting information providers on the Web. IEEE Trans. on Knowledge and Data Engineering, 2007,20(6):796-808.[doi:10.1109/TKDE.2007.190745]
    [2] Galland A, Abiteboul S, Marian A, Senellart P. Corroborating information from disagreeing views. In:Proc. of the WSDM. New York, 2010.131-140. https://hal.inria.fr/inria-00429546/document
    [3] Zhao B, Han JW. A probabilistic model for estimating real-valued truth from conflicting sources. In:Proc. of the QDB. Istanbul, 2012. http://web.engr.illinois.edu/~hanj/pdf/qdb12_bzhao.pdf
    [4] Zhao B, Rubinstein BIP, Gemmell J, Han JW. A Bayesian approach to discovering truth from conflicting sources for data integration. PVLDB, 2012,5(6):550-561.[doi:10.14778/2168651.2168656]
    [5] Li Q, Li YL, Gao J, Zhao B, Fan W, Han JW. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In:Proc. of the SIGMOD. Snowbird, 2014.1187-1198. http://hanj.cs.illinois.edu/pdf/sigmod14_jgao.pdf
    [6] Li Q, Li YL, Gao J, Demirbas M, Zhao B, Su L, Fan W, Han JW. A confidence-aware approach for truth discovery on long-tail data. PVLDB, 2014,8(4):425-436.
    [7] Dong XL, Berti-Equille L, Srivastava D. Integrating conflicting data:The role of source dependence. PVLDB, 2009,2(1):550-561.
    [8] Dong XL, Berti-Equille L, Srivastava D. Truth discovery and copying detection in a dynamic world. PVLDB, 2009,2(1):562-573.[doi:10.14778/1687627.1687691]
    [9] Dong XL, Berti-Equille L, Hu YF, Srivastava D. Global detection of complex copying relationships between sources. PVLDB, 2010,3(1-2):1358-1369.
    [10] Dong XL, Berti-Equille L, Hu YF, Srivastava D. Solomon:Seeking the truth via copying detection. PVLDB, 2010,3(1-2):1617-1620.[doi:10.1145/1966883.1966887]
    [11] Dong XL, Gabrilovich E, Murphy K, Dang V, Horn W, Lugaresi C, Sun S, Zhang W. Knowledge-Based trust:Estimating the trustworthiness of Web sources. PVLDB, 2015,8(9):938-949.
    [12] Pochampally R, Das-Sarma A, Dong XL, Meliou A, Srivastava D. Fusing data with correlations. In:Proc. of the SIGMOD. Snowbird, 2014.433-444. http://lunadong.com/publication/fusionWCorr_sigmod.pdf
    [13] Li X, Dong XL, Lyons K, Meng W, Srivastava D. Truth finding on the deep Web:Is the problem solved. PVLDB, 2012,6(2):97-108.
    [14] Song SX, Zhang AQ, Wang JM, Yu PS. SCREEN:Stream data cleaning under speed constraints. In:Proc. of the SIGMOD. Melbourne, 2015.827-841. http://ise.thss.tsinghua.edu.cn/sxsong/doc/15sigmod-screen.pdf
    [15] Cao L, Yang D, Wang QY, Yu YW, Wang JY, Rundensteiner EA. Scalable distance-based outlier detection over high-volume data streams. In:Proc. of the ICDE. 2014.76-87.[doi:10.1109/ICDE.2014.6816641]
    [16] Zhao Z, Cheng J, Ng W. Truth discovery in data streams:A single-pass probabilistic approach. In:Proc. of the CIKM. Shanghai, 2014.1589-1598. http://er2004.cse.ust.hk/~wilfred/paper/cikm14a.pdf
    [17] Li JZ, Li JB, Shi SF. Concepts, issues and advance of sensor networks and data management of sensor networks. Ruan Jian Xue Bao/Journal of Software, 2003,14(10):1717-1727(in Chinese with English abstract). http://www.jos.org.cn/ch/reader/create_pdf.aspx?file_no=20031007&journal_id=jos
    [18] Zhao Z, Ng W. A model-based approach for rfid data stream cleansing. In:Proc. of the CIKM. Hawaii, 2012.862-871. http://www.cs.ust.hk/~wilfred/paper/cikm12b.pdf
    [19] Cheng SY, Li JZ, Yu L. Location aware peak value queries in sensor networks. In:Proc. of the INFOCOM. 2012.486-494.[doi:10.1109/INFCOM.2012.6195789]
    [20] Raza U, Camerra A, Murphy A, Palpanas T, Picco GP. Practical data prediction for real-world wireless sensor networks. IEEE Trans. on Knowledge and Data Engineering, 2015,PP(8):1.[doi:10.1109/TKDE.2015.2411594]
    [21] Li YL, Li Q, Gao J, Su L, Fan W, Han JW. On the discovery of evolving truth. In:Proc. of the SIGKDD. Sydney, 2015.675-684. http://www.cse.buffalo.edu/~lusu/papers/KDD2015Yaliang.pdf
    附中文参考文献:
    [1] 李建中,李金宝,石胜飞.传感器网络及其数据管理的概念、问题与进展.软件学报,2003,14(10):1717-1727. http://www.jos.org.cn/ch/reader/create_pdf.aspx?file_no=20031007&journal_id=jos
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

李天义,谷峪,马茜,李芳芳,于戈.一种多源感知数据流上的连续真值发现技术.软件学报,2016,27(7):1655-1670

Copy
Share
Article Metrics
  • Abstract:5607
  • PDF: 6958
  • HTML: 3303
  • Cited by: 0
History
  • Received:September 25,2015
  • Revised:January 12,2016
  • Online: March 24,2016
You are the first2038023Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063