Data Model for Dirty Databases
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [20]
  • |
  • Related
  • | | |
  • Comments
    Abstract:

    Dirty data brings new challenges for data management. Current methods of dirty data management are mainly data cleaning. Such methods have limitations when dealing with in applications. In some systems, dirty data has to be tolerated. Therefore, the management of databases with dirty data becomes an important issue. The crucial problem is to obtain query result with a clean degree satisfying clean requirement of applications from databases with dirty data. From the aspect of dirty data management, a data model for dirty databases is presented in this paper. This paper proposes the representation of dirty data, data operators for dirty data and the computation method of clean degree of tuples with support of data operation. The equivalent transformation rules for query expressions on dirty data and the preliminary implementation of the data model are also discussed in this paper.

    Reference
    [1] Eckerson W. Data Quality and the Bottom Line: Achieving Business Success through a Commitment to High Quality Data, Vol.1. Seattle: The Data Warehousing Institute, 2002. 1-36.
    [2] Shilakes CC, Tylman J. Enterprise information portals. RC#60232206, United States: Merrill Lynch, 1998. 1-64.
    [3] Fuxman A, Miller R. First-Order query rewriting for inconsistent databases. In: Eiter T, Libkin L, eds. Proc. of the 10th Int’l Conf. on Database Theory. Edinburgh: Springer-Verlag, 2005. 337-351. [doi: 10.1016/j.jcss.2006.10.013]
    [4] Fuxman A, Fazli E, Miller RJ. ConQuer, efficient management of inconsistent databases. In: ?zcan F, ed. Proc. of the ACM SIGMOD Int’l Conf. on Management of Data. Baltimore: ACM Press, 2005. 155-166. [doi: 10.1145/1066157.1066176]
    [5] Andritsos P, Fuxman A, Miller RJ. Clean answers over dirty databases: A probabilistic approach. In: Liu L, Reuter A, Whang KY, Zhang J, eds. Proc. of the 22nd Int’l Conf. on Data Engineering. Atlanta: IEEE Computer Society, 2006. 30. [doi: 10.1109/ICDE. 2006.35]
    [6] Khalefa ME, Mokbel MF, Levandoski JJ. Skyline query processing for incomplete data. In: Proc. of the 24th Int’l Conf. on Data Engineering. Cancún: IEEE Computer Society, 2008. 556-565. [doi: 10.1109/ICDE.2008.4497464]
    [7] Koch C. On query algebras for probabilistic databases. SIGMOD Record, 2008,37(4):78-85. [doi: 10.1145/1519103.1519116]
    [8] Gal A, Martinez MV, Simari GI, Subrahmanian VS. Aggregate query answering under uncertain schema mappings. In: Proc. of the 25th Int’l Conf. on Data Engineering. Shanghai: IEEE Computer Society, 2009. 940-951. [doi: 10.1109/ICDE.2009.55]
    [9] Dong XL, Halevy A, Yu C. Data integration with uncertainty. In: Koch C, Gehrke J, Garofalakis MN, Srivastava D, Aberer K, Deshpande A, Florescu D, Chan CC, Ganti V, Kanne C, Klas W, Neuhold EJ, eds. Proc. of the 33rd Int’l Conf. on Very Large Data Bases. Vienna: ACM Press, 2007. 687-698. [doi: 10.1007/s00778-008-0119-9]
    [10] Elmagarmid AK, Ipeirotis PG, Verykios VS. Duplicate record detection: A survey. IEEE Trans. on Knowledge and Data Engineering, 2007,19(1):1-16. [doi: 10.1109/TKDE.2007.250581]
    [11] Li MH, Wang HZ, Li JZ, Gao H. Duplicate record detection method based on optimal bipartite graph matching. Journal of Computer Research and Development, 2009,46(Suppl.):339-345 (in Chinese with English abstract).
    [12] Madhavan J, Bernstein PA, Doan AH, Halevy AL. Corpus-Based schema matching. In: Proc. of the 21st Int’l Conf. on Data Engineering. Tokyo: IEEE Computer Society, 2005. 57-68. [doi: 10.1109/ICDE.2005.39]
    [13] Li C, Wang B, Yang XC. Vgram: Improving performance of approximate queries on string collections using variable-length grams. In: Koch C, Gehrke J, Garofalakis MN, Srivastava D, Aberer K, Deshpande A, Florescu D, Chan CC, Ganti V, Kanne C, Klas W, Neuhold EJ, eds. Proc. of the 33rd Int’l Conf. on Very Large Data Bases. Vienna: ACM Press, 2007. 303-314.
    [14] Yang XC, Wang B, Li C. Cost-Based variable-length-gram selection for string collections to support approximate queries efficiently. In: Wang JT, ed. Proc. of the ACM SIGMOD Int’l Conf. on Management of Data. Vancouver: ACM Press, 2008. 353-364. [doi: 10.1145/1376616.1376655]
    [15] Li C, Lu JH, Lu YM. Efficient merging and filtering algorithms for approximate string searches. In: Proc. of the 24th Int’l Conf. on Data Engineering. Cancún: IEEE Computer Society, 2008. 257-266. [doi: 10.1109/ICDE.2008.4497434]
    [16] Lieberman M, Sankaranarayanan J, Samet H. A fast similarity join algorithm using graphics processing units. In: Proc. of the 24th Int’l Conf. on Data Engineering. Cancún: IEEE Computer Society, 2008. 1111-1120. [doi: 10.1109/ICDE.2008.4497520]
    [17] Xiao C, Wang W, Lin XM, Yu JX. Efficient similarity joins for near duplicate detection. In: Huai JP, Chen R, Hon HW, Liu YH, Ma WY, Tomkins A, Zhang XD, eds. Proc. of the 17th Int’l Conf. on World Wide Web. Beijing: ACM Press, 2008. 131-140. [doi: 10.1145/1367497.1367516]
    [18] Garey M, Johnson D. Computers and Intractability: A Guide to the Theory of NP-Completeness. New York: W.H. Freeman and Company, 1979.
    [19] Feige U, Peleg D, Kortsarz G. The dense k-subgraph problem. Algorithmica, 2001,29(3):410-421. [doi: 10.1007/s004530010050]
    [20] Arora S, Karger D, Karpinski M. Polynomial time approximation schemes for dense instances of NP-hard problems. In: Proc. of the 27th Annual ACM Symp. on Theory of Computing. Las Vegas: ACM Press, 1995. 284-293. [doi: 10.1145/225058.225140]
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

王宏志,李建中,高宏.一种非清洁数据库的数据模型.软件学报,2012,23(3):539-549

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 21,2010
  • Revised:April 28,2011
  • Online: March 05,2012
You are the first2038265Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063