Association Relationships Study of Multi-Dimensional Data Quality
Author:
Affiliation:

Fund Project:

National Program on Key Basic ResearchProject of China (973) (2012CB316200); National Natural Science Foundation of China (U1509216, 61472099, 61133002); Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Provience (LC2016026)

  • Article
  • | |
  • Metrics
  • |
  • Reference [22]
  • |
  • Related [20]
  • |
  • Cited by [1]
  • | |
  • Comments
    Abstract:

    Recently, with the rapid growth of data quantity, users are using a variety of indicators to evaluate and improve the quality of data from different dimensions. During the course of data quality management, it is found that many important factors that influence the data availability are not completely isolated. In the evaluation mechanism which can guide data cleaning rules, these dimensions may be associated with each other. In this paper, serveral data quality dimensions researched in the literature as well as being used in the real information system are discussed, and accordingly the definition and properties of the dimensions are summarized. In addition, a multi-dimensional data quality assessment framework is proposed. According to the four important properties of data availability:Accuracy, completeness, consistency and currency, the operation method and the relationships among them on the data set are constructed. Finally, a multi-dimensional data quality accessment strategy is created. The effctiveness of the proposed strategy is verified by experiments.

    Reference
    [1] Mayer-Schonberger V, Cukier K. Big Data:A Revolution That Will Transform How We Live, Work, and Think. London:Houghton Mifflin Harcourt, 2013.19-31.
    [2] Sidi F, Shariat PPH, Affendey LS, Jabar MA, Ibrahim H, Mustapha A. Data quality:A survey of data quality dimensions. In:Proc. of the 2012 Int'l Conf. on Information Retrieval & Knowledge Management. IEEE, 2012.300-304.[doi:10.1109/InfRKM.2012.6204995]
    [3] Guo ZM, Zhou AY. Research on data quality and data cleaning:A survey. Ruan Jian Xue Bao/Journal of Software, 2002, 13(11):2076-2082(in Chinese with English abstract). http://www.jos.org.cn/ch/reader/view_abstract.aspx?flag=1&file_no=20021103&journal_id=jos
    [4] Batini C, Cappiello C, Francalanci C, Maurino A. Methodologies for data quality assessment and improvement. ACM Computing Surveys, 2009,41(3):No.16.[doi:10.1145/1541880.1541883]
    [5] Wang RY, Strong DM. Beyond accuracy:What data quality means to data consumers. Journal of Management Information Systems, 1996,12(4):5-33.[doi:10.1080/07421222.1996.11518099]
    [6] Cong G, Fan W, Geerts F, Jia XB, Ma S. Improving data quality:Consistency and accuracy. In:Proc. of the 33rd Int'l Conf. on Very Large Data Bases. VLDB Endowment, 2007.315-326. http://dl.acm.org/citation.cfm?id=1325890&preflayout=flat
    [7] Bohannon P, Fan W, Geerts F, Jia XB, Kementsietsidis A. Conditional functional dependencies for data cleaning. In:Proc. of the 23rd IEEE Int'l Conf. on Data Engineering. Istanbul:IEEE, 2007.746-755.[doi:10.1109/ICDE.2007.367920]
    [8] Fan W, Geerts F, Wijsen J. Determining the currency of data. ACM Trans. on Database Systems, 2012,37(4):25-41.[doi:10.1145/2389241.2389244]
    [9] Li MH, Li JZ, Gao H. Evaluation of data currency. Chinese Journal of Computers, 2012,35(11):2348-2360(in Chinese with English abstract).
    [10] McGilvray D. Executing Data Quality Projects:Ten Steps to Quality Data and Trusted Information. Burlington:Elsevier, 2008.16-59.
    [11] Fan W, Ma S, Tang N, Yu WY. Interaction between record matching and data repairing. Journal of Data and Information Quality, 2014,4(4):16.[doi:10.1145/2567657]
    [12] Tee SW, Bowen PL, Doyle PH. Rohde F. Factors influencing organizations to improve data quality in their information systems. Accounting & Finance, 2007,47(2):335-355.[doi:10.1111/j.1467-629x.2006.00205.x]
    [13] Eckerson W. Data quality and the bottom line, Vol.1. TDWI Report, Data Warehouse Institute, 2002.1-31.
    [14] Pipino LL, Lee YW, Wang RY. Data quality assessment. Communications of the ACM, 2002,45(4):211-218.[doi:10.1145/505248.506010]
    [15] https://en.wikipedia.org/wiki/Cronbach%27s_alpha
    [16] Yue K. Data Engineering:Processing, Analysis and Service. Beijing:Tsinghua University Press, 2013.169-180(in Chinese).
    [17] Fan W, Geerts F. Relative information completeness. ACM Trans. on Database Systems, 2010,35(4):97-106.[doi:10.1145/1862919.1862924]
    [18] Bravo L, Fan W, Ma S. Extending dependencies with conditions. In:Proc. of the 33rd Int'l Conf. on Very Large Data Bases. VLDB Endowment, 2007.243-254. http://dl.acm.org/citation.cfm?id=1325882&CFID=627672245&CFTOKEN=70772333
    附中文参考文献:
    [3] 郭志懋,周傲英.数据质量和数据清洗研究综述.软件学报,2002,13(11):2076-2082. http://www.jos.org.cn/ch/reader/view_abstract.aspx?flag=1&file_no=20021103&journal_id=jos
    [9] 李默涵,李建中,高宏.数据时效性判定问题的求解算法.计算机学报,2012,35(11):2348-2360.
    [16] 岳昆.数据工程——处理、分析与服务.北京:清华大学出版社,2013.169-180.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

丁小欧,王宏志,张笑影,李建中,高宏.数据质量多种性质的关联关系研究.软件学报,2016,27(7):1626-1644

Copy
Share
Article Metrics
  • Abstract:6761
  • PDF: 7892
  • HTML: 3167
  • Cited by: 0
History
  • Received:October 10,2015
  • Revised:January 12,2016
  • Online: March 24,2016
You are the first2032476Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063