基于可能世界模型的关系数据不一致性的修复
作者:
基金项目:

国家重点基础研究发展计划(973)(2012CB316203);国家自然科学基金(61332006,61472321,61502390);西北工业大学基础研究基金(3102014JSJ0013,3102014JSJ0005)


Repairing Inconsistent Relational Data Based on Possible World Model
Author:
Fund Project:

National Basic Research Program of China (973) (2012CB316203); National Natural Science Foundation of China (61332006, 61472321, 61502390); Northwestern Polytechnical University Foundation for Fundamental Research (3102014JSJ0013, 3102014JSJ0005)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [13]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    针对关系数据的不一致性虽然已有各种修复方法被提出,但这些修复策略在构建最终修复方案的过程中只分析函数依赖包含属性的信息(即,数据集的部分信息),且偏向于修复代价最小的方案,而忽略了数据集的其他属性以及这些属性与函数依赖包含属性之间的相关性.为此,提出一种基于可能世界模型的不一致性修复方法.它首先构造可能的修复方案,然后从修复代价和属性值相关性两个方面量化各个候选修复方案的可信性程度,并最后找出最优的修复方案.实验结果验证了所提出的修复方法取得了比现有基于代价的修复方法更好的修复效果.同时也分析了错误率和不同类型概率量化对所提出的修复方法的影响.

    Abstract:

    Various techniques have been proposed to repair inconsistent relational data that violate functional dependencies by optimizing the repair plan by the metric of repair cost. However, they may fall short in the circumstances where the erroneous data occurs in the left-hand side of a functional dependency or repair cost is not a reliable optimization indicator. In this paper, a novel repairing approach based on possible world model is proposed. It first constructs candidate repair plans and then estimates their possible world probabilities. The possible world probabilities are measured by quantifying both repair cost and candidate value appropriateness with regard to other related attribute values presented in relational data. Finally, extensive experiments on synthetic datasets show that the proposed approach performs considerably better than the cost-based approach on repair quality.

    参考文献
    [1] Yakout M, Berti-Equille L, Elmagarmid AK. Don't be scared:Use scalable automatic repairing with maximal likelihood and bounded changes. In:Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD 2013). New York:ACM Press, 2013.553-564.[doi:10.1145/2463676.2463706]
    [2] Bohannon P, Flaster M, Fan WF, Rastogi R. A cost-based model and effective heuristic for repairing constraints by value modification. In:Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Baltimore:ACM Press, 2005.143-154.[doi:10.1145/1066157.1066175]
    [3] Kolahi S, Lakshmanan LVS. On approximating optimum repairs for functional dependency violations. In:Proc. of the 12th Int'l Conf. on Database Theory (ICDT 2009). St. Petersburg:ACM Press, 2009.53-62.[doi:10.1145/1514894.1514901]
    [4] Zhou AY, Jin CQ, Wang GR, Li JZ. A survey on the management of uncertain data. Chinese Journal of Computers, 2009,32(1):1-16(in Chinese with English abstract).[doi:10.3724/SP.J.1016.2009.00001]
    [5] Galhardas H, Florescu D, Shasha D, Simon E, Saita CA. Declarative data cleaning:Language, model, and algorithms. In:Apers PMG, Atzeni P, Ceri S, Paraboschi S, Ramamohanarao K, Snodgrass RT, eds. Proc. of the 27th Int'l Conf. on Very Large Data Bases (VLDB 2001). Roma:Morgan Kaufmann Publishers, 2001.371-380.
    [6] Raman V, Hellerstein JM. Potter's wheel:An interactive data cleaning system. In:Apers PMG, Atzeni P, Ceri S, Paraboschi S, Ramamohanarao K, Snodgrass RT, eds. Proc. of the 27th Int'l Conf. on Very Large Data Bases (VLDB 2001). Roma:Morgan Kaufmann Publishers, 2001.381-390.
    [7] Rahm E, Do HH. Data cleaning:Problems and current approaches. IEEE Data Engineering Bulletin, 2000,23(4):3-13.
    [8] Lian X, Lin YC, Chen L. Cost-Efficient repair in inconsistent probabilistic databases. In:Proc. of the 20th ACM Conf. on Information and Knowledge Management (CIKM 2011). Glasgow:ACM Press, 2011.1731-1736.[doi:10.1145/2063576.2063826]
    [9] Mayfield C, Neville J, Prabhakar S. ERACER:A database approach for statistical inference and data cleaning. In:Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD 2010). Indianapolis:ACM Press, 2010.75-86.[doi:10.1145/1807167.1807178]
    [10] Hu YH, De S, Chen Y, Kambhampati S. Bayesian data cleaning for Web data. arXiv:1204.3677, 2012.
    [11] Yakout M, Elmagarmid AK, Neville J, Ouzzani M, Ilyas IF. Guided data repair. PVLDB, 2011,4(5):279-289.[doi:10.14778/1952376.1952378]
    附中文参考文献:
    [4] 周傲英,金澈清,王国仁,李建中.不确定性数据管理技术研究综述.计算机学报,2009,32(1):1-16.[doi:10.3724/SP.J.1016.2009.00001]
    相似文献
    引证文献
引用本文

徐耀丽,李战怀,陈群,钟评.基于可能世界模型的关系数据不一致性的修复.软件学报,2016,27(7):1685-1699

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2015-10-14
  • 最后修改日期:2016-01-12
  • 在线发布日期: 2016-03-24
文章二维码
您是第19920864位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号