两两比较模型的Why-not问题解释及排序
CSTR:
作者:
作者单位:

作者简介:

祁丹蕊(1997-),女,内蒙古赤峰人,硕士生,主要研究领域为数据清洗;宋韶旭(1981-),男,博士,副教授,博士生导师,CCF专业会员,主要研究领域为数据库;王建民(1968-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为数据库,工作流.

通讯作者:

宋韶旭,E-mail:sxsong@tsinghua.edu.cn

中图分类号:

基金项目:

国家重点研发计划(2016YFB1001101);国家自然科学基金(61572272,71690231)


Learning Pair-wise Relationship Models for Ranking Why-not Problem Explanations
Author:
Affiliation:

Fund Project:

National Key Research and Development Plan (2016YFB1001101); National Natural Science Foundation of China (61572272, 71690231)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    由于数据缺失,数据库用户通常无法获得查询结果中的预期答案.它被称为"Why-not问题",即"为什么预期的元组不会出现在结果中".现有的方法通过列举可能的元组值来解释Why-not问题.枚举所给出解释的数量往往太大,无法由用户探索.完整性约束,如函数依赖,被用来排除不合格的解释.然而,许多属性在简化后解释中仅仅表示为变量,用户可能仍然无法理解.由于数据稀疏性,许多不合理的解释也会被推荐给用户.提出通过研究元组间两两比较关系,从而对Why-not问题的解释进行排序的方法.首先,重新定义为什么Why-not问题解释的形式没有变量,以便于用户理解;其次,对元组中的相等/不相等关系进行表示,提出在{0,1}表示的元组对的基础上学习统计模型,从而解决直接在原始数据上学习所带来的稀疏性问题,许多模型可以被用来推断概率,包括统计分布、分类和回归;最后,根据推断的概率对解释进行评价和排序.实验结果证明:利用统计、分类和回归方法计算两两关系概率分布的方法,可以为用户寻找Why-not问题的解释并返回较为高质量的解释.

    Abstract:

    Database users often fails to obtain the expected answer in the query results, since databases are often incomplete with missing data. It is known as the Why-not problem, that is, "why the expected tuples do not appear in the results". Existing methods present the explanations of the Why-not problem by enumerating possible values. The number of explanations presented by enumeration is often too large to explore by users. Integrity constraints, such as function dependencies, are employed to rule out irrational explanations. Unfortunately, many attributes are simply represented as variables in the reduced explanations, which the users may still not understand. There are also many unreasonable explanations, owing to data sparsity. This work proposes to study the pair-wise relationships of tuples as the features for ranking Why-not explanations. First, the format of Why-not problem explanations is re-defined, without variables, for easy understanding by users. Secondly, the equality/inequality relationships in tuple pairs are represented. Instead of learning over the original data with sparsity issue, to learn statistical models over the {0,1} representation of tuple pairs is proposed. A number of models are employed to infer the probability, including statistical distribution, classification, and regression. Finally, the explanations are evaluated and ranked according to the inferred probability. Experiments shows that high-quality explanations for Why-not question can be returned using pair-wise method.

    参考文献
    相似文献
    引证文献
引用本文

祁丹蕊,宋韶旭,王建民.两两比较模型的Why-not问题解释及排序.软件学报,2019,30(3):620-647

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2018-07-21
  • 最后修改日期:2018-09-20
  • 录用日期:
  • 在线发布日期: 2019-03-06
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号