两两比较模型的Why-not问题解释及排序

doi:10.13328/j.cnki.jos.005700

微信服务号

微信订阅号

2025年4月5日 20:25 星期六

首页 > 过刊浏览>2019年第30卷第3期 >620-647. DOI:10.13328/j.cnki.jos.005700

PDF HTML阅读 XML下载导出引用引用提醒

两两比较模型的Why-not问题解释及排序
DOI:
                        10.13328/j.cnki.jos.005700
                    
CSTR:
                        
                    
作者:
                        祁丹蕊祁丹蕊
清华大学 软件学院, 北京 100084
在期刊界中查找
在百度中查找
在本站中查找
宋韶旭宋韶旭
清华大学 软件学院, 北京 100084;大数据系统软件国家工程实验室, 北京 100084;北京信息科学与技术国家研究中心, 北京 100084
在期刊界中查找
在百度中查找
在本站中查找
王建民王建民
清华大学 软件学院, 北京 100084;大数据系统软件国家工程实验室, 北京 100084;北京信息科学与技术国家研究中心, 北京 100084
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:祁丹蕊(1997-),女,内蒙古赤峰人,硕士生,主要研究领域为数据清洗;宋韶旭(1981-),男,博士,副教授,博士生导师,CCF专业会员,主要研究领域为数据库;王建民(1968-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为数据库,工作流.
通讯作者:宋韶旭,E-mail:sxsong@tsinghua.edu.cn
中图分类号:
基金项目:国家重点研发计划（2016YFB1001101）；国家自然科学基金（61572272，71690231）

Learning Pair-wise Relationship Models for Ranking Why-not Problem Explanations

Author:

QI Dan-Rui
QI Dan-Rui
School of Software, Tsinghua University, Beijing 100084, China
在期刊界中查找
在百度中查找
在本站中查找
SONG Shao-Xu
SONG Shao-Xu
School of Software, Tsinghua University, Beijing 100084, China;National Engineering Laboratory for Big Data Software, Beijing 100084, China;Beijing National Research Center for Information Science and Technology, Beijing 100084, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Jian-Min
WANG Jian-Min
School of Software, Tsinghua University, Beijing 100084, China;National Engineering Laboratory for Big Data Software, Beijing 100084, China;Beijing National Research Center for Information Science and Technology, Beijing 100084, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Key Research and Development Plan (2016YFB1001101); National Natural Science Foundation of China (61572272, 71690231)

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

由于数据缺失，数据库用户通常无法获得查询结果中的预期答案.它被称为"Why-not问题"，即"为什么预期的元组不会出现在结果中".现有的方法通过列举可能的元组值来解释Why-not问题.枚举所给出解释的数量往往太大，无法由用户探索.完整性约束，如函数依赖，被用来排除不合格的解释.然而，许多属性在简化后解释中仅仅表示为变量，用户可能仍然无法理解.由于数据稀疏性，许多不合理的解释也会被推荐给用户.提出通过研究元组间两两比较关系，从而对Why-not问题的解释进行排序的方法.首先，重新定义为什么Why-not问题解释的形式没有变量，以便于用户理解；其次，对元组中的相等/不相等关系进行表示，提出在{0，1}表示的元组对的基础上学习统计模型，从而解决直接在原始数据上学习所带来的稀疏性问题，许多模型可以被用来推断概率，包括统计分布、分类和回归；最后，根据推断的概率对解释进行评价和排序.实验结果证明：利用统计、分类和回归方法计算两两关系概率分布的方法，可以为用户寻找Why-not问题的解释并返回较为高质量的解释.

关键词:数据质量;数据清洗;条件函数依赖;缺失结果解释;解释排序

Abstract:

Database users often fails to obtain the expected answer in the query results, since databases are often incomplete with missing data. It is known as the Why-not problem, that is, "why the expected tuples do not appear in the results". Existing methods present the explanations of the Why-not problem by enumerating possible values. The number of explanations presented by enumeration is often too large to explore by users. Integrity constraints, such as function dependencies, are employed to rule out irrational explanations. Unfortunately, many attributes are simply represented as variables in the reduced explanations, which the users may still not understand. There are also many unreasonable explanations, owing to data sparsity. This work proposes to study the pair-wise relationships of tuples as the features for ranking Why-not explanations. First, the format of Why-not problem explanations is re-defined, without variables, for easy understanding by users. Secondly, the equality/inequality relationships in tuple pairs are represented. Instead of learning over the original data with sparsity issue, to learn statistical models over the {0,1} representation of tuple pairs is proposed. A number of models are employed to infer the probability, including statistical distribution, classification, and regression. Finally, the explanations are evaluated and ranked according to the inferred probability. Experiments shows that high-quality explanations for Why-not question can be returned using pair-wise method.

Key words:data quality;data cleaning;conditional functional dependency;missing answer explanation;sorting explanation

引用本文

祁丹蕊,宋韶旭,王建民.两两比较模型的Why-not问题解释及排序.软件学报,2019,30(3):620-647

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2018-07-21
最后修改日期:2018-09-20
录用日期:
在线发布日期: 2019-03-06
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码