Abstract:Database users often fails to obtain the expected answer in the query results, since databases are often incomplete with missing data. It is known as the Why-not problem, that is, "why the expected tuples do not appear in the results". Existing methods present the explanations of the Why-not problem by enumerating possible values. The number of explanations presented by enumeration is often too large to explore by users. Integrity constraints, such as function dependencies, are employed to rule out irrational explanations. Unfortunately, many attributes are simply represented as variables in the reduced explanations, which the users may still not understand. There are also many unreasonable explanations, owing to data sparsity. This work proposes to study the pair-wise relationships of tuples as the features for ranking Why-not explanations. First, the format of Why-not problem explanations is re-defined, without variables, for easy understanding by users. Secondly, the equality/inequality relationships in tuple pairs are represented. Instead of learning over the original data with sparsity issue, to learn statistical models over the {0,1} representation of tuple pairs is proposed. A number of models are employed to infer the probability, including statistical distribution, classification, and regression. Finally, the explanations are evaluated and ranked according to the inferred probability. Experiments shows that high-quality explanations for Why-not question can be returned using pair-wise method.