For many KDD (knowledge discovery in databases) applications, such as fraud detection in E-commerce, it is more interesting to find the exceptional instances or the outliers than to find the common knowledge. Most existing work in outlier detection deals with data with numerical attributes. And these methods give no explanation to the outliers after finding them. In this paper, a hypergraph-based outlier definition is presented, which considers the locality of the data and can give good explanation to the outliers,and it also gives an algorithm called HOT(hypergraph-based outlier test) to find outliers by counting three measurements,the support,belongingness and deviation of size,for each vertex in the hypergraph.This algorithm can manage both numerical attributes and categorical attributes.Analysis shows that this approach can find the outliers in high-dimensionsal space effctively.
魏藜,宫学庆,钱卫宁,周傲英.高维空间中的离群点发现.软件学报,2002,13(2):280-290
Copy