Abstract:Most research on distance metric of kNN classification is focused on how to integrate the differences caused by various attributes, and the semantic difference between values of the same attribute is ignored. In addition, classification accuracy of the traditional approaches is very sensitive to the incomplete data described on different abstract levels. In this paper, a novel kNN approach based on semantic distance——SDkNN (semantic distance based k-nearest neighbor) is presented, which solves the two problems mentioned above. This approach analyzes the semantic difference between values of an attribute and presents how to calculate the semantic distance based on domain ontologies, and the semantic distance is then used to improve the traditional kNN methods. Experiments on the UCI (University of California, Irvine) machine learning repository and real application datasets show that the overall performance of SDkNN outperforms the traditional one, especially when the data is incomplete. SDkNN also has the desirable application value in practice.