Similarity Measures for XML Documents Based on Kernel Matrix Learning
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    XML document as a new data model has been a hot research area. Similarity measure is a basic of analyses, management and text mining for XML documents. Structured Link Vector Model (SLVM) is a document model for the XML documents’ similarity measure based on both the content and structure. The kernel matrix, which describes the relations between the structure units, plays an important role in the SLVM. In the paper, two algorithms are derived to learn the kernel matrix for capturing the relations between the structure units: one is based on the support vector machine and the other is based on matrix iterative analysis. For the performance evaluation, the proposed similarity measure is applied to similarity search. The experimental results show that the similarity measure based on kernel matrix learning outperform significantly the traditional measures. Furthermore, comparing with the kernel matrix leaning algorithm based on the support vector machine (SVM)’s regression, the kernel matrix leaning algorithms based on matrix iterative analysis not only acquires higher precision but also needs less training documents and cost.

    Reference
    Related
    Cited by
Get Citation

杨建武,陈晓鸥.基于核矩阵学习的XML文档相似度量方法.软件学报,2006,17(5):991-1000

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 30,2005
  • Revised:October 20,2005
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063