基于函数依赖的结构匹配方法
基金项目:

Supported by the National Natural Science Foundation of China under Grant No.60873030 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant No.2007AA01Z309 (国家高技术研究发展计划(863)); the National Defense Pre-Research Foundation of

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [19]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    模式匹配是模式集成、数据仓库、电子商务以及语义查询等领域中的一个基础问题,近来已经成为研究的热点,并取得了丰硕的成果.这些成果主要利用元素(典型的为关系模式中的属性)自身的信息来挖掘元素语义,目前,这方面的研究已经相当成熟.结构信息作为模式中一种重要的信息,能够为提高模式匹配的精确性提供有用的支持,但是目前关于如何利用结构信息提高模式匹配的精确性的研究还很少.将模式元素之间的相似度分为语义相似度(根据元素自身信息得到的相似度)和结构相似度(根据元素之间的关联关系得到的相似度),并采用新的统计方法计算元素间的结构相似度,然后再综合考虑语义相似度得到元素间的相似概率;最后根据相似概率得到模式元素间的映射关系(模式元素之间的对应关系).实验结果表明,该算法在查准率、查全率及全面性等方面都优于已有的其他算法.

    Abstract:

    Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing and semantic query processing. Recently it has become a research hotspot, and most of the achievements are about the use of element’s own information. Research about element’s own information is very mature at present. As an important piece of information in a schema, structure information can be useful information for schema matching, but the research of structure information is far behind that of element’s own information. This paper divides the similarity between two elements into linguistic similarity and structural similarity, and gets the structural similarity by a new statistic method, and then gets the matching probability by integrating the linguistic similarity and structural similarity. At last, the paper gets the mapping between schema elements according to the matching probability. Extensive simulation experiments are conducted and the results show that this algorithm is better than other algorithms in various performance metrics.

    参考文献
    [1] Zhao HM. Semantic matching across heterogeneous data sources. Communications of the ACM, 2007,50(1):45-50.
    [2] Bohannon P, Elnahrawy E, Fan WF, Flaster M. Putting context into schema matching. In: Proc. of the VLDB. 2006. http://www.vldb.org/dblp/db/conf/vldb/vldb2006.html
    [3] Bilke A, Naumann F. Schema matching using duplicates. In: Proc. of the 21st Int’l Conf. on Data Engineering (ICDE). 2005. http://www.informatik.uni-trier.de/~ley/db/conf/icde/icde2005.html
    [4] Madhavan J, Bernstein PA, Doan AH, Halevy A. Corpus-Based schema matching. In: Proc. of the 21st Int’l Conf. on Data Engineering (ICDE). 2005. http://www.informatik.uni-trier.de/~ley/db/conf/icde/icde2005.html
    [5] Do HH, Rahm E. COMA—A system for flexible combination of schema matching approaches. In: Proc. of the VLDB. 2002. http://www.vldb.org/dblp/db/conf/vldb/vldb2002.html
    [6] Aumueller D, Do HH, Massmann S, Rahm E. Schema and ontology matching with COMA++. In: Proc. of the SIGMOD. 2005. http://www.informatik.uni-trier.de/~ley/db/conf/sigmod/sigmod2005.html
    [7] Melnik S, Garcia-Molina H, Rahm E. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proc. of the ICDE. 2002. http://www.informatik.uni-trier.de/~ley/db/conf/icde/icde2002.html
    [8] Madhavan J, Bernstein PA, Rahm E. Generic schema matching with CUPID. In: Proc. of the VLDB. 2001. http://www.vldb.org/ dblp/db/conf/vldb/vldb2001.html
    [9] Xu L, Embley DW. Discovering direct and indirect matches for schema elements. In: Proc. of the 8th Int’l Conf. on Database Systems for Advanced Applications (DASFAA 2003). 2003. http://www.informatik.uni-trier.de/~ley/db/conf/dasfaa/dasfaa2003. html
    [10] Gusfield D, Irving R. The Stable Marriage Problem: Structure and Algorithms. Cambridge: MIT Press, 1989.
    [11] Rahm E, Bernstein PA. A survey of approaches to automatic schema matching. VLDB Journal, 2001,10(4):334-350.
    [12] Salton G. The SMART Retrieval System—Experiments in Automatic Document Retrieval. Englewood Cliffs, 1971.
    [13] Domingos P, Pazzani M. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 1997,29(2-3): 103-130.
    [14] Li WS, Clifton C. SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data & Knowledge Engineering, 2000,33(1):49-84.
    [15] Sa SX, Wang S. An Introduction to Database System. 3rd ed., Beijing: Higher Education Press, 2000. 183-187 (in Chinese).
    [16] Lu ZN, Zhang HS. Foundation of Operations Research. 2nd ed., Hefei: University of Science and Technology of China Press, 2006. 117-123 (in Chinese).
    [17] Tansalarak N, Claypool KT. QMatch-Using paths to match XML schemas. Data & Knowledge Engineering, 2007,60(2):260-282.
    附中文参考文献: [15] 萨师煊,王珊.数据库系统概论.第3版,北京:高等教育出版社,2000.183-187.
    [16] 路正南,张怀胜.运筹学基础教程.第2版,合肥:中国科学技术大学出版社,2006.117-123.
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

李国徽,杜小坤,胡方晓,杨 兵,唐向红.基于函数依赖的结构匹配方法.软件学报,2009,20(10):2667-2678

复制
分享
文章指标
  • 点击次数:6184
  • 下载次数: 6968
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2008-03-06
  • 最后修改日期:2009-05-07
文章二维码
您是第19780982位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号