主页期刊介绍编委会编辑部服务介绍道德声明在线审稿编委办公编辑办公English
     
在线出版
各期目录
纸质出版
分辑系列
论文检索
论文排行
综述文章
专刊文章
美文分享
各期封面
E-mail Alerts
RSS
旧版入口
中国科学院软件研究所
  
投稿指南 问题解答 下载区 收费标准 在线投稿
孔令波,唐世渭,杨冬青,王腾蛟,高军.XML信息检索中最小子树根节点问题的分层算法.软件学报,2007,18(4):919-932
XML信息检索中最小子树根节点问题的分层算法
Layered Solution for SLCA Problem in XML Information Retrieval
投稿时间:2005-10-20  修订日期:2006-04-27
DOI:
中文关键词:  XML索引  Dewey编码  XML信息检索  关键字查询  SLCA  ILE
英文关键词:XML index  Dewey code  XML information retrieval  keyword search  SLCA (smallest lowest common ancestor)  ILE (indexed lookup eager)
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60503037 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant No.2005AA4Z307 (国家高技术研究发展计划(863)); the Beijing Natural Science Foundation of China under Grant No.4062018 (北京市自然科学基金)
作者单位
孔令波 北京大学,计算机科学技术系,北京,100871 
唐世渭 北京大学,计算机科学技术系,北京,100871
北京大学,视觉与听觉信息处理国家重点实验室,北京,100871 
杨冬青 北京大学,计算机科学技术系,北京,100871 
王腾蛟 北京大学,计算机科学技术系,北京,100871 
高军 北京大学,计算机科学技术系,北京,100871 
摘要点击次数: 3928
全文下载次数: 3792
中文摘要:
      最小子树根节点问题(smallest lowest common ancestor,简称SLCA)是实现XML信息检索研究中关键字查询的一个基本问题,其主旨就是求解所有包含给定关键字的紧致子树的根节点.XU等人给出了3种算法-基于索引的搜索算法(indexed lookup eager,简称ILE)、基于堆栈的算法以及基于扫描的算法(scan eager,简称SE),并通过实验证明ILE算法具有最好的表现.与基于B+树索引结构的ILE算法不同,所给出的新算法,称为LISA(layered intersec
英文摘要:
      SLCA (smallest lowest common ancestor) problem is a basic task of keyword search in XML information retrieval. It means to find all the nodes corresponding to the tightest subtrees in XML data, which involves the given. Xu, et al., illustrate three algorithms-Indexed lookup eager (ILE), stack algorithm and scan eager (SE), and manifest that ILE has the best performance. Different from the complicated-B+-tree-based ILE algorithm, this paper proposes a layered solution for SLCA problem, named as LISA (layered intersection scan algorithm). It benefits from the distribution rule of SLCA nodes in XML tree, and calculates the SLCA nodes level by level (the deepest level runs first). That is, based on the retrieved Dewey codes corresponding to given keywords, the Dewey codes of SLCA nodes can be gotten by intersecting the prefix Dewey codes of each level. Compared with the ILE algorithm, LISA solutions need not sophisticated data structures, and have comparatively runtime performance. There are two instances following the LISA idea, called LISA I and LISA II respectively. They are distinguished from each other according to whether keeping Dewey codes in computation or transforming Dewey codes into integer sequences. Extensive experiments evaluate the performance of algorithms and prove the efficiency of LISA II.
HTML  下载PDF全文  查看/发表评论  下载PDF阅读器
 

京公网安备 11040202500064号

主办单位:中国科学院软件研究所 中国计算机学会 京ICP备05046678号-4
编辑部电话:+86-10-62562563 E-mail: jos@iscas.ac.cn
Copyright 中国科学院软件研究所《软件学报》版权所有 All Rights Reserved
本刊全文数据库版权所有,未经许可,不得转载,本刊保留追究法律责任的权利