基于适应性分段估计的数据流相似性搜索
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

Supported by the National High-Tech Research and Development Plan of China under Grant Nos.2006AA01Z451, 2007AA01Z474 (国家高技术研究发展计划(863))


Similarity Search in Data Stream with Adaptive Segmental Approximations
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    相似性搜索在股票交易行情、网络安全、传感器网络等众多领域应用广泛.由于这些领域中产生的数据具有无限的、连续的、快速的、实时的特性,所以需要适合数据流上的在线相似性搜索算法.首先,在具有或不具有全局约束条件下,分别提出了没有索引结构的DTW(dynamic time warping)下限函数LB_seg_WFglobalLB_seg_WF,它们是一种分段DTW技术,能够处理数据流上的非等长序列间在线相似性匹配问题.然后,为了进一步提高LB_seg_WFglobalLB_seg_WF的近似程度,提出了一系列的改进方法.最后,针对流上使用LB_seg_WFglobalLB_seg_WF可能会出现连续失效的情况,分别提出了DTW的下限函数LB_WFglobal(具有全局约束条件)和上限函数UB_WF、下限函数LB_WF(不具有全局约束条件).通过增量方式快速估计DTW,极大地减少了估计DTW的冗余计算量.通过理论分析和统计实验,验证了该方法的有效性.

    Abstract:

    Similarity search has attracted many researchers from various communities (real-time stock quotes, network security, sensor networks). Due to the infinite, continuous, fast and real-time properties of the data from these communities, a method is needed for online similarity search in data stream. This paper first proposes the lower bound function LB_seg_WFglobal for DTW (dynamic time warping) in the presence of global warping constraints and LB_seg_WF for DTW without global warping constraints, which are not applied to any index structures. They are segmented DTW techniques, and can be applied to sequences and queries of varying lengths in data stream. Next, several tighter lower bounds are proposed to improve the approximate degree of the LB_seg_WFglobal and LB_seg_WF. Finally, to deal with the possible continuously non-effective problem of LB_seg_WFglobal or LB_seg_WF in data stream, it is believed that lower-bound LB_WFglobal (in the presence of global warping constraints) and lower-bound LB_WF, upper-bound UB_WF (without global warping constraints) can fast estimate DTW and hence reduce a lot of redundant computations by incrementally computing. The theoretical analysis and statistical experiments confirm the validity of the proposed methods.

    参考文献
    相似文献
    引证文献
引用本文

吴 枫,仲 妍,吴泉源,贾 焰,杨树强.基于适应性分段估计的数据流相似性搜索.软件学报,2009,20(10):2867-2884

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:2008-12-30
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号