Distributed Time Series Similarity Search Method Based on Key-value Data Stores

doi:10.13328/j.cnki.jos.006445

微信服务号

微信订阅号

2025-4-6- 1

Home > Archive>Volume 33, Issue 3, 2022 >950-967. DOI:10.13328/j.cnki.jos.006445

PDF HTML XML Export Cite reminder

Distributed Time Series Similarity Search Method Based on Key-value Data Stores
DOI:
                        10.13328/j.cnki.jos.006445
                    
Author:
                        YU Zi-ShengYU Zi-Sheng
School of Cyber Engineering, Xidian University, Xi’an 710126, China;JD Intelligent Cities Research, Beijing 100176, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Rui-YuanLI Rui-Yuan
College of Computer Science, Chongqing University, Chongqing 400044, China;JD Intelligent Cities Research, Beijing 100176, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GUO YangGUO Yang
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
JIANG Zhong-YuanJIANG Zhong-Yuan
School of Cyber Engineering, Xidian University, Xi’an 710126, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
BAO JieBAO Jie
JD Intelligent Cities Research, Beijing 100176, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHENG YuZHENG Yu
JD Intelligent Cities Research, Beijing 100176, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Time series similarity search is one of the most basic operations for temporal data analysis, which has various application scenarios. Existing distributed methods face the problems of dimension explosion, too large scan range, and time-consuming similarity calculation. To this end, this study proposes a distributed time series similarity search algorithm KV-Search. First, time series are segmented into blocks and stored in the key-value database, which solves the problem of high and growing dimension. Second, the lower bound is calculated based on Chebyshev distance, and the invalid data is filtered out in advance using key value range scanni ng, which reduce the data transmission and calculation overhead. Third, a block-based time series representation is used to calculate the lower bound of distance, which avoids the calculation of higher dimensional real data. KV-Search is implemented based on HBase, and a set of extensive experiments are conducted using both real and synthetic time series data. The results show that the proposed KV-Search is superior to benchmark experiment in efficiency and scalability.

Key words:time series;similarity search;key-value storage;pruning filtration;distributed query

Get Citation

俞自生,李瑞远,郭阳,蒋忠元,鲍捷,郑宇.基于键值存储的分布式时序相似性搜索方法.软件学报,2022,33(3):950-967

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:June 30,2021
Revised:July 31,2021
Adopted:
Online: October 21,2021
Published: March 06,2022

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History