高速流环境下近似连续k代表轮廓查询算法

doi:10.13328/j.cnki.jos.006718

微信服务号

微信订阅号

2025年5月11日 12:29 星期日

首页 > 过刊浏览>2023年第34卷第3期 >1425-1450. DOI:10.13328/j.cnki.jos.006718

PDF HTML阅读 XML下载导出引用引用提醒

高速流环境下近似连续k代表轮廓查询算法
DOI:
                        10.13328/j.cnki.jos.006718
                    
CSTR:
                        
                    
作者:
                        朱睿朱睿
沈阳航空航天大学 计算机学院, 辽宁 沈阳 110136
在期刊界中查找
在百度中查找
在本站中查找
宋栿尧宋栿尧
沈阳航空航天大学 计算机学院, 辽宁 沈阳 110136
在期刊界中查找
在百度中查找
在本站中查找
王斌王斌
东北大学 计算机科学与工程学院, 辽宁 沈阳 110169
在期刊界中查找
在百度中查找
在本站中查找
杨晓春杨晓春
东北大学 计算机科学与工程学院, 辽宁 沈阳 110169
在期刊界中查找
在百度中查找
在本站中查找
张安珍张安珍
沈阳航空航天大学 计算机学院, 辽宁 沈阳 110136
在期刊界中查找
在百度中查找
在本站中查找
夏秀峰夏秀峰
沈阳航空航天大学 计算机学院, 辽宁 沈阳 110136
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:朱睿(1982-),男,博士,副教授,CCF高级会员,主要研究领域为流数据管理,查询处理与优化;杨晓春(1973-),女,博士,教授,博士生导师,CCF杰出会员,主要研究领域为数据库理论与系统,文本与时序大数据管理;宋栿尧(1995-),男,硕士,主要研究领域为流数据管理;张安珍(1990-),女,博士,讲师,CCF专业会员,主要研究领域为大数据质量管理,近似查询处理库;王斌(1973-),男,博士,教授,CCF专业会员,主要研究领域为数据质量管理,文本数据管理;夏秀峰(1965-),男,博士,教授,CCF高级会员,主要研究领域为管理信息系统,数据库.
通讯作者:王斌，wangbin@neu.edu.cn
中图分类号:TP311
基金项目:国家自然科学基金（62102271，62072088，61991404）；国家重点研发计划（2020YFB1707901）；沈阳市创新人才项目（RC200439）

Approximate Continuous k Representative Skyline Query Algorithm over High-Speed Streaming Data Environment

Author:

ZHU Rui
ZHU Rui
School of Computer Science, Shenyang Aerospace University, Shenyang 110136, China
在期刊界中查找
在百度中查找
在本站中查找
SONG Fu-Yao
SONG Fu-Yao
School of Computer Science, Shenyang Aerospace University, Shenyang 110136, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Bin
WANG Bin
School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
在期刊界中查找
在百度中查找
在本站中查找
YANG Xiao-Chun
YANG Xiao-Chun
School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG An-Zhen
ZHANG An-Zhen
School of Computer Science, Shenyang Aerospace University, Shenyang 110136, China
在期刊界中查找
在百度中查找
在本站中查找
XIA Xiu-Feng
XIA Xiu-Feng
School of Computer Science, Shenyang Aerospace University, Shenyang 110136, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

k代表轮廓查询是从传统轮廓查询中衍生出来的一类查询.给定多维数据集合D，轮廓查询从D中找到所有不被其他对象支配的对象，将其返回给用户，便于用户结合自身偏好选择高质量对象.然而，轮廓对象规模通常较大，用户需要从大量数据中进行选择，导致选择速度和质量无法得到保证.与传统轮廓查询相比，k代表轮廓查询从所有轮廓对象中选择“代表性”最强的k个对象返回给用户，有效地解决了传统轮廓查询存在的这一问题.给定滑动窗口W和连续查询q，q监听窗口中的数据.当窗口滑动时，查询q返回窗口中，组合支配面积最大的k个对象.现有算法的核心思想是：实时监测当前窗口中的轮廓对象集合，当轮廓对象集合更新时，算法更新k代表轮廓.然而，实时监测窗口中，轮廓集合的计算代价通常较大.此外，当轮廓集合规模较大时，从中选择k代表轮廓的计算代价是同样巨大的，导致已有算法无法在高速流环境下使用.针对上述问题，提出了r-近似k代表轮廓查询.为了支持该查询，提出了查询处理框架PAKRS （predict-based approximate k representative skyline）.首先，PAKRS利用高速流的特性对当前窗口进行划分，根据划分结果构建未来窗口预测结果集，用其预测新流入窗口数据成为轮廓对象的最早时间.其次，提出了索引r-GRID.它帮助PAKRS在2维和d维（d>2）环境下，分别以O （k/s+k/m）和O （2^Ld/m+2^Ld/s）的增量维护代价下筛选近似k代表轮廓，L是一个小于k的正整数.由理论分析证明可知，PAKRS的计算复杂度小于前人所提的算法计算复杂度.最后，通过大量实验对所提算法性能进行评估.结果表明，PAKRS的运行时间是PBA （prefix-based algorithm）算法的1/4、GA （greedy algorithm）算法的1/6、e-GA （e-constraint greedy algorithm）算法的1/3.

关键词:轮廓查询;k代表轮廓查询;滑动窗口;分片;高速流

Abstract:

k representative skyline query is a type of query derived from traditional skyline query. Given a set of d-dimensional dataset D, a skyline query finds all objects in D that are not dominated by other ones, which helps users to select high-quality objects based on their preference. However, the scale of skyline objects may be large in many cases, users have to choose target objects from a large number of objects, leading that both the selection speed and quality cannot be guaranteed. Compared with traditional skyline query, k representative skyline query chooses the most "representative" k objects from all skyline objects, which effectively solves such problem causes by traditional skyline query. Given the sliding window W and a continuous query q, q monitors objects in the window. When the window slides, q returns k skyline objects with the largest group dominance size in the window. The key behind existing algorithms is to monitor skyline objects in the current window. When the skyline set is updated, the algorithm updates k representative skyline set. However, the cost of monitoring skyline set is usually high. When the skyline set scale is large, the computational cost of choosing k representative skyline objects is also high. Thus, existing algorithms cannot efficiently work under high-speed stream environment. This study proposes a query named r-approximate k representative skyline query. In order to support this type of queries, a novel framework is proposed named PAKRS (predict-based approximate k representative skyline). Firstly, PAKRS partitions the current window into a group of sub-windows. Next, the predicted result sets of a few future windows are constructed according to the partition result. In this way, the earliest moment can be predicted when new arrived objects may become skyline objects. Secondly, an index is proposed named r-GRID, which can help PAKRS to select r-approximate k representative skyline with O(k/s+k/m) computational cost under 2-dimensional space, and O(2^Ld/m+2^Ld/s) computational cost under d-dimensional space (d>2), where L is a little integer smaller than k. Theoretical analysis shows that the computational complexity of PAKRS is lower than the state-of-the-art efforts. Extensive experiments have been conducted to confirm the efficiency and effectiveness of the proposed algorithms. Experimental results show that the running time of PAKRS is about 1/4 times of PBA (prefix-based algorithm), algorithm 1/6 times of GA (greedy algorithm) and about 1/3 times of e-GA (e-constraint greedy algorithm).

Key words:skyline query;k representative skyline query;sliding window;partition;high-speed streaming

引用本文

朱睿,宋栿尧,王斌,杨晓春,张安珍,夏秀峰.高速流环境下近似连续k代表轮廓查询算法.软件学报,2023,34(3):1425-1450

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-11-25
最后修改日期:2022-04-27
录用日期:
在线发布日期: 2023-03-10
出版日期: 2023-03-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码