Approximate Continuous k Representative Skyline Query Algorithm over High-Speed Streaming Data Environment
Author:
Affiliation:

Clc Number:

TP311

  • Article
  • | |
  • Metrics
  • |
  • Reference [42]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    k representative skyline query is a type of query derived from traditional skyline query. Given a set of d-dimensional dataset D, a skyline query finds all objects in D that are not dominated by other ones, which helps users to select high-quality objects based on their preference. However, the scale of skyline objects may be large in many cases, users have to choose target objects from a large number of objects, leading that both the selection speed and quality cannot be guaranteed. Compared with traditional skyline query, k representative skyline query chooses the most "representative" k objects from all skyline objects, which effectively solves such problem causes by traditional skyline query. Given the sliding window W and a continuous query q, q monitors objects in the window. When the window slides, q returns k skyline objects with the largest group dominance size in the window. The key behind existing algorithms is to monitor skyline objects in the current window. When the skyline set is updated, the algorithm updates k representative skyline set. However, the cost of monitoring skyline set is usually high. When the skyline set scale is large, the computational cost of choosing k representative skyline objects is also high. Thus, existing algorithms cannot efficiently work under high-speed stream environment. This study proposes a query named r-approximate k representative skyline query. In order to support this type of queries, a novel framework is proposed named PAKRS (predict-based approximate k representative skyline). Firstly, PAKRS partitions the current window into a group of sub-windows. Next, the predicted result sets of a few future windows are constructed according to the partition result. In this way, the earliest moment can be predicted when new arrived objects may become skyline objects. Secondly, an index is proposed named r-GRID, which can help PAKRS to select r-approximate k representative skyline with O(k/s+k/m) computational cost under 2-dimensional space, and O(2Ld/m+2Ld/s) computational cost under d-dimensional space (d>2), where L is a little integer smaller than k. Theoretical analysis shows that the computational complexity of PAKRS is lower than the state-of-the-art efforts. Extensive experiments have been conducted to confirm the efficiency and effectiveness of the proposed algorithms. Experimental results show that the running time of PAKRS is about 1/4 times of PBA (prefix-based algorithm), algorithm 1/6 times of GA (greedy algorithm) and about 1/3 times of e-GA (e-constraint greedy algorithm).

    Reference
    [1] Lin X, Yuan Y, Zhang Q, et al.Selecting stars:The k most representative skyline operator.In:Proc.of the Int'l Conf.on Data Engineering.Istanbul, 2007.86-95.
    [2] Tao Y, Ding L, Lin X, et al.Distance-based representative skyline.In:Proc.of the Int'l Conf.on Data Engineering.Shanghai, 2009.892-903.
    [3] Sarma AD, Lall A, Nanongkai D, et al.Representative skylines using threshold-basedpreference distributions.In:Proc.of the Int'l Conf.on Data Engineering, Hannover, 2011.387-398.
    [4] Zhou X, Li K, Yang Z, et al.Efficient approaches to k representative G-skyline queries.ACM Trans.on Knowledge Discovery from Data, 2020, 14(5):1-27.
    [5] Bai M, Xin JC, Wang GR, et al.Discovering the k representative skyline over a sliding window.IEEE Trans.on Knowledge and Data Engineering, 2016, 28(8):2041-2056.
    [6] Borzsonyi S, Kossmann D, Stocker K.The skyline operator.In:Proc.of the IEEE Int'l Conf.Tucson, 2001.421-430.
    [7] Chomicki J, Godfrey P, Gryz J, et al.Skyline with presorting:Theory and optimization.In:Proc.of the Int'l Conf.on Intelligent Information Systems.Wroclaw, 2005.216-225.
    [8] Buchta C.On the average number of maxima in a set of vectors.Information Processing Letters, 1989, 33(2):63-65.
    [9] Godfrey P, Shipley R, Gryz J.Maximal vector computation in large data sets.In:Proc.of the Very Large Databases.Trondheim, 2005.229-240.
    [10] Lin X, Yuan Y, Wang W, et al.Stabbing the sky:Efficient skyline computation over sliding windows.In:Proc.of the 21st Int'l Conf.on Data Engineering.Tokyo, 2005.502-513.
    [11] Tao Y, Papadias D.Maintaining sliding window skylines on data streams.IEEE Trans.on Knowledge and Data Engineering, 2006, 18(3):377-391.
    [12] Zuo KZ, Hu P, Wang TC, et al.Privacy-preserving Skyline query protocol in two-tiered sensor networks.Ruan Jian Xue Bao/Journal of Software, 2014, 25:43-45(in Chinese with English abstract).http://www.jos.org.cn/1000-9825/14013.htm
    [13] Tang YF, Chen SP.Improved skyline query algorithm of data streams based on k-d tree index.Journal of Chinese Computer Systems, 2018, 39(3):544-550(in Chinese with English abstract).
    [14] Ren W, Lian X, Ghazinour K.Skyline queries over incomplete data streams.The VLDB Journal, 2019, 28(6):961-985.
    [15] Alami K, Maabout S.A framework for multidimensional skyline queries over streaming data.Data & Knowledge Engineering, 2020, 127:Article No.101792.
    [16] Tian L, Li AP, Zou P, et al.The continuous skyline computation on update data streams.Computer Engineering and Science, 2008, 30(5):59-64(in Chinese with English abstract).
    [17] Zhang L, Zou P, Jia Y, et al.Continuous dynamic skyline queries over data stream.Journal of Computer Research and Development, 2011, 48(1):77-85(in Chinese with English abstract).
    [18] Bai M, Xin JC, Wang GR, et al.Research on dynamic skyline query processing over data streams.Chinese Journal of Computers, 2016, 39(10):2007-2030(in Chinese with English abstract).
    [19] Dimitris P, Tao YF, Greg F, et al.An optimal and progressive algorithm for skyline queries.In:Proc.of the ACM SIGMOD Int'l Conf.on Management of Data.San Diego, 2003.467-478.
    [20] Chan CY, Jagadish HV, Tan KL, et al.Finding k-dominant skylines in high dimensional space.In:Proc.of the ACM SIGMOD Int'l Conf.on Management of Data.Chicago, 2006.27-29.
    [21] Li S, Dou YN, Hao XH, et al.The method of the k-dominant space skyline query in road network.Journal of Computer Research & Development, 2020, 57(1):227-239(in Chinese with English abstract).
    [22] Cui NN, Yang XC, Wang B, et al.Research on authentication of moving k-dominant NN queries.Chinese Journal of Computers, 2018, 41(8):90-107(in Chinese with English abstract).
    [23] Zhu HY, Li XY, Liu Q, Xu ZC.Top-k dominating queries on skyline groups.IEEE Trans.on Knowledge and Data Engineering, 2020, 32(7):1431-1444.
    [24] Wang CP, Wang CK, Guo GY, Ye XJ, Yu P.Efficient computation of G-skyline groups.IEEE Trans.on Knowledge and Data Engineering, 2018, 30(4):674-688.
    [25] Bai M, Wang XT, Li GY, et al.Research on optimization algorithms of k-maximum coverage skyline queries.Chinese Journal of Computers, 2020, 43(12):2276-2297(in Chinese with English abstract).
    [26] Malene SS, Sean C, Ira A.Maximum coverage representative skyIine.In:Proc.of the Int'l Conf.on Extending Database Technology.Bordeaux, 2016.702-703.
    [27] Huang X, Zheng J.Deletion-robust k-coverage queries.In:Proc.of the Database Systems for Advanced Applications.Chiang Mai, 2019.215-219.
    [28] Zhou X, Li KL, Yang ZB, Li KQ.Finding optimal skyline product combinations under price promotion.IEEE Trans.on Knowledge and Data Engineering, 2019, 31(1):138-151.
    [29] Zhou X, Li KL, Xiao GQ, Zhou YT, Li KQ.Top k favorite probabilistic products queries.IEEE Trans.on Knowledge and Data Engineering, 2016, 28(10):2808-2821.
    [30] Gan J, Tao Y.DBSCAN revisited:Mis-claim, un-fixability, and approximation.In:Proc.of the ACM Conf.on Management of Data (SIGMOD).2015.519-530.
    [31] Song FY, Zhu R, Zhang H, et al.Keyword skyline query algorithm over streaming data.Journal of Chinese Computer Systems, 2021, 42(9):2004-2010(in Chinese with English abstract).
    [32] Zhu R, Wang B, Yang X, et al.SAP:Improving continuous top-k queries over streaming data.IEEE Trans.on Knowledge and Data Engineering, 2017, 29(6):1310-1328.
    附中文参考文献
    [12] 左开中, 胡鹏, 王涛春, 等.两层传感器网络隐私保护Skyline查询协议.软件学报, 2014, 25:113-121.http://www.jos.org.cn/1000-9825/14013.htm
    [13] 唐颖峰, 陈世平.利用k-d树索引改进数据流skyline查询算法.小型微型计算机系统, 2018, 39(3):544-550.
    [16] 田李,李爱平, 邹鹏, 等.更新数据流上的连续Skyline计算.计算机工程与科学, 2008, 30(5):59-64.
    [17] 张丽,邹鹏, 贾焰, 等.数据流上连续动态skyline查询研究.计算机研究与发展, 2011, 48(1):77-85.
    [18] 白梅,信俊昌, 王国仁, 等.数据流上动态轮廓查询处理技术的研究.计算机学报, 2016, 39(10):2007-2030.
    [21] 李松,窦雅男, 郝晓红, 等.道路网环境下k-支配区域Skyline查询方法.计算机研究与发展, 2020, 57(1):227-239.
    [22] 崔宁宁, 杨晓春, 王斌, 等.移动k-支配最近邻查询验证研究.计算机学报, 2018, 41(8):90-107.
    [25] 白梅,王习特, 李冠宇, 等.基于最大覆盖的代表Skyline问题的优化算法研究.计算机学报, 2020, 43(12):2276-2297.
    [31] 宋栿尧, 朱睿, 张豪, 等.数据流环境下的关键词轮廓查询算法.小型微型计算机系统, 2021, 42(9):2004-2010.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

朱睿,宋栿尧,王斌,杨晓春,张安珍,夏秀峰.高速流环境下近似连续k代表轮廓查询算法.软件学报,2023,34(3):1425-1450

Copy
Share
Article Metrics
  • Abstract:366
  • PDF: 1757
  • HTML: 1343
  • Cited by: 0
History
  • Received:November 25,2021
  • Revised:April 27,2022
  • Online: March 10,2023
  • Published: March 06,2023
You are the first2044637Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063