一种云环境下的大数据Top-K查询方法
作者:
基金项目:

国家自然科学基金(61379050,91224008);国家高技术研究发展计划(863)(2013AA013204);高等学校博士学科点专项科研基金(20130004130001)


Method for Top-K Query on Big Data in Cloud
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [19]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    Top-K查询在搜索引擎、电子商务等领域有着广泛的应用.Top-K查询从海量数据中返回最符合用户需求的前K个结果,主要目的是消除信息过载带来的负面影响.大数据背景下的Top-K查询,给数据管理和分析等方面带来新的挑战.结合MapReduce的特点,从数据划分、数据筛选等方面对云环境下的大数据Top-K查询问题进行深入研究.实验结果表明,该方法具有良好的性能和扩展性.

    Abstract:

    Top-K query has been widely used in lots of modern applications such as search engine and e-commerce. Top-K query returns the most relative results for user from massive data, and its main purpose is to eliminate the negative effect of information overload. Top-K query on big data has brought new challenges to data management and analysis. In light of features of MapReduce, this paper presents an in-depth study of Top-K query on big data from the perspective of data partitioning and data filtering. Experimental results show that the proposed approaches have better performance and scalability.

    参考文献
    [1] Fagin R. Combining fuzzy information from multiple systems. Journal of Computer and System Sciences, 1999,58(1):83-99. [doi: 10.1006/jcss.1998.1600]
    [2] Fagin R, Lotem A, Naor M. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 2003,66(4): 614-656. [doi: 10.1016/S0022-0000(03)00026-6]
    [3] Güntzer U, Balke W, Kießling W. Towards efficient multi-feature queries in heterogeneous environments. In: Proc. of the Int'l Conf. on Information Technology: Coding and Computing (ITCC 2001). Piscataway: IEEE, 2001. 622-628. [doi: 10.1109/ITCC. 2001.918866]
    [4] Chang KCC, Hwang SW. Minimal probing: Supporting expensive predicates for top-k queries. In: Proc. of the 2002 ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 2002. 346-357. [doi: 10.1145/564691.564731]
    [5] Bruno N, Chaudhuri S, Gravano L. Top-K selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. on Database Systems, 2002,27(2):153-187. [doi: 10.1145/568518.568519]
    [6] Ilyas IF, Aref WG, Elmagarmid AK. Supporting top-k join queries in relational databases. In: Proc. of the 29th Int'l Conf. on Very Large Databases. San Fransisco: Morgan Kaufmann Publishers, 2003. 207-221. [doi: 10.1007/s00778-004-0128-2]
    [7] Vlachou A, Doulkeridis C, Kotidis Y, Nørvåg K. Reverse top-k queries. In: Proc. of the 26th IEEE Int'l Conf. on Data Engineering. Piscataway: IEEE, 2010. 365-376. [doi: 10.1109/ICDE.2010.5447890]
    [8] Vlachou A, Doulkeridis C, Kotidis Y, Nørvåg K. Monochromatic and bichromatic reverse top-k queries. IEEE Trans. on Knowledge and Data Engineering, 2011,23(8):1215-1229. [doi: 10.1109/TKDE.2011.50]
    [9] Marian A, Bruno N, Gravano L. Evaluating top-k queries over Web-accessible databases. ACM Trans. on Database Systems, 2004, 29(2):319-362. [doi: 10.1145/1005566.1005569]
    [10] Cao P, Wang Z. Efficient top-K query calculation in distributed networks. In: Proc. of the 23th Annual ACM Symp. on Principles of Distributed Computing. New York: ACM Press, 2004. 206-215. [doi: 10.1145/1011767.1011798]
    [11] Michel S, Triantafillou P, Weikum G. KLEE: A framework for distributed top-k query algorithms. In: Proc. of the 31st Int'l Conf. on Very Large Data Bases. New York: ACM Press, 2005. 637-648. http://dl.acm.org/citation.cfm?id=1083667
    [12] Zhao KP, Tao YF, Zhou SG. Efficient top-k processing in large-scaled distributed environments. Data and Knowledge Engineering, 2007,63(2):315-335. [doi: 10.1016/j.datak.2007.03.012]
    [13] Dedzoe WK, Lamarre P, Akbarinia R, Valduriez P. ASAP top-k query processing in unstructured P2P systems. In: Proc. of the 10th IEEE Int'l Conf. on Peer-to-Peer Computing. Piscataway: IEEE, 2010. 1-10. [doi: 10.1109/P2P.2010.5569974]
    [14] Vlachou A, Doulkeridis C, Nørvåg K, Vazirgiannis M. On efficient top-k query processing in highly distributed environments. In: Proc. of the 2008 ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 2008. 753-764. [doi: 10.1145/ 1376616.1376692]
    [15] Vlachou A, Doulkeridis C, Nørvåg K. Distributed top-k query processing by exploiting skyline summaries. Distributed and Parallel Databases, 2012,30(3-4):239-271. [doi: 10.1007/s10619-012-7094-2]
    [16] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008,51(1):107-113. [doi: 10.1145/1327452.1327492]
    [17] Candan KS, Kim JW, Nagarkar P, Nagendra M, Yu RW. RanKloud: Scalable multimedia data processing in server clusters. IEEE MultiMedia, 2011,18(1):64-77. [doi: 10.1109/MMUL.2010.70]
    [18] Doulkeridis C, Nørvåg K. On saying “enough already!” in MapReduce. In: Proc. of the 1st Int'l Workshop on Cloud Intelligence. New York: ACM Press, 2012. 7-7. [doi: 10.1145/2347673.2347680]
    [19] Tsaparas P, Palpanas T, Kotidis Y, Koudas N, Srivastava D. Ranked join indices. In: Proc. of the 19th IEEE Int'l Conf. on Data Engineering. Piscataway: IEEE, 2003. 277-288. [doi: 10.1109/ICDE.2003.1260799]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

慈祥,马友忠,孟小峰.一种云环境下的大数据Top-K查询方法.软件学报,2014,25(4):813-825

复制
分享
文章指标
  • 点击次数:6976
  • 下载次数: 10349
  • HTML阅读次数: 3439
  • 引用次数: 0
历史
  • 收稿日期:2013-09-10
  • 最后修改日期:2013-12-18
  • 在线发布日期: 2014-03-28
文章二维码
您是第20486942位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号