• Article
  • | |
  • Metrics
  • |
  • Reference [26]
  • |
  • Related [20]
  • |
  • Cited by [1]
  • | |
  • Comments
    Abstract:

    The computation of data cubes usually produces huge outputs. There are two popular methods to solve this problem: Iceberg cube and closed cube, which can be combined together. Due to the importance and usability of closed iceberg cube, how to efficiently compute it becomes a key research issue. A cache-conscious computation method is proposed in this paper. The data are aggregated in a bottom-up manner. In the meantime, the closed cells covering the aggregate cells are discovered and output. Two pruning strategies are used to save unnecessary recursive calls. The Apriori pruning is utilized to support iceberg cube computation. To reduce the number of memory-related stalls and produce the aggregate results efficiently, multiple dimensions are pre-sorted and the software prefetching technology is introduced into data scans. A comprehensive and detailed performance study is conducted on both synthetic data and real data sets. The results show that the proposed closed iceberg cube computation method is efficient and effective.

    Reference
    [1] Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1997,1(1):29-53. [doi: 10.1023/A:1009726021843]
    [2] Zhao Y, Deshpande PM, Naughton JF. An array-based algorithm for simultaneous multidimensional aggregates. In: Peckham J, ed. Proc. of the ACM SIGMOD Int’l Conf. on Management of Data. New York: ACM Press, 1997. 159-170.
    [3] Beyer K, Ramakrishnan R. Bottom-Up computation of sparse and iceberg CUBEs. In: Delis A, Faloutsos C, Ghandeharizadeh S, eds. Proc. of the ACM SIGMOD Int’l Conf. on Management of Data. New York: ACM Press, 1999. 359-370.
    [4] Xin D, Han JW, Li XL, Wah BW. Star-Cubing: Computing iceberg cubes by top-down and bottom-up integration. In: Freytag JC, Lockemann PC, Abiteboul S, Carey MJ, Selinger PG, Heuer A, eds. Proc. of the 29th Int’l Conf. on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 2003. 476-487.
    [5] Shao Z, Han JW, Xin D. MM-Cubing: Computing iceberg cubes by factorizing the lattice space. In: Hatzopoulos M, Manolopoulos Y, eds. Proc. of the 16th Int’l Conf. on Scientific and Statistical Database Management. Washington: IEEE Computer Society, 2004. 213-222.
    [6] Hurtado CA, Mendelzon AO, Vaisman AA. Maintaining data cubes under dimension updates. In: Kitsuregawa M, Maciaszak L, Papazoglou M, Pu C, eds. Proc. of the 15th Int’l Conf. on Data Engineering. Washington: IEEE Computer Society, 1999. 346-355.
    [7] Lee KY, Kim MH. Efficient incremental maintenance of data cubes. In: Dayal U, Whang KY, Lomet DB, Alonso G, Lohman GM, Kersten ML, Cha SK, Kim YK, eds. Proc. of the 32nd Int’l Conf. on Very Large Data Bases. New York: ACM Press, 2006. 823-833.
    [8] Barbara D, Sullivan M. Quasi-Cubes: Exploiting approximations in multidimensional databases. SIGMOD Record, 1997,26(3): 12-17. [doi: 10.1145/262762.262764]
    [9] Shanmugasundaram J, Fayyad U, Bradley PS. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In: Proc. of the ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 1999. 223-232. http://www.informatik.uni-trier.de/~ley/db/conf/kdd/kdd99.html
    [10] Wang W, Feng JL, Lu HJ, Yu JX. Condensed cube: An effective approach to reducing data cube size. In: Agrawal R, Dittrich K, Ngu AH, eds. Proc. of the 18th Int’l Conf. on Data Engineering. Washington: IEEE Computer Society, 2002. 155-165.
    [11] Sismanis Y, Deligiannakis A, Roussopoulos N, Kotidis Y. Dwarf: Shrinking the PetaCube. In: Franklin MJ, Moon B, Ailamaki A, eds. Proc. of the ACM SIGMOD Int’l Conf. on Management of Data. New York: ACM Press, 2002. 464-475.
    [12] Lakshmanan LVS, Pei J, Han JW. Quotient cube: How to summarize the semantics of a data cube. In: Bernstein PA, Ioannidis Y, Ramakrishnan R, eds. Proc. of the 28th Int’l Conf. on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 2002. 778-789.
    [13] Lakshmanan LVS, Pei J, Zhao Y. QC-Trees: An efficient summary structure for semantic OLAP. In: Halevy AY, Ives ZG, Doan AH, eds. Proc. of the ACM SIGMOD Int’l Conf. on Management of Data. New York: ACM Press, 2003. 64-75.
    [14] Xin D, Shao Z, Han JW, Liu HY. C-Cubing: Efficient computation of closed cubes by aggregation-based checking. In: Liu L, Reuter A, Whang KY, Zhang JJ, eds. Proc. of the 22nd Int’l Conf. on Data Engineering. Washington: IEEE Computer Society, 2006.
    [15] Balmin A, Papadimitriou T, Papakonstantinou Y. Hypothetical queries in an OLAP environment. In: Abbadi AE, Brodie ML, Chakravarthy S, Dayal U, Kamel N, Schlageter G, Whang KY, eds. Proc. of the 26th Int’l Conf. on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 2000. 220-231.
    [16] Lakshmanan LVS, Russakovshy A, Sashikanth V. What-If OLAP queries with changing dimensions. In: Proc. of the 24th Int’l Conf. on Data Engineering. 2008. 1334-1336. http://www.informatik.uni-trier.de/~ley/db/conf/icde/icde2008.html
    [17] Hennessy JL, Patterson DA. Computer Architecture: A Quantitative Approach. 3rd ed., San Francisco: Morgan Kaufmann Publishers, 2002.
    [18] Boncz PA, Manegold S, Kersten ML. Database architecture optimized for the new bottleneck: memory access. In: Atkinson MP, Orlowska ME, Valduriez P, Zdonik SB, Brodie ML, eds. Proc. of the 25th Int’l Conf. on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 1999. 54-65.
    [19] Rao J, Ross KA. Making B+-Trees cache conscious in main memory. In: Chen WD, Naughton JF, Bernstein PA, eds. Proc. of the ACM SIGMOD Int’l Conf. on Management of Data. New York: ACM Press, 2000. 475-486.
    [20] Chen S, Gibbons PB, Mowry TC. Improving index performance through prefetching. In: Aref WG, ed. Proc. of the ACM SIGMOD Int’l Conf. on Management of Data. New York: ACM Press, 2001. 235-246.
    [21] Chen S, Ailamaki A, Gibbons P, Mowry T. Improving hash join performance through prefetching. In: Proc. of the 20th Int’l Conf. on Data Engineering. Washington: IEEE Computer Society, 2004. 116-127. http://www.informatik.uni-trier.de/~ley/db/conf/icde/ icde2004.html
    [22] Ailamaki A, DeWitt DJ, Hill MD, Wood DA. DBMSs on a modern processor: Where does time go? In: Atkinson MP, Orlowska ME, Valduriez P, Zdonik SB, Brodie ML, eds. Proc. of the 25th Int’l Conf. on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 1999. 266-277.
    [23] IA-32 Intel architecture optimization reference manual. 2005. http://srl.cs.jhu.edu/manuals/24896612.pdf
    [24] Hahn C, Warren S. Extended edited synoptic cloud reports from ships and land stations over the globe. 1952~1996. 1999. http://cdiac.esd.ornl.gov/ftp/ndp026c/ndp026c.txt
    [25] Liu DW, Luan H, Wang S, Qin B. Main memory database TPC-H workload characterization on modern processor. Journal of Software, 2008,19(10):2573-2584 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/19/2573.htm [doi: 10.3724/SP.J.1001.2008.02573]
    附中文参考文献: [25] 刘大为,栾华,王珊,覃飙.内存数据库在TPC-H负载下的处理器性能.软件学报,2008,19(10):2573-2584. http://www.jos.org.cn/ 1000-9825/19/2573.htm [doi: 10.3724/SP.J.1001.2008.02573]
    Comments
    Comments
    分享到微博
    Submit
Get Citation

栾华,杜小勇,王珊.缓存敏感的封闭冰山立方体计算.软件学报,2010,21(4):620-631

Copy
Share
Article Metrics
  • Abstract:5072
  • PDF: 11216
  • HTML: 0
  • Cited by: 0
History
  • Received:May 14,2008
  • Revised:October 09,2008
You are the first2042819Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063