一个基于三元组存储的列式OLAP查询执行引擎

doi:10.13328/j.cnki.jos.004568

微信服务号

微信订阅号

2025年4月17日 1:26 星期四

首页 > 过刊浏览>2014年第25卷第4期 >753-767. DOI:10.13328/j.cnki.jos.004568

PDF HTML阅读 XML下载导出引用引用提醒

一个基于三元组存储的列式OLAP查询执行引擎
DOI:
                        10.13328/j.cnki.jos.004568
                    
CSTR:
                        
                    
作者:
                        朱阅岸朱阅岸
数据工程与知识工程教育部重点实验室(中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
张延松张延松
数据工程与知识工程教育部重点实验室(中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872;中国人民大学 中国调查与数据中心, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
周烜周烜
数据工程与知识工程教育部重点实验室(中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
王珊王珊
数据工程与知识工程教育部重点实验室(中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家科技重大专项（核高基）（2010ZX01042-001-002）；国家自然科学基金（61272138，61232007）；中国人民大学研究生科学研究基金（13XNH216）

Column-Oriented Query Execution Engine for OLAP Based on Triplet

Author:

ZHU Yue-An
ZHU Yue-An
Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education (Renmin University of China, Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Yan-Song
ZHANG Yan-Song
Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education (Renmin University of China, Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China;National Survey Research Center at Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
ZHOU Xuan
ZHOU Xuan
Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education (Renmin University of China, Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Shan
WANG Shan
Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education (Renmin University of China, Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [26]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

大数据与传统的数据仓库技术相结合产生了大数据实时分析处理需要（volume+velocity），它要求大数据背景下的数据仓库不能过多地依赖物化、索引等高存储代价的优化技术，而要提高实时处理能力来应对大数据分析中数据量大、查询分析复杂等特点.这些查询分析操作一般表现为在事实表和维表之间连接操作的基础上对结果集上进行分组聚集等操作.因此，表连接和分组聚集操作是ROLAP（relational OLAP）性能的两个重要决定因素.研究了新硬件平台下针对大规模数据的OLAP查询的性能，设计新的列存储OLAP查询执行引擎CDDTA-MMDB（columnar direct dimensional tuple access-main memory databasequeryexecutionengine，直接维表元组访问的内存数据库查询执行引擎）.基于三元组的物化策略，使得CDDTA-MMDB能够减少内存列存储模型上表连接操作访问基表和中间数据结构的次数.首先，CDDTA-MMDB将查询分解为作用在维表和事实表上的子查询，如果只涉及过滤操作，子查询将生成<代理键，布尔值>二元组；否则，子查询生成<代理键，关键字，值>三元组.然后，只需一趟扫描事实表，利用事实表的外键映射函数直接定位相应三元组或者二元组，完成相应的过滤、连接或聚集操作.CDDTA-MMDB充分考虑了内存列存储数据库的设计原则，尽量减少随机内存访问.实验结果表明：CDDTA-MMDB是高效的，与具代表性的列存储数据库相比，比MonetDB 5.5快2.5倍，比C-store的invisible join快5倍；并且，CDDTA-MMDB在多核处理器上具有线性加速比.

关键词:大数据分析;联机分析处理;内存列存储数据库;表连接算法;物化策略

Abstract:

Integrating big data and traditional data warehouse (DW) techniques bring demand for real-time big data analysis. The new demand means DW can not depend too much on the optimization such as materialization and indexing which consume large space, but instead needs to enhance ability of real-time analysis to handle big data analysis which usually issues complex queries on huge data volumes. Those queries usually consist in applying group or aggregation operator on the join result between fact table and dimension table(s). The join and group operation often are the bottle-necks for performance improvement. This paper studies the OLAP performance under the new hardware platform and big data environment, and develops a new OLAP query execution engine in columnar storage, called CDDTA-MMDB (columnar direct dimensional tuple access for main memory database query execution engine). The optimized materialization makes CDDTA-MMDB reduce access to base table and intermediate data structure during join procedure. CDDTA- MMDB decomposes the query into sub-queries on the fact table and dimension table respectively. If the sub-query on dimension table only serves as filter, it will produce the binary tuple <surrogate,Boolean_value>; otherwise, it will produce the triplet in the form of <surrogate,key,value>. Thus, by just scanning the fact table one-pass and utilizing the mapping function of foreign keys in fact table to directly access the binary tuples or triplets, the executor can accomplish the join, filter and group operations. Consideration is fully placed on the design principle for the main-memory columnar database. Experimental results show that the system is efficient and can be 2.5 times faster than MonetDB 5.5 and 5 times faster than invisible join used by C-store. Moreover, it scales linearly on multi-core processors.

Key words:big data analysis;OLAP;main-memory columnar database;join algorithm;materialization

参考文献

[1] Douglas L. The Importance of ‘Big Data': A Definition. Gartner, G00235055, 2012.

[2] Boncz P, Zukowski M, Nes N. MonetDB/X100: Hyper-Pipelining query execution. In: Ailamaki A, ed. Proc. of the CIDR 2005. Asilomar, 2005. 225-237.

[3] Boncz P, Grust T, van Keulen M, Manegold S, Rittinger J, Teubner J. MonetDB/XQuery: A fast xquery processor powered by a relational engine. In: Chaudhuri S, Hristidis V, Polyzotis N, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Chicago: ACM Press, 2006. 479-490. [doi: 10.1145/1142473.1142527]

[4] Larson PA, Hanson EN, Price SL. Columnar storage in SQL server 2012. In: Kementsietsidis A, Vaz Salles MN, eds. Proc. of the IEEE 28th Int'l Conf. on Data Engineering (ICDE 2012).Washington: IEEE Computer Society, 2012. 15-20.

[5] Boncz PA, Kersten ML. MIL primitives for querying a fragmented world. In: Atkinson MP, Orlowska ME, Valduriez P, Zdonik SB, Brodie ML, eds. Proc. of the 25th Int'l Conf. on Very Large Data Bases (VLDB'99). Edinburgh: Morgan Kaufmann Publishers, 1999. 101-119.

[6] Plattner H. A common database approach for OLTP and OLAP using an in-memory column database. In: Çetintemel U, Zdonik SB, Kossmann D, Tatbul N, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD 2009). Rhode Island: ACM Press, 2009. 1-2. [doi: 10.1145/1559845.1559846]

[7] Sikka V, Färber F, Lehner W, Cha SK, Peh T, Bornhövd C. Efficient transaction processing in SAP HANA database—The end of a column store myth. In: Candan KS, Chen Y, Snodgrass RT, Gravano L, Fuxman A, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD 2012). Scottsdale: ACM Press, 2012. 731-741. [doi: 10.1145/2213836.2213946]

[8] Zhang YS, Jiao M, Wang ZW, Wang S, Zhou X. One-Size-Fits-All OLAP technique for big data analysis. Chinese Journal of Computers, 2011,34(10):1936-1946 (in Chinese with English abstract). [doi: 10.3724/SP.J.1016.2011.01936]

[9] Wang HJ, Qin XP, Zhang YS, Wang S, Wang ZW. LinearDB: A relational approach to make data warehouse scale like MapReduce. In: Yu JX, Kim MH, Unland R, eds. Proc. of the 16th Int'l Conf. on Database Systems for Advanced Applications (DASFAA 2011). Lecture Notes in Computer Science, Hong Kong, 2011. 306-320. [doi: 10.1007/978-3-642-20152-3_23]

[10] Wang S, Wang HJ, Qin XP, Zhou X. Architecting big data: Challenges, studies and forecasts. Chinese Journal of Computers, 2011, 34(10):1742-1752 (in Chinese with English abstract). [doi: 10.3724/SP.J.1016.2011.01741]

[11] Copeland GP, Khoshaian S. A decomposition storage mode. In: Navathe SB, ed. Proc. of the 1985 ACM SIGMOD Int'l Conf. on Management of Data. Austin: ACM Press, 1985. 268-279. [doi: 10.1145/318898.318923]

[12] Boncz P, Kersten ML, Manegold S. Breaking the memory wall in MonetDB. Communications of the ACM, 2008,51:77-85. [doi: 10.1145/1409360.1409380]

[13] Stonebraker M, Abadi DJ, Batkin A, Chen XD, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O'Neil E, O'Neil P, Rasin A, Tran T, Zdonik S. C-Store: A column-oriented DBMS. In: Böhm K, Jensen CS, Haas LM, Kersten ML, Larson PA, Ooi BC, eds. Proc. of the 31st Int'l Conf. on Very Large Data Bases. Trondheim: ACM Press, 2005. 553-564.

[14] Selinger PG, Astrahan MM, Chamberlin DD, Lorie RA, Price TG. Access path selection in a relational database. In: Bernstein PA, ed. Proc. of the 1979 ACM SIGMOD Int'l Conf. on Management of Data. Boston: ACM Press, 1979. 23-34.

[15] Hong W, Stonebraker M. Exploiting InteroperatorParallelism in XPRS. In: Stonebraker M, ed. Proc. of the 1992 ACM SIGMOD Int'l Conf. on Management of Data. San Diego: ACM Press, 1992. 19-28.

[16] Abadi DJ, Madden SR, Hachem N. Column-Stores vs. row-stores: How different are they really. In: Wang JTL, ed. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD 2008). Vancouver: ACM Press, 2008. 967-980. [doi: 10. 1145/1376616.1376712]

[17] Abadi DJ, Myers DS, DeWitt DJ, Madden SR. Materialization strategies in a column-oriented DBMS. In: Chirkova R, Dogac A, Özsu MT, Sellis TK, eds. Proc. of the 23rd Int'l Conf. on Data Engineering (ICDE 2007). Istanbul: IEEE, 2007. 466-475. [doi: 10. 1109/ICDE.2007.367892]

[18] Halverson A, Beckmann J, Naughton J. A comparisonof C-store and row-store in a common framework. Technical Report, TR1566, UW Madison Department of CS, 2006.

[19] Harizopoulos S, Liang V, Abadi DJ, Madden S. Performance tradeoffs in read-optimized databases. In: Dayal U, Whang KY, Lomet DB, Alonso G, Lohman GM, Kersten ML, Cha SK, Kim YK, eds. Proc. of the 32nd Int'l Conf. on Very Large Data Bases. Seoul: ACM Press, 2006. 487-498.

[20] Harizopoulos S, Abadi D, Madden S, Stonebraker M. OLTP through the looking glass, and what we found there. In: Wang JTL, ed. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD 2008). ACM Press, 2008. 981-992. [doi: 10.1145/ 1376616.1376713]

[21] DeBrabant J, Pavlo A, Tu S, Stonebraker M, Zdonik S. Anti-Caching: A new approach to database management system architecture. Proc. of the VLDB Endowment, 2013,6(14):1942-1953.

[22] Youssefiand K, Wong E. Query processing in a relational database Management system. In: Furtado AL, Morgan HL, eds. Proc. of the 5th Int'l Conf. on Very Large Data Bases. Rio de Janeiro: IEEE, 1979. 409-417.

[23] Held GD, Stonebraker M, Wong E. INGRES—A relational data base management system. In: Proc. of the 1975 National Computer Conf. on American Federation of Information Processing Societies. Anaheim: AFIPS Press, 1975. 409-417. http://dl.acm.org/ citation.cfm?doid=1499949.1500029

[24] Garcia-Molina H, Ullman JD, Widom J. Database System Implementation. Upper Saddle River: Prentice Hall, 2000.

[25] O'Neil PE, O'Neil EJ, Chen X. The star schema benchmark (SSB). 2009. http://www.cs.umb.edu/-poneil/StarSchemaB.PDF

[26] O'Neil PE, Chen X, O'Neil EJ. Adjoined dimension column index to improve star schema query performance. In: Alonso G, Blakeley JA, Chen ALP, eds. Proc. of the 24th Int'l Conf. on Data Engineering (ICDE 2008). Cancún: IEEE, 2008. 1409-1411.

引用本文

朱阅岸,张延松,周烜,王珊.一个基于三元组存储的列式OLAP查询执行引擎.软件学报,2014,25(4):753-767

复制

文章指标

点击次数:6363
下载次数: 8105
HTML阅读次数: 2557
引用次数: 0

历史

收稿日期:2013-10-13
最后修改日期:2014-01-27
录用日期:
在线发布日期: 2014-03-28
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码