Abstract:Integrating big data and traditional data warehouse (DW) techniques bring demand for real-time big data analysis. The new demand means DW can not depend too much on the optimization such as materialization and indexing which consume large space, but instead needs to enhance ability of real-time analysis to handle big data analysis which usually issues complex queries on huge data volumes. Those queries usually consist in applying group or aggregation operator on the join result between fact table and dimension table(s). The join and group operation often are the bottle-necks for performance improvement. This paper studies the OLAP performance under the new hardware platform and big data environment, and develops a new OLAP query execution engine in columnar storage, called CDDTA-MMDB (columnar direct dimensional tuple access for main memory database query execution engine). The optimized materialization makes CDDTA-MMDB reduce access to base table and intermediate data structure during join procedure. CDDTA- MMDB decomposes the query into sub-queries on the fact table and dimension table respectively. If the sub-query on dimension table only serves as filter, it will produce the binary tuple <surrogate,Boolean_value>; otherwise, it will produce the triplet in the form of <surrogate,key,value>. Thus, by just scanning the fact table one-pass and utilizing the mapping function of foreign keys in fact table to directly access the binary tuples or triplets, the executor can accomplish the join, filter and group operations. Consideration is fully placed on the design principle for the main-memory columnar database. Experimental results show that the system is efficient and can be 2.5 times faster than MonetDB 5.5 and 5 times faster than invisible join used by C-store. Moreover, it scales linearly on multi-core processors.