Research Advance on MapReduce Based Big Data Processing Platforms and Algorithms
Author:
Affiliation:

Clc Number:

TP311

  • Article
  • | |
  • Metrics
  • |
  • Reference [105]
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    This paper introduces the research advance on MapReduce based big data processing platforms. Frist, twelve typical MapReduce based data processing platforms are descripted, their implementation principles and application areas are compared, and their commonalities are concluded. Second, the MapReduce based big data processing algorithms, including search algorithms, data cleansing/transformation algorithms, aggregation algorithms, join algorithms, sorting algorithms, optimization algorithms, preference query algorithms, graph algorithms, and data mining algorithms, are studied. These algorithms are classified by their MapReduce implementations, and the factors that affect their performance are analyzed. Finally, big data processing algorithms are abstracted as the out-of-core algorithms whose performance features are well analyzed. The considerations, ideas and challenges of universal optimizations on the performance of out-of-core algorithms are proposed as references for researchers. These optimizations include optimizing algorithms' I/O cost and locality, and designing incremental iterative algorithms. Comparing the current topics, such as resource allocation and task scheduling based dynamic optimizations on platform, parallelization for specific algorithms, and performance optimizations on iterative algorithms, the proposed static optimizations serve as complements that highlight new areas for the researchers.

    Reference
    [1] Wu L, Yuan L, You J. Survey of large-scale data management systems for big data applications. Journal of Computer Science and Technology, 2015,30(1):163-183.[doi:10.1007/s11390-015-1511-8]
    [2] Dean J, Ghemawat S. MapReduce:Simplified data processing on large clusters. Communications of the ACM, 2008,51(1):107-113.[doi:10.1145/1327452.1327492]
    [3] Wolf J, Balmin A, Rajan D, Hildrum K, Khandekar R, Parekh S, Wu KL, Vernica R. On the optimization of schedules for MapReduce workloads in the presence of shared scans. The VLDB Journal-The Int'l Journal on Very Large Data Bases, 2012,21(5):589-609.[doi:10.1007/s00778-012-0279-5]
    [4] Computing platform. 2016. https://en.wikipedia.org/wiki/Computing_platform
    [5] Yang H, Luan Z, Li W, Qian D. MapReduce workload modeling with statistical approach. Journal of Grid Computing, 2012,10(2):279-310.[doi:10.1007/s10723-011-9201-4]
    [6] Kimura K, Nomura Y, Tanaka Y, Kurihara H, Yamamoto R. Runtime composition for extensible big data processing platforms. In:Proc. of the 2015 IEEE 8th Int'l Conf. on Cloud Computing. 2015. 1053-1057.[doi:10.1109/CLOUD.2015.151]
    [7] Out-of-Core algorithm. 2016. https://en.wikipedia.org/wiki/Out-of-core_algorithm
    [8] Low Y, Gonzalez J, Kyrola A, Bickson D, Bickson D, Guestrin C, Hellerstein JM. Distributed graphLab:A framework for machine learning and data mining in the cloud. Proc. of the VLDB Endowment, 2012,5(8):716-727.[doi:10.14778/2212351.2212354]
    [9] Zhang J, Xiang D, Li T, Pan Y. M2M:A simple Matlab-to-MapReduce translator for cloud computing. Tsinghua Science and Technology, 2013,18(1):1-9.
    [10] Liu Y, Li M, Alham NK, Hammoud S. HSim:A MapReduce simulator in enabling cloud computing. Future Generation Computer Systems, 2013,29(1):300-308.[doi:10.1016/j.future.2011.05.007]
    [11] GridGain in-memory data fabric. http://go.gridgain.com/rs/491-TWR-806/images/GridGain_Product_Datasheet_070416.pdf
    [12] Fang W, He B, Luo Q, Govindaraju NK. Mars:Accelerating mapreduce with graphics processors. IEEE Trans. on Parallel and Distributed Systems, 2011,22(4):608-620.[doi:10.1109/TPDS.2010.158]
    [13] Yoo RM, Romano A, Kozyrakis C. Phoenix rebirth:Scalable MapReduce on a large-scale shared-memory system. In:Proc. of the IEEE Int'l Symp. on Workload Characterization (IISWC 2009). IEEE, 2009. 198-207.[doi:10.1109/IISWC.2009.5306783]
    [14] Mundkur P, Tuulos V, Flatow J. Disco:A computing platform for large-scale data analytics. In:Proc. of the 10th ACM SIGPLAN Workshop on Erlang. 2011. 84-89.[doi:10.1145/2034654.2034670]
    [15] Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S, Qiu J, Fox G. Twister:A runtime for iterative MapReduce. In:Proc. of the 19th ACM Int'l Symp. on High Performance Distributed Computing. ACM Press, 2010. 810-818.[doi:10.1145/1851476.1851593]
    [16] Bu Y, Howe B, Balazinska M, Ernst MD. HaLoop:Efficient iterative data processing on large clusters. Proc. of the VLDB Endowment, 2010,3(1-2):285-296.[doi:10.14778/1920841.1920881]
    [17] Zhang Y, Gao Q, Gao L, Wang C. Imapreduce:A distributed computing framework for iterative computation. Journal of Grid Computing, 2012,10(1):47-68.[doi:10.1007/s10723-012-9204-9]
    [18] Elnikety E, Elsayed T, Ramadan HE. iHadoop:Asynchronous iterations for MapReduce. In:Proc. of the 3rd IEEE Int'l Conf. on Cloud Computing Technology and Science (CloudCom). IEEE, 2011. 81-90.[doi:10.1109/CloudCom.2011.21]
    [19] Zhang Y, Gao Q, Gao L, Wang C. PrIter:A distributed framework for prioritized iterative computations. In:Proc. of the 2nd ACM Symp. on Cloud Computing. ACM Press, 2011. 13.[doi:10.1145/2038916.2038929]
    [20] Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad:Distributed data-parallel programs from sequential building blocks. Proc. of the ACM SIGOPS Operating Systems Review, 2007,41(3):59-72.[doi:10.1145/1272998.1273005]
    [21] Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark:Cluster computing with working sets. HotCloud, 2010.
    [22] Rasooli A, Down DG. Guidelines for selecting hadoop schedulers based on system heterogeneity. Journal of Grid Computing, 2014, 12(3):499-519.[doi:10.1007/s10723-014-9299-2]
    [23] Karun AK, Chitharanjan K. A review on hadoop-HDFS infrastructure extensions. In:Proc. of the 2013 IEEE Conf. on Information & Communication Technologies (ICT). IEEE, 2013. 132-137.[doi:10.1109/CICT.2013.6558077]
    [24] Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, Saha B, Curino C, O'Malley O, Radia S, Reed B, Baldeschwieler E. Apache Hadoop YARN:Yet another resource negotiator. In:Proc. of the 4th Annual Symp. on Cloud Computing. 2013. 16.[doi:10.1145/2523616.2523633]
    [25] Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C. Evaluating mapreduce for multi-core and multiprocessor systems. In:Proc. of the 2007 IEEE 13th Int'l Symp. on High Performance Computer Architecture. IEEE, 2007. 13-24.[doi:10.1109/HPCA. 2007.346181]
    [26] Pietzuch PR, Bacon J. Peer-to-Peer overlay broker networks in an event-based middleware. In:Proc. of the 2nd Int'l Workshop on Distributed Event-based Systems. ACM Press, 2003. 1-8.[doi:10.1145/966618.966628]
    [27] Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, Zaharia M. Spark SQL:Relational data processing in spark. In:Proc. of the 2015 ACM SIGMOD Int'l Conf. on Management of Data. ACM Press, 2015. 1383-1394.[doi:10.1145/2723372.2742797]
    [28] Gonzalez J, Xin R, Dave A, Stoica I. GraphX:Graph processing in a distributed dataflow framework. In:Proc. of the Int'l Conf. on Operating Systems Design and Implementation. 2014. 599-613.
    [29] Matei Z, Tathagata D, Haoyuan L, Timothy H, Scott S, Ion S. Discretized streams:Fault-Tolerant streaming computation at scale. In:Proc. of the SOSP. 2013. 423-438.[doi:10.1145/2517349.2522737]
    [30] Meng X, Bradley J, Yuvaz B, Sparks E, Venkataraman S, Liu D, Freeman J, Tsai DB, Made M, Owen S, Xin D, Xin R, Franklin MJ, Zadeh R, Zaharia M, Talwalkar A. Mllib:Machine learning in apache spark. Journal Machine Learning Research, 2016,17(34):1-7.
    [31] Qiu J, Wu Q, Ding G, Xu Y, Feng S. A survey of machine learning for big data processing. EURASIP Journal on Advances in Signal Processing, 2016,2016(1):1-16.[doi:10.1186/s13634-015-0293-z]
    [32] Martins R, Manquinho V, Lynce I. Improving linear search algorithms with model-based approaches for MaxSAT solving. Journal of Experimental & Theoretical Artificial Intelligence, 2015,27(5):673-701.[doi:10.1080/0952813X.2014.993508]
    [33] Wang HZ. Big Data Algorithms. Beijing:China Machine Press, 2015(in Chinese).
    [34] Ding XO, Wang HZ, Zhang XY, Gao H. Association relationships study of multi-dimensional data quality. Ruan Jian Xue Bao/Journal of Software, 2016,27(7):1626-1644(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5040.htm[doi:10. 13328/j.cnki.jos.005040]
    [35] Yang DH, Li NN, Wang HZ, Li JZ, Gao H. The optimization of the big data cleaning based on task merging. Chinese Journal of Computers, 2016,39(1):97-108(in Chinese with English abstract).
    [36] Han JW, Kamber M, Pei J. Data Mining:Concepts and Techniques. 3rd ed., Morgan Kaufmann Publishers, 2011.
    [37] Wang Y, Su Y, Agrawal G. A novel approach for approximate aggregations over arrays. In:Proc. of the 27th Int'l Conf. on Scientific and Statistical Database Management. ACM Press, 2015.[doi:10.1145/2791347.2791349]
    [38] Issa JA. Performance evaluation and estimation model using regression method for hadoop WordCount. IEEE Access, 2015,3:2784-2793.[doi:10.1109/ACCESS.2015.2509598]
    [39] Han XX, Yang DH, Li JZ. Approximate join aggregate on massive data. Chinese Journal of Computers, 2010,10:1919-1933(in Chinese with English abstract).
    [40] Song J, Li TT, Zhu ZL, Bao YB, Yu G. Research on I/O cost of MapReduce join. Ruan Jian Xue Bao/Journal of Software, 2015, 26(6):1438-1456(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4586.htm[doi:10.13328/j.cnki.jos.004586]
    [41] Asiri N, Alsulim R. Non-Recursive approach for sort-merge join operation. In:Proc. of Int'l the Conf. on Beyond Databases, Architectures and Structures. Springer Int'l Publishing, 2015. 216-224.[doi:10.1007/978-3-319-34099-9_16]
    [42] Chen M, Zhong Z. Block nested join and sort merge join algorithms:An empirical evaluation. In:Proc. of the Int'l Conf. on Advanced Data Mining and Applications. Springer Int'l Publishing, 2014. 705-715.
    [43] Tong Y, Liu ZJ, Liu H. Optimizing Hash join with MapReduce on multi-core CPUs. IEICE Trans. on Information and Systems, 2016,99(5):1316-1325.[doi:10.1587/transinf.2015EDP7306]
    [44] Song J, Xu S, Zhang L, Pahl C, Yu G. Performance and energy optimization on terasort algorithm by task self-resizing. Information Technology and Control, 2015,44(1):30-40.
    [45] Ci X, Ma YZ, Meng XF. Method for top-K query on big data in cloud. Ruan Jian Xue Bao/Journal of Software, 2014,25(4):813-825(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4564.htm[doi:10.13328/j.cnki.jos.004564]
    [46] Li WF, Peng ZY, Li DY. Top-K query processing techniques on uncertain data. Ruan Jian Xue Bao/Journal of Software, 2012,23(6):1542-1560(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4200.htm[doi:10.3724/SP.J.1001.2012.04200]
    [47] MacLean B, Tomazela DM, Shulman N, Chambers M, Finney GL, Frewen B, Kern R, Tabb DL, Liebler DC, MacCoss MJ. Skyline:An open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics, 2010,26(7):966-968.[doi:10.1093/bioinformatics/btq054]
    [48] Zhang B, Zhou S, Guan J. Adapting skyline computation to the mapreduce framework:Algorithms and experiments. In:Proc. of the Int'l Conf. on Database Systems for Advanced Applications. Berlin, Heidelberg:Springer-Verlag, 2011. 403-414.[doi:10.1007/978-3-642-20244-5_39]
    [49] Ding LL, Xin JC, Wang GR, Huang S. Efficient skyline query processing of massive data based on MapReduce. Chinese Journal of Computers, 2011,34(10):1785-1796(in Chinese with English abstract).[doi:10.3724/SP.J.1016.2011.01785]
    [50] Jin C, Vecchiola C, Buyya R. MRPGA:An extension of MapReduce for parallelizing genetic algorithms. In:Proc. of the 4th IEEE Int'l Conf. on eScience (eScience 2008). IEEE, 2008. 214-221.[doi:10.1109/eScience.2008.78]
    [51] McNabb AW, Monson CK, Seppi KD. Parallel pso using mapreduce. In:Proc. of the 2007 IEEE Congress on Evolutionary Computation. IEEE, 2007. 7-14.[doi:10.1109/CEC.2007.4424448]
    [52] Li H, Wei X, Fu Q, Luo Y. MapReduce delay scheduling with deadline constraint. Concurrency and Computation:Practice and Experience, 2014,26(3):766-778.[doi:10.1002/cpe.3050]
    [53] Xu X, Ji Z, Yuan F, Liu X. A novel parallel approach of cuckoo search using MapReduce. In:Proc. of the 2014 Int'l Conf. on Computer, Communications and Information Technology (CCIT 2014). Atlantis Press, 2014.[doi:10.2991/ccit-14.2014.31]
    [54] Whang JJ, Lenharth A, Dhillon IS, Pingali K. Scalable data-driven pagerank:Algorithms, system issues, and lessons learned. In:Proc. of the European Conf. on Parallel Processing. Berlin, Heidelberg:Springer-Verlag, 2015. 438-450.[doi:10.1007/978-3-662-48096-0_34]
    [55] Song J, Guo CP, Zhang YC, Zhang YF, Yu G. Research and implemental incremental iterative model. Chinese Journal of Computers, 2016,39(1):109-125(in Chinese with English abstract).
    [56] Bu Y, Howe B, Balazinska M, Ernst MD. The HaLoop approach to large-scale iterative data analysis. The VLDB Journal-The Int'l Journal on Very Large Data Bases, 2012,21(2):169-190.[doi:10.1007/s00778-012-0269-7]
    [57] Valiant LG. A bridging model for parallel computation. Communications of the ACM, 1990,33(3):103-111.[doi:10.1145/79173. 79181]
    [58] Yu G, Gu Y, Bao YB, Wang ZG. Large scale graph data processing on cloud computing environments:Challenges and progress. Chinese Journal of Computers, 2011,34(10):1753-1767(in Chinese with English abstract).[doi:10.3724/SP.J.1016.2011.01753]
    [59] Mohanavalli S, Jaisakthi SM, Aravindan C. Strategies for parallelizing k-means data clustering algorithm. In:Proc. of the Information Technology and Mobile Communication. Berlin, Heidelberg:Springer-Verlag, 2011. 427-430.[doi:10.1007/978-3-642-20573-6_76]
    [60] Liao Q, Yang F, Zhao J. An improved parallel K-means clustering algorithm with MapReduce. In:Proc. of the 15th IEEE Int'l Conf. on Communication Technology (ICCT). IEEE, 2013. 764-768.[doi:10.1109/ICCT.2013.6820477]
    [61] Li ZH, Song XD, Zhu WH, Chen YX. K-Means clustering optimization algorithm based on MapReduce. In:Proc. of the 2015 Int'l Symp. on Computers & Informatics. 2015. 198-203.
    [62] Li Q, Wang P, Wang W, Hu H, Li Z, Li J. An efficient K-means clustering algorithm on MapReduce. In:Proc. of the Int'l Conf. on Database Systems for Advanced Applications. Springer Int'l Publishing, 2014. 357-371.[doi:10.1007/978-3-319-05810-8_24]
    [63] Çatak FÖ, Balaban ME. A MapReduce-based distributed SVM algorithm for binary classification. Turkish Journal of Electrical Engineering & Computer Sciences, 2016,24(3):863-873.[doi:10.3906/elk-1302-68]
    [64] Rong C. Using Mahout for clustering Wikipedia's latest articles:A comparison between K-means and fuzzy C-means in the cloud. In:Proc. of the IEEE 3rd Int'l Conf. on Cloud Computing Technology and Science (CloudCom). IEEE, 2011. 565-569.[doi:10.1109/CloudCom.2011.86]
    [65] Pop D, Iuhasz G, Petcu D. Distributed platforms and cloud services:Enabling machine learning for big data. In:Proc. of the Data Science and Big Data Computing. Springer Int'l Publishing, 2016. 139-159.[doi:10.1007/978-3-319-31861-5_7]
    [66] Dino K. H2O persistence framework for column oriented distributed (NoSQL) databases. In:Proc. of the 3rd Int'l Symp. on Sustainable Development. Sarajevo, 2012.
    [67] Gu L, Li H. Memory or time:Performance evaluation for iterative operation on Hadoop and spark. In:Proc. of the 10th IEEE Int'l Conf. on High Performance Computing and Communications & 2013 IEEE Int'l Conf. on Embedded and Ubiquitous Computing (HPCC_EUC). IEEE, 2013. 721-727.[doi:10.1109/HPCC.and.EUC.2013.106]
    [68] Marz N, Warren J. Big Data:Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co. Greenwich, 2015.
    [69] Zhong Y, Shen X, Ding C. Program locality analysis using reuse distance. ACM Trans. on Programming Languages and Systems (TOPLAS), 2009,31(6):20.[doi:10.1145/1552309.1552310]
    [70] Song J, Wang Z, Li TT, Yu G. Energy consumption optimization data placement algorithm for MapReduce System. Ruan Jian Xue Bao/Journal of Software, 2015,26(8):2091-2110(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4802.htm[doi:10.13328/j.cnki.jos.004802]
    [71] Song J, He HY, Wang Z, Yu G, Pierson JM. Modulo based data placement algorithm for energy consumption optimization of MapReduce system. Journal of Grid Computing, 2016. 1-16.[doi:10.1007/s10723-016-9370-2]
    [72] Li Y, Li H. Optimization of parallel I/O for Cannon's algorithm based on lustre. In:Proc. of the 11th Int'l Symp. on Distributed Computing and Applications to Business, Engineering & Science (DCABES). IEEE, 2012. 31-35.[doi:10.1109/DCABES. 2012.61]
    [73] Blelloch GE, Harper R. Cache and I/O efficent functional algorithms. Proc. of the ACM SIGPLAN Notices, 2013,48(1):39-50.[doi:10.1145/2480359.2429077]
    [74] Talebi M, Razzazi M. An I/O cost optimal and progressive algorithm for computing massive skyline points. In:Proc. of the 35th Int'l Convention (MIPRO). IEEE, 2012. 333-338.
    [75] Mohanty SK. I/O efficient algorithms for matrix computations. arXiv preprint arXiv:1006.1307, 2010.
    [76] Ghoting A, Makarychev K. I/O efficient algorithms for serial and parallel suffix tree construction. ACM Trans. on Database Systems (TODS), 2010,35(4):25.[doi:10.1145/1862919.1862922]
    [77] Haverkort H. I/O-Optimal algorithms on grid graphs. arXiv preprint arXiv:1211.2066, 2012.
    [78] Gui X, Zhang Y, Hao X. An almost linear I/O algorithm for skyline query. Journal of Software, 2010,5(2):235-242.[doi:10.4304/jsw.5.2.235-242]
    [79] Ramaswamy S, Suel T. I/O-Efficient join algorithms for temporal, spatial, and constraint databases. CiteSeer, 1996.
    [80] Jiang Y, Zhang EZ, Tian K, Shen X. Is reuse distance applicable to data locality analysis on chip multiprocessors? In:Proc. of the Int'l Conf. on Compiler Construction. Berlin, Heidelberg:Springer-Verlag, 2010. 264-282.[doi:10.1007/978-3-642-11970-5_15]
    [81] Lezos C, Dimitroulakos G, Masselos K. Reuse distance analysis for locality optimization in loop-dominated applications. In:Proc. of the 2015 Design, Automation & Test in Europe Conf. & Exhibition (DATE). IEEE, 2015. 1237-1240.[doi:10.7873/DATE. 2015.0442]
    [82] Yuan L, Ding C, Zhang Y. Modeling the locality in graph traversals. In:Proc. of the 41st Int'l Conf. on Parallel Processing. IEEE, 2012. 138-147.[doi:10.1109/ICPP.2012.40]
    [83] Gupta S, Xiang P, Yang Y, Zhou H. Locality principle revisited:A probability-based quantitative approach. Journal of Parallel and Distributed Computing, 2013,73(7):1011-1027.[doi:10.1016/j.jpdc.2013.01.010]
    [84] Yuan L, Zhang Y. A locality-based performance model for load-and-compute style computation. In:Proc. of the 2012 IEEE Int'l Conf. on Cluster Computing. IEEE, 2012. 566-571.[doi:10.1109/CLUSTER.2012.25]
    [85] Ryder BG, Marlowe TJ, Paull MC. Conditions for incremental iteration:Examples and counterexamples. Science of Computer Programming, 1988,11:1-15.[doi:10.1016/0167-6423(88)90061-5]
    [86] Burke M. An interval-based approach to exhaustive and incremental interprocedural data-flow analysis. ACM Trans. on Programming Languages and Systems, 1990,12:341-95.[doi:10.1145/78969.78963]
    [87] Pham DT, Dimov SS, Nguyen CD. An incremental K-means algorithm. Journal of Mechanical Engineering Science, 2004,218:783-95.[doi:10.1243/0954406041319509]
    [88] Elnekave S, Last M, Maimon O. Incremental clustering of mobile objects. In:Proc. of the 23rd IEEE Int'l Conf. on Data Engineering Workshop. IEEE, 2007. 585-592.[doi:10.1109/ICDEW.2007.4401044]
    [89] Hamza H, Belaïd Y, Belaïd A, Chaudhuri BB. Incremental classification of invoice documents. In:Proc. of the 19th Int'l Conf. on Pattern Recognition (ICPR 2008). IEEE, 2008. 1-4.[doi:10.1109/ICPR.2008.4761832]
    [90] Khy S, Ishikawa Y, Kitagawa H. A novelty-based clustering method for on-line documents. World Wide Web, 2008,11(1):1-37.[doi:10.1007/s11280-007-0018-9]
    [91] Chakraborty S, Nagwani NK. Analysis and study of incremental K-means clustering algorithm. In:Proc. of the Int'l Conf. on High Performance Architecture and Grid Computing. Berlin:Springer-Verlag, 2011. 338-341.[doi:10.1007/978-3-642-22577-2_46]
    [92] Daniel P, Frank D. Large-Scale incremental processing using distributed transactions and notifications. In:Proc. of the 9th Symp. on Operating Systems Design and Implementation. 2010. 137-49.
    [93] Bhatotia P, Wieder A, Rodrigues R, Acar UA, Pasquini R. Incoop:MapReduce for incremental computations. In:Proc. of the 2nd ACM Symp. on Cloud Computing. ACM, 2011.[doi:10.1145/2038916.2038923]
    附中文参考文献:
    [33] 王宏志.大数据处理算法.北京:机械工业出版社,2015.
    [34] 丁小欧,王宏志,张笑影,李建中,高宏.数据质量多种性质的关联关系研究.软件学报,2016,27(7):1626-1644. http://www.jos.org.cn/1000-9825/5040.htm[doi:10.13328/j.cnki.jos.005040]
    [35] 杨东华,李宁宁,王宏志,李建中,高宏.基于任务合并的并行大数据清洗过程优化.计算机学报,2016,39(1):97-108.
    [39] 韩希先,杨东华,李建中.海量数据上的近似连接聚集操作.计算机学报,2010,10:1919-1933.
    [40] 宋杰,李甜甜,张莉,朱志良,鲍玉斌,于戈.MapReduce连接查询的I/O代价研究.软件学报,2015,26(6):1438-1456. http://www.jos.org.cn/1000-9825/4586.htm[doi:10.13328/j.cnki.jos.004586]
    [45] 慈祥,马友忠,孟小峰.一种云环境下的大数据Top-K查询方法.软件学报,2014,25(4):813-825. http://www.jos.org.cn/1000-9825/4564.htm[doi:10.13328/j.cnki.jos.004564]
    [46] 李文凤,彭智勇,李德毅.不确定性Top-K查询处理.软件学报,2012,23(6):1542-1560. http://www.jos.org.cn/1000-9825/4200.htm[doi:10.3724/SP.J.1001.2012.04200]
    [49] 丁琳琳,信俊昌,王国仁,黄山.基于Map-Reduce的海量数据高效Skyline查询处理.计算机学报,2011,34(10):1785-1796.
    [55] 宋杰,郭朝鹏,张一川,张岩峰,于戈.增量式迭代计算模型研究与实现.计算机学报,2016,39(1):109-125.
    [58] 于戈,谷峪,鲍玉斌,王志刚.云计算环境下的大规模图数据处理技术.计算机学报,2011,34(10):1753-1767.
    [70] 宋杰,王智,朱志良,李甜甜,于戈.一种优化MapReduce系统能耗的数据布局算法.软件学报,2015,26(8):2091-2110. http://www.jos.org.cn/1000-9825/4802.htm[doi:10.13328/j.cnki.jos.004802]
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

宋杰,孙宗哲,毛克明,鲍玉斌,于戈. MapReduce大数据处理平台与算法研究进展.软件学报,2017,28(3):514-543

Copy
Share
Article Metrics
  • Abstract:4711
  • PDF: 7783
  • HTML: 1941
  • Cited by: 0
History
  • Received:August 01,2016
  • Revised:September 14,2016
  • Online: June 06,2018
You are the first2032338Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063