Abstract:Driven by big data and cloud computing techniques, the scale of the IT expenditure grows continuously and energy consumption problem has become more and more urgent. Study shows that the lower resource usage and the long idle time of network nodes are responsible for this problem in a large-scale distributed system. This paper studies the energy consumption optimization of MapReduce system. Traditional optimization approaches employ workload concentration, task live-immigration or dynamical power on-off methods. But in a MapReduce system, a node not only executes tasks but also provides data, therefore cannot be simply shut down for energy-saving while the tasks running on it are migrated. This paper presents an idea that a good data placement can optimize the energy consumption of a MapReduce system. Based on this idea, the target of data placement which optimizes the energy consumption is defined. Then the data placement algorithm achieving the target is proved efficient in theory. Finally, three MapReduce systems with different data placement algorithms are deployed on the heterogeneous MapReduce system. Comparing the energy consumption of three systems under the three typical CPU-intensive, I/O intensive and interactive jobs, the proposed data placement algorithm is proved to be able to optimize the energy consumption of a MapReduce system. The optimization efficiency of the proposed approach is proved both in theory and by experiment, demonstrating its ability to facilitate the applications of energy consumption computing and big data analysis.