Abstract:Cloud computing has shown to provide a cost-effective and powerful platform for big data processing. Under this paradigm, data manager (DM) usually rents geographically distributed datacenters to process their geographically dispersed data set, concerning its convenience and economy. Usually, the data sets are dynamically generated and the resource pricing varies over time, which make it a critical issue of cost effectiveness to move the data from different geographic locations to different datacenters while providing suitable computation resources for processing. In this paper, a pertinent joint stochastic optimization problem is firstly formulated, and then the problem is decoupled into two independent subproblems with efficient solutions via Lyapunov framework. Next, an online algorithm based on the solutions is developed. Theoretical analysis show that the proposed online algorithm can produce a solution which is arbitrarily close to the offline optimal solution while minimizing the data processing delays. Experiments on WorldCup98 and Youtube dataset validate the proposed algorithms and demonstrate the superiority of the new approach.