Abstract:Scientific workflow is a complicated data intensive application. How to achieve an effective data placement schema in hybrid cloud environment has become a crucial issue nowadays, especially with the new challenges brought by the security issues. Traditional data placement strategies usually adopt load balancing-based partition model to allocate datasets. Although these data placement schemas can have good performance in load balancing, their data transfer time may not be optimal. In contrast to traditional strategies, this paper focuses on the hybrid cloud environment and proposes a data dependency destruction-based partition model to achieve the minimal data dependency destruction partition. In addition, it presents a novel datacenter-oriented data placement strategy. This strategy allocates high dependency datasets to one datacenter according to the new partition model and thus significantly reduces data transfer time between datacenters. Experimental results show that the proposed strategy can effectively reduce data transfer time during workflow's execution.