Supported by the National Natural Science Foundation of China under Grant No.60703014 (国家自然科学基金); the National Basic Research Program of China under Grant No.G2011CB302605 (国家重点基础研究发展计划(973))
Parallel Job Performance Prediction Based on the Case Reconstruction
Accurate prediction of the running time of parallel jobs under different computing resources is the foundation of many job scheduling approaches. A job performance prediction method based on the Performance Skeleton is proposed to avoid the inaccuracy of historical and modeling analysis prediction methods in heterogeneous clusters. To record the running trace, a method is designed to access all communication traces during the runtime. To merge these traces, this paper designs a trace-merge algorithm to structure the communication traces. To compress the circulatory traces, which is the most central and difficult, this paper converts it into a circular sub-string compressing problem, and proposes an algorithm based on the suffix array. Its performance is theoretically and practically better than the existing algorithms. To automatically reconstruct the Performance Skeleton, it solves the scalable problem of calculation and communication time. Experimental results show that these methods can accurately estimate the running time of computing jobs. The error is less than 3% for homogeneous clusters, and 10% for heterogeneous clusters.
[1] Foster I. The grid: A new infrastructure for 21st century Science. Physics Today, 2002,55(2):42-47.
[2] Zhang W, Fang B, Hu M, Liu X, Zhang H, Gao L. Multisite co-Allocation scheduling algorithms for parallel jobs in computing grid environments. Science in China Series F?Information Sciences, 2006,49(6):906-926.
[3] Gao X, Snavely A, Carter L. Path grammar guided trace compression and trace approximation. In: Proc. of the 15th IEEE Int’l Symp. on High Performance Distributed Computing (HPDC-15). 2006. 57-68.
[4] Cardwell N, Savage S, Anderson T. Modeling TCP latency. In: Proc. of the IEEE INFOCOM 2000. 2000. 1742-1751.
[5] Sodhi S, Subhlok J. Skeleton based performance prediction on shared networks. In: Proc. of the 4th IEEE Symp. on Cluster Computing and the Grid (CCGrid 2004). Washington: IEEE Computer Press, 2004. 723-730.
[6] Cole M. Algorithmic Skeletons: Structured Management of Parallel Computations. Pitman/MIT Press, 1989. 1-42.
[7] Dikaiakos M, Rogers A, Steiglitz K. Fast: A functional algorithm simulation testbed. In: Proc. of the Int’l Conf. on Parallel and Distributed Systems. 1993.
[8] Dinda P, O’Hallaron D. An evaluation of linear models for host load prediction. In: Proc. of the 8th IEEE Int’l Symp. on High Performance Distributed Computing. 1999.
[9] Xu Q, Subhlok J. Efficient discovery of loop nests in communication traces of parallel programs. Technical Report, UH-CS-08-08, University of Houston, 2008.
[10] Xu Q, Subhlok J. Construction and evaluation of coordinated performance skeletons. In: Proc. of the 15th High Performance Computing (HiPC). 2008.
[11] Xu Q. Automatic construction of coordinated performance skeletons [Ph.D. Thesis]. University of Houston, 2007.
[12] Lu C, Reed DA. Compact application signatures for parallel and distributed scientific codes. In: Proc. of the Supercomputing 2002. 2002.
[13] Sherwood T, Perelman E, Hamerly G, Calder B. Automatically haracterizing large scale program behavior. In: Proc. of the10th Int’l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X). 2002.
[15] K?rkk?inen J, Sanders P, Burkhardt S. Linear work suffix array construction. Journal of the ACM, 2006,53(6):918-936.
[16] Bender MA, Farach-Colton M. The LCA problem revisited. In: Proc. of the Latin American Theoretical Informatics. 2000. 88-94.
[17] Fischer J, Heun V. A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In: Proc. of the Int’l Symp. on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. LNCS 4614, Springer-Verlag, 2007. 459-470.