基于判例构造的并行作业性能预测

微信服务号

微信订阅号

2025年4月24日 5:47 星期四

首页 > 过刊浏览>2010年第21卷第zk期 >238-250

基于判例构造的并行作业性能预测
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        张伟哲张伟哲
哈尔滨工业大学 计算机科学与技术学院,黑龙江 哈尔滨 150001
在期刊界中查找
在百度中查找
在本站中查找
张宏莉张宏莉
哈尔滨工业大学 计算机科学与技术学院,黑龙江 哈尔滨 150001
在期刊界中查找
在百度中查找
在本站中查找
张元竞张元竞
哈尔滨工业大学 计算机科学与技术学院,黑龙江 哈尔滨 150001
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60703014 (国家自然科学基金); the National Basic Research Program of China under Grant No.G2011CB302605 (国家重点基础研究发展计划(973))

Parallel Job Performance Prediction Based on the Case Reconstruction

Author:

ZHANG Wei-Zhe
ZHANG Wei-Zhe
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Hong-Li
ZHANG Hong-Li
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Yuan-Jing
ZHANG Yuan-Jing
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [17]

相似文献

引证文献

资源附件

文章评论

摘要:

针对基于MPI 的并行作业性能预测问题,鉴于历史预测与建模分析方法在异构网络计算环境中性能预测的局限,提出了基于判例构造的并行作业性能预测方法.在MPI 库PMPI 接口中插入封套函数,获取通信日志,并设计了日志规整和合并算法.将最核心的日志循环收缩问题,转化为字符串循环子串收缩问题,提出了一种基于后缀数组算法,在理论和实际的性能方面均优于已有算法;判例程序自动构建阶段,解决了计算时间与通信时间等比例缩放问题,设计了自动构建可执行判例程序的方法.同构与异构机群环境实验结果表明,判例预测方法能够比较准确地预估计算作业的运行时间,对于同构机群误差不超过3%,异构机群误差不超过10%,与同类算法相比,具有较好的综合性能.

关键词:网络计算;并行作业;性能预测;判例程序;循环子串收缩

Abstract:

Accurate prediction of the running time of parallel jobs under different computing resources is the foundation of many job scheduling approaches. A job performance prediction method based on the Performance Skeleton is proposed to avoid the inaccuracy of historical and modeling analysis prediction methods in heterogeneous clusters. To record the running trace, a method is designed to access all communication traces during the runtime. To merge these traces, this paper designs a trace-merge algorithm to structure the communication traces. To compress the circulatory traces, which is the most central and difficult, this paper converts it into a circular sub-string compressing problem, and proposes an algorithm based on the suffix array. Its performance is theoretically and practically better than the existing algorithms. To automatically reconstruct the Performance Skeleton, it solves the scalable problem of calculation and communication time. Experimental results show that these methods can accurately estimate the running time of computing jobs. The error is less than 3% for homogeneous clusters, and 10% for heterogeneous clusters.

Key words:network computing; parallel job; performance prediction; case program; circular sub-string compressing

参考文献

[1] Foster I. The grid: A new infrastructure for 21st century Science. Physics Today, 2002,55(2):42-47.

[2] Zhang W, Fang B, Hu M, Liu X, Zhang H, Gao L. Multisite co-Allocation scheduling algorithms for parallel jobs in computing grid environments. Science in China Series F?Information Sciences, 2006,49(6):906-926.

[3] Gao X, Snavely A, Carter L. Path grammar guided trace compression and trace approximation. In: Proc. of the 15th IEEE Int’l Symp. on High Performance Distributed Computing (HPDC-15). 2006. 57-68.

[4] Cardwell N, Savage S, Anderson T. Modeling TCP latency. In: Proc. of the IEEE INFOCOM 2000. 2000. 1742-1751.

[5] Sodhi S, Subhlok J. Skeleton based performance prediction on shared networks. In: Proc. of the 4th IEEE Symp. on Cluster Computing and the Grid (CCGrid 2004). Washington: IEEE Computer Press, 2004. 723-730.

[6] Cole M. Algorithmic Skeletons: Structured Management of Parallel Computations. Pitman/MIT Press, 1989. 1-42.

[7] Dikaiakos M, Rogers A, Steiglitz K. Fast: A functional algorithm simulation testbed. In: Proc. of the Int’l Conf. on Parallel and Distributed Systems. 1993.

[8] Dinda P, O’Hallaron D. An evaluation of linear models for host load prediction. In: Proc. of the 8th IEEE Int’l Symp. on High Performance Distributed Computing. 1999.

[9] Xu Q, Subhlok J. Efficient discovery of loop nests in communication traces of parallel programs. Technical Report, UH-CS-08-08, University of Houston, 2008.

[10] Xu Q, Subhlok J. Construction and evaluation of coordinated performance skeletons. In: Proc. of the 15th High Performance Computing (HiPC). 2008.

[11] Xu Q. Automatic construction of coordinated performance skeletons [Ph.D. Thesis]. University of Houston, 2007.

[12] Lu C, Reed DA. Compact application signatures for parallel and distributed scientific codes. In: Proc. of the Supercomputing 2002. 2002.

[13] Sherwood T, Perelman E, Hamerly G, Calder B. Automatically haracterizing large scale program behavior. In: Proc. of the10th Int’l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X). 2002.

[14] MPI. http://www.mcs.anl.gov/research/projects/mpi

[15] K?rkk?inen J, Sanders P, Burkhardt S. Linear work suffix array construction. Journal of the ACM, 2006,53(6):918-936.

[16] Bender MA, Farach-Colton M. The LCA problem revisited. In: Proc. of the Latin American Theoretical Informatics. 2000. 88-94.

[17] Fischer J, Heun V. A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In: Proc. of the Int’l Symp. on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. LNCS 4614, Springer-Verlag, 2007. 459-470.

引用本文

张伟哲,张宏莉,张元竞.基于判例构造的并行作业性能预测.软件学报,2010,21(zk):238-250

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2010-06-15
最后修改日期:2010-12-10
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码