Abstract:A scalable parallel MPI (message passing interface) version of the popular protein structure prediction tool Hmmpfam is presented, which is one of the kernel programs in the HMMER package. The master process in the previous PVM (parallel virtual machine) version is a communication bottleneck, and the speedup will decrease rapidly when running on large scale parallel systems. A novel three-level communication structure is presented, by which the parallel processing at sequence level and HMM model level is obtained in both. Meanwhile, the load-balance strategies to sequence level and HMM model level distribution are provided separately. Since disk access for getting HMM model costs very much, a so-called once load strategy is provided to reduce the cost. By all these optimization methods, 95% in parallel efficiency is achieved when running on a parallel computer containing more than 700 processors.