Abstract:This paper presents a software controlled data prefetching scheme to overcome the delay of remote memory accesses in the physically distributed and logically shared memory system. With the information provided by the parallel programs respecting weak order consistency model, prefetch operations can be arranged at synchronization points to get the data before they are used. This scheme can effectively hide the large latency of non uniform memory access architechtrol model through block operations.