Intensive scientific algorithms can usually be formulated as nested loops which are the main source of parallelism. When a nested loop is executed in parallel, the total execution time is composed of two parts-the computation time and the communication time. For a message-passing multiprocessor system, performance declines rapidly as the communication overhead is higher than the corresponding computation. In this paper, a method for parallel executing nested loops with constant loop-carried dependencies on message-passing multiprocessor systems to reduce the communication overhead is presented. First, we partition the nested loop into blocks which result in little communication without concern for the topology of machines. For a given linear time transformation found by the hyperplane method, the iterations of a nested loop are partitioned into blocks such that the communication among the blocks is reduced while the execution ordering defined by the time transformation is not perturbed. Then, the partitioned blocks generated by the partitioning method are mapped onto multiprocessor systems according to the specific properties of various machines. We propose a heuristic mapping algorithm for the hypercube machines.
關聯:
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS