Parallel execution time analysis for least squares problems

205. L. T. Yang and R. P. Brent, Parallel execution time analysis for least squares problems on distributed memory architectures, International Journal of Computer Research 10, 4 (2001), 517-530.

Revision appeared as "Parallel time complexity for solving large and sparse least squares problems on distributed memory architectures" in Practical Parallel Computing, Nova Science Publishers, 2001, 97-112.

Preprint: pdf (112K).


In this paper we study the parallelization of PCGLS, a basic iterative method whose main idea is to organize the computation of the conjugate gradient method with preconditioner applied to normal equations. Two important questions are discussed: what is the best possible data distribution, and which communication network topology is most suitable for solving least squares problems on massively parallel distributed memory computers ? A theoretical model of the data distribution and communication phases is presented, allowing us to give a detailed execution time complexity analysis. It is shown that the implementation of PCGLS with a row-block decomposition of the coefficient matrix on a ring communication structure is an efficient choice. Performance tests of a parallel PCGLS algorithm have been carried out on the massively parallel distributed memory system Parsytec, and experimental timing results are compared with the theoretical execution time complexity analysis.

Go to next publication

Return to Richard Brent's index page