## Parallel execution time analysis for least squares problems

205.
L. T. Yang and R. P. Brent,
Parallel execution time analysis for least squares problems on
distributed memory architectures,
* International Journal of Computer Research* 10, 4 (2001), 517-530.
Revision appeared as
"Parallel time complexity for solving large and sparse least squares
problems on distributed memory architectures"
in * Practical Parallel Computing*,
Nova Science Publishers, 2001, 97-112.

Preprint: pdf (112K).

## Abstract

In this paper we study the parallelization of PCGLS, a basic iterative
method whose main idea is to organize the computation of the conjugate
gradient method with preconditioner applied to normal equations.
Two important questions are discussed: what is the best possible data
distribution, and which communication network topology is most suitable
for solving least squares problems on massively parallel distributed memory
computers ?
A theoretical model of the data distribution and communication
phases is presented, allowing us to give a detailed execution time
complexity analysis. It is shown that the implementation of PCGLS with a
row-block decomposition of the coefficient matrix on a ring communication
structure is an efficient choice. Performance tests of a parallel PCGLS
algorithm have been carried out on the massively parallel distributed
memory system *Parsytec*,
and experimental timing results are compared with the
theoretical execution time complexity analysis.
Go to next publication

Return to Richard Brent's index page