Parallel execution time analysis for least squares problems
L. T. Yang and R. P. Brent,
Parallel execution time analysis for least squares problems on
distributed memory architectures,
International Journal of Computer Research 10, 4 (2001), 517-530.
Revision appeared as
"Parallel time complexity for solving large and sparse least squares
problems on distributed memory architectures"
in Practical Parallel Computing,
Nova Science Publishers, 2001, 97-112.
Preprint: pdf (112K).
In this paper we study the parallelization of PCGLS, a basic iterative
method whose main idea is to organize the computation of the conjugate
gradient method with preconditioner applied to normal equations.
Two important questions are discussed: what is the best possible data
distribution, and which communication network topology is most suitable
for solving least squares problems on massively parallel distributed memory
A theoretical model of the data distribution and communication
phases is presented, allowing us to give a detailed execution time
complexity analysis. It is shown that the implementation of PCGLS with a
row-block decomposition of the coefficient matrix on a ring communication
structure is an efficient choice. Performance tests of a parallel PCGLS
algorithm have been carried out on the massively parallel distributed
memory system Parsytec,
and experimental timing results are compared with the
theoretical execution time complexity analysis.
Go to next publication
Return to Richard Brent's index page