Linear algebra research on the AP1000

146. R. P. Brent, A. Czezowski, M. Hegland, P. E. Strazdins and B. B. Zhou, Linear algebra research on the AP1000, Proceedings of the Second Parallel Computing Workshop, Fujitsu Laboratories, Kawasaki, Japan, Nov. 1993, P1-L1-13.

Abstract: dvi (3K), pdf (76K), ps (23K).

Paper: dvi (26K), pdf (194K), ps (91K).

Abstract

This paper gives a report on various results of the Linear Algebra Project on the Fujitsu AP1000 in 1993. These include the general implementation of Distributed BLAS Level 3 subroutines (for the scattered storage scheme). The performance and user interface issues of the implementation are discussed. Implementations of Distributed BLAS-based LU Decomposition, Cholesky Factorization and Star Product algorithms are described.

The porting of the Basic Fourier Functions, written for the Fujitsu-ANU Area-4 Project, to the AP1000, is discussed. While the parallelization of the main FFT algorithm only involves communication on a single "transposition" step, several optimizations, including fast roots of unity calculation, are required for its efficient implementation.

Some optimizations of the Hestenes Singular Value Decomposition algorithm have been investigated, including a "BLAS Level 3"-like kernel for the main computation, and partitioning strategies. A study is made of how the optimizations affect convergence.

Finally, work on implementing QR Factorization on the AP1000 is discussed. The Householder QR method was found to be more efficient than the Givens QR method.

Comments

For related work see [136, 145].

Go to next publication

Return to Richard Brent's index page