Abstract: dvi (3K), pdf (76K), ps (23K).
Paper: dvi (26K), pdf (194K), ps (91K).
The porting of the Basic Fourier Functions, written for the Fujitsu-ANU Area-4 Project, to the AP1000, is discussed. While the parallelization of the main FFT algorithm only involves communication on a single "transposition" step, several optimizations, including fast roots of unity calculation, are required for its efficient implementation.
Some optimizations of the Hestenes Singular Value Decomposition algorithm have been investigated, including a "BLAS Level 3"-like kernel for the main computation, and partitioning strategies. A study is made of how the optimizations affect convergence.
Finally, work on implementing QR Factorization on the AP1000 is discussed. The Householder QR method was found to be more efficient than the Givens QR method.