Technical Report: pdf (1576K).

Note: the version at http://i.stanford.edu/pub/cstr/reports/cs/tr/70/157/CS-TR-70-157.pdf has scanning errors.

The normal algorithm requires
*n*^{3} + O(*n*^{2})
multiplications and about the same number of additions.
Winograd's algorithm almost halves the number of multiplications
at the expense of more additions.
Strassen's algorithm reduces the total number of operations to
O(*n*^{2.82}) by recursively multiplying
2*n* × 2*n* matrices using seven
*n* × *n*
matrix multiplications.

Floating-point error bounds are obtained for both Strassen's and Winograd's methods. It is shown that Strassen's method satisfies a certain numerical stability property (albeit weaker than that satisfied by the normal method); and that scaling is essential for numerical accuracy using Winograd's method.

In practical cases, for moderate *n*,
Winograd's method appears to be slightly faster than the other two
methods, but the gain is, at most, about 20 percent.
Strassen's method should be faster for sufficiently large *n*,
and this will be important in the future as memory sizes increase.

An attempt to generalize Strassen's method is described.

The error analysis of Strassen's algorithm shows that, although it is not
as stable componentwise as the normal algorithm, it is sufficiently stable
to be usable in implementations of Level 3 Basic Linear Algebra Subroutines
(BLAS). (This was not pointed out in the report since Level 3 BLAS had not
been invented at the time.)
For more recent results along similar lines, see
J. Demmel, I. Dumitriu and O. Holtz, "Fast linear algebra is stable",
*Numer. Math.* 108 (2007), 59-91.

For an error analysis of the application of Winograd's identity to LU factorization, see Brent [4].

The conjecture (on page 42)
that 3x3 matrix multiplication can be performed with 23
multiplications was proved by
Julian D. Laderman [*Bull. Amer. Math. Soc.* 82, 1, Jan. 1976, 126-128].

Recently, a
paper by
Fawzi et al [*Nature* Vol 610, 6 Oct 2022, pg. 47],
discusses some new algorithms, notably one with 47 multiplications for
4x4 matrix multiplication (improving on Strassen's 49). However, it should
be noted that the new algorithms only seem to work over the finite field
GF(2) with two elements, and not over the rational, real, or complex fields.
Thus, their application to scientific computation is not clear.

For other developments see Chapters 14-15 of
Burgisser, Clausen and Shokrollahi,
* Algebraic Complexity Theory*,
Springer, 1997;
also recent papers by
Chris Umans *
et al* on an interesting approach via group algebras.