Also appeared as Report TR RC 8173, IBM Research (January 1980).
Abstract: dvi (4K), pdf (101K), ps (34K).
Paper: pdf (2063K).
Both our algorithms for Toeplitz systems are derived from algorithms for computing entries in the Padé table for a given power series. We prove that entries in the Padé table can be computed by the Extended Euclidean Algorithm. We describe an algorithm EMGCD (Extended Middle Greatest Common Divisor) which is faster than the algorithm HGCD of Aho, Hopcroft and Ullman, although both require time O(n log2n), and we generalize EMGCD to produce PRSDC (Polynomial Remainder Sequence Divide and Conquer) which produces any iterate in the PRS, not just the middle term, in time O(n log2n). Applying PRSDC to the polynomials U0(x) = x2n+1 and U1(x) = a0 + a1x + ... + a2nx2n gives algorithm AD (Anti-Diagonal), which computes any (m,p) entry along the antidiagonal m + p = 2n of the Padé table for U1 in time O(n log2n).
Our other algorithm, MD (Main-Diagonal), computes any diagonal entry (n,n) in the Padé table for a normal power series, also in time O(n log2n). MD is related to Schönhage's fast continued fraction algorithm. A Toeplitz matrix T is naturally associated with U1, and the (n,n) Padé approximation to U1 gives the first column of T-1. Thus, the Padé table algorithms AD and MD give O(n log2n) Toeplitz algorithms ADT and MDT. Trench's formula breaks down in certain degenerate cases, but in such cases a companion formula, the discrete analog of the Christoffel-Darboux formula, is valid and may be used to compute z in time O(n log2n) via the fast computation (by algorithm AD) of at most four Padé approximants.
We also apply our results to obtain new complexity bounds for the solution of banded Toeplitz systems and for BCH decoding via Berlekamp's algorithm.