Richard Brent - Twenty Publications and Abstracts

Richard Brent - Twenty Publications and Abstracts

The numbering corresponds to a complete list of publications. To use the hyperlinks, please access the version available at http://www.comlab.ox.ac.uk/oucl/work/richard.brent/pub/pubs20a.html.

-------------------------------------------

11. R. P. Brent, Algorithms for minimization without derivatives, Prentice-Hall, Englewood Cliffs, New Jersey, 1973, 195 pp. Reprinted by Dover Publications, New York, 2002.

This monograph describes and analyzes some practical methods for finding approximate zeros and minima of functions, using only function (not derivative) evaluations. The contents include: the use of successive interpolation for finding simple zeros of a function and its derivatives; an algorithm with guaranteed convergence for finding a zero of a function; an algorithm with guaranteed convergence for finding a minimum of a function of one variable; an algorithm for global minimization given an upper bound on the second derivative; and a new algorithm for minimizing a function of several variables without calculating derivatives.

Chapter 4 describes an algorithm for finding a zero of a function which changes sign in a given interval. The algorithm is based on a combination of successive linear interpolation and bisection, in much the same way as Dekker's algorithm. However, the new algorithm never converges much more slowly than bisection, whereas Dekker's algorithm may converge extremely slowly in certain cases.

Chapter 5 gives an algorithm for finding a local minimum of a function of one variable. The algorithm combines golden section search and successive parabolic interpolation, in the same way as bisection and successive linear interpolation are combined in the zero-finding algorithm of Chapter 4. Convergence in a reasonable number of function evaluations is guaranteed. For a C² function with positive curvature at the minimum, convergence is superlinear (provided that the minimum is at an interior point of the interval). Other algorithms given in the literature either fail to have these two desirable properties, or their order of convergence in typical cases is less than that for the algorithm of Chapter 5.

Chapter 6 considers the problem of finding an approximation to the global minimum of a function f, defined on a finite interval, if some a priori information about f is given. Usually this information is an upper bound on f". A good (in fact close to optimal) algorithm is described. Particular attention is paid to the problem of giving guaranteed bounds in the presence of rounding errors. Insight into the behaviour of the algorithm is obtained by considering some tractable special cases. Finally, the algorithm is generalized to handle functions of several real variables, given suitable a priori information.

Chapter 7 describes a modification of Powell's (1964) algorithm for finding a local minimum of a function of several variables without calculating derivatives. The modification is designed to ensure quadratic convergence, and to avoid the difficulties with Powell's criterion for accepting new search directions.

The book includes a bibliography of over 300 items, and an Appendix contains Fortran translations of the Algol procedures given in Chapters 4-6.

[For a review by D. Goldfarb, see Mathematics of Computation 28 (1974), 865-866.]

-------------------------------------------

17. R. P. Brent, On the precision attainable with various floating-point number systems, IEEE Transactions on Computers C-22 (1973), 601-607.

For scientific computations on a digital computer the set of real number is usually approximated by a finite set F of "floating-point" numbers. We compare the numerical accuracy possible with difference choices of F having approximately the same range and requiring the same word length. In particular, we compare different choices of base (or radix) in the usual floating-point systems. The emphasis is on the choice of F, not on the details of the number representation or the arithmetic, but both rounded and truncated arithmetic are considered. Theoretical results are given, and some simulations of typical floating-point computations (forming sums, solving systems of linear equations, finding eigenvalues) are described. If the leading fraction bit of a normalized base 2 number is not stored explicitly (saving a bit), and the criterion is to minimize the mean square roundoff error, then base 2 is best. If unnormalized numbers are allowed, so the first bit must be stored explicitly, then base 4 (or sometimes base 8) is the best of the usual systems.

[This paper was written in the days when popular IBM machines used base 16, well before the IEEE floating point standard made base 2 ubiquitous.]

-------------------------------------------

22. R. P. Brent, The parallel evaluation of general arithmetic expressions, Journal of the ACM 21 (1974), 201-206.

It is shown that arithmetic expressions with n variables and constants; operations of addition, multiplication, and division; and any depth of parenthesis nesting can be evaluated in time 4 log₂n + 10(n-1)/p using p processors which can independently perform arithmetic operations in unit time. This bound is within a constant factor of the best possible. A sharper result is given for expressions without the division operation, and the question of numerical stability is discussed.

Lemma 2 is a useful simulation result [now often called "Brent's Lemma"] which allows us to deduce time bounds for parallel computations with a limited number of processors by counting the operations performed on a parallel machine with an unlimited number of processors.

-------------------------------------------

34. R. P. Brent, Fast multiple-precision evaluation of elementary functions, Journal of the ACM 23 (1976), 242-251.

Let f(x) be one of the usual elementary functions (exp, log, artan, sin, cosh, etc.), and let M(n) be the number of single-precision operations required to multiply n-bit integers. It is shown that f(x) can be evaluated, with relative error O(2^-n), in O(M(n)log(n)) operations, for any floating-point number x (with an n-bit fraction) in a suitable finite interval. From the Schönhage-Strassen bound on M(n), it follows that an n-bit approximation to f(x) may be evaluated in O(n(log(n))²loglog(n)) operations. Special cases include the evaluation of constants such as pi, e, and e^pi. The algorithms depend on the theory of elliptic integrals, using the arithmetic-geometric mean iteration and ascending Landen transformations.

[One of the results of this paper, found independently by Salamin, is the first quadratically convergent algorithm for computing pi, now known as the "Brent-Salamin" algorithm.]

-------------------------------------------

45. R. P. Brent and H. T. Kung, Fast algorithms for manipulating formal power series, Journal of the ACM 25 (1978), 581-595.

The classical algorithms require order n³ operations to compute the first n terms in the reversion of a power series or the composition of two series, and order n² log n if the fast Fourier transform (FFT) is used for power series multiplication. We show that the composition and reversion problems are equivalent (up to constant factors), and we give algorithms which require only order (n log n)^3/2 operations.

In many cases of practical importance only order n log n operations are required; these include certain special functions of power series and power series solutions of certain differential equations, including many standard functions of mathematical physics such as Bessel functions.

Applications to root-finding methods which use inverse interpolation and to queueing theory are described, some results on multivariate power series are stated, and several open questions are mentioned.

-------------------------------------------

47. R. P. Brent, On the zeros of the Riemann zeta function in the critical strip, Mathematics of Computation 33 (1979), 1361-1372.

We describe a computation which shows that the Riemann zeta function has exactly 75,000,000 zeros in the region 0 < t < 32,585,736.4 of the critical strip; all these zeros are simple and lie on the critical line. (A similar result for the first 3,500,000 zeros was established by Rosser, Yohe and Schoenfeld.) Counts of the number of Gram blocks of various types and the number of failures of "Rosser's rule" are given.

[ Wedeniwski's ZetaGrid project, using essentially the same algorithm on more than 600 computers, has pushed the bound up to 10¹¹ zeros.]

-------------------------------------------

49. R. P. Brent and E. M. McMillan, Some new algorithms for high-precision computation of Euler's constant, Mathematics of Computation 34 (1980), 305-312.

We describe several new algorithms, more efficient than those previously known, for the high-precision computation of Euler's constant 0.577... Using one of the algorithms, which is based on an identity involving Bessel functions, Euler's constant has been computed to 30,100 decimal places. By computing their regular continued fractions, we show that, if Euler's constant or its exponential is of the form P/Q for integers P and Q, then |Q| > 10¹⁵⁰⁰⁰. The computations were performed using the first author's MP package.

[In a later paper, Brent showed that the Bessel function identity mentioned above is the correct generalisation of an (incorrect) conjecture of Ramanujan.

In 1999, Demichel and Gourdon used a related identity from [49] to compute Euler's constant to 108,000,000 decimal digits.

McMillan is better known for the discovery of plutonium: see Jackson and Panofsky, "Edwin Mattison McMillan 1907-1991", Biographical Memoirs of the National Academy of Sciences (USA), 69 (1996).]

-------------------------------------------

52. R. P. Brent, Unrestricted algorithms for elementary and special functions (invited paper at the IFIP 8th World Computer Congress, Tokyo and Melbourne), Information Processing 80 , North-Holland, Amsterdam, 1980, 613-619.

We describe some "unrestricted" algorithms which are useful for the computation of elementary and special functions when the precision required is not known in advance. Several general classes of algorithms are identified and illustrated by examples. The topics include: power series methods, use of halving identities, asymptotic expansions, continued fractions, recurrence relations, Newton's method, numerical contour integration, and the arithmetic-geometric mean. Most of the algorithms discussed are implemented in the author's MP package.

-------------------------------------------

55. R. P. Brent and H. T. Kung, The area-time complexity of binary multiplication, Journal of the ACM 28 (1981), 521-534. Corrigendum: ibid 29 (1982), 904.

The problem of performing multiplication of n-bit numbers on a chip is considered. Let A denote the chip area and T the time required to perform multiplication. By using a model of computation which is a realistic approximation to current and anticipated LSI or VLSI technology, it is shown that

AT > c₁n^3/2 and AT² > c₂n²,

where c₁ and c₂ are positive constants which depend on the technology but are independent of n. The exponents 3/2 and 2 are best possible.

A consequence of the results is that binary multiplication is "harder" than binary addition, in the sense that the area-time complexity of n-bit binary multiplication is asymptotically greater than that of n-bit binary addition.

[Similar results for the AT² measure were obtained independently by Abelson and Andreae [Communications of the ACM 23 (1980), 20-23], using a more restrictive model.]

-------------------------------------------

59. R. P. Brent, F. G. Gustavson and D. Y. Y. Yun, Fast solution of Toeplitz systems of equations and computation of Padé approximants, Journal of Algorithms 1 (1980), 259-295.

We present two new algorithms, ADT and MDT, for solving order-n Toeplitz systems of linear equations Tz = b in time O(n log²n) and space O(n). The fastest algorithms previously known, such as Trench's algorithm, require time of order n² and require that all principal submatrices of T be nonsingular. Our algorithm ADT requires only that T be nonsingular.

Both our algorithms for Toeplitz systems are derived from algorithms for computing entries in the Padé table for a given power series. We prove that entries in the Padé table can be computed by the Extended Euclidean Algorithm. We describe an algorithm EMGCD (Extended Middle Greatest Common Divisor) which is faster than the algorithm HGCD of Aho, Hopcroft and Ullman, although both require time O(n log²n), and we generalize EMGCD to produce PRSDC (Polynomial Remainder Sequence Divide and Conquer) which produces any iterate in the PRS, not just the middle term, in time O(n log²n). Applying PRSDC to certain polynomials gives algorithm AD (Anti-Diagonal), which computes any (m,p) entry along the antidiagonal m + p = 2n of the Padé table for a given power series in time O(n log²n).

Our other algorithm, MD (Main-Diagonal), computes any diagonal entry (n,n) in the Padé table for a normal power series, also in time O(n log²n). MD is related to Schönhage's fast continued fraction algorithm. Using the connection between Toeplitz matrices and Padé tables, algorithms AD and MD give O(n log²n) Toeplitz algorithms ADT and MDT. Trench's formula breaks down in certain degenerate cases, but in such cases a companion formula, the discrete analog of the Christoffel-Darboux formula, is valid and may be used to compute z in time O(n log²n) via the fast computation (by algorithm AD) of at most four Padé approximants.

We also apply our results to obtain new complexity bounds for the solution of banded Toeplitz systems and for BCH decoding via Berlekamp's algorithm.

[The term "superfast" was introduced by Ammar and Gragg to distinguish algorithms which are faster than order n². This paper gave the first such algorithm for the solution of Toeplitz systems.]

-------------------------------------------

60. R. P. Brent and H. T. Kung, A regular layout for parallel adders, IEEE Transactions on Computers C-31 (1982), 260-264.

With VLSI architectures, the chip area and design regularity represent a better measure of cost than the conventional gate count. We show that addition of n-bit binary numbers can be performed on a chip with a regular layout in time proportional to log n.

[At the time of publication the use of carry-lookahead in VLSI designs was unpopular, but more recently the design technique proposed here has been applied widely in VLSI implementations of adders.]

-------------------------------------------

79. R. P. Brent, H. T. Kung and F. T. Luk, Some linear-time algorithms for systolic arrays (invited paper at the IFIP 9th World Computer Congress, Paris), Information Processing 83, North-Holland, Amsterdam, 1983, 865-876.

We describe some recent results on linear-time algorithms for systolic arrays. In particular, we show how the greatest common divisor (GCD) of two polynomials of degree n over a finite field can be computed in time O(n) on a linear systolic array of O(n) cells; similarly for the GCD of two n-bit binary numbers.

We show how n × n Toeplitz systems of linear equations can be solved in time O(n) on a linear array of O(n) cells, each of which has constant memory size (independent of n).

Finally, we outline how a two-dimensional square array of O(n) × O(n) cells can be used to solve (to working accuracy) the eigenvalue problem for a symmetric real n × n matrix in time O(nS(n)). Here S(n) is a slowly growing function of n; for practical purposes S(n) can be regarded as a constant.

In addition to their theoretical interest, these results have potential applications in the areas of error-correcting codes, symbolic and algebraic computations, signal processing and image processing. For example, systolic GCD arrays for error correction have been implemented with the micro-programmable "PSC" chip.

-------------------------------------------

116. R. P. Brent, G. L. Cohen and H. J. J. te Riele, Improved techniques for lower bounds for odd perfect numbers, Mathematics of Computation 57 (1991), 857-868.

If N is an odd perfect number, and q^k is the highest power of q dividing N, where q is prime and k is even, then it is almost immediate that N > q^2k. We prove here that, subject to certain conditions verifiable in polynomial time, in fact N > q^5k/2. Using this and related results, we are able to extend the computations in an earlier paper (Mathematics of Computation 53 (1989), 431-437) to show that N > 10³⁰⁰. The main part of the proof is a (very large) tree, each of whose 12655 leaves gives either a contradiction or a sufficiently large lower bound on N.

-------------------------------------------

133. R. P. Brent, On the periods of generalized Fibonacci recurrences, Mathematics of Computation 63 (1994), 389-401.

We give a simple condition for a linear recurrence (mod 2^w) of degree r to have the maximal possible period 2^w-1(2^r-1). It follows that the period is maximal in the cases of interest for pseudo-random number generation, i.e. for 3-term linear recurrences defined by trinomials which are primitive (mod 2) and of degree r > 2. We consider the enumeration of certain exceptional polynomials which do not give maximal period, and list all such polynomials of degree less than 15.

The main result has applications to uniform random number generators with good statistical properties and extremely long periods.

-------------------------------------------

135. R. P. Brent, On computing factors of cyclotomic polynomials, Mathematics of Computation 61 (1993), 131-149 (D. H. Lehmer memorial issue).

For odd square-free n > 1 the cyclotomic polynomial Phi_n(x) satisfies the identity of Gauss

4Phi_n(x) = A_n² - (-1)^(n-1)/2nB_n².

A similar identity of Aurifeuille, Le Lasseur and Lucas is

Phi_n((-1)^(n-1)/2x) = C_n² - nxD_n²

or, in the case that n is even and square-free,

+-Phi_n/2(-x²) = C_n² - nxD_n².

Here A_n(x), ... , D_n(x) are polynomials with integer coefficients. We show how these coefficients can be computed by simple algorithms which require O(n²) arithmetic operations and work over the integers. We also give explicit formulae and generating functions for A_n(x), ... , D_n(x), and illustrate the application to integer factorization with some numerical examples.

-------------------------------------------

144. A. W. Bojanczyk, R. P. Brent, F. R. de Hoog and D. R. Sweet, On the stability of the Bareiss and related Toeplitz factorization algorithms, SIAM J. Matrix Analysis and Applications 16 (1995), 40-57.

This paper contains a numerical stability analysis of factorization algorithms for computing the Cholesky decomposition of symmetric positive definite matrices of displacement rank 2. The algorithms in the class can be expressed as sequences of elementary downdating steps. The stability of the factorization algorithms follows directly from the numerical properties of algorithms for realizing elementary downdating operations. It is shown that the Bareiss algorithm for factorizing a symmetric positive definite Toeplitz matrix is in the class and hence the Bareiss algorithm is stable. Some numerical experiments that compare behavior of the Bareiss algorithm and the Levinson algorithm are presented. These experiments indicate that in general (when the reflection coefficients are not all positive) the Levinson algorithm is not stable; certainly it can give much larger residuals than the Bareiss algorithm.

-------------------------------------------

161. R. P. Brent, Factorization of the tenth Fermat number, Mathematics of Computation 68 (1999), 429-451.

We describe the complete factorization of the tenth and eleventh Fermat numbers. The tenth Fermat number is a product of four prime factors with 8, 10, 40 and 252 decimal digits. The eleventh Fermat number is a product of five prime factors with 6, 6, 21, 22 and 564 decimal digits. We also note a new 27-digit factor of the thirteenth Fermat number. This number has four known prime factors and a 2391-decimal digit composite factor. All the new factors reported here were found by the elliptic curve method (ECM). The 40-digit factor of the tenth Fermat number was found after about 140 Mflop-years of computation. We discuss aspects of the practical implementation of ECM, including several variants of phase 2, and the use of special-purpose hardware. We also note several other large factors found recently by ECM.

-------------------------------------------

183. R. P. Brent, Twenty years' analysis of the binary Euclidean algorithm, Millennial Perspectives in Computer Science: Proceedings of the 1999 Oxford-Microsoft Symposium in honour of C. A. R. Hoare, Palgrave, 2000, 41-53.

The binary Euclidean algorithm is a variant of the classical Euclidean algorithm. It avoids divisions and multiplications, except by powers of two, so is potentially faster than the classical algorithm on a binary machine.

The classical Euclidean algorithm has been exhaustively analysed since the time of Gauss. The theory of the binary Euclidean algorithm is much less well developed. The first satisfactory, but not completely rigorous, model was given by the author in 1976 (see Knuth, The Art of Computer Programming, Vol. 2). Recently, Vallée succeeded in giving the model a rigorous justification.

We describe the binary algorithm and consider its average case behaviour. In particular, we correct some small but interesting errors in the literature, discuss the results of Vallée, and describe a numerical computation which confirms a conjecture of Vallée to (at least) 40 decimal places. The numerical computations were performed using the author's MP package.

-------------------------------------------

185. R. P. Brent, Random number generation and simulation on vector and parallel computers (invited paper at fourth Euro-Par Conference), Lecture Notes in Computer Science, Vol. 1470, Springer-Verlag, Berlin, 1998, 1-20.

Pseudo-random numbers are often required for simulations performed on parallel computers. The requirements for parallel random number generators are more stringent than those for sequential random number generators. As well as passing the usual sequential tests on each processor, a parallel random number generator must give different, independent sequences on each processor.

We consider the requirements for a good parallel random number generator, discuss generators for the uniform and normal distributions, and describe a new class of generators for the normal distribution, based on a proposal by Chris Wallace [ ACM Transactions on Mathematical Software 22 (1996), 119-127]. These generators can give very fast vector or parallel implementations. Implementations of various classes of uniform and normal generators on vector and parallel computers are discussed.

-------------------------------------------

199. R. P. Brent, S. Larvala and P. Zimmermann, A fast algorithm for testing reducibility of trinomials mod 2 and some new primitive trinomials of degree 3021377, Mathematics of Computation 72 (2003), 1443-1452.

The standard algorithm for testing irreducibility of a trinomial of prime degree r over GF(2) requires 2r + O(1) bits of memory. We describe an algorithm which requires only 3r/2 + O(1) bits of memory and 44 percent of the number of bit-operations required by the standard algorithm.

If 2^r - 1 is a Mersenne prime, then an irreducible trinomial of degree r is necessarily primitive. We give primitive trinomials for the Mersenne exponents r = 756839, 859433, and 3021377. The results for r = 859433 extend and correct some computations of Kumada et al [Mathematics of Computation 69 (2000), 811-814]. The two results for r = 3021377 are primitive trinomials of the highest known degree.

[More recently, Brent, Larvala and Zimmermann [214] have extended the results to degree 6972593, thus completing the search for all Mersenne exponent degrees less than 10000000.]

-------------------------------------------