Notices of the American Mathematical Society

Welcome to the Notices of the American Mathematical Society.
With support from AMS membership, we are pleased to share the journal with the global mathematical community.

Numerical Stability in Gaussian Elimination

John Urschel

Communicated by Notices Associate Editor Emilie Purvine

In some sense, numerical analysis is a very old field: it involves the study of algorithms for problems in continuous mathematics, and algorithms have been around at least since the Babylonians wrote them on tablets more than four thousand years ago. But with the advent of the computer age in the 1940s, which made numerical calculations on a scale that was previously unimaginable not only possible but easy, understanding the stability of these algorithms became increasingly urgent. What kind of errors could creep in? What were the relative costs of accuracy and speed?

John von Neumann and Herman Goldstine, two of the architects of the modern computer, saw the need for numerical analysis instantly. At the Institute for Advanced Study, they developed the IAS Machine (Figure 1), one of the first⁠Footnote1 stored-program digital computers, and sought to understand the effect of rounding in computations on it. In their seminal 1947 paper “Numerical Inverting of Matrices of High Order,” von Neumann and Goldstine studied the stability of one of the oldest and most fundamental known algorithms, Gaussian elimination vNG47. Goldstine later wrote Gol93:

1

See also the Small-Scale Experimental Machine (SSEM), completed in 1948 at the University of Manchester, the Electronic Delay Storage Automatic Calculator (EDSAC), completed in 1949 at the University of Cambridge, and the Electronic Discrete Variable Automatic Computer (EDVAC), completed in 1949 at the University of Pennsylvania and fully operational by 1952. All of these machines were inspired by the architecture described by von Neumann in the controversial “First Draft of a Report on the EDVAC,” commonly referred to as the von Neumann architecture.

Indeed, von Neumann and I chose this topic for the first modern paper on numerical analysis ever written precisely because we viewed the topic as being absolutely basic to numerical mathematics.

Figure 1.

John von Neumann and the director of the Institute for Advanced Study, Robert Oppenheimer, in front of the IAS Machine.

Graphic without alt text

Gaussian elimination is very simple—so simple that the “elimination method” is taught to grade school students to solve systems of two or three equations. It is a straightforward way of solving linear algebraic equations in unknowns. The earliest variant appears in the eighth chapter of the ancient Chinese text The Nine Chapters on the Mathematical Art (Figure 2). And yet, despite being more than two thousand years old and the most popular method for solving a linear system, some aspects of Gaussian elimination are still not understood. Goldstine and von Neumann took on the question of round-off error analysis for matrix factorization and inversion methods in part because of disagreements about the precision of the technique in certain cases. (Alan Turing also presented an analysis around the same time.)

Figure 2.

The earliest variant of Gaussian elimination appears in the eighth chapter of the ancient Chinese text The Nine Chapters on the Mathematical Art. The above procedure is presented to solve the following Har11: Given 3 bundles of superior paddy [unhusked rice], 2 bundles of ordinary paddy, and 1 bundle of inferior paddy, [together they yield] 39 dou of grain; 2 bundles of superior paddy, 3 bundles of ordinary paddy, and 1 bundle of inferior paddy [together yield] 34 dou of grain; 1 bundle of superior paddy, 2 bundles of ordinary paddy, and 3 bundles of inferior paddy [together yield] 26 dou of grain. Problem: 1 bundle of superior, ordinary, and inferior paddy each yield how much grain?

\renewcommand{\arraystretch}{1} \setlength{\unitlength}{1.0pt} \setlength{\extrarowheight}{0.0pt}\begin{tikzpicture}[ node distance=0.6cm, every node/.style={anchor=base}, every path/.style={-stealth, thick} ] \node(M1) { $\left[\begin{array}{@{}ccc|c@{}} 3 & 2 & 1 & 39 \\ 2 & 3 & 1 & 34 \\ 1 & 2 & 3 & 26 \end{array}\right]$ }; \node(M2) [right=of M1] { $\left[\begin{array}{@{}ccc|c@{}} 3 & 2 & 1 & 39 \\ 6 & 9 & 3 & 102 \\ 3 & 6 & 9 & 78 \end{array}\right]$ }; \node(M3) [right=of M2] { $\left[\begin{array}{@{}ccc|c@{}} 3 & 2 & 1 & 39 \\ 0 & 5 & 1 & 24 \\ 0 & 4 & 8 & 39 \end{array}\right]$ }; \node(M4) [right=of M3] { $\left[\begin{array}{@{}ccc|c@{}} 3 & 2 & 1 & 39 \\ 0 & 5 & 1 & 24 \\ 0 & 20 & 40 & 195 \end{array}\right]$ }; \node(M5) [right=of M4] { $\left[\begin{array}{@{}ccc|c@{}} 3 & 2 & 1 & 39 \\ 0 & 5 & 1 & 24 \\ 0 & 0 & 36 & 99 \end{array}\right]$ }; \draw[] (M1.east) to (M2.west); \draw[] (M2.east) to (M3.west); \draw[] (M3.east) to (M4.west); \draw[] (M4.east) to (M5.west); \end{tikzpicture}

Many mathematicians, including von Neumann himself, were quite pessimistic about the stability of the algorithm.⁠Footnote2 Von Neumann famously changed his opinion only a year later and, together with Goldstine, provided the first framework for understanding it. However, their concrete results were limited, proving stability only in the special case of a symmetric positive definite matrix using complete pivoting.⁠Footnote3 Despite the urgency of pinpointing and avoiding cascading rounding errors as computers became widely available, questions regarding the stability of Gaussian elimination remained. In many ways, it’s Wilkinson who is the hero of this story, as it was not until his 1961 paper “Error Analysis of Direct Methods of Matrix Inversion” that a more complete analysis of the backward error in Gaussian elimination due to rounding errors was conducted Wil61. Most of the open questions that remain involve mathematically quantifying what Wilkinson, even in the 1960s, felt ought to be true.

2

See Hotelling’s 1943 paper Hot43 and Bargmann, Montgomery, and von Neumann’s 1946 technical report BMvN46.

3

“We have not so far been able to obtain satisfactory error estimates for the pseudo-operational equivalent of the elimination method in its general form…We did, however, succeed in securing everything that is needed in the special case of a definite vNG47.

However, before we delve much further into the topic, allow me to remind the reader what exactly Gaussian elimination is.

What is Gaussian elimination?

Given an invertible matrix and compatible vector , Gaussian elimination is an algorithm to solve the linear system for in two stages:

(1)

factor into the product of a lower unitriangular matrix and an upper triangular matrix (LU factorization)

(2)

solve the linear systems for (forward substitution) and for (backward substitution).

The second stage is straightforward and numerically stable conditional on the stability of the first, and so we can devote our attention to the LU factorization.

The algorithmic procedure for computing this factorization can be concisely summarized by the block formula

where is a scalar (called the pivot), , are vectors, and is a matrix. If for a lower unitriangular and upper triangular , then the LU factorization of is given by

This iterative procedure of rank-one updates is essentially what a computer does when performing Gaussian elimination, up to one crucial detail: What to do when the pivot ? In this case, some permutations of the rows and/or columns of are needed to put a nonzero entry in the pivot’s place. In reality, a pivot equalling zero is not the only difficulty computers contend with. Very small pivots can be just as dangerous, and the prospect of errors compounding rapidly can also lead to wildly incorrect results. Some pivoting strategy is required to avoid these situations (a topic we will leave for later).

It’s worth noting that Gaussian elimination has a number of equivalent mathematical representations, each of which can prove useful in analysis. For example, the block submatrices iteratively created during this algorithm can be tied to the original matrix through a Schur complement, i.e., Gaussian elimination can be equivalently represented by a sequence of matrices for equals to satisfying and

where

In this representation, the resulting LU factorization of is encoded in the first row and column of each of the iterates . This procedure can also be represented entirely through the language of determinants. Because the determinant of a triangular matrix is the product of its diagonal entries, each entry of and is equal to a ratio of minors (determinants of submatrices) of . The “right” representation of the algorithm largely depends on the specific property/setting being studied.

Gaussian elimination is quite an amazing algorithm. It’s not only the oldest in linear algebra, but it is also still the most popular method for numerically solving an unstructured linear system. And it has the benefit of being one of the few linear algebra algorithms that works over an arbitrary field. (The QR factorization, for instance, requires the notion of an inner product.)

Here, in this article, I’d like to focus on this algorithm through the lens of computation, in particular its numerical stability, and journey through both the mathematical discoveries that have occurred and the questions that remain since the early work of von Neumann, Wilkinson, and others over 60 years ago. However, to do so, we first need to take a detour into the fundamentals of numerical computation.

Floating point arithmetic

To the untrained eye, at this point a natural question may arise:

Why discuss error at all when the Gaussian elimination algorithm is exact?

While the LU factorization of a rational matrix is indeed rational, the lengths of the matrix elements may be prohibitively long for fast computation in practice.

As an example, consider solving a linear system with random binary coefficients in the Julia programming language:

julia>   n = 1000;
A = rand(Bool,n,n);
b = rand(Bool,n);

First, we solve by using bits per number:

julia>   @time = Float64.(A) \ Float64.(b);
 0.01 seconds

Next, we solve exactly, without any rounding:

julia> @time x = RationalBigInt.(A) \ RationalBigInt.(b);
 2389.89 seconds

and output the error from our first computation:

julia> norm(x - ) / norm(x)
 4.34e-13

The first computation took only 10 milliseconds and gave an answer accurate to , while the second took nearly 40 minutes! This is due to the growing complexity of the numbers encountered. For instance, the numerator and denominator of the first entry of are both over 3200 bits long!

For this reason, in practice, numbers are only approximately stored, using what is called a floating point number system (essentially an adaptation of scientific notation tailored for computers with precision constraints).

This system is parameterized by four numbers , , and consists of all numbers of the form

See Figure 3 for an example of the bit string representation of in IEEE single precision and Figure 4 for a visual representation of the elements of for a toy model with , , , and .

Figure 3.

The bit string for in single precision (32 bits). The number is represented as (note ). The exponent bias is , so the stored exponent is .

Graphic without alt text
Figure 4.

The nonnegative floating point numbers in a toy model with , , and . Image appears in Nick Higham’s What Is series.

Graphic without alt text

When a number is input, it is stored as , where rounds real numbers to the nearest element in . The effects of rounding when inputting numbers or performing computations can be effectively bounded based on the unit round-off , the distance between and the next element of . In particular, ignoring issues of overflow and underflow, for any ,

An analogous property also holds for fundamental computer operations: given and a standard operation (e.g., , , , ),

This property is sometimes referred to as the fundamental axiom of floating point arithmetic.

Numerical stability and an elegant mathematical problem

Suppose someone wanted to approximate a function at a point using an algorithm . How should they judge how well they did? One measure would simply be the distance between the exact and computed solutions (called forward error). However, if the derivative of was very large, then this might not be the most fair measure of a good algorithm, as a small change to the input data could drastically change the output . Instead of asking whether the computed solution is almost the right answer, we could consider whether it is the right answer to almost the right question, i.e., does there exist an such that and is small? This is called backward error, and though this measure is also imperfect,⁠Footnote4 in this language we can quickly measure the stability of Gaussian elimination.

4

Consider the outer product of two vectors and , i.e., . The approximation is a very good one, but it is not backward stable! This is because will likely have rank greater than one.

Thinking of an LU factorization as a function from to the pair and , i.e., , this procedure is backward stable if the computed LU factorization satisfies for some that’s not too big. Let denote the matrix with the absolute values of the entries, i.e., , and consider the matrix inequality componentwise. Some version of the following theorem⁠Footnote5 appears in many numerical analysis texts:

5

A similar result also holds for the final solution computed from and using Gaussian elimination.

Theorem 1 (GVL13).

Let . If no zero pivots are encountered during Gaussian elimination, then the computed lower and upper triangular matrices satisfy , where

While I won’t reproduce the proof, I’d like to give some intuition about how these types of results can be proven. One idea is to proceed by induction and consider a single step of Gaussian elimination from

In floating point arithmetic, this procedure involves three steps, and therefore introduces three errors of relative size at most . Assuming that the theorem statement holds for matrices of dimension , i.e., the computed and for also satisfies the statement, the result follows pretty quickly, as

The big takeaway from Theorem 1 is that the stability of Gaussian elimination in floating point arithmetic can be estimated by a simple mathematical quantity: the largest entry in or (scaled by ). It is such a central quantity that we give it a name, the growth factor:⁠Footnote6

6

The exact definition of the growth factor varies from reference to reference. The quantities and the pair and are two popular alternatives. They are all equivalent in the sense that they can all be used to bound the backward error of the LU factorization in floating point arithmetic.

where denotes the maximum magnitude entry of a matrix. Theorem 1 converts the practical question of the stability of this algorithm in computation to a purely mathematical one regarding the size of the entries of and relative to . The game, so to speak, is to choose a strategy for permuting rows and columns of , say with permutation matrices , so that (1) the strategy can be implemented quickly, and (2) the resulting factorization has a small growth factor .

Figure 5.

James Wilkinson at the blackboard.

Graphic without alt text

The modern story

Though von Neumann and Goldstine may have been the first to present a more encouraging perspective regarding Gaussian elimination, it wasn’t until Wilkinson (Figure 5) came along that a more robust understanding of the computational stability of this algorithm emerged. For instance, a variant of Theorem 1 and the resulting observation that the growth factor can be used to estimate stability were first realized by Wilkinson Wil61. Wilkinson’s early work on the subject included the analysis of Gaussian elimination primarily under two popular pivoting strategies:

(1)

complete pivoting: permuting rows and columns so that the pivot is always the largest entry in the matrix, i.e., , ,

(2)

partial pivoting: permuting rows so that the pivot is always the largest entry in the first column, i.e., , .

These two pivoting strategies are quite different, a fact reflected by the nature of the questions regarding them. Complete pivoting, famously referred to by von Neumann as “customary” (then called “positioning for size”), is a popular theoretical technique with reasonably good guarantees on stability that, today, is primarily used when high accuracy is desired. Partial pivoting, on the other hand, is a more interesting situation. The need to compare far fewer entries of each iterate means that partial pivoting is much faster than complete pivoting, and adds little to the overall run time of the Gaussian elimination algorithm. However, the issue here is that this procedure is known to be horribly unstable in the worst-case (see below). Still, Gaussian elimination with partial pivoting remains the most popular method for solving an unstructured linear system numerically: the backslash command in Matlab and Julia, and the linalg.solve() command in Python, all employ this method (by calling LAPACK routines) when faced with an arbitrary matrix.

Its use in spite of this worst-case instability can be concisely explained as follows: it is simply faster, in a practical sense, than any other known algorithm, and, historically, numerical analysts have believed that the situations where it is unstable are extremely rare. For instance, the QR factorization

where is written as the product of an orthogonal matrix and upper triangular matrix , is probably the most popular alternative to Gaussian elimination. This approach solves a linear system by computing and applying backward substitution to . This factorization is mathematically equivalent to applying Gram-Schmidt orthogonalization to the columns of , though, especially when does not need to be formed explicitly, in practice it is often computed by applying simple orthogonal transformations to to convert it to an upper triangular matrix . While the QR factorization, computed by, say, Householder reflections, is incredibly stable, it requires roughly operations to compute (with only implicitly computed), while a simple counting argument applied to equation 1 implies the LU factorization only requires roughly operations ( for the vector outer product, for the subtraction), half the cost compared to QR. While an exact comparison of computational efficiency is far more complicated than simple operation-counting, often involving the movement of data in the computer’s memory system, the main takeaway is that Gaussian elimination, paired with partial pivoting, remains the fastest way to compute the solution to an unstructured linear system. In some sense, the use of Gaussian elimination is a trade between guaranteed stability for a factor of two in runtime.

In what remains of this article, I’d like to talk a little about complete pivoting and partial pivoting, in each case starting with the early contributions of Wilkinson and discussing the modern developments, while briefly providing some mathematical insight into the types of techniques involved. It’s worth noting that many results do not fall under either complete or partial pivoting, my favorite two being the lower bounds of Higham and Higham for the growth factor of a matrix under any pivoting strategy HH89 and the estimates of Demmel, Grigori, and Rusciano on the expected log growth of a matrix pre- and post-multiplied by random orthogonal matrices DGR23. I encourage the motivated reader to seek these results out, as well as the many other research directions I was unable to mention here due to space constraints.⁠Footnote7 For more details about complete and partial pivoting, Higham’s Accuracy and Stability of Numerical Algorithms Hig02 (for complete pivoting) and Trefethen and Bau’s Numerical Linear Algebra TB22 (for partial pivoting) are good starting points.

7

including rook pivoting, threshold pivoting, the Cholesky factorization, incomplete LU/Cholesky, rank-revealing techniques, communication-avoiding techniques, butterfly matrices for LU, growth for Hadamard matrices, and others…

The theory of complete pivoting

In 1961, Wilkinson produced the following bound for a completely pivoted matrix :

The proof is incredibly short, requiring one page of mathematics and using only Hadamard’s inequality applied to the matrix iterates . However, the proof is also somewhat mysterious. I’d like to sketch a slightly more transparent proof through the language of linear programming.

Let (recall that, under complete pivoting, this is the magnitude of the pivot of ). Hadamard’s inequality, that the modulus of the determinant of a matrix is at most the product of the two-norm of its columns, applied to implies that

The leftmost equality is a result of having ones and having the pivots on their respective diagonals. The maximum th pivot, viewed as a function of , is nondecreasing, and so the maximum value of under these constraints provides an upper bound for the maximum growth factor. The transformation produces the linear program:

We can rewrite this in matrix-vector form as

where , , and is given by⁠Footnote8

8

The additional constraint plays no role, as the feasible region for is shift-independent.

and holds entrywise. The matrix has an easily computable inverse, with

The quantity

is entrywise nonnegative, implying inequality 3

Wilkinson felt that this bound was extremely pessimistic, a viewpoint shared by most numerical analysts today. In the same paper, he remarked that Wil61

In our experience the last pivot has fallen far below the given bound, and no matrix has been encountered in practice for which was as large as 8.

This sentiment was soon quantified, and a folklore conjecture was formed that the growth factor under complete pivoting of a real matrix was at most . A great deal of research went into proving this conjecture for small . The exact conjecture proved to be false, illustrated by a matrix with growth factor found numerically by Nick Gould in 1991 Gou91 and verified in exact arithmetic by Alan Edelman a year later Ede92. But still the larger question of how this quantity behaves remains.

Question 2.

What is the asymptotic behavior of the maximum growth factor under complete pivoting?

One theoretical consequence of this question is that, if the growth factor is bounded by a polynomial in , then only bits are required to perform Gaussian elimination accurately.

This question is one that my collaborators and I have made some progress on recently, producing a lower bound of for all EU24 and an upper bound of BEU25. The former is the first disproof of the 1960s folklore conjecture for all larger than a constant. The latter is the first improvement to Wilkinson’s 1961 bound, and involves finding further constraints on the pivots of and solving a much more complicated linear program. However, the gap between the upper and lower bounds for this quantity still remains quite large.

Partial pivoting and a practical problem

In contrast, for partial pivoting, where the rows of are permuted so that for all and , the worst-case behavior is incredibly well understood. Because the pivot is the largest entry in the first column, the largest entry of is at most twice that of , leading to an upper bound of for the growth factor of an matrix under partial pivoting. In his 1965 text The Algebraic Eigenvalue Problem Wil65, Wilkinson showed that this bound can be obtained, illustrated by the example matrix

After one step of Gaussian elimination, the resulting matrix looks nearly identical in structure to the original, except the last column has doubled in size:

where . This doubling occurs at every step, resulting in the last entry of equaling . This exponentially large growth factor can lead to catastrophic errors, even for well-conditioned matrices.

To see this in practice, let’s consider the solution to a linear system , where is the example matrix in equation 4, is a Gaussian vector, and, of course, equals times . This setup allows us to explicitly compute the numerical error, as we already know exactly. Using the Julia programming language, we first define the Wilkinson matrix from equation 4:

julia> wilk(n) = [ i>j ? -1 : (i==j || j==n) ? 1 : 0 for i=1:n, j=1:n];

Next, we initialize , randomly select , and compute :

julia>   A = wilk(100);
x = randn(100);
b = A*x;

Solving for numerically using the built-in command, which employs Gaussian elimination with partial pivoting, we’re in for quite a surprise:

julia> norm(x-A\b)/norm(x)
1.8623380318378875

The computed solution (typically accurate to ) has relative error greater than one (e.g., zero is a better guess)! To really stress the inaccuracy of the computed solution, we can take a look at the last ten digits of and our computed :

julia> [x A\b][91:100,:]
10×2 MatrixFloat64:
0.727419   0.0
  - 0.383269   0.0
  - 2.3035   0.0
0.134562   0.0
2.42309   0.0
1.46416   0.0
0.358267   0.0
  - 0.375875   0.0
0.817008   0.0
0.0752026   0.0752026

This is not caused by the condition number:

julia> cond(A) 44.80225124630292

as linear condition numbers are quite acceptable for most applied problems. No warning or error message is given. The same phenomenon occurs in the majority of programming languages (e.g., “\” in Matlab, linalg.solve() in Python, etc.).

Many researchers suspect that the situations where partial pivoting fails are rare. In Matrix Computations, Golub and Van Loan state that GVL13, p. 131

Although there is still more to understand about [the growth factor], the consensus is that serious element growth in Gaussian elimination with partial pivoting is extremely rare. The method can be used with confidence.

Of course, even rare instances become important to understand as the number and variety of computations increases exponentially. For partial pivoting, the majority of modern research is devoted to either quantifying how rare this worst-case behavior is or illustrating how this worst-case behavior can be avoided altogether. In 2012, Nick Trefethen, the then-president of SIAM, described “two of the biggest unsolved problems in numerical analysis” Tre12. The first, unsurprisingly, was fast matrix multiplication; the second was “why does Gaussian elimination work?” Inspired by experiments, he conjectured that the growth factor of a Gaussian matrix under partial pivoting scales like and offered for a proof:

Conjecture 3.

Let be an Gaussian matrix, i.e., iid . Then, for any , the probability that the growth factor of under partial pivoting is larger than is at most for all sufficiently large.

While some asymptotic properties require large values of to emerge, here the behavior of the growth factor is abundantly clear, even for small values of (Figure 6). However, proving this exactly seems difficult, for reasons we will soon discuss.

Figure 6.

Histograms of 100,000 samples of the growth factor under partial pivoting divided by of for , . The largest growth factor encountered here was less than .

Graphic without alt text

Another approach for analyzing partial pivoting is through smoothed analysis, a technique popularized by Spielman and Teng that strikes a balance between average-case and worst-case analysis. A smoothed analysis of an algorithm is a study of its performance for a mild random perturbation of an arbitrary input. For example, even though Wilkinson’s matrix (equation 4) is a highly unstable instance, upon mild perturbation (even without pivoting), the growth factor decreases significantly (Figure 7). In some sense, smoothed analysis is a generalization of average-case analysis, with the added benefit of being not only descriptive, but prescriptive. In particular, rather than speculate about the rarity of unstable instances in practice, smoothed analysis offers a concrete solution to avoid these instances altogether, in a true probabilistic sense, through the introduction of a modest amount of noise.

Figure 7.

A log-log plot of (the growth factor without pivoting) with respect to for a Wilkinson matrix (equation 4) and a fixed Gaussian matrix . Setting (approximately the unit round-off in double precision) already drops the growth from approximately to .

Graphic without alt text

In 2006, Sankar, Spielman, and Teng provided a smoothed analysis of Gaussian elimination without pivoting SST06. Here’s one sample result: for an arbitrary matrix with , and a Gaussian matrix , the LU factorization , , satisfies

I’d like to give some intuition about how a result like this, or, say, one about a Gaussian matrix, is often proven. By equation 2, the entries of are elements of a Schur complement involving the leading submatrices of , and the only way the elements of can become much larger than those of is if a leading submatrix of has a very small singular value. In practice, growth factor estimates for these types of problems essentially reduce to singular value-type estimates, or, similarly, bounding the application of the inverse of a leading submatrix to a given vector. For example, inequality 5 essentially relies on the following property of a random Gaussian perturbation:

Lemma 4 (SST06).

Let and be a Gaussian matrix. Then, for any fixed unit vector ,

While I won’t provide a proof of this, I can give the basic idea in a few lines. Because Gaussian matrices are distributionally invariant under orthogonal transformations, we can assume and simply bound the size of the first column of . The size of this column is exactly the inverse of the inner product between the first row of and a unit vector from the orthogonal complement of the remaining rows, reducing the problem to one involving the inner product between a fixed unit vector and a Gaussian one, which is well studied.

Overall, the average-case and smoothed analysis without pivoting is fairly well understood. However, making analogous statements for partial pivoting has proven much more difficult. Partial pivoting can be viewed as a greedy determinant maximization where the th row is chosen to maximize the leading minor. Intuitively, this reordering should significantly improve the situation, as larger leading minors seem beneficial. This intuition is supported by practice, as partial pivoting is usually, though certainly not always, far superior to no pivoting. However, this greedy procedure makes mathematical analysis significantly more difficult, as it introduces dependence between elements of the random perturbation.

For instance, Sankar, in his PhD thesis San04, performed a smoothed analysis of partial pivoting, but, due to the mathematical complications introduced by pivoting, was only able to prove a quasipolynomial bound for growth. Recently, Huang and Tikhomirov managed to prove that the growth factor under partial pivoting of a Gaussian matrix is polynomial HT24. One key idea was a quantification of how “Gaussian” the remaining rows of a Gaussian matrix are after rows have already been chosen during partial pivoting, e.g., understanding how much randomness persists in spite of the pivoting procedure. This result is still quite far from the expected behavior, and the arguments presented do not appear to directly translate to the smoothed analysis setting.

There is still quite a gap between what we observe (Figure 6) and what we can prove mathematically, though. The problems of producing a smoothed analysis of partial pivoting with a polynomial bound, or better yet, one at least as strong as inequality 5, and proving Trefethen’s $1000 conjecture both remain completely open.

Acknowledgments

The author thanks Louisa Thomas and two anonymous referees for significantly improving the style of presentation.

All experiments were performed in the Julia programming language, on a Mac Studio with an Apple M2 Max chip and 32GB of unified memory. Figures 6 and 7 were created using the built-in command in double and quadruple precision, respectively.

References

[BEU25]
Ankit Bisain, Alan Edelman, and John Urschel, A new upper bound for the growth factor in Gaussian elimination with complete pivoting, Bulletin of the London Mathematical Society (2025).,
Show rawAMSref \bib{bisain2023new}{article}{ author={Bisain, Ankit}, author={Edelman, Alan}, author={Urschel, John}, title={A new upper bound for the growth factor in {G}aussian elimination with complete pivoting}, date={2025}, journal={Bulletin of the London Mathematical Society}, }
[BMvN46]
Valentine Bargmann, Deane Montgomery, and John von Neumann, Solution of linear systems of high order, US Department of Commerce, Office of Technical Services, 1946.,
Show rawAMSref \bib{bargmann1946solution}{book}{ author={Bargmann, Valentine}, author={Montgomery, Deane}, author={von Neumann, John}, title={Solution of linear systems of high order}, publisher={US Department of Commerce, Office of Technical Services}, date={1946}, }
[DGR23]
James Demmel, Laura Grigori, and Alexander Rusciano, An improved analysis and unified perspective on deterministic and randomized low-rank matrix approximation, SIAM J. Matrix Anal. Appl. 44 (2023), no. 2, 559–591, DOI 10.1137/21M1391316. MR4586593,
Show rawAMSref \bib{demmel2023improved}{article}{ author={Demmel, James}, author={Grigori, Laura}, author={Rusciano, Alexander}, title={An improved analysis and unified perspective on deterministic and randomized low-rank matrix approximation}, journal={SIAM J. Matrix Anal. Appl.}, volume={44}, date={2023}, number={2}, pages={559--591}, issn={0895-4798}, review={\MR {4586593}}, doi={10.1137/21M1391316}, }
[Ede92]
Alan Edelman, The complete pivoting conjecture for Gaussian elimination is false (1992).,
Show rawAMSref \bib{edelman1992complete}{article}{ author={Edelman, Alan}, title={The complete pivoting conjecture for {G}aussian elimination is false}, date={1992}, }
[EU24]
Alan Edelman and John Urschel, Some new results on the maximum growth factor in Gaussian elimination, SIAM J. Matrix Anal. Appl. 45 (2024), no. 2, 967–991, DOI 10.1137/23M1571903. MR4737248,
Show rawAMSref \bib{edelman2024some}{article}{ author={Edelman, Alan}, author={Urschel, John}, title={Some new results on the maximum growth factor in Gaussian elimination}, journal={SIAM J. Matrix Anal. Appl.}, volume={45}, date={2024}, number={2}, pages={967--991}, issn={0895-4798}, review={\MR {4737248}}, doi={10.1137/23M1571903}, }
[Gol93]
Herman H. Goldstine, The computer from Pascal to von Neumann, Princeton University Press, Princeton, NJ, 1993. Reprint of the 1972 original. MR1271142,
Show rawAMSref \bib{goldstine1993computer}{book}{ author={Goldstine, Herman H.}, title={The computer from Pascal to von Neumann}, note={Reprint of the 1972 original}, publisher={Princeton University Press, Princeton, NJ}, date={1993}, pages={xii+378}, isbn={0-691-02367-0}, review={\MR {1271142}}, }
[Gou91]
Nick Gould, On growth in Gaussian elimination with complete pivoting, SIAM J. Matrix Anal. Appl. 12 (1991), no. 2, 354–361, DOI 10.1137/0612025. MR1089164,
Show rawAMSref \bib{gould1991growth}{article}{ author={Gould, Nick}, title={On growth in Gaussian elimination with complete pivoting}, journal={SIAM J. Matrix Anal. Appl.}, volume={12}, date={1991}, number={2}, pages={354--361}, issn={0895-4798}, review={\MR {1089164}}, doi={10.1137/0612025}, }
[GVL13]
Gene H. Golub and Charles F. Van Loan, Matrix computations, 4th ed., Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, 2013. MR3024913,
Show rawAMSref \bib{golub2013matrix}{book}{ author={Golub, Gene H.}, author={Van Loan, Charles F.}, title={Matrix computations}, series={Johns Hopkins Studies in the Mathematical Sciences}, edition={4}, publisher={Johns Hopkins University Press, Baltimore, MD}, date={2013}, pages={xiv+756}, isbn={978-1-4214-0794-4}, isbn={1-4214-0794-9}, isbn={978-1-4214-0859-0}, review={\MR {3024913}}, }
[Har11]
Roger Hart, The Chinese roots of linear algebra, Johns Hopkins University Press, Baltimore, MD, 2011. MR2731645,
Show rawAMSref \bib{hart2011chinese}{book}{ author={Hart, Roger}, title={The Chinese roots of linear algebra}, publisher={Johns Hopkins University Press, Baltimore, MD}, date={2011}, pages={xiv+286}, isbn={978-0-8018-9755-9}, isbn={0-8018-9755-6}, review={\MR {2731645}}, }
[HH89]
Nicholas J. Higham and Desmond J. Higham, Large growth factors in Gaussian elimination with pivoting, SIAM J. Matrix Anal. Appl. 10 (1989), no. 2, 155–164, DOI 10.1137/0610012. MR988627,
Show rawAMSref \bib{higham1989large}{article}{ author={Higham, Nicholas J.}, author={Higham, Desmond J.}, title={Large growth factors in Gaussian elimination with pivoting}, journal={SIAM J. Matrix Anal. Appl.}, volume={10}, date={1989}, number={2}, pages={155--164}, issn={0895-4798}, review={\MR {988627}}, doi={10.1137/0610012}, }
[Hig02]
Nicholas J. Higham, Accuracy and stability of numerical algorithms, 2nd ed., Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002, DOI 10.1137/1.9780898718027. MR1927606,
Show rawAMSref \bib{higham2002accuracy}{book}{ author={Higham, Nicholas J.}, title={Accuracy and stability of numerical algorithms}, edition={2}, publisher={Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA}, date={2002}, pages={xxx+680}, isbn={0-89871-521-0}, review={\MR {1927606}}, doi={10.1137/1.9780898718027}, }
[Hot43]
Harold Hotelling, Some new methods in matrix calculation, Ann. Math. Statistics 14 (1943), 1–34, DOI 10.1214/aoms/1177731489. MR7851,
Show rawAMSref \bib{hotelling1943some}{article}{ author={Hotelling, Harold}, title={Some new methods in matrix calculation}, journal={Ann. Math. Statistics}, volume={14}, date={1943}, pages={1--34}, issn={0003-4851}, review={\MR {7851}}, doi={10.1214/aoms/1177731489}, }
[HT24]
Han Huang and Konstantin Tikhomirov, Average-case analysis of the Gaussian elimination with partial pivoting, Probab. Theory Related Fields 189 (2024), no. 1-2, 501–567, DOI 10.1007/s00440-024-01276-2. MR4750563,
Show rawAMSref \bib{huang2024average}{article}{ author={Huang, Han}, author={Tikhomirov, Konstantin}, title={Average-case analysis of the Gaussian elimination with partial pivoting}, journal={Probab. Theory Related Fields}, volume={189}, date={2024}, number={1-2}, pages={501--567}, issn={0178-8051}, review={\MR {4750563}}, doi={10.1007/s00440-024-01276-2}, }
[San04]
Arvind Sankar, Smoothed analysis of Gaussian elimination, ProQuest LLC, Ann Arbor, MI, 2004. Thesis (Ph.D.)–Massachusetts Institute of Technology. MR2717147,
Show rawAMSref \bib{sankar2004smoothed}{book}{ author={Sankar, Arvind}, title={Smoothed analysis of Gaussian elimination}, note={Thesis (Ph.D.)--Massachusetts Institute of Technology}, publisher={ProQuest LLC, Ann Arbor, MI}, date={2004}, pages={(no paging)}, review={\MR {2717147}}, }
[SST06]
Arvind Sankar, Daniel A. Spielman, and Shang-Hua Teng, Smoothed analysis of the condition numbers and growth factors of matrices, SIAM J. Matrix Anal. Appl. 28 (2006), no. 2, 446–476, DOI 10.1137/S0895479803436202. MR2255338,
Show rawAMSref \bib{sankar2006smoothed}{article}{ author={Sankar, Arvind}, author={Spielman, Daniel A.}, author={Teng, Shang-Hua}, title={Smoothed analysis of the condition numbers and growth factors of matrices}, journal={SIAM J. Matrix Anal. Appl.}, volume={28}, date={2006}, number={2}, pages={446--476}, issn={0895-4798}, review={\MR {2255338}}, doi={10.1137/S0895479803436202}, }
[TB22]
Lloyd N. Trefethen and David Bau III, Numerical linear algebra, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2022. 25th anniversary edition [of 1444820]; With a foreword by James G. Nagy. MR4713493,
Show rawAMSref \bib{trefethen2022numerical}{book}{ author={Trefethen, Lloyd N.}, author={Bau, David, III}, title={Numerical linear algebra}, note={25th anniversary edition [of 1444820]; With a foreword by James G. Nagy}, publisher={Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA}, date={2022}, pages={xvi+370}, isbn={978-1-611977-15-8}, isbn={[9781611977165]}, review={\MR {4713493}}, }
[Tre12]
Lloyd N. Trefethen, The smart money’s on numerical analysts, SIAM News 45 (2012), no. 9, 1–5.,
Show rawAMSref \bib{trefethen2012smart}{article}{ author={Trefethen, Lloyd~N.}, title={The smart money's on numerical analysts}, date={2012}, journal={SIAM News}, volume={45}, number={9}, pages={1\ndash 5}, }
[vNG47]
John von Neumann and H. H. Goldstine, Numerical inverting of matrices of high order, Bull. Amer. Math. Soc. 53 (1947), 1021–1099, DOI 10.1090/S0002-9904-1947-08909-6. MR24235,
Show rawAMSref \bib{von1947numerical}{article}{ author={von Neumann, John}, author={Goldstine, H. H.}, title={Numerical inverting of matrices of high order}, journal={Bull. Amer. Math. Soc.}, volume={53}, date={1947}, pages={1021--1099}, issn={0002-9904}, review={\MR {24235}}, doi={10.1090/S0002-9904-1947-08909-6}, }
[Wil61]
J. H. Wilkinson, Error analysis of direct methods of matrix inversion, J. Assoc. Comput. Mach. 8 (1961), 281–330, DOI 10.1145/321075.321076. MR176602,
Show rawAMSref \bib{wilkinson1961error}{article}{ author={Wilkinson, J. H.}, title={Error analysis of direct methods of matrix inversion}, journal={J. Assoc. Comput. Mach.}, volume={8}, date={1961}, pages={281--330}, issn={0004-5411}, review={\MR {176602}}, doi={10.1145/321075.321076}, }
[Wil65]
J. H. Wilkinson, The algebraic eigenvalue problem, Clarendon Press, Oxford, 1965. MR184422,
Show rawAMSref \bib{Wilkinson1965AEP}{book}{ author={Wilkinson, J. H.}, title={The algebraic eigenvalue problem}, publisher={Clarendon Press, Oxford}, date={1965}, pages={xviii+662}, review={\MR {184422}}, }

Credits

Figure 1 is in the public domain, https://commons.wikimedia.org/w/index.php?curid=66240126.

Figure 2, Figure 6, and Figure 7 are courtesy of John Urschel.

Figure 3 is courtesy of Fresheneesz at the English Wikipedia project, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=3357169.

Figure 4 is courtesy of Nicholas J. Higham.

Figure 5 is courtesy of Jack Dongarra.

Photo of John Urschel is courtesy of Christopher Harting.