The Legend of Abraham Wald

The story might well be true, and there is certainly, as we shall see, a solid germ of truth in it, but there is very little evidence for the best bits."...

Bill Casselman
University of British Columbia, Vancouver, Canada
Email Bill Casselman

 

The myth

It makes a great story.

The year is 1943. American bombers are suffering badly from German air defense. The military decides it needs some advice on how to cut losses, so they consult the wizards in the Statistical Research Group at Columbia University to see what their best options might be. One possibility is to use more armor on planes, but armor weighs a lot, and adding too much would lower performance considerably. So the Air Force brass ask the SRG, how much armor should we use for optimal results, and where should we put it?

The SRG was one of several collaborating groups of scientists formed soon after America joined the war. The story of its beginning, in the summer of 1942, is told well in W. Allen Wallis' autobiographical memoir. The SRG was staffed by a distinguished lot, including many of the most prominent statisticians of the post-war world, the economists Milton Friedman and George Stigler--who were later to receive Nobel Prizes in economics--and the mathematician Abraham Wald. Norbert Wiener was at one time a consultant to the group. Recruitment to the SRG was by an "old-boy" network (to use a phrase also applicable to that other successful war-time operation across the ocean at Bletchley Park), but it prided itself on what we would call diversity.

Wald was born in the former Austrian-Hungarian empire in 1902, in the city now called Cluj. It advertizes itself as the unoffical capital of Transylvania, which is now a part of Romania but inhabited in the past largely by Hungarians, and Hungarian was Wald's mother tongue. He started his professional life in Vienna as a pure mathematician, but became interested in the mathematics of statistics in the mid-thirties. As a Jew, he was deprived of his academic position in Austria, and like others in his situation was lucky to be able to move to the United States. At the time the SRG was founded, he was on the faculty of Columbia University, which is where the SRG was located, and he was one of its first members. By all accounts, he was impressively bright--"smartest man in the room," says one recent book (but keep in mind, most of the time there were many smart men in the room).

The problem of armoring planes is assigned to Wald. Along with the assignment, he is given a fair amount of statistical data regarding aircraft damage, for example the location of damage from hits by enemy aircraft. It happens that most of the damage is located on the fuselage and very little in the area around motors, and the military is expecting to add armor to the fuselage, where the density of hits is highest. "Not so fast," said Wald. "What you should really do is add armor around the motors! What you are forgetting is that the aircraft that are most damaged don't return. You don't see them. Hits by German shells are presumably distributed somewhat randomly. The number of damaged motors you are seeing is far less than randomness would produce, and that indicates that it is the motors that are the weak point." The advice is taken, and in fact Wald's techniques for interpreting aircraft damage statistics continue through two later conflicts.

The Internet loves this tale. Try a search for

"Abraham Wald" aircraft

to see what Mr. Google has to show you, and you will find dramatic headlines:

ABRAHAM WALD AND THE MISSING BULLET HOLES
Seeing is Disbelieving
How A Story From World War II Shapes Facebook Today
The hole story: What you don't see will kill you

The reason for this excitement is that the aircraft damage is an example of what is known as "survivorship bias." This is a technical term for what we all know well: the dead don't often get to tell their side of the story, and yet sometimes it would be better if they did. The loss is the source of all kinds of misinformation, as the Internet will tell you emphatically. Including deceptive practices in selling hot stocks, which may explain much of the buzz.

Well, it's gratifying to see a great mathematician become a legend for good reasons, rather than bad. "MATHEMATICAL GENIUS SCORES AGAINST ARMY BRASS!" reads pretty well. After all, publicity about mathematicians typically concentrates on features most of us would rather not think about. But it would be much more gratifying if there were more truth to the story, or at least more reason for believing it. Some of us prefer our history lessons to be taken from the non-fiction shelves.

The story might well be true, and there is certainly, as we shall see, a solid germ of truth in it, but there is very little evidence for the best bits. The capsule biography of Wald is accurate, and although he might not have been the smartest man in the room, he was probably nearly always the most accomplished mathematician in the room, which counted for a lot. But ... most of the rest of the story is--to use a charitable phrase--"plausible reconstruction." There is extremely little source material for what Wald had to say about aircraft damage.

The autobiographical memoir by W. Allen Wallis is the best source--practically the only source--for the operation of the SRG. It is surprisingly entertaining as well as informative, but its coverage of Wald's work at the SRG concentrates on the invention of sequential analysis, for which Wald eventually became deservedly famous. This is a technique for improving quality control in production, say of military ordnance. It was used, apparently with great success, by thousands of wartime production facilities. But it is not exactly great material for Internet headlines: "HEY! ARMY TRUCK TIRE PRODUCTION ROSE 6.37% IN AUGUST 1944!"

To be precise, regarding Wald's work on aircraft damage we have (1) two short and rather vague mentions in Wallis' memoir of work on aircraft vulnerability and (2) the collection of the actual memoranda that Wald wrote on the subject. That's it! Everything not in one of these places must be considered as fiction, not fact. Or at best, as I say, plausible reconstruction. Not to complain too much--the history of mathematics is plagued by the temptation, rarely resisted, to write as things should have been, rather than what they were. Reality is rarely as logical as one might hope. I should add, though, that it's not only mathematical reality that gets slighted in the Internet versions of this tale--you should be quite amused by the pictures that accompany the Internet headlines. Lots and lots of airplanes with bullet holes scattered all over them. One goes so far as to claim it is showing you Wald's own sketches (and we do not have any idea at all as to whether if he ever made any). Most show diagrams of aircraft that by no means match what must have been involved--my favorite is of a venerable DC3, a plane referred to by the military as the C-47. These served as cargo carriers in WWII and rarely saw real combat except by straying from route. "As long as it has motors and flies" seems to be the criterion for the art work. A few of the web sites show chilling clips of American planes being destroyed in action. These certainly show, in case you might have forgotten, what stakes were ultimately involved in the apparently abstract technology being developed in the comfort of upper Manhattan.

The vague references in Wallis' memoir are particularly interesting, since Wald is not mentioned in them. One of them (p. 323) says in entirety, "The problem of aircraft vulnerability led SRG to devise a technique for determining vulnerability from damage survived by our own planes ... " The other (p. 324) names Wallis himself as the author of a note titled Uses for Aircraft Vulnerability Figures. This, however, is one of a list of a random selection of reports from the SRG, and there might well have been other reports on the same topic. (Do these reports still exist in some deep archive?)

So the only really reliable account of Wald's work is what we find in Wald's own writings.

Comments?

The true story, or at least part of it

The memoranda by Wald are severely technical. Not much drama at all. In particular, Wald says nothing about what the military should do to improve things. If I understand Wallis correctly, it was the general policy of the SRG to answer just the questions asked and never--well, hardly ever--attempt to offer advice on applications of what they discovered. Military decisions were made by the military.

The memoranda are so technical, in fact, that in the account by Jordan Ellenberg, a photograph of one page of the document is flashed at the reader with an apology for suddenly introducing a topic possibly suitable only for adults. There is, however, a very valuable guide to the memoranda by Marc Mangel and Francisco Samaniego, that appeared almost at the same time the memoranda were made available to the public by the Center for Naval Analyses.

There are eight items among the memoranda. Five of them deal with a single problem, estimating probabilities of an airplane's survival, given that it has already been hit. Its outstanding feature is that it offers a way to estimate damage on the planes that never returned. A kind of magic, indeed. One--just one--deals with the problem of vulnerability of different sections of an airplane, and this shares with the previous sections some impressive estimates. That is, as the Internet fiction suggests, both have to deal with the problem that downed planes aren't around to give evidence.

Consider the first problem. We are given data, such as the number of hits, only on returning aircraft. The question Wald asked--or perhaps the one he was asked to look at--was, "Given these data, what can we say about the probability of surviving a given number of hits?" Not a complicated question, but with a complicated answer. All we know about the planes that didn't return is ... that they didn't return. In truth, there might be a number of reasons for this, since--for example--a number of fatalities in the war were from mechanical failure. Of course Wald had to be very careful. It was in principle possible, one might suppose, that all downed airplanes ran out of gasoline. The point is that this was extremely unlikely. In other words, any answer to the question is complicated by the missing data associated to planes that were downed. Wald could only calculate his probabilities by making certain reasonable assumptions, and being very, very careful about how the assumptions played a role in results. In all his works on statistics, in fact, he was renowned for being very, very careful with assumptions.

His first simplifying assumption is that planes are downed because of enemy fire. Rather than mechanical failure, say.

What data did Wald have to work with? This seems to have varied from time to time, but at the least, in so far as this problem was concerned, he was given the number of planes sent out on missions, the number returning, and the number of hits on each plane that came back. In the example treated by Mangel and Samaniego (following Wald):

    Number $\phantom{xxx}$ Ratio
Planes in the mission   $400 = N$ $\phantom{xxx}$ $\phantom{s_{0} =\>}1.00$
Planes returning   $380$ $\phantom{xxx}$ $\phantom{s_{0}=\>}0.95$
Number of planes downed   $20$ $\phantom{xxx}$ $\phantom{s_{0}=\>}0.05$
Number returning with no hits   $S_{0} = 320$ $\phantom{xxx}$ $s_{0} = 0.80$
With $1$ hit   $S_{1} = \phantom{2}32$ $\phantom{xxx}$ $s_{1} = 0.080$
With $2$ hits   $S_{2} = \phantom{2}20$ $\phantom{xxx}$ $s_{2} = 0.050$
With $3$ hits   $S_{3} = \phantom{22}4$ $\phantom{xxx}$ $s_{3} = 0.010$
With $4$ hits   $S_{4} = \phantom{22}2$ $\phantom{xxx}$ $s_{4} = 0.005$
With $5$ hits   $S_{5} = \phantom{22}2$ $\phantom{xxx}$ $s_{5} = 0.005$

 

The $N$ planes on the mission divide into two major groups, the $S$ survivors and the $L$ planes that are downed. These in turn divide into groups according to how many hits they get: $N_{i}$ is the total number with exactly $i$ hits, similarly $S_{i}$ and $L_{i}$. Of course we know all the $S_{i}$, and know nothing about the $L_{i}$ except for three simple things: (1) $L = \sum L_{i} = N - S$, and (2) $L_{i} + S_{i} = N_{i}$, and (3) $L_{0} = 0$, because we have assumed that all those that are lost are lost because they have been hit. Let $N_{\ge i}$ be the sum $\sum_{j \ge i} N_{j}$, etc. Thus $$ N = N_{\lt i} + N_{\ge i} \, . $$

It seems a little crazy, but what we really want to do is figure out what all the missing numbers $L_{i}$ are, or at least estimate them in a reasonable way. It looks at first sight as though this is a task for a conjuror rather than a mathematician.

If, as Mangel and Samniego advise you to do, you think on your own about this problem, you will likely be led to come up with something rather complicated. Yet Wald's reasoning is remarkably simple. One of his best ideas is to introduce variables that we do have at least some chance of estimating, and from which all the rest can be computed. Let $p_{i}$ be the conditional probability of going down on the $i$-th hit, having survived $i-1$ hits. Thus $p_{1}$ is just the probability of going down on the first hit, and $p_{i}$ is the proportion of those who receive $\ge i$ hits who are, however, shot down by the $i$-th. In an equation: $$ p_{i} = { L_{i} \over N_{\ge i} } \, . $$

We can write this also as $$ \eqalign { L_{i} &= p_{i} \cdot \Big( \sum_{j \ge i} N_{j} \Big) \cr &= p_{i} \cdot \Big(N - \sum_{j \lt i} N_{j} \Big ) \cr &= p_{i} \cdot \Big( N - \sum_{j\lt i} S_{j} - \sum_{j\lt i} L_{j} \Big) \, . \cr } $$

Here's the basis of the magic to come: We know what the $S_{i}$ are. Therefore the last equation for $L_{i}$ is an equation that can be solved by induction for the $L_{i}$, since we know that $L_{0} = 0$, if only we know the $p_{i}$! Thus $$ \eqalign { L_{0} &= 0 \cr L_{1} &= p_{1} \cdot (N - S_{0}) \cr L_{2} &= p_{2} \cdot (N - S_{0} - S_{1} - L_{1} ) \cr & \dots \cr } $$

Of course this leads us to the question--how can we figure out what the $p_{i}$ are? The short answer is, we can't, but Wald was able to make various estimates of them, by an argument that appears to me all the more subtle the more I try to understand it.

Let $q_{i} = 1- p_{i}$, which is the conditional probability of surviving $i$ hits, given that there are at least $i$. These are a main ingredient in what I'll call Wald's basic equation: $$ \sum_{m=1}^{n} { S_{m} \over q_{1}q_{2} \ldots q_{m} } = 1 - S_{0} \, . $$

I'll try to explain in a moment where this comes from--as far as I can see it is not an obvious relation, although it is not difficult to derive, and it is even less obvious that it gets us anywhere. It is a single equation with several unknowns, so in general there will be many possible solutions. Wald's approach is to find which solutions in this large world of solutions are most likely.

But first let me give you some idea of how the equation can give us approximate values of the $p_{i}$. Following Wald and Mangel-Samaniego, let's look first at an unrealistically simple case. We expect hits to weaken a plane, or at least not improve its chances, which means that $$ q_{1} \ge q_{2} \ge \ldots \, , $$

but as a first very rough approximation we might guess that all the $q_{i}$ are equal, so that for the denominators $$ q_{1}q_{2} \ldots q_{i} = q^{i} $$

for some fixed $q$. This amounts to assuming that a hit does not weaken a plane, which does not seem to be far off the truth. With this assumption Wald's basic equation becomes $$ { s_{1} \over q } + { s_{2} \over q^{2} } + \cdots + { s_{n} \over q^{n} } = 1 - s_{0} \, $$ and in our example $$ { 0.080 \over q } + { 0.050 \over q^{2} } + { 0.010 \over q^{3} } + + { 0.005 \over q^{4} } + { 0.005 \over q^{5} } = 0.20 $$ This tells us that $q$ is the root of a relatively simple equation. I include below a graph of the function on the left, as well as the level line at $0.20$. We see that $q$ is approximately $0.85$, and a little more calculation (using Newton's method, for example) gives us the slightly more accurate $0.851$ (but the extra decimal digit is spurious, given the coarseness of the data).

From this point, Wald's memoranda go on to apply the basic equation in order to find plausible bounds on the possible values of the $q_{i}$ rather than exact guesses, by even more subtle arguments. After that he applies similar techniques to the problem of locating the most fatal hits on the planes. Abraham Wald was not a necromancer, but he was a magician. He might not able to make the dead speak, but he could pull a few rabbits out of thin air.

Comments?

Mathematical magic

Rather than discuss these topics, I'll try to explain where the basic equation comes from.

Wald's own argument for his basic equation is followed by Mangel and Samaniego. It is extremely clever. Wald does something only a mathematician could love, he says in effect let's consider an imaginary scenario in which only dummy bullets are fired. I have to confess that I find the argument a bit obscure, and I think this is because its plausibility seems to depend on some probabilistic intuition I don't have. So I offer something new, if less adventurous.

I start with something that Wald mentions, and seems to think important, but doesn't use in a crucial way: The number of hits on an airplane is bounded, so that $N_{\gt n} = 0$ for some $n$. In our example, $n =5$. Now the induction formula for $p_{i}$ tells us that $$ \eqalign { p_{i} &= { L_{i} \over N_{\ge i} } \cr q_{i} &= 1 - p_{i} \cr &= { N_{\ge i} - L_{i} \over N_{\ge i} } \cr S_{i} + N_{\ge i+1} &= q_{i}N_{\ge i} \cr } $$

for all $i$. If we combine these facts, we deduce first that $$ S_{n} = q_{n} N_{\ge n} \, . $$ But we also deduce a descending inductive formula: $$ N_{\ge i} = { S_{i} \over q_{i} } + { N_{\ge i+1} \over q_{i} } $$ that leads to the sequence of formulas $$ \eqalign { N_{\ge n} &= S_{n}/q_n \cr N_{\ge n-1} &= { S_{n} \over q_{n-1}q_{n} } + { S_{n-1} \over q_{n-1} } \cr & \dots \cr N_{\ge 0} = N &= { S_{n} \over q_{1}\dots q_{n} } + \cdots + { S_{1} \over q_{1} } + S_{0}\, . \cr } $$ The last equation is Wald's basic equation! Quod erat demonstrandum!

Postcript

My indignation at how the internet dealt with Wald's work was overblown. Stephen Stigler (son of George, and a statistician at the University of Chicago) called my attention to a note by W. Allen Wallis himself in which he mentions Wald's work explicitly in connection with survivorship bias. Wallis' original article in the Journal of the American Statistical Association was followed by two very brief comments and then by a further 'rejoinder' of a bit more than one page. Towards the end of it he says, "The military was inclined to provide protection for those parts that on returning planes showed the most hits. Wald assumed, on good evidence, that hits in combat were uniformly distributed over the planes. It follows that hits on the more vulnerable parts were less likely to be found on returning planes than hits on the less vulnerable parts, since planes receiving hits on the more vulnerable parts were less likely to return to provide data. From these premises, he devised methods for estimating vulnerability of various parts."

Stephen Stigler recalled to us that both Wallis and his father George Stigler had mentioned this work of Wald in conversation several times. He called attention to Wallis' remarks in a letter published in the May 1989 issue of Nature in which he also pointed out the relevance of survivorship bias to the interpretation of the statistical record of trilobite fossils. This may have been the original seed from which the tree of subsequent comment grew.

 

Acknowledgements

Thanks to Marc Mangel and Phil DePoy for assistance. My thanks to Pawan Gupta for pointing out a minor error in two of the equations in the original posting of the column.

Reading further

  • Wald's memoranda

    The originals were written by Wald around 1943, and these were later published in 1981 by the Document Center of the Center for Naval Analyses (CNA), 2000 North Beauregard St., Alexandria, Virginia 22311.

    One might well wonder how the publication of Wald's memoranda, along with the near simultaneous publication of the article by Mangel and Samaniego, came about. Why the wait for nearly forty years?

    Around 1980, W. Allen Wallis was in the process of leaving the University of Rochester, where he had worked for many years. In the process, he found a number of items left over from his days at the SRG, and offered them to Phil DePoy, also in Rochester at that time and working at the CNA. DePoy writes to us: "The original material was given to me by W. Allen Wallis after he moved to Washington to take a position with the State Department (that of Under Secretary of State for Economic Affairs). He came into my office one day carrying a large box which he found when he was moving out of his house in Rochester. He said that he had collected a lot of loose papers from the SRG when they closed the office down in 1946. He asked me to review everything in the box and determine if anything was worth saving. I read everything and decided that most of it was not worth preserving. The only material that I saved was a package of items by Abraham Wald. With some minimal editing, I published eight 'papers' under Wald's name in July 1980."

    Recorded history hangs by thin threads. The most unfortunate fact in Wald's history is that he died in an airplane accident in the mountains of southern India in 1950, and had no chance to write his autobiography.

  • Abraham Wald's Work on Aircraft Survivability

    A survey of Wald's work on aircraft damage by Marc Mangel and Francisco Samaniego. Originally appeared in volume 79 of the Journal of the American Statistical Association.

    Mangel has told us, "In 1981 or so, in anticipation of the 40th anniversary of Operations Evaluation Group of the Center for Naval Analyses, Phil DePoy asked me to prepare a version of Wald's report that would be publishable."

  • Oskar Morgenstern, 'Abraham Wald, 1902-1950', Econometrica 19, 361-367, 1951.
  • Jacob Wolfowitz, `Abraham Wald, 1902-1950', Annals of Mathematical Statistics 23, 1-13, 1952.

    Memoirs by two of Wald's closest friends and collaborators.

  • W. Allen Wallis, 'The Statistical Research Group, 1942-1945', Journal of the American Statistical Association 75, 320-330, 1980.

    Unfortunately not publicly accessible. This is well written, with panache. Who would have thought such a dry topic could furnish so much pleasure?

    Exploring the history of one's professional field is often a mark of maturity. Reminiscing about it is usually a mark of senility.

    I have obtained a copy of the Final Report of SRG from the National Archives, where it and other SRG documents are kept in the Center for Polar and Scientific Archives.

    The program that resulted ... eventually made a major contribution to the war effort. Its aftermath, in fact, continues to make major contributions not only to the American economy but also to the Japanese economy.

    We pinched pennies ... One principal staff menber still alleges resentfully that the adminstrative assistant told him to economize by writing equations on both sides of the paper.

    In 1950 I was 30 years closer to the events than I am now, and, furthermore, I had the use of a 37 year old's memory--something that now I can scarcely recall ever having had.

    These are among many amusing passages. One asks immediately, is the Final Report extant? One curious fact is that the "computers" at the SRG were (in Wallis' words) "30 young women, mostly mathematics graduates of Hunter or Vassar." Again, reminiscent of Bletchley Park. The only woman on the staff seems to have been Mina Rees, then teaching at Hunter College. Like other members of the staff, she had a distinguished career ahead of her.

  • Jordan Ellenberg, How not to be wrong. Penguin, 2014. A non-technical account of many mathematical topics. The opening is one of the better plausible reconstructions of Wald's work.

 

Bill Casselman
University of British Columbia, Vancouver, Canada
Email Bill Casselman