# What's Normal?

###### Posted June 2011.

Any method of numerical integration, such as Simpson's rule, works rapidly and accurately in any range of practical significance. But if we take the problem as a somewhat theoretical one, there are many interesting questions that arise... Bill Casselman
University of British Columbia, Vancouver, Canada
Email Bill Casselman Mail to a friend Print this article

One of the most commonly encountered functions in mathematics is $$f(x) = {1 \over \sigma \sqrt{2 \pi} } \int_{0}^{x} e^{-(t/\sigma)^2/2}\, dt$$ which represents area under the normal curve This is the normal distribution with mean $0$ and standard deviation $\sigma$. The point is that if a variable is distributed normally in a population with mean $\mu$ and standard deviation $\sigma$, the probability of getting a value of the variable between $x_{0}$ and $x_{1}$ is the area under the graph of the normal curve between $x_{0}-\mu$ and $x_{1}-\mu$. Changing $\sigma$ scales the graph horizontally and vertically, but preserves area. By a variable change $t/\sigma = s$ we can express $f(x)$ in terms of the standard error function, which corresponds to $\sigma = 1$: $${\rm erf}(x) = {1 \over \sqrt{2 \pi} } \int_{0}^{x} e^{-s^2/2}\, ds .$$

The probability of getting a value of $x$ in a normal population with mean $\mu$ and standard deviation $\sigma$ is thus $${\rm erf}\left({x_{1}-\mu\over \sigma}\right) - {\rm erf}\left({x_{0}-\mu\over \sigma}\right) .$$

The function ${\rm erf}(x)$ is by now very familiar, but "familiar" should not translate to "uninteresting". The problem of computing area under the graph is not exactly a trivial one, especially if one wants very precise answers. This is the problem that I'll look at here. To simplify notation, though, I'll look at the very slightly modified problem of computing area under the curve $$y = e^{-x^{2}} ,$$

which up to a constant represents probability for $\sigma = 1/\sqrt{2}$. More explicitly, if I set $$\epsilon(x) = \int_{0}^{x} e^{-t^{2}} \, dt ,$$

then I'll investigate the problem of computing $\epsilon(x)$ to about $15$ significant figures, for a range of values of $x$.

I should explain at the start that for all practical purposes, as far as I am aware, there is no problem in computing the area. Any method of numerical integration, such as Simpson's rule, works rapidly and accurately in any range of practical significance. But if we take the problem as a somewhat theoretical one, there are many interesting questions that arise. They arise also when trying to evaluate functions where no simpler technique exists. I am looking at $\epsilon(x)$ because it is the most easily defined, and perhaps the most commonly encountered, of these.

### The total area

I should also explain at the start that there is no simple formula for $\epsilon(x)$. That is to say, there is no algebraic formula for a function whose derivative is $\epsilon(x)$. Therefore values must be calculated numerically, with one exception. The exception is $x = \infty$---that is to say, the total area under the curve can be computed exactly. Even this is not quite trivial, but it is a standard exercise in a second course in calculus. The first step is simple, noting that $$2 \epsilon(\infty) = \int_{-\infty}^{\infty} e^{-t^{2}} \, dt$$ It then proceeds indirectly. Suppose we let $I = 2\epsilon(\infty)$. Then the product $I^{2}$, can be expressed as a double integral: $$I^{2} = \left( \int_{-\infty}^{\infty} e^{-x^{2}} \, dx \right) \left( \int_{-\infty}^{\infty} e^{-y^{2}} \, dy \right) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{-x^{2}-y^{2}} \, dx \, dy .$$ But $x^{2} +y^{2} = r^{2}$, and calculus gives us a way to integrate in polar coordinates: $$I^{2} = \int_{0}^{2\pi} \, d\theta \, \int_{0}^{\infty} r e^{-r^{2}} \, dr = 2 \pi \int_{0}^{\infty} r e^{-r^{2}} \, dr .$$ Now we are in good shape, since we can set $t = r^{2}$, $r \, dr = dt/2$ to see that this is $$\pi \int_{0}^{\infty} e^{-t} \, dt = \pi \, ,$$ leading to $I = \sqrt{\pi}$. This is what gives us the factor $1/\sqrt{2 \pi}$ in the definition of the error function, since any probability function has to have total area $1$ under its graph.

### Numerical integration

One simple but effective way to compute integrals is by Simpson's rule, which approximates $$\int_{a}^{b} f(x) \, dx$$ by dividing the interval $[a, b]$ into $2N$ smaller intervals $[a_{i},a_{i+1}]$ with $a_{i} = a + ih$ ($h = (b-a)/(2N)$). The integral is then approximately $$S_{N} = \sum_{i=0}^{N-1} \; 2h \cdot { f(a_{2i}) + 4f(a_{2i+1}) + f(a_{2i+2}) \over 6 } \, .$$ The accuracy of this approximation becomes better as $N$ gets larger. For $f(x) = e^{-x^{2}}$, we get this sequence of approximations for $\epsilon(1)$: $$\matrix { & N & S_{N} & \hbox{error }\; e_{N} & \hbox{ratio of errors } \; e_{N}/e_{N-1} \cr & 2 & 0.7468553797909873 & 3.124697856027314e-05& \cr & 4 & 0.7468261205274666 & 1.987715039564186e-06& 0.0636\cr & 8 & 0.7468242574357303 & 1.246233033436184e-07& 0.0627\cr & 16 & 0.7468241406069851 & 7.794558110063576e-09& 0.0625\cr & 32 & 0.7468241332996723 & 4.872453551740819e-10& 0.0625\cr & 64 & 0.7468241328428812 & 3.045419472158528e-11& 0.0625\cr & 128 & 0.7468241328143308 & 1.903810442627218e-12& 0.0625\cr & 256 & 0.7468241328125457 & 1.186828413324292e-13& 0.0623\cr & 512 & 0.7468241328124338 & 6.772360450213455e-15& 0.0571\cr & 1024 & 0.7468241328124284 & 1.443289932012704e-15& 0.2131\cr & 2048 & 0.7468241328124273 & 3.330669073875470e-16& 0.2308\cr & 4096 & 0.7468241328124261 & 8.881784197001252e-16& 2.6667\cr }$$ We gain one roughly decimal of accuracy when we double $N$. More precisely, the last column shows that we cut the error by about $1/16$ in each doubling, at least for a while. This is typical for Simpson's rule.

But at $N=512$ things start to get confused. The reason for this apparent chaos is that the computer has only a limited amount of storage for each real number, and does not normally store more than about $16$ significant digits. Trying to get it to do so without doing something special leads to meaningless numbers.

### Taylor series

The function $e^{-x^{2}}$ possesses a Taylor series that converges for all $x$, and hence so does that of $\epsilon(x)$: $$\epsilon(x) = \int_{0}^{x} \left( 1 - t^{2} + { t^{4} \over 2! } - { t^{6} \over 3! } + \cdots \right) \, dt = x - { x^{3} \over 3 } + { x^{5} \over 5 \cdot 2! } - { x^{7} \over 7 \cdot 3! } + \cdots$$ This series converges without any obvious difficulty---the ratio of the $n$-th term to the previous one is $-(x^2/n)((2n-1)/(2n+1))$, which certainly goes to $0$ very rapidly for fixed $x$ as as $n \rightarrow \infty$, so the series converges absolutely, by comparison with a geometric series.

Things are even a bit better---the terms oscillate in sign. In addition, eventually they start to steadily decrease. When they do decrease, it is easy to see that the number to which the series converges lies within any two successive partial sums (the sandwich rule). For example, if $x=1$ the series is $$1 - { 1 \over 3 } + { 1 \over 5 \cdot 2! } - { 1 \over 7 \cdot 3! } + { 1\over 9 \cdot 4! } \cdots$$ with partial sums $$\matrix { 1 & = & 1 \cr 1 - 1/3 & = & 2/3 & = 0.6667 \cr 1 - 1/3 + 1/10 & = & 23/30 & = 0.7667 \cr 1 - 1/3 + 1/10 - 1/42 & = & 156/210 & = 0.7429 \cr 1 - 1/3 + 1/10 - 1/42 + 1/216 & = & 33906/45360 & = 0.7475 \cr & \dots & \cr }$$ converging to the final value $0.74682\dots$. We now write a simple program to do this by machine. The $n$-th term in the series is $$T_{n} = { x^{2n+1} \over (2n+1)n! } = { t_{n} \over 2n+1 }$$ where $t_{0} = x$, $t_{n+1} = (-x^{2}/n+1)t_{n}$. So the program goes:

def taylors(x):
t = x; T = t; s = x
n = 0
while (abs(T) > 10**(-15)):
t *= (-x*x)/float(n+1)
T = t/float(2*n+3)
s += T
n += 1
return s


Let's see how it goes. We can use some extra information from the program, too. It is natural to wonder, how many terms of the series are necessary in order to get a final value of the accuracy we want? The smaller $x$ is, the faster the series converges and the fewer the number of terms necessary. So in addition to returning the sum of the series, we keep track of the number of terms summed, too. Here are some results: $$\matrix { x & \epsilon(x) & \hbox{ number of terms necessary to get \epsilon(x) to 15 significant digits} \cr 0.01 & 0.00999\;96666\;76666 & 3 & \cr 0.1 & 0.09966\; 76642\; 90336 & 6 & \cr 1 & 0.74682\;41328\;12427 & 17 & \cr 2 & 0.88208\;13907\;62422 & 30 & \cr 3 & 0.88620\;73482\;59494 & 46 & \cr 4 & 0.88622\;69117\;93312 & 67 & \cr }$$ These seem to be approaching $\sqrt{\pi}/2 = 0.88622\;92545\;27579$ very nicely. As we expect, larger $x$ means more terms are required. But now comes a small if unpleasant surprise---if we try for $x=5$ we get $$\matrix { x & \epsilon(x) & \phantom{\hbox{ number of terms necessary to get \epsilon(x) to 15 significant digits}} \cr \; 5\; & 0.88622\;69094\;20538 & 92 & \cr }$$ which has to be wrong, since $\epsilon(x)$ must increase with $x$. Things get worse---much worse---if we continue: $$\matrix { x & \epsilon(x) & \phantom{\hbox{ number of terms necessary to get \epsilon(x) to 15 significant digits}} \cr \; 6\; & 0.892906819737514 & 123 & \cr 7 & 1437.353133813762 & 159 & \cr 8 & 4256024335.577157 & 200 & \cr 9 & -7455367150974675 & 246 & \cr }$$ These don't make any sense at all! The computation does come to a halt, but what's going on?

### Impractical converging series

It is true that the series $$x - { x^{3} \over 3 } + { x^{5} \over 5 \cdot 2! } - { x^{7} \over 7 \cdot 3! } + \cdots$$ converges for all $x$, but it doesn't do so in a uniform manner. The $n$-th terms $$T_{n} = { x^{2n+1} \over (2n+1) n! }$$ satisfy the recursion relation $$T_{n+1} = T_{n} \left( { 2n+1 \over 2n+3 } \right ) \left( { x^{2} \over n+1 }\right )$$ so the terms will increase as long as the product $$\left( { 2n+1 \over 2n+3 } \right ) \left( { x^{2} \over n+1 }\right )$$ is greater than $1$. If $n$ is reasonably large, then the first factor will be close to $1$, so in practice the terms $T_{n}$ will grow until $n$ is about the same size as $x^{2}$. When it does start to decrease, $T_{n}$ can be quite large. Here is what the terms $T_{n}$ look like that occur when the program is run on $x=4$:

     0               -21.3333333333333
1               102.400000000000
2              -390.095238095238
3              1213.62962962963
...       ...
11             94020.5549912671
12           -107145.931613980
13            114007.493737042
14           -113762.316331156
15            106867.630492904
16            -94833.7964710143
...       ...
36                -1.38261483695243
37                 0.567032742427519
...       ...
66                -0.000000000000000385315135287094


The final sum produced is $0.88622 69117 93312$. In other words, although individual terms are large, the final sum is small. But since the maximum term is of the order of $10^{6}$, the last $6$ digits are meaningless. For $x=9$, things go very badly:

     0                   -114.333333333333
1                   1680.70000000000
...       ...
17         78358251444665.7
18       -191718636800511.
19        446797945031436.
20       -994038838945907.
21       211559579157477*.
22      -431534571731950*.
...       ...
33    100635466991158****.
34   -136920931145716****.
...       ...
46   -783290114072196****.
47    783121881659379****.
...       ...
77       269452356210702*.
78      -165026429132149*.
...       ...
158                     -0.000000000000000410343750426103


Here, $*$ denotes a meaningless digit, and that they are there means that none of the digits in the final answer, which is less than $1$, have any meaning.

Therefore, if we want to use Taylor's series to calculate $\epsilon(x)$ for values of $x > 5$ or so, we must do computations with software that gives us more than the usual machine precision. But there turns out to be another trick available besides Taylor's series.

### The asymptotic series

As $x \rightarrow \infty$, $\epsilon(x) \rightarrow \sqrt{\pi}/2$. Therefore, for large values of $x$, those for which the Taylor's series has problems, we already know that $\epsilon(x)$ is approximately $\sqrt{\pi}/2$. In fact, since $e^{-x^{2}}$ decreases rapidly, we expect this approximation to be very accurate. We can in fact get a very good idea of how accurate. First of all, we can write $$\int_{0}^{x} e^{-t^{2}} \, dt = \int_{0}^{\infty} e^{-t^{2}} \, dt - \int_{x}^{\infty} e^{-t^{2}} \, dt = \sqrt{\pi}/2 - \int_{x}^{\infty} e^{-t^{2}} \, dt \, .$$

So we want a good way to estimate the last integral. Integration by parts asserts that $$\int u \, dv = uv - \int v \, du$$ for arbitrary functions $u$, $v$. In our case, we can choose $$e^{-t^{2}} \, dt = { 1 \over -2t } \cdot \big( -2t e^{-t^{2}} \big) \,dt = u \, dv$$ with $v = e^{-t^{2}}$, $u = -1/(2t)$. Then $du = 1/(2t^{2})$, so integration by parts gives us $$\int_{x}^{\infty} e^{-t^{2}} \, dt = \left[ e^{-t^{2}} \over -2t \right]^{\infty}_{x} - \int_{x}^{\infty} { e^{-t^{2}} \over 2t^{2} } \, dt = { e^{-x^{2}} \over 2x } - \int_{x}^{\infty} { e^{-t^{2}} \over 2t^{2} } \, dt$$ which implies that $$\int_{x}^{\infty} e^{-t^{2}} \, dt < { e^{-x^{2}} \over 2x }$$ But we can perform the same trick again, writing $${ e^{-t^{2}} \over 2t^{2} } = { -2t e^{-t^{2}} \over -4t^{3} }$$ and get $$\int_{x}^{\infty} { e^{-t^{2}} \over 2t^{2} } \, dt = {e ^{-x^{2}} \over 4x^{3}} - 3 \int_{x}^{\infty} { e^{-t^{2}} \over 4t^{4} } \, dt$$ so that, taking into account the earlier inequality: $${ e^{-x^{2}} \over 2x } - { e^{-x^{2}} \over 4x^{3} } < \int_{x}^{\infty} e^{-t^{2}} \, dt < { e^{-x^{2}} \over 2x }$$ Continuing in this way, we get a series expansion $$\int_{x}^{\infty} e^{-t^{2}} \, dt \sim { e^{-x^{2}} \over 2x } \left( 1 - { 1 \over 2x^{2} } + { 1\cdot 3 \over (2x^{2})^{2} } - \cdots \; \right )$$ in the sense that the left hand side lies between any two successive sums on the right (sandwich rule again). This looks pretty good. But there is a problem---if we compute the first $20$ terms in the series for $x=2$, we get:

   0       0.00457890972218354
1      -0.000572363715272943
2       0.000214636393227354
3      -0.000134147745767096
4       0.000117379277546209
5      -0.000132051687239485
6       0.000181571069954292
7      -0.000295052988675725
8       0.000553224353766984
9      -0.00117560175175484
10       0.00279205416041775
11      -0.00732914217109658
12       0.0210712837419027
13      -0.0658477616934459
14       0.222236195715380
15      -0.805606209468252
16       3.12172406168948
17     -12.8771117544691
18      56.3373639258023
19    -260.560308156836


and the terms keep growing from there. In other words, the series does not converge. It is called an asymptotic series because $\epsilon(x)$ is better approximated by any finite sum of the series as $x \rightarrow \infty$. We can see easily why it does not converge---the ratio of the $n+1$-st term to the $n$-th is $-(2n+1)/2x^{2}$, and as soon as $n > x^{2} - 1/2$ the terms increase in absolute value. So this series can be used as long as the terms decrease, in order to get a good approximation to $\sqrt{\pi}/2 - \epsilon(x)$, but not indefinitely. Still, it is very useful. Even for the relatively small value $x=4$ the smallest term in the series is about $2\cdot 10^{-15}$, which gives us very good accuracy for $\epsilon(4)$, and with a fewer number of terms than Taylor's series.

This paradoxical situation is rather common in the computation of interesting functions---we have on the one hand a series which converges, but only impractically; and on the other a series which does not converge but in some ways is more practical.

### Euler's trick

There is yet one more trick to learn. If $$a_{0} + a_{1}x + a_{2}x^{2} + a_{3}x^{3} \cdots$$ is any series, its Euler transform is the series $${1 \over 1 - x } \sum [\Delta^{n} a]_{0} \left( { x \over 1 - x } \right)^{n}$$ Here $[\Delta^{n} a]_{k}$ is the sequence of differences defined recursively: \eqalign { [\Delta^{0} a]_{k} &= a_{k} \cr [\Delta^{1} a]_{k} & = a_{k+1} - a_{k} \cr [\Delta^{n+1} a]_{k} &= [\Delta^{n} a]_{k+1} - [\Delta^{n} a]_{k} \cr } In other words, if $a = (a_{n})$ is any sequence then $\Delta a$ is the new sequence $(a_{n+1}-a_{n})$, and $\Delta^{n}a$ is the sequence you get by applying this operator $n$ times.

It is easy to do a formal manipulation that makes this assertion plausible:

If the first series converges, so does the second, and they converge to the same thing.

In fact, one has to be somewhat careful, and I am not sure exactly what conditions on the series $a_{n}$ make this claim true. In practice, if the $a_{n}$ are all non-negative and $x \lt 0$ then the second series can often be used to compute the function represented by the series, even when it is only an asymptotic series. And the new series will be more efficient.

The most frequently applied case of Euler's transformation is with $x=-1$ and $a_{n} \gt 0$, in which case it says (formally) $$a_{0} - a_{1} + a_{2} - a_{3} \cdots = {1 \over 2 } \sum [\Delta^{n} a]_{0} \left( { 1 \over 2 } \right)^{n}$$ It is easy to see that the right hand side ought to be better behaved than the left. I am now going to explore what happens when we apply the Euler transform to the asymptotic series for $\epsilon(x)$. The strategy will be to find the sum of the series $$1 - { 1 \over 2x^{2} } + { 1\cdot 3 \over (2x^{2})^{2} } - { 1\cdot 3 \cdot 5 \over (2x^{2})^{2} } + \cdots$$ up until the terms start to increase, and then apply Euler's transformation to the remainder to see what happens.

Let's try $x=2$. The terms decreasing in absolute value go up to $t_{4}$:


0      1.00000000000000
1     -0.125000000000000
2      0.0468750000000000
3     -0.0292968750000000
4      0.0256347656250000


Summing the first 4 and 5 terms, then multiplying by $e^{-x^{2}}/2x$, this tells us that $$0.88202 \lt \epsilon(2) \lt 0.88214$$ The rest of the series goes on like this:

 5     -0.0288391113281250
6      0.0396537780761719
7     -0.0644373893737793
8      0.120820105075836
9     -0.256742723286152
10      0.609763967804611
11     -1.60063041548710
12      4.60181244452542
13    -14.3806638891419
14     48.5347406258541
15   -175.938434768721


It is not a promising looking set of data. But if we apply Euler's transformation to this sequence we get the sequence

    -0.0144195556640625
0.00270366668701172
-0.00174611806869507
0.00110188499093056
-0.000947207445278764
0.000926676366361789


and if we apply the sandwich rule to it as we did to the asymptotic series, we finally get $$0.8820792 \lt \epsilon(2) \lt 0.8820834$$ which is pretty good when considering that we are working with an `asymptotic' series for the relatively small value $x=2$. This apparently unreasonable procedure is justified in the article by Rosser mentioned below.

### Final remarks

These procedures might seem a bit complicated when applied to $\epsilon(x)$, which we can evaluate in various ways. This is essentially because it decreases very rapidly as $x \rightarrow \infty$. But there are many functions $f(x)$ for which all of these techniques are important in various ranges of $x$. Frank Olver's book is the standard reference for the computational use of asymptotic series, and Boyd's and Rosser's articles are good places to start for the application of Euler's transformation and other related techniques.