## Random Numbers: Nothing Left to ChanceThe generation of random numbers is too important to be left to chance. --Robert CoveyouDavid Austin
It is often said that mathematics is about understanding patterns and that these patterns are what makes mathematics beautiful. This article, however, is about how mathematics may be used to create, or at least simulate, randomness, the lack of patterns. Surprisingly, we will see that it is difficult to create random behavior (in fact, the words "create randomness" may sound a little jarring) and to determine when we have successfully done it. We will focus on constructing random sequences of numbers. There are many reasons why we might want to do this. For centuries, people have used randomness--a coin toss, for instance--to make important decisions. Of course, many states run lotteries that use a sequence of random numbers to determine the recipients of huge amounts of money. Some readers may remember the draft lottery used during the Vietnam war. In 1969, the days of the year were ordered by drawing pieces of paper, each with a day of the year written on it, from a bin. Young men born on those days drawn first were more likely to be drafted. A statistical analysis later found that, because they were placed in the bin later, days toward the end of the year were more likely to be drawn. In other words, the days were not ordered randomly, and a more mathematical solution was proposed for subsequent years' lotteries. Currently, random numbers are used to form encryption keys when information needs to be passed securely, say, across the Internet. Random sequences of numbers are also important for simulating real-world phenomena. For instance, a bank that operates a network of ATM machines may wish to test its software by simulating the actions of customers accessing their accounts at random times through random machines. Random numbers are also used by so-called Monte Carlo methods to obtain numerical approximations to problems that may be too difficult to solve exactly. For instance, imagine that we do not know the area of the unit circle. We could place the unit circle inside a square of known area and randomly add points to the square. The percentage of points that lie inside the circle should approximate the ratio of the area of the circle to the area of the square. The figure below shows 2000 points randomly placed inside a square whose sides have length 2.2, 1308 of which lie inside the circle. The approximate area of the circle is then the area of the square, 4.84, times 1308/2000, which is 3.16536 (the exact area is, of course, ).
## Where do random numbers come from?Most of us already have an intuitive notion of what random numbers are, but, as we'll see, this is a difficult question to answer precisely. Simply said, we should not be able to use the numbers that we have seen to predict which number will come next or to restrict the possibilities. For example, you may wish to think of rolling an ideal die. On each roll, each of the numbers--1, 2, 3, 4, 5, 6--is equally likely to appear. If we roll many times, we may expect to see each of the six possibilities roughly one-sixth of the time. In addition, if we count the number of successive pairs, we would expect to see each of the 36 possible pairs appear with roughly equal frequency. It is possible, however, that we roll the die one thousand times and obtain a "1" each time. Regardless, each number is equally likely to show up on the 1001 At first glance, it may seem like a simple thing to generate random numbers. In fact, you may think that humans could do this easily by, for instance, just writing sequences of 0's and 1's in a stream of consciousness. By following this link, you will be able to write a sequence of one hundred 0's and 1's that will be analyzed using a few of the tests we'll encounter later in this article. If your sequence passes these tests, it shares several properties with truly random sequences. However, you will most likely find it difficult to write a sequence that passes these tests. Humans have a difficult time creating random sequences by trying to "act randomly." Truly random sequences of numbers may be generated by natural phenomena. Indeed, the web site HotBits, makes available random sequences generated by recording the times between the decays of radioactive particles. For instance, the time between two decays is recorded and compared to the time between the next two. If the first time is shorter than the second, a "0" is recorded, but if the first time is longer than the second, a "1" is recorded. In this way, a sequence of random bits, which may be concatenated to form random integers, is produced. Another site, Random.org, works in a similar way by recording the amplitude of atmospheric noise. An interesting question is whether the digits of form a random sequence. Numerical evidence, based on looking at trillions of digits, suggests that they do, but not enough is known at this time to say definitively. ## Random number generatorsWhile some sequences generated by natural phenomena produce truly random numbers, many applications require that we be able to create random numbers efficiently inside a computer. This may sound impossible: computers simply execute a set of instructions whose output is determined by the input. Since we supply the computer with the instructions and the input, the output is determined by our choices. How can such a number be random? The answer is that it's not. Instead, our goal is to use a procedure that hides our footprints so that the numbers create the illusion of randomness. More precisely, we want the numbers to share many properties that we would expect a truly random sequence to enjoy. Such a procedure is often called a John von Neumann proposed using the following method as one of the first random number generators. Suppose we want to create eight-digit numbers. Begin with an eight-digit number
X_{0}^{2} = 1252162343440836 so that our next number is
Since it is difficult, at first glance, to find a pattern in these numbers, we may think that this is an appropriate way to find random numbers. In other words, we have created the illusion of randomness through a deterministic process. Further study shows, however, that this is not a good random number generator. Each term in the sequence depends only on its immediate predecessor and there are only a finite number of possible terms. This means that the sequence will inevitably repeat. The problem is that the sequence can get caught in relatively short cycles. For instance, if the number 31360000 appears in the sequence at some point, we end up with this number again after another 99 iterations and this cycle continues indefinitely. The middle square method may give you the idea to ask the computer to perform a sequence of many, unrelated, random operations. In fact, Knuth constructed an algorithm that used 13 "peculiar" steps to produce the next number in a sequence of 10-digit numbers and found that it was a very poor way to generate random numbers. I created a simple algorithm to generate two-digit numbers using apparently "random" operations.
What may seem surprising is that this algorithm produces the short cycle 91, 96, 46, 1, 64, 73, 72, 91, ...., and most seeds lead us to this cycle. As Knuth writes, "The moral of the story is that ## Linear congruential generatorsSurprisingly, a good source of pseudorandom number generators comes from the
We then form the next terms in the sequence by
X_{n+1} = f (X_{n}) = (aX_{ n} + c) mod m. At first glance, it is hard to believe this will produce the illusion of randomness. Remember that we want each term of the sequence to appear to be independent of its predecessor. Can we really accomplish this using a linear function? It turns out that the art is in choosing appropriate values of Of course, there are only a finite number of possible numbers 0, 1, 2, ...,
For example, the choices , and
X_{n+1} = (106X + 64) mod 8505. _{n}Shown below are the first few points (
Typically, integers are represented in a computer using 32 bits so it is natural to choose a value of Notice that
X_{0}, X_{0} + 1, X_{0} + 2, X_{0} + 3, X_{0} + 4, ..., which is hardly random. Clearly, we need another condition, in addition to the maximal period condition, if we are going to create random numbers using this method. A second condition, which we will now describe, will be quantified using the m.To understand this quantity, begin with
If we expand
If the potency is 1, then we have cn) mod m. This sequence is certainly not random.If the potency is 2, then Continuing this thinking shows that we need the potency to be relatively high to achieve the illusion of randomness. Experimenting shows that a potency of at least 5 is necessary to create useful random numbers. However, a linear congruential sequence with a potency of 5 is not guaranteed to be random. In our example above, we had and , which produces a linear congruential sequence of maximal period with potency 5 (the first few points are shown on the left below). In constrast, if we take and , we obtain a linear congruential sequence with maximal period and potency 2 (the first few points are on the right).
In spite of the fact that linear congruential sequences are simply defined, the numbers produced are good enough for many applications. In fact, random numbers provided by computer programming languages, such as Java, are often created by linear congruential generators. The choice of the seed Other methods, besides the linear congruential method, have been proposed for creating random sequences. Two other ideas come to mind. First, we might choose a quadratic function rather than a linear one. Robert Coveyou suggested the function:
X_{n+1} = X_{n} (X_{n} + 1) mod 2^{e} and showed that it produced sufficiently random numbers. Also, we could consider using more than the previous term in the sequence to generate the next. An example is the
X_{n+1} = (X_{n-24} + X_{n-55}) mod m once the first 55 terms of the sequence are generated using, say, a linear congruential generator. The appearance of 24 and 55 in this expression, chosen after careful study, guarantees that the sequence has a suitably long period. ## How do we recognize random numbers?If we propose a method for creating random numbers, we need some way to measure how good its results are. We will now describe a few tests that are important but still somewhat inadequate. To get started, we will describe one test and how it is implemented. Then we will describe a collection of subsequent tests that may be implemented in a similiar way. Let's consider a simple case in which the numbers we generate are either 0 or 1. There are 32 = 2
What we will now describe is known as the
as a measure of how our sequence compares to a random sequence. Smaller values of For example, suppose we have
r_{0} = 10, r_{1} = 46, r_{2} = 95, r_{3} = 107, r_{4} = 60, r_{5} = 15, for which we compute Knuth suggests that our sequence fails the test when Knuth also suggests running the test three different times on different sequences produced by the generator we are testing. If any test gives a suspicious result, we should not view the generator as producing sufficiently random results. In the same way, we may wish to repeat the test many times and determine how closely the percentages There is a widely accepted collection of such tests, known as the **Frequency test:**We would like our numbers to be uniformly distributed so that if the numbers come from 0, 1, 2, ...,*m*- 1, then each number occurs with roughly equal frequency. In this case, we let*r*_{i}be the number of times*i*occurs in our sequence and notice that*p*_{i}= 1/*m*.
Of course, the nonrandom sequence 0, 1, 2, ... would pass this test. We may therefore ask in addition that every pair (*i*,*j*) occur in successive elements with equal frequency. Here we let*r*_{i,j}count the number of times the pair (*i*,*j*) occurs in the sequence and note that*p*_{i,j}= 1/*m*^{2}. This is called the*2-distribution test*and may be extended to the more general*k*-distribution test.**Permutation test:**Select some small value for*t*such as*t*= 3, break the sequence into disjoint groups of*t*consecutive elements and assume that no two elements in a group are equal. There are*t*! possible ways in which the elements are ordered, each of which is equally likely.**Run test:**In this test, we count the length of "runs," portions of the sequence in which the terms either increase monotonically or decrease monotonically.**Gap test:**We fix a range of values, such as 0, 1, 2, ... ,*m*/ 2, and count the number of terms in the sequence separating two terms that appear in this range.**Poker test:**We form the elements in the sequence {*X*_{n}} into groups of five consecutive elements and count the number of "poker hands" that we have. These are:All different: abcde One pair: aabcd Two pairs: aabbc Three of a kind: aaabc Full house: aaabb Four of a kind: aaaab Five of a kind: aaaaa **Birthday spacings test:**This test was introduced by George Marsaglia in 1984. Think of the terms of the sequence*X*_{n}as birthdays and the numbers 0, 1, 2, ...,*m*- 1 as days of the year. We will study the spacing between the birthdays. To do this, first order the terms in the sequence in nondecreasing order and find the spacings between the birthdays. We let*r*_{i}be the number of times*i*occurs as a spacing.
## The spectral testThe tests we've examined so far may be applied to any random number generator and work by examining sequences output by the generator. The spectral test is specifically designed for linear congruential generators and works by studying the output of the generator over the entire period. We first construct the set of all points (
Notice that all the points lie on a finite family of equidistant parallel lines.
In fact, if you look carefully, you will notice that there are many such families of parallel lines containing all the points. This is to be expected since the function The accuracy is greater for the generator on the left (which also has a higher potency) since the maximum distance between families of parallel lines is smaller than for the example on the right.
To get a feel for why the accuracy should reflect the quality of the random number generator, let's look at the plot of one-twentieth of the points in a period.
Remember that all the points will lie on families of lines. When the accuracy is large, the families of lines are all tightly packed together. When one family of lines is relatively far apart, there is less flexibility for where the points will appear, which makes the sequence appear less random. The spectral test may be applied in other dimensions as well. For instance, in dimension three, we form the collection of points {( There is an efficient algorithm for computing the accuracy in low dimensions. Indeed, the spectral test is the best available for linear congruential generators: all generators that are known to be good pass the spectral test while all generators known to be bad fail it. ## ConclusionsAs we've seen, it can be hard to generate a sequence of random numbers and hard to determine exactly how random the sequence is. It is also surprising that one of the most popular means for generating random numbers is given by such a simple linear function. In 1997, Makoto Matsumoto and Takuji Nishimura found a new random number generator, which they named the Mersenne Twister, with the phenomenally large period 2 ## References- Donald E. Knuth,
*The Art of Computer Programming, Volume 2,*Third Edition, Addison-Wesley, Boston. 1997. Knuth's book is an excellent, comprehensive reference. Start here if you are interested in learning more. - Stephen Park and Keith Miller, Random Number Generators: Good Ones are Hard to Find, Communications of the ACM, October 1988, Volume 31, Number 10, 1192-1201.
- Makoto Matsumoto and Takuji Nishimura, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator, ACM Transactions on Modeling and Computer Simulation, January 1988, Vol. 8, No. 1, 3-30.
- HotBits, Genuine random numbers, generated by radioactive decay.
- Random.org, a site that provides random numbers obtained through variations in atmospheric noise.
- The Diehard tests, maintained by George Marsaglia at Florida State University.
- Statisticians Charge Draft Lottery Was Not Random,
*New York Times,*January 4, 1970, page 66.
David Austin
Those who can access JSTOR can find some of the papers mentioned above there. For those with access, the American Mathematical Society's MathSciNet can be used to get additional bibliographic information and reviews of some these materials. Some of the items above can be accessed via the ACM Portal , which also provides bibliographic services. |
Welcome to the These web essays are designed for those who have already discovered the joys of mathematics as well as for those who may be uncomfortable with mathematics. Search Feature Column Feature Column at a glance |