Skip to Main Content

Why Do We Expect Lots of Twin Primes?

One important question is, how are prime numbers distributed among the integers? The answer is, in a very few words, rather randomly, given certain basic statistics...

Bill Casselman
University of British Columbia, Vancouver, Canada
Email Bill Casselman

Email to a friendMail to a friend Print this articlePrint this article


Many years ago it was conjectured that there are an infinite number of pairs of prime numbers $p$, $q$ with $q = p+2$. There has been some recent (and famous) progress on this problem. I shall say little about recent progress, but discuss instead what we expect--rather than what we know--about such pairs. Nothing I shall say is new, and I have to some extent merely elaborated on what Andrew Granville says in his recent expository note on the recent advances.

How many primes do we expect?

A prime number is a positive integer that has no divisors except itself and $1$. The definition is simple, but as soon as one starts to explore the significance of prime numbers one realizes that they exhibit much subtle behaviour. One important question is, how are prime numbers distributed among the integers? The answer is, in a very few words, rather randomly, given certain basic statistics. What do I mean by this?

Well, here is a square array picturing all the primes up to $400$:


There are some patterns in this array, some of which I'll comment on later. Most of those you perceive are caused by some artefact of the display. For example, every other column is essentially empty, and this is because the only even prime is $2$. Also, every fifth column is essentially empty. More interesting are the twin prime pairs (some of which are not so visible because they split into two rows, such as $59$, $61$), and also some patterns related to the last digits displayed. These are what might be called local patterns. There are no global ones apparent.

There has been much effort put into extremely efficient ways to produce prime numbers, but these days even a very simple program takes only a small amount of time to make large lists of primes. One of the most fruitful programs will make a large list of primes and then use this to calculate the function $\pi(n)$, the number of primes $\le n$.

One thing this can be used for is to get a rough idea of the frequency of primes. There are $135$ primes in the range $[0,1000)$, $127$ in $[1000,2000)$, $120$ in $[2000,3000)$. More frequencies are in this table: $$ \matrix{ 0\hbox{-}999 & 1000\hbox{-}1999 & 2000\hbox{-}2999 & 3000\hbox{-}3999 & 4000\hbox{-}4999 & \cr 135 & 127 & 120 & 119 & 114 \cr } $$ $$ \matrix{ 5000\hbox{-}5999 & 6000\hbox{-}6999 & 7000\hbox{-}7999 & 8000\hbox{-}8999 & 9000\hbox{-}9999 \cr 117 & 107 & 110 & 112 & 106 \cr }$$

and here is the graph of the first $100$ intervals of $1,000$ each:


The frequency of primes in the neighbourhood of $n$ seems to decrease as $n$ gets larger, although it is clear there is no simple rule. In fact, we know that the frequency of primes in the neighbourhood of $n$ is very roughly equal to $1/\log n$. Here $\log n$ is the natural logarithm of $n$, the logarithm with respect to Euler's number $e = 2.7128 \ldots$ This function is also characterized by the equation $$ \int_{1}^{x} {dy \over y } = \log x \, . $$ In any case, the claim about frequency suggests the approximation $$ \pi(x) \sim {\rm Li}(x) = \int_{2}^{x} { dy \over \log y} \, . $$

The integral is a bit unfamiliar, but integration by parts gives $x/\log x$ as a fair approximation. Here is the graph of $\pi(x)$ compared with those of ${\rm Li}(x)$ and $x/\log x$ in the range $[0, 10^{4}]$:


The approximation looks great! So, even if $\pi(x)$ bounces around a bit, apparently without much pattern, we have a fairly good estimate for it, which means we have a good idea of how the primes grow, at least on the average.

It should be kept in mind, however, that the apparent smoothness is deceiving. Close up, here is what the top two curves look like:



Heuristic reasoning about twin primes

Twin primes are a pair $p$, $p+2$ which are both primes. For example, from one of the plots above we see the twin primes $$ [3,5], [5,7], [11,13], [29,31], [41,43], [59,61], [71,73] $$

and others. It has been conjectured that there are an infinite number of them, but although there is much empirical evidence for this it has not yet been proved.

Perhaps the strongest evidence is a formula that approximates the number $\pi_{2}(x)$ of twin primes $\le x$ remarkably well. I am not quite sure about the origin of this formula, but it first appeared in a paper by G. H. Hardy and J. E. Littlewood (themselves a pair of twin primes) dated 1923. The paper contains several similar formulas, all based on the same method, which they had used to attack many problems earlier. This was their circle method. Their derivation of the formula for $\pi_{2}(x)$ is not at all rigourous, but it is very convincing. There is a succinct account of this in an appendix to Andrew Granville's article.

But just after World War II Lord Cherwell (Frederick Lindemann) suggested a more elementary probabilistic (and still not rigourous) derivation. This led to a colaboration with E. M. Wright and eventually to a posthumous joint publication. This is explained in Section 22.20 of the well-known text by Hardy and Wright. A slightly more elementary version of this can also be found in an appendix of Granville's article (Section 2.5). I follow him.

The starting point for the plausible reasoning is our estimate for $\pi(x)$, the number of primes less than or equal to $x$. The basic idea of the derivation is that the prime numbers are, to a good extent, distributed randomly. We know that in the neighbourhood of $x$ the density of primes is about $1/\log x$. If the primes were distributed randomly, then the probability that any two given numbers near $x$ are primes would then be just the product of local probabilities, which is $1/\log^{2} x$. This certainly is specious reasoning, since if $n > 2$ the probability that both $n$ and $n+1$ are primes is $0$, while the fact that $p$ is a prime would seem to increase the chances that $p+2$ is. There are similar considerations to take into account because of possible divisibility by other small primes, too, in the neighbourhood of $x$.

There is one simple fact in this story. If $q$ is any prime, then the probability of choosing an integer at random that is not divisible by $q$ is $1-1/q$. The probability of choosing two that are not divisible by $q$ is hence the product $(1 - 1/q)^{2}$. Suppose $p$ and $p+2$ are both primes, then neither $p$ nor $p+2$ is divisible by $q$. If $q = 2$, this happens $1/2$ the time. If $q \ne 2$, the probability that $p \not\equiv 0$ and $p \not\equiv -2$ modulo $q$ is $(1 - 2/q)$. Because of the Chinese Remainder Theorem, divisibility by two primes is an independent event. So if $x$ is large and $Q$ is relatively small the probability that any pair of numbers $m$ and $n$ near $x$ will not be a multiple of any prime up to $Q$ is $$ { 1\over 2} \cdot \prod_{q \le Q} { (1 - 1/q)^{2} } \, . $$

Let $\Gamma_{Q}$ be the ratio of this to $\prod_{q \le Q}(1 - 2/q)$: $$ \Gamma_{Q} = { 1\over 2} \cdot \prod_{q \le Q} { (1 - 1/q)^{2} \over (1 - 2/q) } $$ We have $$ { 1 - 2/q \over (1-1/q)^{2} } = { (1 - 1/q)^{2} - 1/q^{2} \over (1 - 1/q)^{2} } = 1 - { 1 \over (q -1)^{2} } \, . $$ A standard criterion asserts that this product converges if the sum $\sum 1/(q-1)^{2}$ converges. But it does converge since the integral test tells us that $\sum 1/n^{2}$ converges. Therefore this product actually converges to a number $\Pi_{\infty}$ as $Q \rightarrow \infty$. Let $C_{2}$ be its inverse. Section 2.5 of Granville's article says that it ought to be intuitively reasonable at this point that the number $\pi_{2}(x)$ of twin primes $\le x$ is approximated by $$ \Pi_{2}(x) = C_{2} \int_{2}^{\infty} { dx \over \log^{2} x } \, . $$

The text of Hardy and Wright lay out a more careful analysis that makes the intuitive leap a bit clearer.

The value of the constant here was calculated a long time ago by J. W. Wrench to be about $1.32032363169373914785562422002911155686525 \ldots $. Here is a comparison of some values of $\pi_{2}(x)$ and the formula in the range $[0, 10^{5}]$: $$ \matrix{ n & 10,000 & 20,000 & 30,000 & 40,000 & 50,000 \cr \pi_{2}(n) & 205\phantom{.0} & 342 \phantom{.0} & 467 \phantom{.0} & 591 \phantom{.0} & 705 \phantom{.0} & \cr \Pi_{2}(x) & 214.2 & 357.7 & 486.7 & 607.4 & 722.5 \cr } $$ $$ \matrix{ n & 60,000 & 70,000 & 80,000 & 90,000 & 100,000 \cr \pi_{2}(n) & 811 \phantom{.0} & 905 \phantom{.0} & 1007 \phantom{.0} & 1116 \phantom{.0} & 1224\phantom{.0} \cr \Pi_{2}(x) & 833.3 & 940.9 & 1045.7 & 1148.2 & 1248.7 \cr } $$


Dickson's conjecture

Twin primes are a special case of something far more general. One can ask, are there are infinite number of pairs $p$, $p+4$ or $p$, $p+6$ that are primes? Or $p$, $p+10$? The case of $p$ and $p+10$ is certainly suggested by one of the images above, in which such pairs are easily visible. How about triples of some kind? Suppose that $a_{1}$, $a_{2}$, $\ldots$ , $a_{k}$ make up an array of integers. Can we expect that there are an infinite number of sets $n+a_{1}$, $n+a_{2}$, $\ldots$ , $n+a_{k}$ all of which are prime?

One has to be a bit careful. There are no prime pairs of the form $p$, $p+1$ (past $p = 2$), because one of these two numbers must be divisible by $2$. Similarly, past $p = 3$ there are no prime triples $p$, $p+2$, $p+4$ because one of these must be divisible by $3$.

This last example should be enlightening. The number $p+4$ is divisible by $3$ if and only if $p+1$ is divisible by $3$. Therefore $3$ divides one of the numbers $p$, $p+2$, $p+4$ if and only if it divides $p$, $p+1$, $p+2$. But this is sure to happen, since the last triple clearly covers all numbers modulo $3$.

In general, $p$ is said to be an obstruction for the array $(a_{i})$ if $p$ always divides at least one of every sequence $n+a_{i}$ ($i = 1$, $\ldots$, $k$). This happens if and only if the numbers $a_{i} \; {\rm mod} \, p$ fill up all of $Z/p$.

This can never happen if $p > k$, so in order to check whether some $p$ obstructs $(a_{i})$ we need only check whether the $a_{i}$ cover $Z/p$ for all $p \le k$. We call the array $(a_{i})$ admissible if it has no prime obstructions.

For example, $(1,3,7,9)$ is an admissible array, since, as you can check easily, neither $2$ nor $3$ is an obstruction. Several prime sequences fitting into this pattern can be seen in one of the diagrams above. Other examples are (a) any $0$, $2n$; (b) $0$, $4$, $6$; (c) $0$, $2$, $6$.

There exist arbitrarily large admissible sets. In fact:

Theorem. If $a_{1}$ to $a_{k}$ is any set of distinct primes $> k$, they make up an admissible set.

Why is this? If $p$ is any prime where $p> k$, then we have already seen that it cannot be an obstruction. But if $p \le k$, then it cannot divide any of the $a_{i}$, none of which can be congruent to $0$ modulo $p$. So the $a_{i}$ do not fill up $Z/p$.

A special case of a conjecture made originally by L. E. Dickson is that if there are no prime obstructions to $(a_{i})$ then there exist an infinite number of prime sequences of the form $(n +a_{i})$.

What has happened recently

This not the place to tell the whole story, which has been covered thoroughly in other places. I recommend particularly the Quanta article by Erica Klarreich for a popular account, and Granville's essay for a more technical one. I just want to state a bit more precisely than in Klarreich's account what the most famous of the new results is.

It is due to Yitang Zhang. The simplest statement of what he proved is that there exist an infinite number of intervals $[n, n+70,000,000)$ containing at least two primes. This is weaker than the twin primes conjecture, but conceptually very close to it. In subsequent work (by many mathematicians) the size of the gap has been severely reduced, but there is presumably no way to reduce it to $2$ with current methods.

As explained by Granville, Zhang's first and arguably his main result is that there exists $k$ with the property that if $a_{1}$, $\ldots$ , $a_{k}$ is an admissible set of $k$ elements, then there exist an infinite number of $n$ such that $\{ n +a_{i} \}$ contains at least two primes. (In Klarreich's Quanta article, the array is called a "comb." Keep in mind that according to Dickson's conjecture, we would expect to be able to say all $k$ are primes.) In fact, Zhang proved this claim to be true for a particular $k =3,500,000$ spanning an interval of $70,000,000$. The by now famous consequence is for some $m \lt 70,000,000$ that there exist an infinite number of prime pairs $p$, $p+m$.

Reading further

  • Lord Cherwell and E. M. Wright, "The frequency of prime patterns", The Quarterly Journal of Mathematics 11 (1960).

    Those who know something of the life of Lord Cherwell, otherwise known as Frederick Lindemann, will be very surprised to learn of his contribution to number theory. The Wikipedia biography does not adequately suggest all of the reasons for his poor reputation as principal scientific advisor to Churchill during World War II.

  • L. E. Dickson, "A new extension of Dirichlet's theorem on prime numbers", Messenger of Mathematics 33.

    See also the Wikipedia entry on Dickson's conjecture.

  • Andrew Granville, "Primes in intervals of bounded length". Available from his home page.
  • G. H. Hardy, Ramanujan, Cambridge University Press.

    Section 2.5 contains an ideal derivation of a possible formula for $\pi(x)$.

  • G. H. Hardy and J. E. Littlewood, "Some problems of 'Partitio numerorum' III: on the expression of a number as a sum of primes", Acta Mathematica 44 (1923).
  • G. H. Hardy and E. M. Wright, The theory of numbers, Oxford Press.
  • Erica Klarreich, "Together and alone, closing the prime gap", Quanta November 19, 2013.
  • B. Riemann, "On the number of prime numbers less than a given quantity". An English translation of this classic by David Wilkins is available from a link in Wikipedia.

    Among other things, this sketches a derivation of the approximate formula for $\pi(x)$.

  • J. W. Wrench, "Evaluation of Artin's constant and the twin-prime constant", Mathematics of Computation 76 (1961).

    An early computation of $C_{2}$.

  • Lists of primes up to $1,000,000,000,000$
  • A list of the first 20,000 twin primes.

Bill Casselman
University of British Columbia, Vancouver, Canada
Email Bill Casselman