Long gaps between primes

By Kevin Ford, Ben Green, Sergei Konyagin, James Maynard, and Terence Tao

Abstract

Let denote the th prime. We prove that

for sufficiently large , improving upon recent bounds of the first, second, third, and fifth authors and of the fourth author. Our main new ingredient is a generalization of a hypergraph covering theorem of Pippenger and Spencer, proven using the Rödl nibble method.

1. Introduction

Let denote the prime, and let

It is clear from the prime number theorem that

as the average gap between the prime numbers which are is . In 1931, Westzynthius Reference 46 proved that infinitely often, the gap between consecutive prime numbers can be an arbitrarily large multiple of the average gap, that is, as , improving upon prior results of Backlund Reference 2 and Brauer-Zeitz Reference 5. Moreover, he proved the quantitative bound⁠Footnote1

1

As usual in the subject, , , and so on. The conventions for asymptotic notation such as and will be defined in Section 2.

In 1935 Erdős Reference 11 sharpened this to

and in 1938 Rankin Reference 40 made a subsequent improvement

with . The constant was increased several times: to by Schönhage Reference 43, then to by Rankin Reference 41, to by Maier and Pomerance Reference 32, and, most recently, to by Pintz Reference 36.

Recently, in two independent papers Reference 13Reference 35, the authors lshowed that could be taken to be arbitrarily large, answering in the affirmative a long-standing conjecture of Erdős Reference 12. The methods of proof in Reference 13 and Reference 35 both introduced estimates on primes in very short arithmetic progressions, but these differed in some key aspects. The arguments in Reference 13 used recent results Reference 21Reference 22Reference 23 on the number of solutions to linear equations in primes, whereas the arguments in Reference 35 instead relied on multidimensional prime-detecting sieves introduced in Reference 33. Our main theorem is the following quantitative improvement.

Theorem 1 (Large prime gaps).

For any sufficiently large , one has

The implied constant is effective.

Our overall approach combines ideas from the two previous papers Reference 13Reference 35. There are two key ingredients which allow us to obtain the quantitative improvement. First, we incorporate a uniform version of the multidimensional sieve approach as worked out in Reference 34, which gives a quantitative improvement to the underlying estimates about primes. Second, we prove a generalization of a hypergraph covering theorem of Pippenger and Spencer Reference 37, which allows for an essentially optimal means of translating such estimates into a result on large gaps between primes. It is this covering theorem which is the key new ingredient in our work and may be of independent interest.

All approaches which obtain quantitative improvements beyond the average bound have used a sieving argument which is conjectured to be unable to produce a result stronger than . Moreover, in light of the essentially optimal bounds in our covering theorem for this problem and the current limitations of the multidimensional sieve estimates, Theorem 1 appears to be the strongest result one can hope to prove without improvements toward the Hardy-Littlewood prime -tuples conjecture, or a radically new approach.

In a sequel Reference 15 to this paper, a subset of the authors will extend the above theorem to also cover chains of consecutive large gaps between primes, by combining the methods in this paper with the Maier matrix method. In view of this, we have written some of the key propositions in this paper in slightly more generality than is strictly necessary to prove Theorem 1, as the more general versions of these results will be useful in the sequel Reference 15.

The results and methods of this paper have also subsequently been applied by Maier and Rassias Reference 31 (extending the method of the first author, Heath-Brown, and the third author Reference 14) to obtain large prime gaps (of the order of that in Theorem 1) that contain a perfect power of a prime for a fixed , and by Baker and Freiberg Reference 3 to obtain lower bounds on the density of limit points of prime gaps normalized by any function that grows slightly more slowly than the one in Theorem 1. We refer the interested reader to these papers for further details.

1.1. Historical background

Based on a probabilistic model of primes, Cramér Reference 8 conjectured that

Granville Reference 20 offered a refinement of Cramér’s model and has conjectured that the above is in fact at least . These conjectures are well beyond the reach of our methods. Cramér’s model also predicts that the normalized prime gaps should have exponential distribution, that is, for about primes , for any fixed . Numerical evidence from prime calculations up to Reference 44 matches this prediction quite closely, with the exception of values of close to , for which there are very little data available. In fact, , slightly below the predictions of Cramér and Granville.

Unconditional upper bounds for are far from the conjectured truth, the best being and due to Baker, Harman, and Pintz Reference 4. Even the Riemann hypothesis⁠Footnote2 furnishes only the bound Reference 7.

2

Some slight improvements are available if one also assumes some form of the pair correlation conjecture; see Reference 26.

All works on lower bounds for have followed a similar overall plan of attack: they show that there are at least consecutive integers in , each of which has a “very small” prime factor. To describe the results, we make the following definition.

Definition 1.

Let be a positive integer. Define to be the largest integer for which one may select residue classes , one for each prime , which together “sieve out” (cover) the whole interval . Equivalently, is the largest integer so that there are consecutive integers, each with a factor in common with .

The relation between this function and gaps between primes is encoded in the following simple lemma.

Lemma 1.1.

Write for the product of the primes less than or equal to . Then

Proof.

Set , and select residue classes , one for each prime , which cover . By the Chinese remainder theorem there is some , , with for all primes . We claim that all of the numbers are composite, which means that there is a gap of length amongst the primes less than , thereby concluding the proof of the lemma. To prove the claim, suppose that . Then there is some such that , and hence ; thus divides . Since , is indeed composite.

By the prime number theorem we have . Thus the bound of Lemma 1.1 implies that

as . In particular, Theorem 1 is a consequence of the bound

which we will establish later in this paper. This improves on the bound obtained by Rankin Reference 40.

The function is intimately related to Jacobsthal’s function . If is a positive integer, then is defined to be the maximal gap between integers coprime to . In particular is the maximal gap between numbers free of prime factors , or equivalently plus the longest string of consecutive integers, each divisible by some prime . The Chinese remainder theorem construction given in the proof of Lemma 1.1 in fact proves that

This observation, together with results in the literature, gives upper bounds for . The best upper bound known is , which comes from Iwaniec’s work Reference 28 on Jacobsthal’s function. It is conjectured by Maier and Pomerance that in fact . This places a serious (albeit conjectural) upper bound on how large gaps between primes we can hope to find via lower bounds for : a bound in the region of , far from Cramér’s conjecture, appears to be the absolute limit of such an approach.

The lower bound on certain values of Jacobsthal’s function provided by Equation 1.1, Equation 1.2 can be inserted directly into Reference 39, Theorem 1 to obtain a lower bound for the maximum over of , the least prime in the arithmetic progression , in the case when the modulus has no small prime factors. We have the following corollary.

Corollary 1.

For any natural number , let denote the maximum value of over all coprime to . Suppose that has no prime factors less than or equal to for some . Then, if is sufficiently large (in order to make , positive), one has the lower bound

Proof.

Apply [36, Theorem 1] with if and with if .

In view of Reference 39, Theorem 3, one may also expect to be able to prove a lower bound of the form

for a set of natural numbers of density , but we were unable to find a quick way to establish this from the results in this paper.⁠Footnote3

3

Inequality Equation 1.3 has recently been established by Li, Pratt, and Shakan Reference 30 for every positive integer except those with more than prime factors, fixed.

1.2. Method of proof

Our methods here are a combination of those in our previous papers Reference 13Reference 35, which are in turn based in part on arguments in earlier papers, particularly those of Rankin Reference 40 and Maier-Pomerance Reference 32. In order to make the lower bound in Theorem 1 as efficient as possible, we combine these ideas with a generalization of some arguments of Pippenger and Spencer Reference 37.

As noted above, to prove Theorem 1, it suffices to sieve out an interval by residue classes for each prime , where . Actually, it is permissible to have survivors in that are not sieved out by these residue classes, since one can easily eliminate such survivors by increasing by a constant multiplicative factor. Also, for minor technical reasons, it is convenient to sieve out rather than .

Following Reference 13, we will sieve out by the residue classes for both very small primes () and medium primes (between and ). The survivors of this process are essentially the set of primes between and . After this initial sieving, the next stage will be to randomly sieve out residue classes for small primes (between and ). (This approach differs slightly from the approach taken in Reference 35 and earlier papers, in which a fixed residue class is used for all small (and very small) primes instead.) This cuts down the set of primes to a smaller set , whose cardinality is typically on the order of . The remaining task is then to select integers for each prime between and , such that the residue classes cut down to a set of survivors of size .

Assuming optimistically that one can ensure that the different residue classes largely remove disjoint sets from , we are led to the need to select the integers so that each contains about of the primes in . In Reference 13, the approach taken was to use recent results on linear equations in primes Reference 21Reference 22Reference 23 to locate arithmetic progressions consisting entirely of primes for some suitable , and then to take . Unfortunately, due to various sources of ineffectivity in the known results on linear equations in primes, this method only works when is fixed or growing extremely slowly in , whereas here we would need to take of the order of . To get around this difficulty, we use instead the methods from Reference 35, which are based on the multidimensional sieve methods introduced in Reference 33 to obtain bounded intervals with many primes. A routine modification of these methods gives tuples which contain primes, for suitable large ; in fact, by using the calculations in Reference 34, one can take as large as for some small absolute constant (e.g. ), so that the residue class is guaranteed to capture primes in .

There is, however, a difficulty due to the overlap between the residue classes . In both of the previous papers Reference 13Reference 35, the residue classes were selected randomly and independently of each other, but this led to a slight inefficiency in the sieving: with each residue class containing approximately primes, probabilistic heuristics suggest that one would have needed the original survivor set to have a size about rather than if one is to arrive at after the final sieving process. This ultimately leads to the bound

as worked out in unpublished work of the fourth author—an additional loss of compared to Theorem 1.

To avoid this difficulty, we use some ideas from the literature on efficient hypergraph covering. Of particular relevance is the work of Pippenger and Spencer Reference 37 in which it is shown that whenever one has a large hypergraph which is uniform both in the sense of edges having constant cardinality and also in the sense of the degrees being close to constant in , one can efficiently cover most of by almost disjoint edges in . Unfortunately, the results in Reference 37 are not directly applicable for a number of technical reasons, the most serious of which is that the analogous hypergraph in this case (in which the vertices are the sifted set and the edges are sets of the form for various ) does not have edges of constant cardinality. However, by modifying the “Rödl nibble” or “semi-random” method used to prove the Pippenger-Spencer theorem, we are able to obtain a generalization of that theorem in which the edges are permitted to have variable cardinality. This generalization is purely combinatorial in nature and may be of independent interest beyond the application here to large prime gaps.

We will make a series of reductions to prove Theorem 1. To aid the reader, we summarize the chain of implications below, indicating in which section each implication or theorem is proven (above or below), and in which section one may find a statement of each theorem (in parentheses),

The deduction of Theorem 1 from Theorem 2 is easy and codifies the reduction of the problem to that of finding residue classes for primes in which cover all the primes in with exceptions. Theorem 5, proved using the sieve methods from Reference 34, postulates the existence of a weight function with certain average properties. It implies the existence of residue classes for primes , each containing many primes of , and moreover that each prime is covered by about the same number of these congruence classes . These properties are quantified in Theorem 4. Showing that Theorem 4 implies Theorem 2, i.e. that there exist choices for which efficiently cover most of the primes in , is accomplished with our new hypergraph covering tool. The fundamental result is Theorem 3, which is written in a very general form and is consequently rather long to state. Corollary 4 is a somewhat shorter version specialized for our purposes.

For ease of reading, we have endeavored to separate the combinatorial arguments of Sections 4, 5, and 6 from the number theoretic arguments of Sections 7 and 8. Indeed, a reader only interested in our hypergraph covering result Theorem 3 can read Sections 4 and 5 as a stand-alone paper. A reader only interested in the number theoretic part of Theorem 1 can just read Sections 7 and 8 provided they are willing to assume the reduction of Theorem 2 to Theorem 5. The deduction of Theorem 2 from the purely combinatorial Corollary 4 and the purely number theoretic Theorem 5 is performed in the second half of Section 4 and in Section 6, and does not require reading the more specialized Section 5, 7, or 8.

2. Notational conventions

In most of the paper, will denote an asymptotic parameter going to infinity, with many quantities allowed to depend on . The symbol will stand for a quantity tending to zero as . The same convention applies to the asymptotic notation , which means . We use , , and to denote the claim that there is a constant such that throughout the domain of the quantity . We adopt the convention that is independent of any parameter unless such dependence is indicated, e.g. by subscript such as . In all of our estimates here, the constant will be effective (we will not rely on ineffective results such as Siegel’s theorem). If we can take the implied constant to equal , we write instead. Thus for instance

is synonymous with

Finally, we use synonymously with .

When summing or taking products over the symbol , it is understood that is restricted to be prime.

Given a modulus and an integer , we use to denote the congruence class of in .

Given a set , we use to denote its indicator function, and thus is equal to when and zero otherwise. Similarly, if is an event or statement, we use to denote the indicator, equal to when is true and otherwise. Thus for instance is synonymous with .

We use to denote the cardinality of , and for any positive real , we let denote the set of natural numbers up to .

Our arguments will rely heavily on the probabilistic method. Our random variables will mostly be discrete (in the sense that they take at most countably many values), although we will occasionally use some continuous random variables (e.g. independent real numbers sampled uniformly from the unit interval ). As such, the usual measure-theoretic caveats such as “absolutely integrable,” “measurable,” or “almost surely” can largely be ignored by the reader in the discussion below. We will use boldface symbols such as or to denote random variables (and non-boldface symbols such as or to denote deterministic counterparts of these variables). Vector-valued random variables will be denoted in arrowed boldface, e.g. might denote a random tuple of random variables indexed by some index set .

We write for probability, and for expectation. If takes at most countably many values, we define the essential range of to be the set of all such that is non-zero, and thus almost surely takes values in its essential range. We also employ the following conditional expectation notation. If is an event of non-zero probability, we write

for any event , and

for any (absolutely integrable) real-valued random variable . If is another random variable taking at most countably many values, we define the conditional probability to be the random variable that equals on the event for each in the essential range of , and similarly we define the conditional expectation to be the random variable that equals on the event . We observe the idempotency property

whenever is absolutely integrable and takes at most countably many values.

We will make frequent use of the basic inequalities of Markov

and Chebyshev

The latter implies, when the variance is small, that a random variable is highly concentrated.

Lemma 2.1.

Suppose that for some and we have

Then, for any we have

Proof.

We first derive an upper bound on the variance

Then, using Equation 2.3, we obtain