The sharp threshold for bootstrap percolation in all dimensions

In r-neighbour bootstrap percolation on a graph G, a (typically random) set A of initially 'infected' vertices spreads by infecting (at each time step) vertices with at least r already-infected neighbours. This process may be viewed as a monotone version of the Glauber dynamics of the Ising model, and has been extensively studied on the d-dimensional grid $[n]^d$. The elements of the set A are usually chosen independently, with some density p, and the main question is to determine $p_c([n]^d,r)$, the density at which percolation (infection of the entire vertex set) becomes likely. In this paper we prove, for every pair $d \ge r \ge 2$, that there is a constant L(d,r) such that $p_c([n]^d,r) = [(L(d,r) + o(1)) / log_(r-1) (n)]^{d-r+1}$ as $n \to \infty$, where $log_r$ denotes an r-times iterated logarithm. We thus prove the existence of a sharp threshold for percolation in any (fixed) number of dimensions. Moreover, we determine L(d,r) for every pair (d,r).

To be precise, let A 0 = A, and define for each t ∈ N, where N (v) denotes the set of (nearest) neighbours of v in G, and |S| denotes the cardinality of a set S. We think of the set A t as the vertices which are infected at time t, and write [A] = t A t for the closure of A under the process. We say that the set A percolates if the entire vertex set is eventually infected, i.e., if [A] = V (G).
The bootstrap process was introduced in 1979 by Chalupa, Leath and Reich [16] in the context of disordered magnetic systems, and has been studied extensively by mathematicians (see, for example, [2,4,8,13,31,39]) and physicists [1,12,30,36], as well as by computer scientists [17,20] and sociologists [24,41], amongst others. Motivated by these physical models, we shall consider bootstrap percolation on the grid [n] d = {1, . . . , n} d , and an initial set A ⊂ V (G) whose elements are chosen independently at random, each with probability p. We shall write P p for this distribution; throughout the paper, A will always denote a random subset of V (G) chosen according to P p .
It is clear that the probability of percolation is increasing in p, and so we may define the critical probability, p c (G, r) as follows: p c (G, r) := inf p : P p A percolates in the r-neighbour process on G 1/2 .
Our aim is to give sharp bounds on p c ([n] d , r), and to bound the size of the 'critical window' in which the probability of percolation shifts from o(1) to 1 − o(1). The first rigorous results on bootstrap percolation were obtained by van Enter [19] and Schonmann [39], on the infinite lattice Z d , and by Aizenman and Lebowitz [2], on the finite grid. In particular, Schonmann proved that p c (Z d , r) = 0 if r d, and p c (Z d , r) = 1 otherwise. The finite-volume behaviour (also known as 'metastability') was studied in [2,13,14], and the threshold function p c ([n] d , r) was determined up to a constant factor, for all d r 2, by Cerf and Manzo [14]. The first sharp threshold was determined by Holroyd [31], in the case d = r = 2, who proved that p c [n] 2 , 2 = π 2 18 log n + o 1 log n as n → ∞, and a corresponding result in three dimensions was recently proved in [6]. However, a longstanding open question (see, for example, [2,3,14,31]) was to determine whether there is sharp transition for p c ([n] d , r) (for fixed d and r, as n → ∞), and if so, whether there is a limiting constant. We resolve this question affirmatively, and determine the constant for every pair (d, r).
In order to state our main result we first need to recall some functions from [6]. First, for each k ∈ N, let The following theorem is the main result of this paper. Let log denote the natural logarithm, and let log (r) denote an r-times iterated logarithm, log (r+1) (n) = log log (r) (n) . as n → ∞.
Remark 1. We shall moreover obtain explicit bounds on the probability that A percolates outside the critical window. To be precise, for any ε > 0 we shall prove that, if p = (1−ε)p c , then P p A percolates n −d(r−2)−δ for some δ = δ(ε) > 0 (see Corollary 23 and Theorem 27). In [6] it was proved that if p = (1 + ε)p c , then Some special cases of Theorem 1 were known previously. Indeed, as noted above, the case d = r = 2 was proved by Holroyd [31], and the case d = r = 3, and the upper bound in Theorem 1, were proved by Balogh, Bollobás and Morris [6]. Holroyd [32] also proved a sharp threshold for a 'modified' bootstrap percolation in an arbitrary (constant) number of dimensions. The modified model is much simpler to study, however, and the critical threshold differs from ours by a factor of about d. A weaker notion of sharpness was proven for r = 2 and all d 2 by Balogh and Bollobás [3], using a general result of Friedgut and Kalai [23]. Their result implies that the critical window is of order o(p c ), but not that the sequence p c ([n] d , 2)(log n) d−1 converges.
We shall prove Theorem 1 by induction on r, and in order for the proof to work we shall need to strengthen the induction hypothesis. A bootstrap structure is a graph G, together with a function r : V (G) → N which assigns a 'threshold' to each vertex of G. Bootstrap percolation on such a structure is then defined in the obvious way, by setting A 0 = A and for each t 0, and letting [A] = t A t .
The following family of bootstrap structures, which we call C([n] d × [k] , r), will be a crucial tool in our proof. We think of [ (see Theorem 27, below, and Theorem 5 of [6]). The main difficulty will lie in proving the result below, which implies the lower bound in the case r = 2. We define the diameter diam(S) of a set S ⊂ Z d+ to be diam(S) := sup x − y ∞ + 1 : x, y ∈ S and (x ↔ y) S , where we write (x ↔ y) S to indicate that there exists a path from x to y (in the graph Z d+ ) using only vertices of S.
Recall that [A] denotes the closure of A under the bootstrap process. The following theorem will be the base case of our proof by induction.
Theorem 2. Let d, ∈ N, with d 2, and let ε > 0. Let B > 0 and k k 0 (B) ∈ N be sufficiently large, and let the elements of A ⊂ C([n] d × [k] , 2) be chosen independently at random with probability p, where Then The rest of the paper is organised as follows. In Sections 2 and 4 we review some basic definitions and tools from [6], and in Section 3 we give a brief sketch of the proof. In Section 5 we bound the probability that a rectangle is 'crossed' by A, in Section 6 we present some basic analytic tools, and in Section 7 we deduce Theorem 2 using ideas from Holroyd's proof of the case d = r = 2. In Section 8 we recall the method of Cerf and Cirillo [13], and in Section 9 we deduce Theorem 1. Finally, in Section 10, we state some open problems and conjectures.
A component of a set S ⊂ Z d is a maximal connected set in the graph Z d [S] (the subgraph of Z d induced by S). Given a subset S ⊂ [n] d × [k] , let R(S) denote the smallest rectangle such that S ⊂ R(S).
We next define the span A of a set A in C([n] d × [k] , 2). The definition we give here is slightly different from that in [6], but has many of the same properties (see Section 4). This definition simplifies the proof in Section 9.
Definition. Let n, k ∈ N and A ⊂ C([n] d × [k] , 2). Let C 1 , . . . , C m denote the collection of connected components in [A]. The span of A is defined to be the following collection of rectangles: is connected (i.e., m = 1), then we say that A spans the rectangle R(C 1 ). If R ∈ A ∩ R , then we say A internally spans R.
If A = {R}, i.e., A spans R, then we shall write A = R. If S is a set and S ⊂ [A ∩ S] then we shall say that A internally fills S.
Given a set S, and p ∈ [0, 1], say that A ∼ Bin(S, p) if the elements of A ⊂ S are chosen independently at random with probability p.
, i.e., the probability that A internally spans R.
A set is said to be occupied if it is non-empty (i.e., contains some element of A), and it is said to be full if every site is in A. We shall use throughout the paper the notation as in [31]. Note that p ∼ q for small p. The advantage of this notation is the fact that Let u(x) = 1 − e −qx for any x ∈ R, and note that this is the probability that a set S of size x is empty (i.e., not occupied) under P p . Given x ∈ R d and j ∈ [d], we define is a rectangle, then let u j (R) = u j dim(R) . We next recall the concept of disjoint occurrence of events, and the van den Berg-Kesten Lemma [10], which utilizes it. An event E defined on subsets of [N ] is increasing if (S ⊂ T )∧E(S) implies E(T ). In the setting of bootstrap percolation on a graph G, two increasing events E and F occur disjointly if there exist disjoint sets S, T ⊂ V (G) such that the infected sites in S imply that E occurs, and the infected sites in T imply that F occurs. (We call S and T witness sets for E and F .) We write E • F for the event that E and F occur disjointly.
The van den Berg-Kesten Lemma. Let E and F be any two increasing events defined in terms of the infected sites A ⊂ V (G), and let p ∈ (0, 1). Then We remark here, for ease of reference, that there will be various constants which appear in the proof of Theorem 2, which will depend on each other, but not on p. These will be chosen in the order first B (for 'big'), then δ, k, Z (for 'seed'), and finally T (for 'tiny'), and will satisfy T Z δ 1 B k. Each of these constants also depends on d, and ε, which are fixed at the start of the proof.

Sketch of the proof
To aid the reader's understanding, we shall give a brief outline of the proof of Theorem 1; we begin with the base case of our induction on r, Theorem 2.
, 2) and let A ⊂ G be a random set, chosen with density The first step is to apply a lemma introduced by Aizenman and Lebowitz [2], which says that if diam([A]) B log n, then there exists an internally spanned rectangle R in G with B log n 2 long(R) B log n. We shall show that P p R ∈ A ∩ R , the probability that R is internally spanned, is at most where the last inequality follows from our choice of p. This implies (by the union bound) To bound the probability that R is internally spanned, we use the 'hierarchy method', which was introduced by Holroyd [31], and then adapted for our purposes in [6]. To be precise, we show that if A ∩ R spans R, then there is a 'good and satisfied hierarchy' for R (see Section 4, below), and so P p R ∈ A ∩ R is bounded by the expected number of such hierarchies. A hierarchy is essentially a way of breaking up the event R ∈ A ∩ R into a bounded number of disjoint (and relatively simple) events, and so, by the van den Berg-Kesten inequality, the probability that a good hierarchy is satisfied is bounded by the product of the probabilities of these events (see Lemma 5). Moreover, we shall show that the number of good hierarchies is small (see Lemma 4), and so it suffices to give a uniform bound on the probability that a good hierarchy is satisfied.
To prove such a bound, the key step is to determine precisely the probability of 'crossing a rectangle' R (see Section 5), that is, the probability that there is a path in [A ] across R in direction 1, where A = (A ∩ R ) ∪ {x : x 1 a 1 − 1}. This is the most technical part of the paper, and we give a proof quite different from (and somewhat simpler than) that of the corresponding statement in [6]. One of the key steps is to partition R into pieces S j of bounded width, and study the probability of crossing S j , under the coupling in which all elements of R \ S j are already infected. In particular, our method allows us to avoid the use of Reimer's Theorem, which was a crucial tool in [6]. The required bound then follows (see the proof of Theorem 17 in Section 7) using some basic analysis, which generalizes results from [31] to higher dimensions (see Section 6).
Having proved Theorem 2, we then deduce Theorem 1 using the method of Cerf and Cirillo [13], once again suitably generalized (see Section 8). Let G = C([n] d × [k] , r), and let A ⊂ G be chosen randomly with density The first step is to observe that if A internally spans G, then there exists a connected set X with X ⊂ [A ∩ X] and log n diam(X) = m 2 log n (see Lemma 26). We consider the smallest cuboid R containing X, and partition it into sub-cuboids L j of bounded width (along its longest edge). Now, we perform the bootstrap process in each L j , under the coupling in which every vertex of R \ L j is already infected; under this coupling, the bootstrap structure on L j becomes isomorphic to C([m] d−1 × [k] +1 , r − 1), and so we can apply the induction hypothesis. (In fact the situation is more complicated than this (see Theorem 27), but we leave the details until Section 9.) By counting the expected number of minimal paths across R (see Lemma 24), we deduce that the probability that R is crossed by [A ∩ R] is at most n −d−ε , and hence with high probability there is no such connected set X in G, in which case the set A does not percolate, as required.

Hierarchies
In this section we shall recall (from [6] and [31]) the definition and some basic properties of a hierarchy of a rectangle R. All of the results in this section were first proved by Holroyd [31] for [n] 2 , and generalized to C([n] d × [k] , r) in [6]. We refer the reader to those papers for detailed proofs, and note that although our definition of A is slightly different from that in [6], the proofs all work in exactly the same way.
We begin by defining a hierarchy of a rectangle in C( , 2). A hierarchy H of R is an oriented rooted tree G H , with all edges oriented away from the root ('downwards'), together with a collection of rectangles , 2), one for each vertex of G H , satisfying the following criteria: (a) The root of G H corresponds to R. (b) Each vertex has at most + 2 neighbours below it.
Given two rectangles S ⊂ R, we write D(S, R) for the event (depending on the set A ⊂ R) that i.e., the event that R is internally spanned by A ∪ S. Note that the event D(S, R) depends only on the set A ∩ (R \ S), and let P p (S, R) := P p D(S, R) .
We say a hierarchy occurs (or is satisfied by a set A ⊂ R) if the following events all occur disjointly.
(e) If u is a seed, then R u is internally spanned by A.
A hierarchy is good for (T ,Ẑ) ∈ R 2 if it satisfies the following.
(j) u is a seed if, and only if, short(R u ) Ẑ .
In our application we shall takeT = T /p 1/(d−1) andẐ = Z/p 1/(d−1) for some (small) constants T, Z > 0. The definition above is useful because of the following lemma, which says that if A internally spans R, then there is a good hierarchy which is satisfied by A. Our definition of the span A of the set A is motivated by the proof of this lemma (see [6] for more details).
, 2) be a rectangle. Suppose that A internally spans R. Then there exists a good (for (T ,Ẑ)) and satisfied hierarchy of R.
GivenT ,Ẑ > 0, let H(R,T ,Ẑ) denote the collection of hierarchies for R which are good for the pair (T ,Ẑ). The next lemma makes the straightforward (but crucial) observation that there are only 'few' possible hierarchies. Finally, we state the following key lemma, which gives us our fundamental bound on the probability that A percolates. The lemma follows easily from Lemma 3 and the van den Berg-Kesten Lemma (see Lemma 20 of [6] or Section 10 of [31]). Recall that P p (R) denotes the probability that a rectangle R is spanned by a set A ∼ Bin(R, p).

Crossing a rectangle
In this section we shall bound from above the probability that a rectangle R is 'crossed' by a set A ∼ Bin(R, p). Our bound (see Lemma 6, below) is a generalization of Lemma 21 of [6], but the proof will be somewhat simpler than that given in [6]; in particular, we shall avoid using Reimer's Theorem. We refer the reader also to the paper of Duminil-Copin and Holroyd [18], where similar ideas are used.
We begin by fixing integers d, ∈ N, with d 2. In order to save repetition, we shall keep these values fixed throughout the section.
, 2), where n and k will be chosen later.
Then there is path in [A ] across R in direction j.
We write H →(j) (R) for this event, and define H ←(j) (R) (the event that R is right-to-left crossed by A) similarly, with x j a j − 1 replaced by x j b j + 1. As in [6], we shall bound from above the function Lemma 6. Let d, ∈ N and B, δ > 0. If k ∈ N is sufficiently large then the following holds. Let p > 0 be sufficiently small, and let R be a rectangle in Then, for any t ∈ N, where u 1 (R) = u d i=2 a i . As mentioned above, the strategy we shall use to prove this lemma differs from that in [6]. Instead of directly looking at the probability of this rectangle being left-to-right crossed, we will rather study the probability that a rectangle S with dimensions (s, a 2 , . . . , a d ), and with r(x) decreased by one for each x ∈ S with x 1 = s, is crossed from left to right in direction 1, with s large but constant (so in particular s a 1 ). Having proven an essentially sharp estimate for this rectangle, we shall be able to extend this bound to any length a 1 , by splitting the large rectangle into rectangles of width s.
This point of view has the following advantage: it allows us to study the structure of the bootstrap process under the assumption that no two sites in A ∩ S are close to one another. In order to do so, we introduce the following slight generalization of the structure , 2). It corresponds to (or, more precisely, may be coupled with) the process inside the rectangle S when everything in R \ S is already infected.
Given vectors m ∈ N d−1 and k ∈ N +1 , we define C([m] × [k], 1) to be the bootstrap structure such that , with examples of blockers and edges. Observe that the d − 1 first dimensions are not depicted: each 'unit' square is a set M x .
We remark that in our applications, we shall take k 2 = · · · = k +1 = k, and k 1 = s, where k is much larger than s.
To study this structure, we slice the set S = [m] × [k] into sets M x , see Figure 1, where we define a boundary edge (or simply an edge) of S to be the union of . We shall need the following generalization of the notion of blockers from [6]. Let e i = (0, . . . , 0, 1, 0, . . . , 0) denote the vector with a single 1 in position i.
in the definition above is replaced by the event Note that L-blockers are so-named because of their shape; L is not a variable. The following lemma is purely deterministic.
Suppose that there is a path in [A] across S in direction j, for some d j d + . Then one of the following holds: Proof. Suppose that none of the three events holds; that is, A ⊂ S does not contain two sites at distance at most two from one another, and for every b ∈ C(k), the boundary edge E (j) b is blocked, and the boundary edges {E We define a setŜ as follows (see Figure 1). For each b ∈ C(k) and i ∈ [ + 1], let For each b ∈ C(k), let Proof of claim. We first claim that the sets S b are pairwise at distance at least three. Indeed, let b, b ∈ C(k), and suppose that there exists some , which is a contradiction, and hence the sets S b are pairwise at distance at least three, as claimed.
Suppose that [Ŝ ∪ A] \Ŝ ∪ A is non-empty, and consider the first new site v to be infected. It has at most one neighbour inŜ, by the previous observation, and at most one neighbour in A, since A does not contain two sites at distance at most two. Thus v must have threshold at most two, and hence it belongs to an edge, E By the definition of a blocker, these vertices have no element of A \Ŝ as a neighbour, and so it must have threshold one. But if v has threshold one, then v ∈ M b for some b ∈ C(k), and since v ∈Ŝ (by assumption), it follows that S b is empty, and that M b is a blocker for E It follows immediately from the claim that [A] ⊂Ŝ ∪ A. But there is no path inŜ ∪ A across S in direction j, since the rectangles S b are pairwise at distance at least three, and the elements of A are pairwise at distance at least three. The lemma follows.
We shall need the following bound on the probability that an edge is blocked.
, and let j ∈ [ + 1] and p > 0. Then In order to prove Lemma 8, we shall need the following lemma from [6], which is easily proved by induction on m. Given , m ∈ N, consider some sequence of events Lemma 9 (Lemma 6 of [6]). Let , m ∈ N, let u ∈ (0, 1), and suppose that each event in the set occurs independently with probability u.
Let L(m, u) denote the probability that there is no L-gap in E. Then where β +1 (u) is the function defined in the Introduction.
Proof of Lemma 8. Assume without loss that b j = 1, and let For each y ∈ [k j ], consider the following events: Note that the events F 1 (y) and F 2 (y) are independent.
Suppose that E The lemma now follows from Lemma 9, applied to the events U x(t) and V (i) and similarly for F 2 (y), and hence is not fully blocked then either the event F 1 (k j /3) − 1 or the event F 2 (2k j )/3 occurs, and so as claimed.
The following upper bound on the probability of X S j (A) follows easily from Lemmas 7 and 8.
Lemma 10. Let B > 0 and d, k, ∈ N, with d 2. If p > 0 is sufficiently small then the following holds. Let j ∈ [ + 1], and let m ∈ N d−1 and k ∈ N +1 , with m i B/p 1/(d−1) , 2 < k j k/6 and k/2 k i k for each i = j. Then then either A contains two sites within distance two, or one of the boundary edges in direction j is not blocked, or one of the other boundary edges is not fully blocked. The probability that A contains two sites at distance at most two is at most There are 2 boundary edges in direction j, so, by Lemma 8, the probability that one of them is not blocked is at most Finally, there are at most 2 boundary edges not in direction j, so the probability that one of them is not fully blocked is at most by Lemma 8, and since k i k/2 for every i = j. Since p was chosen sufficiently small, and k j k/6 and k j > 2 , it follows that as required.
We can now deduce Lemma 6 from Lemma 10.
Proof of Lemma 6. The lower bound is straightforward, and follows by Lemma 7 of [6], and the second inequality is immediate from the definition. We shall prove the upper bound. Let R be a rectangle as described in the lemma, so We are required to bound from above the probability that there is a path in Let s = k/10, m = (a 2 , . . . , a d ) and k = (s, k, . . . , k) ∈ N +1 , and assume for simplicity that s divides a 1 . We partition R into M = a 1 /s blocks Proof of claim. Let x be a vertex of B j , so x = (y 1 , x 2 , . . . , x d , y 2 , . . . , y +1 ), where x j ∈ [m j ] and y j ∈ [k] for each j 2, and (j − 1)s + 1 y 1 js. Observe that x has at most one neighbour in A \ B j , since such a neighbour must differ from x in direction 1. Moreover, 'internal' vertices of B j (those with y 1 ∈ {(j − 1)s + 1, js}) have no neighbours in A outside B j .
In Hence, by Lemma 10, and recalling that M = a 1 /s and |J | t, In the final inequality we used the fact that a i B/p for each i = 1, so β +1 u 1 (B j ) is bounded away from 1 (as a function of B, d and ). Hence since s = k/10 is sufficiently large. This proves Lemma 6.

Analytic tools
In this section we shall extend the analytic tools used by Holroyd [31] to the d-dimensional setting. We remark that the results of this section, together with the method of [31], are sufficient to prove Theorem 1 in the case r = 2.
The following line integral was introduced in [31] in the case d = 2. Let R + denote the (strictly) positive reals. Given any function f : R + → R + , and a, b ∈ R d + , define where the infimum is taken over all piecewise linear, increasing paths from a to b in R d + (see Section 6 of [31]). Moreover, for any two rectangles The aim of this section is to prove the following two propositions, which will allow us to deduce Theorem 2 from Lemmas 5 and 6. The first is a generalization of Lemma 37 of [31].
Proposition 11. Let n, d, k, ∈ N, letT ,Ẑ, p > 0, and let The rectangle S(H) is called the pod of the hierarchy H. In order to understand this statement, ignore the final (error) term, and observe that the lemma gives us a lower bound on the sum a large number of small line integrals (which correspond to events D(R v , R u ) in the hierarchy).
The next result, which is a generalization of Proposition 14 of [31], shows that, if there are not too many big seeds, then this lower bound is exactly what we want. It will follow from the fact that the line integral W g (a, b) is minimized by following the main diagonal as closely as possible.
Given a vector x ∈ R d , we shall write ∆(x) = max j {x j }. Given two vectors a, b ∈ R d , we shall write a b if a j b j for each j ∈ [d], and a < b if a j < b j for every j ∈ [d].
Proposition 12. Let d, ∈ N, and let a, b ∈ R d + , with a b and min j {b j } = b i . Then Remark 2. We shall use the following simple properties of the function g k (z) defined in the introduction: g k (u) is decreasing, convex and continuous, and We shall prove Propositions 11 and 12 using a discretization argument. Given a function f and a path γ in R d + , we shall write so that W f (a, b) = inf γ : a→b w f (γ). We begin with a simple observation.
Observation 13. Let f : R + → R + be continuous, and let a, b ∈ R d + with 0 < a b. For every ε > 0, there exists a piecewise linear, increasing path γ ε from a to b, with each linear piece parallel to one of the axes, and all of equal length, such that The following lemma is a generalization of Lemma 18 of [31].
Lemma 14. Let f : R + → R + be continuous and decreasing, and let a, b, , c), is not true in general, even in two dimensions. To see this, consider for example the triple a = (1, 1), b = (B, 1) and c = (B, B), and let B 1.
Proof. The proof will be by induction on d. When d = 1 it is trivial, since there is a unique path from a to c, which passes through b.
Let d 2, and assume that the result holds for d − 1. Let ε > 0, and let γ ε be the path from a to c given by Observation 13. In other words, γ ε is piecewise linear and increasing, with each linear piece parallel to one of the axes, and all of equal length, and Now consider the first point v on γ ε such that v b, and observe that v j = b j for some j ∈ [d]. Assume that j = 1, let a = (b 1 , a 2 , . . . , a d ), and observe that a , b and v all live in the same (d − 1)-dimensional hyperplane. Hence, by the induction hypothesis, it follows that W f (a , v) W f (b, v). Now, let γ 1 denote the section of γ ε between a and v, and let γ 2 denote the section from v to c. Consider the path δ 1 from a to v obtained from γ 1 by projecting onto the hyperplane x 1 = b 1 . Observe that each linear piece which is parallel to the x 1 -axis disappears, and each other piece retains its length and direction, and has its x 1 -coordinate increased. Since f is decreasing, it follows that w f (δ 1 ) w f (γ 1 ).
Finally, let δ ε denote the path from a to c obtained by conjoining the paths δ (from b to v) and γ 2 (from v to c). By the observations above, we have by our choice of γ ε . Since ε > 0 was arbitrary, the lemma follows.
We are now ready to prove Proposition 11. The proof is exactly as in [31], except we need to replace Lemma 18 of [31] with Lemma 14, above. For completeness, we sketch the proof.
Proof of Proposition 11. Let f : R + → R + be continuous and decreasing; the lemma holds for any such function. The key step is a d-dimensional version of Proposition 15 of [31], which states the following: for every a, b, c, d ∈ R d + with a b and c d, and every x, Z ∈ R + and r ∈ R d + with b, d r b + d + (x, . . . , x), x < Z and r (2Z, . . . , 2Z), there exists s ∈ R d + with s a + c such that This statement for d = 2 follows by Propositions 12 and 13 and Lemmas 17 and 18 of [31]. The first three generalize easily to the d-dimensional setting; in fact they are easy consequences of the fact that f is decreasing. The last follows for general d by Lemma 14. Finally, we prove Proposition 12. In this case the proof does not follow by the method of [31], which was via an application of Green's Theorem in the plane. We shall discretize and apply Lemma 15. Given two piecewise linear paths γ and γ in R d + , we say that γ is a permutation of γ if it is obtained by permuting the linear pieces of γ.
The following lemma allows us to permute adjacent linear pieces in order to move the path closer to the main diagonal.
Lemma 15. Let f : R + → R + be convex, let a ∈ R d + and b ∈ R + , and set b = a + be 1 and c = a + be 2 . Suppose that a 1 a 2 . Then Proof. This follows easily from the definition. Since f convex, we have for any x, y, z ∈ R with x y. Thus Proof of Proposition 12. Let f : R + → R + be continuous and convex; the result will hold for any such function. Recall that a, b ∈ R d + with a b, and assume without loss of generality that b 1 . . . b d . We require a lower bound on W f (a, b). Let B = b d = ∆(b) and let b = (B, . . . , B).
It is easy to see that simply choose a path which grows in direction 1 first, then direction 2, and so on), and so the proposition will follow from the statement Let ε > 0, and let γ be a path from a to b = (B, . . . , B) given by Observation 13. Thus γ is piecewise linear and increasing, with each linear piece parallel to one of the axes, and all of equal length, and w f (γ) W f (a, b ) + ε. Let δ > 0 denote the length of each piece of γ, and note that we may choose δ as small as we like.
We claim that there exists a permutation γ of γ which passes within ∞ -distance δ of every point of the straight line between (A, . . . , A) and (B, . . . , B), such that This follows by Lemma 15. Indeed, let γ be chosen to minimize w f (γ ) over all permutations of γ. Assume, without loss of generality, that a 1 a 2 . . . a d , and consider the piecewise linear path ζ, given by (a 1 , . . . , a d ) → · · · → (a j , . . . , a j , a j+1 , . . . , a d ) → · · · → (a d , . . . , a d ) → (B, . . . , B), where x → y means that ζ follows a straight line between x and y. By Lemma 15, we can choose γ to be the permutation which follows ζ as closely as possible. The second inequality follows because f is continuous, and we chose δ > 0 sufficiently small. Putting the pieces together, we have Since ε > 0 was arbitrary, the result follows.
To finish the section, we prove the following simple property of λ(d, r), which will be useful in Section 7.

Proof of Theorem 2
In this section we complete the proof of Theorem 2. We shall follow the basic method of Holroyd [31] (see also Sections 4.3 and 4.4 of [6]), but we shall need some new ideas here also. Theorem 2 will follow easily from the following theorem (see Corollary 23).
Theorem 17. For every d, ∈ N with d 2, and every ε > 0, there exists B 0 > 0 and k 0 : N → N such that the following holds for every B B 0 and every k k 0 (B). Let , 2), and let p > 0 be sufficiently small. Let R ⊂ V (G) be a rectangle with long(R) = B/p 1/(d−1) . Then We begin by bounding the probability that a rectangle grows sideways by T /p 1/(d−1) . Let R ⊂ R be rectangles in C([n] d × [k] , 2), and recall from Section 4 that P p (R, R ) = P p D(R, R ) , where D(R, R ) denotes the event that R is internally spanned by (A ∪ R) ∩ R .
We shall deduce the following lemma from Lemma 6. We refer the reader to [28] (see Lemma 5) where a similar trick is used.
Proof. For each direction j ∈ [d], let R < j denote the rectangle {x ∈ R : x j < y j for all y ∈ R}, and similarly let R > j denote the rectangle {x ∈ R : x j > y j for all y ∈ R}. Write R j = R < j ∪ R > j , and let C = i<j R i ∩ R j denote the corner areas of R \ R. Finally, let W = A ∩ C, and let t = |W |.
If the event D(R, R ) occurs, then clearly the events H →(j) (R > j ) and H ←(j) (R < j ) must also occur for each j ∈ [d]. Hence, . The idea is that, since T may be chosen small compared with Z (and also B, d, k, ), it is likely that |A ∩ C| will be small compared with s j , and so the events H →(i) (R > i ) and H ←(j) (R < j ) are 'almost independent'. To be precise, by Lemma 6, and the binomial theorem, we have To estimate the error term, we use our bounds on m j and s j . Indeed, since m j Proposition 3 of [6]), and β +1 (u) u when u is small, it follows that Let T 1 = p 1/(d−1) max j {s j } T , and recall that |C| if T > 0 is chosen to be sufficiently small (with respect to d, , B, δ, k and Z). The penultimate inequality follows since we may we choose √ T δg +1 2B d−1 . In the final inequality, we used the facts that g +1 is decreasing, and that q i =j m i 2B d−1 .
We now rewrite the right-hand side of (4) in a more useful form. We shall use the following easy observation from [31].
Observation 19 (Proposition 12 of [31]). If f is decreasing, then By Observation 19 and the definition of U g +1 (R, R ), we have The following corollary of Lemma 18 is now immediate.

Corollary 20. Under the conditions of Lemma 18,
Next we bound the probability that a seed is internally spanned. Recall that φ(R) denotes the semi-perimeter of a rectangle R.
Proof. Let dim(R) = (u 1 , . . . , u d ), and suppose without loss of generality that u 1 = long(R) and u 2 = short(R). Note that if R ∈ A ∩ R , then R has no 'double gap', i.e., no pair of adjacent empty hyperplanes (see Lemma 27 of [6]). Thus, , which holds if Z > 0 is sufficiently small, as required.
Finally, we recall the following lemma from [6] (see also [2]). We are ready to prove Theorem 17.
Proof of Theorem 17. Let d, ∈ N, with d 2, and let ε > 0. We choose positive constants B, α, δ, k, Z and T (chosen in that order), with B > 0 sufficiently large, δ > 0 sufficiently small, and k, Z and T chosen so that Lemmas 18 and 21 hold. In particular, let α = dλ(d + , + 2)B, and note that Finally, we let p → 0, so that p . . . , b d ), let long(R) = B/p 1/(d−1) , and assume without loss of generality that b 1 . . . b d . By Corollary 20 and Lemmas 5 and 21, we obtain For each hierarchy H ∈ H(R,T ,Ẑ), let The theorem will follow easily from (5), Lemma 4 and the following claim.
Proof of claim. We shall consider three cases. First, suppose that H has 'many' seeds.
Case 1: In this case it is sufficient to consider only the second term in Q(H). Indeed, if p > 0 is sufficiently small, since α = dλ(d + , + 2)B, as required.
Next, suppose that R is unusually 'long and thin'. Let girth(R) = p d j=2 b j , and recall that b d = B/p 1/(d− 1) , and that B is chosen to be sufficiently large.
In this case we consider only the first term in Q(H). Let S = S(H) be the pod of H, given by Proposition 11. Note that H has bounded height (in terms of B, d and T ), and hence that |G H | is bounded (as p → 0). Hence, by Proposition 11, we have for some constant M 1 = M 1 (d, , B, Z, T ). (3), the definition of U g +1 (S, R), and recall also that φ(S) seeds φ(R u ), and that g +1 is decreasing. We obtain if B > 0 is sufficiently large. The first inequality above follows by considering growth only in direction d. For the second step, note that q d−1 j=1 b j 2 · girth(R)(b 1 /b d ), and use the upper bounds on seeds φ(R u ), girth(R) and b 1 /b d . The final inequality holds if B is sufficiently large, since g +1 (z) → ∞ as z → 0.
Thus, combining this bound with (6), we deduce that if B > 0 is sufficiently large and p > 0 is sufficiently small, as required.
Finally, we arrive at the main case.
To complete the proof of Theorem 17, recall that, by Lemma 4, for some constant M 2 = M 2 (B, T, d, ). Hence, by (5) and the claim, if p > 0 is sufficiently small. Since ε > 0 was arbitrary, the theorem follows.
We complete this section by deducing the following corollary of Theorem 17, which is the technical statement which we shall need in Section 9. Then Proof. Let ε = ε (d, , ε) > 0 be sufficiently small, let B 0 = B 0 (ε ), k 0 = k 0 (B 0 , ε ) be chosen according to Theorem 17, and write λ = λ(d + , + 2). Let n ∈ N and, noting that the probability is monotone in p, let . We shall show that P p long(R) B 1 log n for some R ∈ A n −ε .
The result will then follow, since diam([A]) max long(R) : R ∈ A .
Suppose long(R) B 1 log n for some R ∈ A . By Lemma 22, there exists an internally spanned rectangle R ⊂ R with B 1 log n 2 long(R ) B 1 log n.
There are at most (B log n) d n d n d+ε potential such rectangles R . So, writing Y (B 1 ) for the number of internally spanned rectangles R ⊂ C([n] d × [k] , 2) with (B 1 /2) log n long(R ) B 1 log n, we get It is easy to see that Corollary 23 implies Theorem 2.

The Cerf-Cirillo Method
In this section we shall recall a fundamental technique in the study of bootstrap percolation on [n] d . This technique was introduced by Cerf and Cirillo [13], and later used and refined by Cerf and Manzo [14], Holroyd [32], and Balogh, Bollobás and Morris [6]. We shall use this 'Cerf-Cirillo method' in order to prove the induction step in our proof of Theorem 1.
In order to state the main lemma of this section, we need to recall some definitions from [6]. We will be interested in two-coloured graphs, i.e., simple graphs with two types of edges, which we shall label 'good' and 'bad'. We call such a two-coloured graph 'admissible' if it either contains at least one bad edge, or if every component is a clique (i.e., a complete graph). For any set S, let Λ(S) := admissible two-coloured graphs with vertex set S × [2] . Now, given m ∈ N, let Ω(S, m) := P = (G 1 , . . . , G m ) : G t ∈ Λ(S) for each t ∈ [m] , the set of sequences of two-coloured admissible graphs on S × [2] of length m. We shall sometimes think of G t as a coloured graph on S × [2t − 1, 2t], and trust that this will cause no confusion. We shall be interested in probability distributions on Ω(S, m) in which, with high probability, there are bad edges in only very few of the graphs G t . Now, for each P ∈ Ω(S, m), let G P denote the graph with vertex set S × [2m], and the following edge set E(G P ) (see, for example, Figure 2).
Edges in G P of type (a) are labelled good and bad in the obvious way, to match the label of the corresponding edge in G y . Thus G P has three types of edge: good, bad, and unlabelled.
Such a graph G P , with S = [3] and m = 4, is pictured below. Note that, for example,    Given G ∈ Λ(S), let E g (G) denote the set of good edges, and E b (G) denote the bad edges, so that E(G) = E g (G) ∪ E b (G). If uv is a good edge in G, then we shall write u ∼ v. For each vertex v = (x, y) ∈ V (G P ), let Γ P (v) := {u ∈ V (G P ) : u ∼ v and u = v}, and let d P (v) = |Γ P (v)|. We emphasize that d P (v) is the number of good edges incident with v.
Finally, let X(P) denote the event that there is a connected path across G P (i.e., a path from the set S × {1} to the set S × {2m}). Observe that the event X(P) holds for the graph G P depicted in Figure 2.
The following lemma was first stated in [6], but the proof is due to Cerf and Cirillo [13].
Lemma 24 (Cerf and Cirillo [13], see Lemma 35 of [6]). For each 0 < α < 1/2 and ε > 0, there exists δ > 0 such that the following holds for all m ∈ N and all finite sets S with α 4 |S| ε 1. Let P = (G 1 , . . . , G m ) be a random sequence of admissible two-coloured graphs on S × [2], chosen according to some probability distribution f Ω on Ω(S, m). Suppose f Ω satisfies the following conditions: (a) Independence: G i and G j are independent if i = j, (b) BK condition: For each t ∈ [m], r ∈ N, and each x 1 , y 1 , . . . , x r , y r ∈ V (G t ), and for each t ∈ [m] and v ∈ V (G P ), (c) Bad edge condition: δ.
Then P X(P) α m |S|. (This definition is important, and is due to Holroyd [32].) The following straightforward lemma, which we shall use to bound the expected number of good edges incident with a vertex, was proved in [6].
Lemma 25 (Lemma 36 of [6]). Let n, d, k, ∈ N, with d 2, and let B > 0. There exists a constant c(B, d, k, ) such that the following holds. Given p > 0 sufficiently small, let G = C([n] d × [k] , 2) and A ∼ Bin(V (G), p). Then We shall also use the following easy lemma from [13]. Proof. Add newly infected sites one by one, and note that in each step the largest diameter of a component in [A] may jump from at most L − 1 to at most 2L − 1. Thus, at some point in the process the required set X must appear as a component.

Proof of Theorem 1
We can now prove the following generalization of the lower bound in Theorem 1 by induction on r, using the method of Cerf and Cirillo for the induction step, and with Corollary 23 and Lemma 25 as the base case.
Recall from Section 8 the definition (7) of Γ G (A, m, x), the set of vertices which are connected to x by a 'small' component which is internally filled by A. We shall show that, for appropriate values of p and m, the expected size of this set goes to zero as n → ∞.
Theorem 27. Let d, , r ∈ N with d r 2. If ε > 0 is sufficiently small, then there exist B > 0 and k 0 = k 0 (B) > 0 such that, if k k 0 and n ∈ N is sufficiently large, then the following holds. Let G = C([n] d × [k] , r), and let Then and moreover Proof. The proof is by induction on r; we begin by proving the base case, r = 2. Let B = B (2) (d, , ε) and k 0 (B) be given by Corollary 23. The first statement follows from Corollary 23, and the second follows by Lemma 25, so in this case we are done. Let r 3, and assume that the theorem holds for r − 1, for all d, ∈ N and every sufficiently small ε > 0. We shall prove the theorem with B = B (r) (d, , ε) = 1 when r 3. Fix d, ∈ N and ε > 0, let p = p(n) > 0 be as described above, and let k k 0 (d, , r, ε) ∈ N be sufficiently large.
Let G = C([n] d × [k] , r), and recall that P p (R) denotes the probability that a rectangle R ⊂ V (G) is internally spanned by A ∼ Bin V (G), p . The induction step is a straightforward consequence of the following claim.
Proof of claim. If m log (r) n then P p (R) k d+ p → 0 as n → ∞, since R must contain an element of A. So assume that m > log (r) n k, let R ⊃ R be a rectangle in G, with R ∼ = [m] d × [k] , and let t = m/k . Assume without loss of generality that dim(R) 1 = m (i.e., R has length m in direction 1), and assume for simplicity that m is divisible by k. We partition the rectangle R into blocks L 1 , . . . , L t , each of size [m] d−1 × [k] +1 . To be precise, let L j = x ∈ R : (j − 1)k + 1 x 1 jk .
Since R is internally spanned by A, there exists a path in [A ∩ R] from the set {x ∈ R : x 1 = 1} to the set {x ∈ R : x 1 = m}. We shall use Lemma 24 to show that this is rather unlikely. In order to do so we use the following coupling.
Replace the thresholds in each block L j with those of C([m] d−1 × [k] +1 , r − 1), and run the bootstrap process independently in each block. Denote by {A}(j) the closure of A ∩ L j under this process, i.e., the closure in the bootstrap structure C( The following subclaim shows that this is indeed a coupling. Proof of subclaim. Note that each vertex of L j has at most one neighbour in R \ L j , and 'internal' vertices of L j (those with x ∈ {(j − 1)k + 1, jk}) have no neighbours outside L j .
A vertex x ∈ L j originally had threshold r, and now (in the coupled system) has threshold and let x ∼ y ⇔ there exists an internally filled connected component X ⊂ {A}(j) such that x, y ∈ X and diam(X) B log n, where x ∼ y means xy is a 'good' edge, as in Section 8, and B = B (r−1) (d − 1, + 1, ε) was chosen above. Note that G j is admissible, since x ∼ y and y ∼ z in G j implies that x and z are in the same component of {A}(j), and so either x ∼ z, or xz is a bad edge. Note also that the event x ∼ y is increasing. For each set A ⊂ V (G), we have defined a sequence P := (G 1 , . . . , G t ) ∈ Ω(S, m) of admissible two-coloured graphs. We claim that the (random) sequence P satisfies the conditions of Lemma 24. Indeed, recall that m log n, so p λ(d + , + r) − ε log (r−2) m , and let ε = ε/(d + ). By the induction hypothesis (and our choice of B and k), for each j ∈ [t] we have since |S| = m d−1 k m d+ . Next, choose a function α = α(n) such that α → 0 sufficiently slowly as n → ∞, and let δ = δ(α, ε ) > 0 be given by Lemma 24. Since α(n) → 0 sufficiently slowly, and d, and ε are constants, we can assume that δ = δ(n) approaches zero arbitrarily slowly as n → ∞. Thus, by the induction hypothesis, we have for any v ∈ V (G j ), if n (and therefore m log (r) n) is sufficiently large. Moreover, we have |S| m log (r) n, so α 4 |S| ε → ∞ as n → ∞ if α(n) → 0 sufficiently slowly. By (8) and (9), it follows that conditions (c) and (d) of Lemma 24 are satisfied (for ε and δ = δ(α, ε ) as above). Condition (a) is satisfied by construction. Condition (b) follows because if x ∼ y and x ∼ y , and there are no bad edges, then either all four points are in the same internally spanned component with diameter at most B log n, or they are in different components of {A}(j). Thus, if x ∼ x , then the events x ∼ y and x ∼ y must occur disjointly, and so we can apply the van den Berg-Kesten Lemma.
Recall that X(P) denotes the event that there is a connected path across G P , and note that if R is internally spanned by A, then, by the subclaim, the event X(P) holds. Thus, by Lemma 24, we have P p (R) P X(P) α m/k (m + k) d+ as required, and the claim follows.
We shall now use the claim to prove the theorem for r. Indeed, suppose that diam([A]) log n. By Lemma 26, there exists an internally filled, connected set X with log n − 1 2 diam(X) log n − 1.
Let R be the smallest rectangle containing X, and observe that R is internally spanned by A, and that diam(R) = diam(X) log n. Since there are at most (n log n) d such rectangles, by the claim we have P p diam([A]) log n (n log n) d · α log n/3k (log n + k) d+ n −dr , if n is sufficiently large, since α(n) → 0 as n → ∞. Finally, let v ∈ V (G), and suppose that w ∈ Γ G (A, log n, v). Then there exists an internally filled connected component X ⊂ V (G) such that v, w ∈ X and diam(X) log n, and hence there exists an internally spanned rectangle R (the smallest rectangle containing X) such that v, w ∈ R and m := diam(R) log n.
There are at most m 2d rectangles with diameter m containing v, and each contains at most m d k vertices. It follows that This completes the proof of Theorem 1, since the upper bound was proved in [6], and the lower bound follows immediately from Theorem 27 in the case = 0.

Open problems
In this section we shall present three different directions for future research into the bootstrap process on the grid [n] d : extensions to higher dimensions (d = d(n) → ∞), more general update rules, and further sharpening of the thresholds. See [4,5,7,18,29] for some recent work on these questions. We expect that our proof of Theorem 1 can be extended to slowly growing functions d = d(n), and that d = Θ(log n) will be the most challenging range. The growth of the the critical droplet is very different in the ranges d = O(1) (where it grows in all directions at the same time), and d log n (where it grows in only one direction at a time), and it will be particularly interesting to see whether these are the only two possible (dominant) behaviours.
Due to some recent progress, we also know a significant amount about the process when d = r. Indeed, by Theorem 1 and the results of [5], we have sharp bounds on p c ([n] d , d) when d = O(1) and when d (log log n) 2+ε . Looking from slightly further away, we have the following theorem, which is implied by the results of [39] and [5]. There are also much simpler questions to which we have no good answer. For example, the following conjecture was made in [7]. for some c > 0, and showed also that the function p c ([n] 2 , 2) log n converges too slowly for the limit to be easily estimated. In [28], they conjectured that their upper bound is close to being tight. This conjecture was proved recently by Gravner, Holroyd and Morris [29].
Similarly tight bounds have also been proved for the hypercube when r = 2 and when r = d/2 (see [5,7]). By combining the techniques of this paper with those of [29], one might hope that sharper bounds could also be given on p c ([n] d , 2). However, it is likely to be much harder to prove such results when r 3.