Error estimates and convergence rates for the stochastic homogenization of Hamilton-Jacobi equations

We present exponential error estimates and demonstrate an algebraic convergence rate for the homogenization of level-set convex Hamilton-Jacobi equations in i.i.d. random environments, the first quantitative homogenization results for these equations in the stochastic setting. By taking advantage of a connection between the metric approach to homogenization and the theory of first-passage percolation, we obtain estimates on the fluctuations of the solutions to the approximate cell problem in the ballistic regime (away from flat spot of the effective Hamiltonian). In the sub-ballistic regime (on the flat spot), we show that the fluctuations are governed by an entirely different mechanism and the homogenization may proceed, without further assumptions, at an arbitrarily slow rate. We identify a necessary and sufficient condition on the law of the Hamiltonian for an algebraic rate of convergence to hold in the sub-ballistic regime and show, under this hypothesis, that the two rates may be merged to yield comprehensive error estimates and an algebraic rate of convergence for homogenization. Our methods are novel and quite different from the techniques employed in the periodic setting, although we benefit from previous works in both first-passage percolation and homogenization. The link between the rate of homogenization and the flat spot of the effective Hamiltonian, which is related to the nonexistence of correctors, is a purely random phenomenon observed here for the first time.


Introduction
We consider the Hamilton-Jacobi equation where the Hamiltonian H = H(p, y, ω) is level-set convex and coercive in p and depends on an element ω of an underlying probability space (Ω, F, P). If the action of translation on R d is stationary and ergodic with respect to the law of H, then, as ε → 0, the solutions u ε = u ε (x, t, ω) of (1.1), subject to appropriate initial conditions, converge P-almost surely to the solution u of the deterministic equation with the same initial conditions, where the effective Hamiltonian H is level-set convex, continuous and coercive. This fundamental theorem concerning the qualitative theory of stochastic homogenization of Hamilton-Jacobi equations was proved for convex Hamiltonians by one of the authors [32] (see also Rezakhanlou and Tarver [31]) and, more recently, by two of the authors [4] in the generality discussed here.
In this paper, we present the first quantitative homogenization results for Hamilton-Jacobi equations in the stochastic setting. Throughout the paper we assume that the Hamiltonian H = H(p, y, ω) satisfies a finite range dependence hypothesis (a continuum analogue of "i.i.d.") in its spatial dependence. This essentially means that, for some fixed distance D > 0, the values of H(p, y, ·) for y ∈ E are independent of those for y ∈ F provided that dist(E, F ) > D. (Obviously we lose no generality by taking D = 1.) By a novel integration of probabilistic and pde techniques, we (i) obtain explicit estimates showing the probability of |u ε (x, t, ω) − u(x, t)| > λ decays exponentially in λ 2 , and (ii) identify a necessary and sufficient condition for the almost sure, local uniform convergence u ε → u to proceed at an algebraic rate O(ε α ). The main results, including the precise assumptions, are stated in the next section. They essentially give the error estimates for the homogenization of (1.1), where the first inequality depends on a supplemental assumption on the law of H.
The difficulty in obtaining estimates on the fluctuations of u ε (x, t, ·) − u(x, t) is due in part to the fact that the dependence of u ε on H is highly singular. Understanding how the solutions depend on the random environment is very challenging. Difficulties of a similar nature occur, for example, in the theory of first passage percolation (see Kesten [18] and Alexander [1]) and in the study of the fluctuations of the Lyapunov exponents for Brownian motion in Poissonian potentials (see Sznitman [33] and Wüthrich [36]). As far as we know, the only previous result on the oscillations of solutions of Hamilton-Jacobi equations in random media is found in the work of Rezakhanlou [30], who gave structural conditions on H in dimension d = 1 in which a central limit theorem holds. Such phenomena are not expected to appear in any dimension d ≥ 2 (see Remark 4.3).
Our arguments rely crucially on adaptations of some of the probabilistic techniques of [18,1], which are based on Azuma's inequality and the martingale method of bounded differences. This connection between first-passage percolation and the stochastic homogenization of (1.1), made explicit for the first time in this paper (as far as we know), arises naturally from an analogy between the passage time in percolation and solutions of the metric problem (see Remark 3.2 below). Using arguments inspired from [18,1], we prove exponential error estimate and obtain rates of convergence for the homogenization of the metric problem. Then, by quantifying the new proof of homogenization recently introduced by two of the authors [4], we transform the estimates for the metric problem into error estimates for the approximate cell problem (see (1.4)

below).
The rate of convergence of periodic homogenization of Hamilton-Jacobi has been understood for some time and goes back to the work of Capuzzo-Dolcetta and Ishii [8], who proved that u ε and u differ by at most O(ε 1 3 ). The periodic setting is much simpler to understand due to the fact that the cell problem has periodic solutions; that is, exact correctors exist. A quantitative version of the classical perturbed test function proof of homogenization due to Evans [12,13] then yields the convergence rate. Our main results stated in Section 2 do not encompass the periodic or almost periodic settings, since obviously an almost periodic function cannot be embedded into the random setting in such a way that it satisfies a finite range of dependence condition. However, as we show, our arguments yield a uniform rate of convergence in the almost periodic setting (see Section 8).
In the stochastic environment, the situation is not only much more complicated but also qualitatively different from the periodic setting. It is, therefore, necessary to devise a new strategy, since the usual proof of periodic homogenization, which is based on exact correctors and can be quantified to yield a rate, does not generalize to random enviroments. Indeed, as Lions and Souganidis [22] demonstrated with an explicit example, exact correctors do not exist, in general, for stochastic Hamiltonians. The only known proofs of the qualitative homogenization of Hamilton-Jacobi equations in the stationary ergodic setting are based on an application of the subadditive ergodic theorem to certain subadditive quantities (e.g., the m µ 's below) and then showing that these quantities control, in an appropriate way, the solutions of (1.1). In order to obtain a convergence rate for the homogenization, one is therefore left with the twofold task of quantifying both the limits given by the subadditive ergodic theorem as well as the precise way in which the subadditive quantities control the solutions of (1.1). The former must necessarily be handled by probabilistic methods and the latter by pde methods.
These are not mere technical difficulties. It turns out that, in the stochastic setting, the magnitude of the fluctuations of the solutions u ε about their limit u must be separated into two distinct regimes, which we refer to as ballistic and sub-ballistic, respectively (this terminology is borrowed from the probability literature, see, for example, Sznitman [34]). Intuitively, in the ballistic regime, the solutions are able to "feel" the random environment sufficiently quickly as ε → 0. Then the mixing of the medium dominates, which results in an algebraic convergence rate. In the sub-ballistic regime, the dependence of the solutions on H is highly localized in the vicinity of points y in which H(p, y, ω) is close to its essential supremum. It is therefore the law of H(p, 0, ·) near its essential supremum that principally governs the rate at which homogenization occurs, and this rate may be arbitrarily slow without a further assumption on the law.
To give a more detailed overview of the approach, we start from the approximate cell problem δv δ + H(p + Dv δ , y, ω) = 0 in R d , (1.4) which, for each fixed p ∈ R d and δ > 0, admits a unique bounded, uniformly continuous solution v δ = v δ (y, ω ; p). The introduction of (1.4) in the context of the homogenization of Hamilton-Jacobi equations goes back to the original proof of periodic homogenization due to Lions, Papanicolaou and Varadhan [21]. It is well-known by now (see, e.g., [3]) that the homogenization of (1.1) to (1.2) is equivalent to (and the effective Hamiltonian H can be identified by) the limit lim δ→0 −δv δ (0, ω ; p) = H(p) P-a.s. (1.5) Moreover, this equivalence is easy to quantify in the sense that an error estimate or convergence rate for (1.5) can be transformed into one for homogenization. We are therefore left with the task of quantifying the limit in (1.5). The intuitive reason for the difficulty of arguing directly for the limit (1.5) in the random case is the complicated dependence of the v δ 's on H, which is both singular (information propagates only along characteristics and does not spread out) and global (information may travel far away in space, which is compounded by the lack of compactness). This problem is overcome in the stochastic setting by (i) imposing some kind of convexity assumption on H and (ii) using the subadditive structure of the metric problem (or its time-dependent analogue) to obtain an almost sure limit via the subadditive ergodic theorem. A comparison argument (introduced in [3,4]) then yields that, in the ballistic regime (p's satisfying H(p) > min H), the metric problem controls the limiting behavior of the δv δ (0, ω ; p) as δ → 0. In the sub-ballistic regime, i.e., for p's belonging to the "flat spot" {H(·) = min H}, the limiting behavior of the δv δ is driven primarily by the law of H(p, 0, ·) near its essential supremum, as mentioned above, and it turns out to be the thickness of the tail of this distribution which governs the rate of homogenization.
We continue by introducing the metric problem: for each fixed µ larger than a certain constant, which turns out to be min H, x ∈ R d and ω ∈ Ω, there exists a unique nonnegative continuous solution m µ = m µ (·, x, ω) : R d → R of H(Dm µ , y, ω) = µ in R d \ {x} and m µ (x, x, ω) = 0.
(1. 6) In terms of control theory, the quantity m µ (y, x, ω) corresponds to a "cost" of transporting a particle from x to y in the medium ω. It thus has the properties of a metric and is analogous to the time constant in first-passage percolation (see Remark 3.2). The m µ (·, x, ω)'s are the maximal subsolutions of H(Dw, y, ω) ≤ µ in R d subject to w(x) = 0, and this implies a subadditivity property. An easy application of the subadditive ergodic theorem (see [4]) then yields the existence of m µ ∈ C(R d ) such that lim t→∞ t −1 m µ (ty, 0, ω) = m µ (y) P-a.s. The existence and some basic properties of the m µ (·, x, ω)'s have been known for some time (see, for example, Lions [20]). More recently a simple comparison argument was introduced in [4] which demonstrated that the m µ (·, x, ω)'s control the δv δ (·, ω ; p)'s from below for every p ∈ R d , and from above for p's in the ballistic regime. It follows from this analysis that the limit (1.7) implies the homogenization of (1.1). As we show here, this argument is constructive in the sense that a quantitative rate for the convergence of (1.7) implies a rate for (1.5).
The main advantage of the metric problem is that it is localized. Indeed, while changes in the medium may influence the value of δv δ at far away points, the quantity m µ (y, x, ω) depends only on the values of H(p, z, ω) for z's satisfying m µ (z, x, ω) ≤ m µ (y, x, ω), which is a bounded set with diameter proportional to |y − x|. This localization (see Lemma 3.4 and (3.25) below) permits us to use the independence of the medium by way of the martingale method of bounded differences and an application of Azuma's concentration inequality in a similar manner as in first-passage percolation [18].
In addition to quantifying the limit in (1.7) and hence in (1.5) for p's in the ballistic regime (and obtaining an almost sure, algebraic rate of convergence), we identify a necessary and sufficient condition (see (2.11) below) for an algebraic rate of convergence to hold for p's in the sub-ballistic regime. This follows from a direct analysis of the v δ 's using explicit comparison arguments. Merging the results for the ballistic and sub-ballistic regimes then yields, under assumption (2.11), an algebraic rate for (1.5) for all p's.
We remark that we do not expect our arguments to yield sharp error estimates or convergence rates for homogenization. Indeed, as we explain in Remark 4.3, this is related to outstanding conjectures on the fluctuations of the time constant in first passage percolation. It is likely that the exponent in the rate of convergence convergence improves in higher dimensions, as is expected in first-passage percolation, although proving a rigorous statement to this effect seems out of reach.
As mentioned above, the periodic homogenization of Hamilton-Jacobi equations was proved in [21]. This was simplified in [12] and subsequently extended to almost periodic media by Ishii [17]. The stochastic homogenization of convex first-order Hamilton-Jacobi equations was first proved in [32,31] and, for viscous convex Hamilton-Jacobi equations, by Lions and Souganidis [23] and Kosygina, Rezakhanlou, and Varadhan [19]. Lions and Souganidis [22] obtained results on the existence and nonexistence of correctors in the random setting and introduced in [24] a more direct proof of homogenization in probability. Later, more direct proofs of almost sure homogenization, based on the metric problem, were given in [3,4].
The metric problem has been used by Davini and Siconolfi [10,11] to study some connections between the stochastic homogenization of Hamilton-Jacobi equations and weak KAM theory and, for periodic H's with special structure, by Oberman, Takei and Vladimirsky [29] and Luo, Yu and Zhao [25] in order to implement efficient numerical schemes for computing H.
Outline of the paper. In the next section we give the precise assumptions and the statement of the main results. In Section 3 we review some preliminary results needed in our arguments. Controlling the fluctuations of the metric problem is the topic of Section 4 and in Section 5 we control its statistical bias. These estimates are combined with comparison arguments in Section 6 to obtain corresponding bounds for the approximate cell problem in the ballistic regime. The subballistic regime is studied in the second part of Section 6, where we produce error estimates under an auxiliary hypothesis on the law of H as well as examples demonstrating that, without such a hypothesis, the rate may be arbitrarily slow. We complete the proof of the error estimates in Section 7 and give convergence rate for the homogenization of the time-dependent problem (1.1). Finally, in Section 8 we discuss the convergence rates of the homogenization of (1.4) and (1.1) in almost periodic media. In the appendices we summarize the fundamentals of the metric and approximate cell problems.
Notation and conventions. The symbols C and c denote positive constants which may vary from line to line and, unless otherwise indicated, depend only on the assumptions for H and other appropriate parameters (often an upper bound for |p| or µ). For s, t ∈ R, we write s ∧ t := min{s, t} and s∨t := max{s, t}. We denote the d-dimensional Euclidean space by R d , Q d is the set of elements of R d with rational coordinates, N is the set of natural numbers and N * := N \{0}. For each y ∈ R d , |y| denotes the Euclidean length of y. If E ⊆ R d , then |E| is the Lebesgue measure of E, int E the interior of E, E the closure of E and conv E the closure of the convex hull of E. For r > 0, we set B(y, r) := {x ∈ R d : |x − y| < r} and B r := B(0, r). The distance between two subsets U, If K is a finite set, then |K| is the number of elements of K. The set of Lipschitz functions on a set U ⊆ R d is written Lip(U ) = C 0,1 (U ) and we set L := Lip(R d ). The set of bounded and uniformly continuous real-valued functions on a metric space Y is denoted BUC(Y ), and USC(Y ) and LSC(Y ) are respectively the sets of real-valued upper and lower semicontinuous functions on Y . The Borel σ-field on R d is B. If G 1 and G 2 are σ-fields on sets X 1 and X 2 , respectively, then G 1 ⊗ G 2 denotes the σ-field on X 1 × X 2 generated by G 1 × G 2 . For a probability space (Ω, F, P), we say that an event A ∈ F is of full probability if P[A] = 1. We denote the indicator random variable of A ∈ F by 1 A . If X is a random variable and G ⊆ F is a σ-field, then E [X|G] denotes the conditional expectation of X with respect to G.
Throughout the paper, all differential inequalities are taken to hold in the viscosity sense. Readers not familiar with the fundamentals of the theory of viscosity solutions may consult standard references such as [9,6].

The assumptions and the statement of the main results
We introduce our hypotheses and state the main results of the paper.
2.1. The hypotheses. Let (Ω, F, P) be a probability space endowed with a group (τ y ) y∈R d of Fmeasurable, measure-preserving transformations τ y : Ω → Ω. That is, we assume that, for every x, y ∈ R d and A ∈ F, We write H = H(p, y, ω) and require that H be stationary in its dependence on (y, ω) with respect to the translation group (τ y ) y∈R d , that is, we assume that, for every p, y, z ∈ R d and ω ∈ Ω, H(p, y, τ z ω) = H(p, y + z, ω).
In order to state the finite range of dependence assumption, which means roughly that H is "i.i.d." in its spatial dependence, we define, for each V ∈ B, the following σ-algebra on Ω: We may also suppose without loss of generality that F = G(R d ). The finite range dependence hypothesis is then the requirement that, for every V, W ∈ B, Of course, this implies that the group (τ y ) y∈R d is ergodic, but is much stronger.
We continue with other structural hypotheses on H. We assume, for each R > 0, that the family and We also require that H is uniformly coercive in p, that is, We assume that H is slightly more than level-set convex in p. Precisely, we assume that there exists Λ : R × R → R, which is nondecreasing in each variable, such that, for all µ, ν ∈ R, and that H satisfies, for all p, q, y ∈ R d and ω ∈ Ω, H 1 2 (p + q), y, ω ≤ Λ H(p, y, ω), H(q, y, ω) . (2.8) Of course, H is convex if and only if (2.8) holds with Λ(µ, ν) = 1 2 (µ + ν). We also make the following assumptions regarding the shape of the level sets of H: for every p, y ∈ R d and ω ∈ Ω, H(p, y, ω) ≥ H(0, y, ω) and ess sup ω∈Ω From the point of view of optimal control theory, the fact that there is a common p 0 for all ω at which H(·, 0, ω) attains its minimum provides some "controllability", i.e., upper and lower bounds on the length of optimal paths. We loose no generality by assuming p 0 = 0 and ess sup ω∈Ω H(0, 0, ω) = 0. From our point of view, (2.9) controls the growth of the m µ 's (see (3.8) below). With the exception of Section 8, the hypotheses (2.1)-(2.9) described above are in force throughout the paper. For ease of reference, we write (2.10) Some of our results are proved under an extra assumption on the distribution of H(0, 0, ·) near its maximum. Precisely, this extra hypothesis is that there exist θ ≥ 0 and c > 0 such that, for every 0 < λ ≤ c, In light of (2.9), we see that, roughly speaking, (2.11) is a requirement that the event that H(0, 0, ·) is near its maximum is not too unlikely. For example, if H(0, 0, ·) attains its maximum on a set of positive probability, then of course (2.11) holds for θ = 0. Throughout the paper, the quantification "for every ω ∈ Ω" is used exclusively for deterministic statements. For assertions which holds P-almost surely (abbreviated as P-a.s. ) we may write, for example, "for every ω ∈ Ω 1 " where Ω 1 ∈ F is a specified event of full probability, i.e., P[Ω 1 ] = 1.

2.2.
The main results. Our first main result consists of error estimates for the limit (1.7), which measure the likelihood that the quantity |m µ (y, 0, ω) − m µ (y)| is large relative to |y|. The definition and basic properties of the metric problem (1.6) and its solutions m µ and m µ are reviewed in the next section. The proof of Theorem 1 is completed in Section 5.
Theorem 1 (Error estimates for the metric problem). Assume (2.10) and fix K > 0. Then there exists C > 0, depending only on K and H, such that, for every 0 < µ ≤ K, λ > 0 and |y| > 1, and, if (2.14) The error estimates for the metric problem and a careful quantification of the comparison arguments introduced in [4], together with an analysis of the convergence on the flat spot under the additional assumption (2.11), yield the following error estimates for the limit (1.5). The basic properties of the solutions v δ of the approximate cell problem (1.4) are outlined in the next section.
Theorem 2 (Error estimates for the approximate cell problem). Assume (2.10) and fix K > 0. There exists C > 0, depending only on K and H, such that, for every |p| ≤ K, we have: By a covering argument and an application of the Borel-Cantelli lemma, the error estimates contained in Theorem 2 yield P-almost sure, local uniform rates of convergence for the limit (1.5).
Theorem 3 (A convergence rate for the approximate cell problem). Assume (2.10) and fix K > 0. Then there exists an event Ω 1 ∈ F of full probability and a constant C > 0, depending on K and H, such that, for every |p| ≤ K and ω ∈ Ω 1 , the following hold: The previous two results are proved in Section 6, where we also give a converse to Theorem 3(iii), which states that the extra assumption (2.11) is actually necessary for an algebraic rate of convergence to hold at p = 0. Indeed, keeping in mind that our assumptions imply that H(0) = 0, we prove in Proposition 6.7 roughly that, if (2.11) is false, then for every exponent η > 0, It is therefore necessary to impose, in addition to (2.10), some assumption on the distribution of H(0, 0, ·) near its essential supremum in order to obtain a rate for the limit (1.5) at p = 0. We next present our main quantitative results for the homogenization of (1.1). Here u ε and u denote, respectively, the unique solutions of (1.1) and (1.2) subject to the initial condition u ε (·, 0) = u(·, 0) = u 0 ∈ C 0,1 (R d ), which are bounded and Lipschitz continuous on R d × [0, T ] for each T > 0. We begin with exponential estimates for the probability that |u ε (x, t) − u(x, t)| is large.
Theorem 5 (Convergence rate for homogenization). Assume (2.10) and fix K > 0. Then there exists an event Ω 2 ∈ F of full probability and a constant C > 0, depending on K and H, such that, for every ω ∈ Ω 2 and u 0 ∈ C 0,1 (R d ) with u 0 C 0,1 (R d ) ≤ K, the following hold: (2.29) (ii) If (2.11) holds, α and β are as in (2.22) and we set , (2.30) then, for every T ≥ 1, Remark 2.1. We discuss later the sharpness of the exponent a and b. Let us point out for the moment that, in the special case that H is positively homogeneous of order one in p, i.e., for every t ≥ 0, p, y ∈ R d and ω ∈ Ω, H(tp, y, ω) = tH(p, y, ω), (2.32) condition (2.11) is clearly satisfied for θ = 0, and thus Theorem 5 gives a rate of O ε 1 8 | log ε| 3 16 for homogenization. Moreover, this rate can be improved since (2.32) implies that H is also positively homogeneous of order one, or equivalently that µ → m µ (y) is positively homogeneous of order one, that is, for all µ > 0, x, y ∈ R d and ω ∈ Ω, m µ (y, x, ω) = µm 1 (y, x, ω).
Thus the fluctuations of m µ − m µ are proportional to µ, and this prevents (2.14) from degenerating as µ → 0. Indeed, we find that (2.14) holds for every λ > 0 satisfying λ ≥ C 2 µ|y| 2 3 (log(1 + |y|)) 1 2 instead of the more restrictive (2.13). This improvement may be propagated through the rest of the paper to find that (2.23) holds for α = 1 3 and β = 1 2 , and (2.31) for a = 1 5 and b = 3 10 . A similar observation holds for Hamiltonians which are positively homogeneous of any positive order and we expect that other such improvements are possible for H's with special structure.

Explicit examples.
We illustrate the assumptions with two simple but typical classes of Hamilton-Jacobi equations: and H 2 (p, y, ω) = a(y, ω)|p|.
The former arises in problems in the calculus of variations and geometric optics, for example, and the latter in front propagation. To ensure that (2.10) is satisfied, we require a, V : R d × Ω → R to be measurable, stationary with respect to the action of the translation group, satisfy a finite range of dependence hypothesis, be uniformly continuous and bounded in the first variable (uniformly in the second variable) and nonnegative. We also require and that a(·, ω) is Lipschitz uniformly in ω and bounded below by a positive constant. Observe that the more restrictive condition (2.11) is satisfied by H 2 and for H 1 is equivalent to the existence of constants θ > 0 and c > 0 such that, for every 0 < λ ≤ c, It is relatively easy to construct random potentials which do not satisfy (2.34): see Subsection 6.3. We remark that the following Hamiltonian is not covered by our assumptions: Here b is a random vector field satisfying appropriate conditions, and the assumption not satisfied is (2.9). We believe it would be very interesting to develop an error analysis for stochastic homogenization for Hamiltonians like H ′ 1 not satisfying (2.9). The difficulty from the point of view of our approach is that we lose control on the rate of growth of the sublevel sets of m µ .
There are many ways of constructing of random functions like a, V and b, above. For example, one may consider a Poissonian point cloud, attach a deterministic bump function to every point and sum. Such a random function satisfies a finite range of dependence if the bump function has compact support, and is called a Poissonian potential. There are other possibilities such as "random checkerboards" and so on, but we do not discuss these here.

Preliminaries
In this section we recall the basic properties of the metric problem and the approximate cell problem and their connections to the effective Hamiltonian. We conclude by giving the statement of some results needed in the sequel.
3.1. The metric problem: basic properties. We summarize some elementary facts concerning the functions m µ , which play a central role in the rest of the paper. They are defined for each x, y ∈ R d , µ ≥ 0, and ω ∈ Ω by m µ (z, x, ω) := sup w(z) − w(x) : w ∈ L and H(Dw, y, ω) ≤ µ in R d .
(3.1) Note that, due to (2.9), the zero function belongs to the admissible class, which is therefore nonempty. Lemma A.1 and (2.6) yield that m µ (y, x, ω) is finite and, in fact, nonnegative and bounded from above by C|y − x|, for some C > 0 depending on an upper bound for µ. It is immediate from (3.1) that m µ is measurable with respect to B ⊗ B ⊗ F, since the expression on the right of (3.1) is. Moreover, from (3.1) and (2.2) we see that m µ is jointly stationary in its first two variables, i.e., for every x, y, z ∈ R d and ω ∈ Ω, m µ (y, x, τ z ω) = m µ (y + z, x + z, ω).
(3.2) Also immediate from (3.1) (and the fact that a supremum of a family of viscosity subsolutions is a viscosity subsolution, see [9,6]) that the m µ (·, x, ω)'s are global subsolutions of (A.2), i.e., for every µ ≥ 0, x ∈ R d and ω ∈ Ω, Further properties of the m µ 's are recorded in the next proposition. Detailed proofs of most of these facts can be found in [4]. In Appendix A we present sketches of the arguments.
(3.6) (iv) There exist l µ , L µ ≥ 0 satisfying, for some C, c > 0 depending only on an upper bound for µ, and There exists a constant c > 0, depending on an upper bound for µ, such that, for every 0 ≤ µ ≤ µ and x, y ∈ R d , The functions m µ can be expressed by the following representation formula due to Lions [20], which provides the above facts with a control theoretic interpretation: where C(x, y) is the set of Lipschitz curves γ : [0, 1] → R d such that γ(0) = x and γ(1) = y and J µ is the support function of the µ-sublevel set of H, given by The expression (3.14) provides us with an interpretation of m µ (y, x, ω) as measuring the "cost" of moving from the point x to the point y in the medium ω. We make no direct use of (3.14) in this paper, preferring instead to work with the maximality property (Proposition 3.1(ii)) which is equivalent to it. Nevertheless, our intuition is enriched from (3.14) and it suggests an analogy between the metric problem and first-passage percolation.
We also use the notation We continue by examining some elementary properties of the reachable set. It is useful to note that, in the particular case that x = 0 and U := {y ∈ R d : m µ (y, 0, ω) < t}, Proposition 3.1 (vi) asserts that, for every t > 0 and y ∈ R d such that m µ (y, 0, ω) ≥ t, In view of (3.8) and (3.16), we see that m µ (y, x, ω) < t for every y in the interior of R ω µ,t (x) and ∂R ω µ,t = {y ∈ R d : m µ (y, 0, ω) = t}. In fact, (3.8) and (3.16), give the following estimates for the growth rate of the reachable set: for every 0 < s < t and ω ∈ Ω, We may think of t → R ω µ,t as a "growing front," and in this interpretation (3.17) provides uniform positive lower and upper bounds on the speed of the front. In particular, for all µ, t > 0 and ω ∈ Ω, The maximality property (Proposition 3.1(ii)) can be improved for the domain U = R ω µ,t (x) \ {x} by restricting the maximum over ∂U to {x}, as stated in the following lemma. This is the crucial fact that localizes the metric problem.
It follows from Lemma 3.4 that the representation formula (3.1) may be restricted to the reachable set. Precisely, for every ω ∈ Ω, In the proof of Lemma 4.2, we require a refinement of Lemma 3.4. To this state this, we define, for every nonempty closed set K ⊆ R d , x, y ∈ K and ω ∈ Ω, and Proof. The argument is similar to the proof of Lemma 3.4. Consider the function m K µ (·, 0, ω)∧(t−ε) and extend this to R d by giving it the value (t − ε) outside of K. This is a global subsolution by Lemma A.1 and the assumption that m K µ (·, 0, ω) ≥ t on ∂K. By the maximality of m µ we deduce In light of (3.23), this completes the proof of (3.25).
We next define, for every compact set K of R d , y ∈ R d and ω ∈ Ω, and The next proposition provides representation formulas for these functions, which are needed to deduce (3.30) below. The proof (sketch) is given in Appendix A.

28)
and It is immediate from (3.28) and (3.29) that, for every compact K ∈ R d and y ∈ R d , The approximate cell problem: basic properties. We summarize the properties of the approximate cell problem δv δ + H(p + Dv δ , y, ω) = 0 in R d .
(3.31) Here p ∈ R d and δ > 0 are given parameters and v δ = v δ (y, ω ; p). We note that the assertions in this section do not depend in any way on the assumptions (2.8) or of (2.9), nor do they depend on the random parameter ω, and therefore they hold (with appropriate changes in the notation) for the Hamiltonians encountered in Section 8.
We begin with a comparison principle for (3.31), which can be reduced to Proposition 3.11, below, by an argument which perturbs the solutions by adding appropriate terms with linear growth (alternatively, a proof can be found in [9,6]).

32)
and with the differential inequality in the definition interpreted either in the viscosity or in the almost everywhere sense, as these are equivalent in our situation by Lemma A.1. It is clear that the constant function belongs to the admissible class, hence v δ (·, ω ; p) ≥ w. Similarly, the function is a bounded supersolution of (3.31), and Proposition 3.7 yields v δ (·, ω ; p) ≤ v. We have shown that, for all y, p ∈ R d , ω ∈ Ω and δ > 0, Immediate from (3.33) and (2.2) is that the v δ 's are stationary functions. That is, for all y, z, p ∈ We summarize some further properties of the v δ 's in the following proposition. The proofs, which are standard in the theory of viscosity solutions, are sketched in Appendix A. For the statements we need to define the constants Observe that K p is bounded above for |p| bounded by (2.6). It follows from this and (2.5) that Π p is also bounded above for bounded |p|.
Proposition 3.8. For every p ∈ R d , ω ∈ Ω and δ > 0, the following hold: To guess what the formulas should be, it is helpful to use the theatrical scaling: for each ε > 0, we rescale by defining If the statement of qualitative homogenization holds for this problem (here we are not being rigorous and in fact using a circular argument!), then we have where m µ should be the solution of the metric problem for the effective Hamiltonian H, that is, Now let us reverse the change of variables to write the limit (3.41) in the original scaling (we also write in terms of t = 1/ε): It is immediate from the form of this limit that m µ must be positively homogeneous. The subadditive property of the m µ 's easily translates into a subadditivity property for m µ and, therefore, m µ is convex. It follows that m µ may be written as a maximum of planes x → p · x over p belonging to some closed convex set. We deduce that, if the equation (3.42) is to hold, this convex set must be the µ-sublevel set of H: We may invert this formula to write H in terms of m µ as We next see how H may be identified as a limit of the solutions of the approximate cell problem, by a similar heuristic. As usual, it is helpful to use the theatrical scaling. With v δ (·, ω ; p) defined by (3.33) where v should be the solution of the problem But notice that we have a formula for the latter: v is a constant function, namely, v ≡ −H(p). So rewriting the limit in terms of the original scaling, we expect that, for every p ∈ R d , The strategy of the proof of qualitative homogenization from [4] consists of reversing the heuristic argument above. Here is an outline of the method, each step of which is quantified in this paper: (1) Apply the subadditive ergodic theorem to deduce that the limit (3.43) holds. The function m µ is produced in the process. In fact, it is necessary to prove a more general fact which allows the vertex of the metric problem to be more free: namely P for all y, z ∈ R d , lim sup Then define H by the formula (3.44). (2) Using the comparison principle, argue that (3.46) implies the limit (3.45), at least away from the flat spot {p : H(p) = min H}. The basic idea is to compare m µ (·, ω) to v δ (·, ω ; p) where µ = H(p). If δv δ (0, ω ; p) is found to be too large or small, then this information is translated in terms of the metric problem to yield that m µ (y, z, ω) is relatively small or large compared to m µ (y − z), for some |y|, |z| ≃ δ −1 . See (6.4) below, for example, for a quantitative version of this assertion. Meanwhile, on the flat spot, the proof is completely different and necessarily indirect: the metric problem cannot "see" the flat spot. (Indeed, note that the error estimates we obtain for m µ degenerate as µ ↓ 0, and in fact the convergence rate turns out to depend in a more delicate way on the law of H.) (3) Using that v δ is an approximate corrector, we argue that (3.45) implies the full statement of qualitative homogenization. This has been well-known for some time and also follows from a (more routine) comparison argument. A quantitative version appears in Lemma 7.1.
We next give some details about how we select the points y and z in the comparison argument in Step (2) of the outline above. These observations will be needed in Section 6. From elementary convex geometric considerations (we again refer to [4] for details) we deduce that, for every p ∈ int{q ∈ R d : H(q) = 0}, we can find a direction e so that the plane with slope p touches m µ from below at e. Precisely, setting µ := H(p), there exists e ∈ R d with |e| = 1 such that (3.47) The points y and z found in the comparison argument are chosen in such a way that y − z ≃ te for some t ≃ δ −1 . Geometrically the idea is clear: m µ is a cone, and if we look far from the origin in the direction of the vector e, then m µ starts to resemble the plane p · x. Since x → p · x + v δ (x, ω; p) should resemble the same plane, it is natural to compare it with m µ (·, y, ω). This is precisely the idea of the proof of Theorem 2 in Section 6.
3.4. Other preliminary results. To control the oscillations of solutions of the metric problem around their means, we use the "martingale method of bounded differences" based on the Azuma inequality [5]. See McDiarmind [27] and Alon and Spencer [2] for an overview of this probabilistic method, as well as a proof of Azuma's inequality, which is stated as follows.
Then τ := lim s→∞ f (s)/s ∈ (−∞, ∞] exists and, for every t > ξ, Several of our arguments rely on the comparison principle for viscosity solutions of first-order equations, typically in the following form (see [9] or [6] for a proof).

Estimating the fluctuations of the metric problem
There are essentially two steps in the proof of Theorem 1. The first is to obtain exponential error estimates controlling the fluctuations of m µ (y, 0, ·) about its mean This is the focus of this section. In Section 5, we complete the proof of Theorem 1 by estimating the difference between the deterministic quantities M µ (y) and m µ (y), which is more involved.
Throughout this section we assume that H satisfies (2.10), but we do not assume (2.11). We also fix K ≥ 1 and µ such that 0 < µ ≤ K.
(4.1) We denote by C and c positive constants depending only on K, the underlying dimension d and the assumptions for H, and which may vary from line to line. Several of our estimates depend on a lower bound for µ, and since we must keep track of this dependency, we explicitly display dependence on µ.
The goal of this section is to prove the following exponential estimate for the fluctuations of m µ (y, 0, ·).
Proposition 4.1. There exists C > 0 such that, for each λ > 0 and |y| > 1, In the proof of Proposition 4.1, below, it is useful to employ a discretization scheme which allows us to essentially condition on the identity of the reachable set R µ,t in order to apply the independence assumption (2.3) in the form of Lemma 4.2, below.
To introduce the discretization, we define, for every r > 0, Recall that K r is a compact metric space under the Hausdorff distance (c.f. Munkres [28]), which is defined by Fix a small parameter δ > 0. Then, by the compactness of K r , there exists ℓ = ℓ(δ, d, r) ∈ N and a disjoint partition We have arranged things so that, for every 1 ≤ i ≤ ℓ, and, for each A ∈ K r and 1 ≤ i ≤ ℓ, We remark that ℓ, the partition {Γ i } as well as the K i 's depend on r and δ, but for convenience we do not explicitly display this dependence.
The following lemma captures the intuitively obvious assertion that the behavior of the medium inside the set R ω µ,t , conditioned on the event that R ω µ,t ∈ Γ i (which implies, in particular, R ω µ,t ⊆ K i ) is independent of the behavior of m µ (y, K i , ω). Recall that the latter is defined in (3.26) and is independent of G(K i ) by (3.30). Roughly speaking, this statement is a pre-processed form of the independence assumption which, as we will see, is particularly well-adapted to our needs in the proof of Proposition 4.1.
Proof. It suffices to show (4.9) for A of the form where p, x ∈ R d and α ∈ R, since such events A generate F µ,t . Recall the definition of m K µ in (3.21) for nonempty and closed K ⊆ R d , the fact that m K µ is G(K)-measurable and the fact from (3.25) that, assuming 0 ∈ K, we have Due to (4.5) and (4.12), and {y : m K i µ (y, ω) ≤ t} ∈ Γ i , in view of (4.5) and (4.12). Thus the above set is empty (and in particular belongs to G(K i )) in the case that x ∈ K i . This confirms (4.9).

4.2.
Controlling the fluctuations of m µ (y, 0, ·). We proceed with the demonstration that, for large |y|, the probability that m µ (y, 0, ·) is relatively far from its mean is small. We use an argument inspired by the pioneering work of Kesten [18] in the theory of first-passage percolation, who introduced a martingale method based on Azuma's concentration inequality. We also benefit with some very elegant simplifications of the argument due recently to Zhang [37].
Notice that, unlike in percolation theory (or its continuum analogue), our Hamiltonian is not assumed to be positively homogeneous. In this generality, it is necessary to keep track of the dependence of the estimates on a lower bound for µ. We recall that, in view of Proposition 3.1(iv) and (4.1), there exist l µ , L µ > 0 such that and, for every x, y ∈ R d , l µ |y − x| ≤ m µ (y, x, ω) ≤ L µ |y − x|.
(4.14) In the control theory interpretation (see Remark 3.2), this important estimate, which we use many times below, provides upper and lower bounds on the lengths of optimal paths connecting two points x, y ∈ R d .
Combining the last two lines, we obtain (4.23) We next use the discretization scheme to estimate E [m µ (y, R µ,t , ·) | F µ,t ] by approximating the integral represented by the expectation as a sum of characteristic functions. With K i , K i and Γ i as described there, observe that, by (3.6), (4.5) and (4.14), Taking the conditional expectation of (4.24) with respect to F µ,t and applying (4.10), we get Since {Γ i } is a disjoint partition of K r , we also have, in view of (4.16), for every 1 ≤ t ≤ s ≤ T , (4.26) Multiplying (4.25) by 1 {ω : Rµ,s∈Γ j } and summing over the indices i and j yields, in light of (4.26), In the same way, after interchanging s for t and j for i, we also obtain It follows that If, for some i, j = 1, . . . , ℓ, there exists ω belonging to the event that R ω µ,t ∈ Γ i and R ω µ,s ∈ Γ j , then Using (3.9), we conclude that, for every i, j = 1, . . . , ℓ, Combining this with (4.29) and sending δ → 0 yields Finally, from (4.23), we finally get, for every 0 < s < t ≤ T , This also holds for 0 < s < t without further restriction by the second assertion of (4.18).
Step 2. We finish the argument by applying Azuma's inequality, using (4.31). Define a discrete martingale sequence X k := X hk with h := l µ L µ /(l µ + L µ ) and observe that, according to (4.31), for all k ∈ N, X k+1 − X k ≤ 2L µ . An application of Azuma's inequality (Proposition 3.9) yields, for every λ > 0 and N ∈ N, Let N be the smallest integer larger than T /h so that X N = X T = m µ (y, 0, ·) − M µ (y). It follows that, since |y| > 1 and T = L µ |y|, From (4.18), (4.32) and (4.33) we deduce for large |y|, and it is believed that the oscillations should decrease in higher dimensions. Nevertheless, it is still open in every dimension d ≥ 2 whether, for some α < 1, this quantity is bounded by O (|y| α ) as |y| → ∞. We expect that it will be similarly challenging to prove such a bound for our quantity var(m µ (y, 0, ·)), and still more difficult to find the optimal exponent for the algebraic rate of homogenization of (1.1).
In analogy with the best known variance bound in first-passage percolation, due to Benjamini, Kalai and Schramm [7], we expect that an estimate of the form var (m µ (y, 0, ·)) ≤ C µ |y| log |y| (4.34) can be proved, in dimensions d ≥ 2, by an application of Talagrand's concentration inequality [35].
In fact, as we were completing the writing of this paper, we received a new preprint by Matic and Nolen [26] who have obtained, in a slightly different setting, a bound like (4.34) for a certain class of Hamilton-Jacobi equations in special i.i.d. environments.

Estimating the statistical bias of the metric problem
Having estimated the oscillations of m µ (y, 0, ·) about its mean M µ (y) in Proposition 4.1, in order to prove Theorem 1 it remains to estimate the rate at which the means t −1 M µ (ty) converge, as t → ∞, to their limit m µ (y). On one side our task is trivial. Indeed, by (3.2) and (3.6), we have It follows from Fekete's lemma (Lemma 3.10 in the special case that ∆ ≡ 0) and (3.46) that, for every y ∈ R d , The estimate (2.12) is then immediate. In order to prove (2.14) we are confronted with the more difficult task of finding good upper bounds for M µ (y) − m µ (y), which are stated in the following proposition.
Proposition 5.1. There exists C > 0 such that, for every |y| > 1, The previous proposition provides the desired estimate for the difference between t −1 M µ (ty) and m µ (y) for large t > 0, and now the proof of Theorem 1 follows: Proof of Theorem 1. The first inequality follows from (5.2) and (4.2) and second from (5.3) and (4.2).
Proving Proposition 5.1 is the focus of the rest of this section. A typical argument for obtaining such an estimate and the strategy we use here involves approximating M µ (y) by another quantity which is superadditive (see also the discussion in Hammersley [16]). Fekete's lemma may then be applied "from the other side" to obtain an estimate on the deviation of this approximate quantity from its asymptotic limit. The desired estimate in terms of the original quantity then follows, depending on the quality of the approximation. This strategy was used by Alexander [1] in the context of first-passage percolation to obtain estimates on the deviation of the expected passage time from the limiting time constant.
In our context, it turns out to be more convenient to first obtain estimates for the difference between the quantities where H t is a given plane at a distance t from the origin. We then argue that the value of which yields good estimates for the deviation of the latter quantity from m µ (H t ). This is then transformed, using a simple geometric argument, into an estimate for M µ (y) − m µ (y) for large |y|.
Here is an illustration of the outline of key steps in the proof of Proposition 5.1: As in Section 4, we fix K > 0 and µ satisfying (4.1). The symbols C and c denote positive constants which may depend on K and H and may vary in each occurrence. 5.1. Introduction of the approximating quantity. It is difficult to work directly with statistical properties of the quantity m µ (H t , 0, ·). We consider instead an approximating quantity to which the independence assumption is easier to apply. Fix a unit direction e ∈ R d which for notational convenience we take to be e = e d = (0, . . . , 0, 1). For each t > 0, define the plane and its discrete analogue We also denote, for t > 0, the halfspaces Define, for each σ, t > 0, the quantities Below we will see that g µ,σ (t) is a good approximation of E [m µ (H t , 0, ·)] for appropriate choices of the parameter σ > 0. Since g µ,σ is a logarithm of the expectation of the sum of exponentials, it can be used naturally with the independence assumption (see the proof of Lemma 5.4, below). We begin with a technical lemma, also used many times below, which asserts that a substantial portion of the quantity G µ,σ (t) is contributed by lattice points (n, t) ∈ H t with |n| ≤ O(t). This implies in particular that G µ,σ and g µ,σ are finite.
Next we show that g µ,σ (t) gives a good upper bound for E [m µ (H t , 0, ·)] for large t and appropriate choices of σ > 0. Lemma 5.3. There exists C > 0 such that, for every t > 1 and 0 < σ ≤ 1, Proof. The upper bound in (5.12) is easy. Using (3.9), we have After taking the logarithm of both sides of this inequality, an application of Jensen's inequality and a rearrangement yield the second inequality of (5.12) with C = (d − 1) We estimate the integrand above by completing the square, i.e., and thus obtain E [exp (−σm µ (y, 0, ·))] ≤ 1 + σM µ (y) exp 1 4µ σ 2 C|y| exp (−σM µ (y)) . (5.13) Summing (5.13) over y ∈ H t ∩ B R , with R := 2(L µ /l µ )t, and applying (5.8), we get In view of (4.13), we have l 1−d µ ≤ Cµ 1−d and R ≤ Ct/µ. Using these with σ ≤ 1, t > 1, we obtain Taking logarithms, dividing by −σ and rearranging this expression yields: We can estimate the logarithm factor in the last term on the right side as follows: This completes the proof of the lower bound of (5.12) and hence of the lemma.

5.2.
The (almost) superadditivity of g µ,σ and estimates for E [m µ (H t , 0, ω)] − m µ (H t ). The next step is to prove that g µ,σ is essentially superadditive, which is summarized in the following lemma. Unlike the approach taken in [1], we do not use an abstract result like the van den Berg-Kesten inequality, which does not seem to easily apply in the continuous setting. We opt instead for a simpler "splitting technique" to apply the independence assumption more directly. A similar technique was employed by Sznitman [33]. The critical property of the m µ 's needed here, which allows us to exploit the independence of the random medium, is the dynamic programming principle. It asserts that, if every path from x to y passes through a surface, then, for some z on the surface, the cost of moving from x to y is equal to the sum of the cost of moving from x to z and from z to y. Precisely, for every open U ⊆ R d with x ∈ U and every y ∈ R d \ U and ω ∈ Ω, m µ (y, x, ω) = min z∈∂U m µ (y, z, ω) + m µ (z, x, ω) . (5.14) See Proposition 3.1 (vi).

(5.15)
Proof. Fix s, t > 1, y ∈ H s+t such that |y| ≤ R := 2(L µ /l µ )(s + t) and ω ∈ Ω. Observe that, in view of (5.14), Thus, using (3.9), (3.26) and (4.14). Applying (3.30), we conclude that In light of (2.3), these random variables are independent and thus . Returning to the discrete setting, we next claim that Indeed, it is clear from (4.14) that any z ∈ H t attaining the (implicit) minimum on the left side of (5.17) must belong to B R , and (5.17) then follows from (3.9). In a similar way, since |y| ≤ R, Combining these inequalities, we obtain Note that, if z ′ ∈ H t , then y − z ′ ∈ H s . So, in view of the definition of G µ,σ and (3.2), we have Summing over all y ∈ H t+s ∩ B R , and using Lemma 5.2 yields, in view of the definition of R, We obtain the lemma after taking the logarithm of both sides of this expression, dividing by −σ, rearranging the resulting the expression and then estimating a logarithm term in a similar way as near the end of the proof of Lemma 5.3.
We next use Lemma 3.10 to obtain a rate of convergence for the means t −1 E [m µ (H t , 0, ·)] to their limit m µ (H t ).
Since ∆ µ,σ is increasing on [1, ∞) andˆ∞ we may apply Lemma 3.10 to deduce that g µ,σ := lim t→∞ g µ,σ (t)/t exists and, for every t > 1, An easy integration by parts yields In view of the second inequality in (5.12), we also have We next claim that lim To see this, note that and, in view of (3.46) and the fact that z → t −1 m µ (tz, 0, ω) is Lipschitz uniformly in t > 0, we deduce that We now obtain (5.23) from these two lines and the dominated convergence theorem (which applies since t −1 m µ (H t ) ≤ L µ ). Combining (5.20), (5.21), (5.22) and (5.23), we obtain Multiplying by t, applying the first inequality in (5.12) and using the homogeneity of m µ yields and choosing σ := µt − 1 2 (log(1 + t/µ)) 1 2 completes the proof.

5.3.
Error estimates for M µ (y) − m µ (y) and the proof of (2.14). It is the rate of convergence of t −1 M µ (ty) to m µ (y) that we wish to estimate, not that of t −1 E [m µ (H t , 0, ·)] to m µ (H 1 ). In order to reach our desired goal, we must compare the quantities t −1 M µ (ty) and t −1 E [m µ (H t , 0, ·)]. This is accomplished in two steps. The first is to show that E [m µ (H t , 0, ·)] is very close to This yields an estimate for the difference between M µ (H t ) and m µ (H t ). The second step is to use elementary convex geometry to relate M µ (y) to the values of M µ (H) for all the possible planes H passing through y.
Lemma 5.6. There exists C > 0 such that, for every t > 1, For every z ∈ H t we have E [m µ (z, 0, ·)] = M µ (z) ≥ M µ (H t ) and thus, for every λ > 0, Applying (4.2) and using R = (L µ /l µ )t ≤ Ct/µ, we find We wish to use the expression and then apply (5.26) to the right side of (5.27), but due to the factor t d−1 on the right side of (5.26), this bound is not very helpful unless λ is large relative to t. With this in mind we fix A > 1, to be selected below, define λ 1 := At µ 2 log 1 + t µ 1 2 and then estimate the right side of (5.27) bŷ Observe that By selecting A to be a large enough constant, the last expression on the right is at most C. Combining the last two sets of inequalities with (5.27), we obtain which implies (5.24).
Lemmas 5.5 and 5.6 give an estimate on the difference of M µ (H t ) and m µ (H t ).
Corollary 5.7. There exists C > 0 such that, for every t > 1, The relationship between M µ (H t ) and M µ (y) depends on the following geometric lemma.
The previous lemma and (5.29) yield a rate of convergence for M µ (y) to m µ (y).
Proof of Proposition 5.1. The first step is to show that, for every z ∈ R d such that |z| > 1, where k > C where C is as in (5.29). Suppose on the contrary that (5.33) fails for some z ∈ R d with t := |z| > 1. By elementary convex separation, there exists a plane H with z ∈ H such that Since H is at most a distance of |z| = t from the origin, we may assume with no loss of generality that H = H s for some s ≤ t. We deduce that Using m µ (H s ) ≥ 0, M µ (H s ) ≤ L µ s and µ ≤ K, we see that by making k larger, if necessary, we may deduce that s > 1. Now (5.34) contradicts (5.29). We have proved (5.33).

(5.35)
If, on the other hand, 0 < µ 3 < |y| −1 , then we take N be the smallest integer larger than µ −1 and find that Note that in either case we have (5.3) and we have chosen N so that N ≤ (1 + µ −1 ) ∨ |y| ≤ |y|, as required.
5.4. Some further error estimates. We conclude this section with versions of (2.12) and (2.14) which hold uniformly for y ∈ B R . These estimates, which are needed in the next section, follow from Theorem 1 and a simple covering argument.
Lemma 5.9. There exists C > 0 such that, for every λ ≥ 4L µ and R ≥ 3, Proof. We may select y 1 , . . . , y N ∈ B R \ B 1 with N ≤ CR d such that B R is covered by the balls B(y j , 2). Then by (3.9) we have, for any λ > 0, According to (2.12), for each 1 ≤ j ≤ N , Since N ≤ CR d , we obtain, for every λ ≥ 4L µ , The estimate (5.39) is obtained in a very similar way from Theorem 1 and a covering argument. We omit the proof.

Error estimates for the approximate cell problem
Here we obtain estimates on the difference between −δv δ (y, ω ; p) and H(p), study the rate for the almost sure convergence lim δ→0 sup y∈B R/δ δv δ (y, ω ; p) + H(p) = 0 (6.1) and prove Theorems 2 and 3. One difficulty arises from the fact that the rate for the approximate cell problem may be very different depending on whether or not p belongs to the interior of the flat spot {H = 0}. Recall that the flat spot is never empty since, e.g., H(0) = 0 (see Appendix A). Moreover, the flat spot {H = 0} is not in general equal to {0} and may indeed have nonempty interior (see, e.g., [3]). We use the metric problem to control the −δv δ 's from above, and from below for p's away from the flat spot {H = 0}. To obtain the upper bound on the flat spot we study directly the behavior of the δv δ 's. We recall here two important deterministic (i.e., uniform in ω ∈ Ω) estimates from Section 3.2: where K p > 0 depends only on the assumptions for H and an upper bound for |p|. Note that the left and right of (6.2) are bounded for bounded |p| by (2.4).
6.1. The ballistic regime. We combine the exponential error estimates for the metric problem obtained in the previous section with a comparison argument to obtain estimates on the difference between −δv δ (y, ω ; p) and H(p). The comparison argument, which was introduced in [4] to prove homogenization, yields an estimate from below for δv δ + H(p) for all p ∈ R d and from above only for p's away from the flat spot.
In the next two proofs, we work with a fixed p ∈ R d and denote by C and c positive constants which may vary in each occurrence and depend only on an upper bound for |p| and the assumptions for H.

Proof of Theorem 2(i).
We actually prove a more general, deterministic statement: namely that, for every 0 < δ ≤ λ ≤ 1, there exists a fixed constant R ≤ C/δ and a finite set K ⊆ R d consisting of at most Cδ −2d elements (which will be identified in the argument) such that where µ := H(p) + λ/4. Admitting (6.4) for the moment, let us see how to derive (2.15) as a consequence of it and Theorem 1 (or more precisely, its corollary, Lemma 5.9). We simply use (2.1), (3.2), a union bound and (5.37) to estimate the probability of right side of (6.4), keeping in mind that µ = λ/4, R ≤ C/δ and |K| ≤ δ −2d . We have: The proof of (6.4) is by a simple comparison argument. We argue that, if −δv δ (0, ω; p) is too large, then we can find some translation of m µ which is much too small-otherwise v δ (·, ω; p) and y → m µ (y, z, ω) − p · y would touch somewhere, in violation of the comparison principle.
In view of (3.9) and |y 1 | ≤ C/λδ, we deduce that, for some c > 0 small enough, we may "snap to a grid" to deduce that there exists Note that K has Cλ −2d ≤ Cδ −2d elements. This completes the proof of (6.4).
Proof of Theorem 2(ii). The argument is similar to the proof of Theorem 2(i) above, but the two are not completely analogous and the details here are a bit more complicated. In particular, it is here that we need the existence of |e| = 1 satisfying (3.47).
To setup the argument, let 0 < δ ≤ 1 and λ > 0 such that (2.16) holds. Set µ := H(p). Since µ > 0 by assumption, there exists e ∈ R d with |e| = 1 such that (3.47) holds. The deterministic statement we prove is this: there exists R ≤ C/δ and a finite set K ⊆ R d with at most Cλ −2d elements such that where we define the events E 1 (z), E 2 (z) ∈ F for each z ∈ R d by Postponing the demonstration of (6.13), let us finish the proof of the theorem. Using (2.1), (3.2), a union bound and |K| ≤ Cλ −2d ≤ Cδ −2d , we find that . (6.14) Applying (5.37), we get and, using the assumption (2.16), we apply (2.14) to get Combining the last two sets of inequalities with (6.14) yields (2.17).
Step 1. We prepare v δ and m µ for the comparison. According to (2.5), (6.3) and Lemma A.1, if c > 0 is chosen sufficiently small, then (Note that, in contrast to Step 1 in the proof of Theorem 2(i) above, we have perturbed v δ by a nonsmooth function. Thus, unlike the derivation of (6.5), the inequality (6.15) does not immediately hold in the viscosity sense. This relies on the level-set convexity of H and explains the appeal to Lemma A.1.) Define U := y ∈ R d : w(y) > −λ/4δ and observe that, for every y ∈ U , and, therefore, According to (6.2), there exists y 2 ∈ R d such that |y 2 | ≤ C/λδ and w(y 2 ) = sup Let V := y ∈ R d : w(y) > −λ/4δ and note that, in light of the fact that w ≤ w, w(y 2 ) = 0 and w(y) ≤ −cλ|y − y 2 |, there exists 0 < R ≤ C/δ such that Define m(y) := m µ (y, y 2 − Re, ω) − p · y and observe that, in view of (3.4), Step 2. We apply the comparison principle: comparing w to m in V yields the inequality w(y 2 ) − m(y 2 ) ≤ sup y∈V ( w(y) − m(y)) = max y∈∂V ( w(y) − m(y)) .
We next split the left side of the above inequality into two pieces, one of which must be at least half of the right side. Recalling that e has been chosen so that (3.47) holds, we deduce that either Using (3.9), (6.17) and that |y 2 | ≤ C/λδ, we may "snap to a grid" to obtain that there exists such that either That is, either ω ∈ E 1 (z) or else ω ∈ E 2 (z) for some z ∈ K. Note that |K| = Cλ −2d ≤ Cδ −2d .
A covering argument now yields explicit error estimates for (6.1) in balls of radius O δ −1 .
Lemma 6.1. For each K > 0, there exist C, c > 0, depending on K and H, such that, for each p ∈ B K , R > 0 and 0 < δ ≤ c, 22) and, if H(p) > 0, then Proof. Since both of the estimates are obtained from the first two statements of Theorem 2 using a similar argument, we prove only (6.22). To do so, we apply (2.15) with λ := A| log δ| 1 3 δ 1 3 , for A > 0 chosen sufficiently large, and use a simple covering argument. Notice that if 0 < δ < 1 2 is sufficiently small, depending on A, then δ ≤ λ. There exist points y 1 , . . . , y N ∈ B R/δ such that N ≤ CR d λ −d and the balls B(y j , λ/2δK p ) cover B R/δ . According to (3.35), (6.3) and (2.15), We therefore obtain (6.22) if we choose A > 0 so that A 3 ≥ C(4d + 2).
We next apply Lemma 6.1 along a certain subsequence δ n → 0 to prove, with the help of (3.40) and the Borel-Cantelli lemma, the first two statements of Theorem 3.

Proof of Theorem 3(i) and Theorem 3(ii).
The arguments for the two statements are almost identical, so we prove only (2.20). Let R > 0, δ n = n −1 and apply (6.22) to obtain ∞ n=1 P sup By the Borel-Cantelli lemma, we deduce that, there exists C > 0 such that, for every R > 0, Intersecting these events for each R = 1, 2, 3, . . ., we find an event Ω 1 of full probability such that, for every R > 0 and ω ∈ Ω 1 , lim sup Notice that δ n+1 /δ n = 1 − δ n+1 and so, according to (3.40), for any δ n+1 ≤ η < δ n and ω ∈ Ω, Hence for every ω ∈ Ω 1 we have lim sup C| log δ| 6.2. The sub-ballistic regime. We show that the behavior of δv δ (0, · ; p) for p's on the flat spot {H = 0} is determined by the distribution of H(0, 0, ·) near its maximum and that with further (quite reasonable) assumptions on this distribution, we obtain exponential error estimates and an algebraic rate of convergence for −δv δ (0, ω ; p) to H(p).
We begin with the simple observation, which is probably well-known and essentially taken from [3], that −δv δ (·, ω ; p) is controlled pointwise from below by H(0, ·, ω). where the constant K p > 0 is defined in (3.36).
We now obtain, under assumption (2.11), exponential error estimates and a rate of convergence, from below, for −δv δ for all p ∈ R d . First, we combine (2.17) and (6.27) to obtain error estimates independent of H(p).
Arguing in a similar way as in the proof of Lemma 6.1, we obtain the following result as an application (2.19). The details are left to the reader. Lemma 6.4. Assume (2.11) and let α and β be defined as in (2.22). Then there exist C, c > 0 such that, for each R > 0 and 0 < δ ≤ c, (6.30) Using (6.30) we complete the proof of Theorem 3.
Proof of Theorem 3(iii). The statement follows from (6.30) by an argument very similar to the proof of Theorem 3(i) given above.
6.3. The rate may be arbitrarily slow on the flat spot. As explained in Appendix A, the vector p = 0 belongs to the flat spot: that is, H(0) = 0. We can also see this from (3.45) by observing that (2.9) implies that v δ (·, ω ; 0) ≡ 0. We show here that (2.11) is necessary for the existence of an algebraic rate of convergence like (6.30) for the limit (6.1) at p = 0. Furthermore, without some assumption on the distribution of the random variable H(0, 0, ·) near its maximum, there is no restraint on how slowly the limit (6.1) may converge for p = 0.
The previous lemma states that the rate −δv δ (0, ω ; 0) converges to 0 is controlled from below, up to a factor of 2, by the maximum of H(0, ·, ω) in the ball B C/δ . Using the independence assumption and an easy covering argument, we relate the latter to the distribution of H(0, 0, ω) near its maximum to recover the following estimate. Lemma 6.6. There exists C > 0 such that, for every δ > 0 and 0 < λ ≤ 1,

33)
where the inequality is vacuous if the argument in the logarithm on the right side is negative.
It is easy to check that for (6.38), it suffices to have and so we need to construct a Hamiltonian with a very thin distribution near its maximum. This is quite simple. We may take, for example, where V is a Poissonian potential (see for example [34]) and φ : [0, ∞) → (0, ∞) is a continuous, decreasing function such that φ(t) decays very slowly to 0 as t → ∞ (the precise rate of decay required can be explicitly calculated in terms of ρ). We leave it to the interested reader to fill in the details.
6.4. Uniform error estimates for the approximate cell problem. The proofs of Theorems 4 and 5, given in the next section, depend on the following extensions of Theorems 2 and 3 which hold uniformly for bounded |p| and for y in balls of radius O δ −N , for any N ≥ 1. We omit the arguments, since the error estimates follow easily from an application of Theorem 2 combined with (3.35) and a routine covering argument, and then the convergence rates from the latter using the nearly same argument as in the proof of Theorem 3.

Error estimates for homogenization
We now present the proofs of Theorems 4 and 5. The main step is to precisely quantify how the δv δ 's control the u ε 's, so that we may apply the results of the previous section to obtain error estimates and a rate of convergence for the latter.
It turns out (see, e.g., [6]) that there exists a constant L > 0, depending on K and the assumptions for H, such that, for all ε > 0, x, y ∈ R d , s, t ≥ 0 and ω ∈ Ω, |u ε (x, t, ω) − u ε (y, s, ω)| ≤ L (|x − y| + |s − t|) . These estimates are derived principally from the coercivity of H. Recall that, due to (6.2) and (3.45), the effective Hamiltonian shares the same rate of coercivity assumed in (2.6). It also follows easily from this that for each ε > 0, x ∈ R d , 0 ≤ t ≤ T and ω ∈ Ω, The important link between the δv δ 's and the u ε 's is summarized in the following lemma. Then Theorems 4 and 5 follow relatively easily from it and Propositions 6.9 and 6.10. The basic idea is that the event that |u ε (x, t, ω) − u(x, t)| is large should only be observed if |δv δ + H(p)| is also large. The proof, which is rather technical and lengthy, follows along the lines of [8] with necessary modifications to deal with to the lack of uniform estimates on the difference between −δv δ and H(p). It essentially consists of quantifying the perturbed test function method [13] to argue that, if −δv δ is close to H(p), then, up to an appropriate error, it properly captures the oscillations of u ε , that is, Rather than apply the comparison principle, we must use the proof of it, following [8]. The difficulty is that, since u is not in general C 1 , we cannot insert p = Du(x, t) into v δ (x, ω ; p). There are other technical difficulties (the presence of three nonsmooth functions, the fact that v δ is not smooth in p) which we handle by the standard viscosity theoretic technique of doubling (or rather tripling) the variables.
Throughout this section, we fix K > 0, assume H satisfies (2.10), and let C and c denote positive constants which may vary in each occurrence and depend only on K and H.
Step 2. We prepare for the proof of (7.13) by recording two elementary estimates that necessarily hold at any global maximum point of Φ(·, ω) and for any ω ∈ Ω. For the moment, we fix ω ∈ Ω and a point ( (7.14) By (7.14) and (7.11), we have Substituting the definition of Φ and rearranging, using that 1 + εδ −1 ≤ 2T by (7.9), we get Recall that a Lipschitz function with Lipschitz constant k cannot be touched from below (or above) by a C 1 function ϕ unless |Dϕ| ≤ k at the touching point. We use this observation to deduce that, if s 0 = 0, then by (7.2) and the fact that s → u(y, s) + λs + (s − t) 2 /2ε has a local minimum at s = s 0 , we have |s 0 − t 0 | ≤ (L + λ)ε ≤ (L + 1)ε. (7.16) The inequality (7.16) is also satisfied for a similar reason if t 0 = 0, and trivially if s 0 = t 0 = 0, so it holds without restriction. We also use a similar idea to get that If not, then y → ζ((x 0 − y)/α) is constant in a neighborhood of y 0 and we obtain from (7.14) that y → u(y, s 0 ) + 1 2α |x 0 − y| 2 has a local minimum at y = y 0 .
In view of (7.2), we deduce from (7.18) that α −1 |x 0 − y 0 | ≤ L. So we see that |x 0 − y 0 | ≤ Lα holds anyway, in contradiction to the assumption that it did not. Thus we obtain (7.17) and, in particular, We now begin the argument for (7.13), following the classical proof of the comparison principle for viscosity solutions and [8]. For the next several steps, we work with fixed ω ∈ Ω \ E and (x 0 , y 0 , t 0 , s 0 ) ∈ R d × R d × (0, T ] × (0, T ] such that (7.14) holds.
Step 3. We give the first part of the proof of (7.13). Here we fix (x, t) = (x 0 , t 0 ), allow (y, s) to vary and use the equation for u. The goal is to derive (7.24), below.
Proof of Theorem 4. The theorem is obtained in a straightforward way from the combination of Lemma 7.1 and Proposition 6.9.
We first give the proof of (i). Fix 0 < ε ≤ 1 and T ≥ 1, set δ := Aε/T λ 2 , where A > 0 is the constant C > 0 from Lemma 7.1, and let 0 < λ ≤ 1. Observe that λ ≥ Cε 1 3 for large enough C > 0 implies that and thus for such λ the hypothesis of Proposition 6.9(i) is in force. We apply first (7.6) and then second (6.40) to discover that We move along to the argument for (ii). Just as above, we fix 0 < ε, λ ≤ 1 and T ≥ 1 and set δ := Aε/T λ 2 , where A > 0 is the constant C > 0 from Lemma 7.1. We need check that, with this choice of δ, the hypothesis (2.27) with C > 0 sufficiently large implies (6.41). Indeed, if λ ≥ Cε A rearrangment produces λ ≥ C which implies (6.41), as desired, if we take C sufficiently large. Now combine (6.42) and (7.7): This completes the proof.
Proof of Theorem 5. The result follows from a combination of Lemma 7.1 and Proposition 6.10. We first prove (i). Define, for each ε > 0, λ(ε) := Aε and δ(ε) := Aε T λ(ε) 2 , where A ≥ 1 will be selected below. It is straightforward to check that According to (7.6), If we choose A large enough, then (6.43) yields that the last event is of probability zero, where we have used N = 2 in (6.43) and the fact that ελ(ε) ≤ δ(ε) 2 for small enough ε. We deduce that This completes the proof of (i).
To prove (ii), we define instead with δ(ε) the same as above and A ≥ 1 to be selected. With α and β as defined in (2.22), and recalling that a = α(1 + 2α) −1 and b = β(1 + 2α) −1 , we see that Thus for sufficiently large A, we find and a rearrangement produces We proceed similarly as above, using (7.7) to obtain η>0 0<ε≤η According to (6.44), the last set on the right has probability zero. It follows that which completes the proof of (ii).

Convergence rates in almost periodic environments
We conclude by showing that the techniques in the previous section may be used to obtain convergence rate for the homogenization of (1.1) in almost periodic (and in particular periodic) media. Since it is necessary to reproduce the qualitative homogenization theory from scratch, we take the opportunity to efficiently reorganize and quantify the argument.
Departing from the hypotheses in the rest of this paper, here we consider H ∈ C(R d × R d ) satisfying, for each K > 0, the regularity assumption H is uniformly continuous on B K × R d and H(·, y) : y ∈ R d is bounded in C 0,1 (B K ) (8. The assumption of almost periodicity is that the family of translations of H in the y-variable is precompact in the uniform topology of B K × R d . Precisely, we assume that, for all K > 0, We remark that, in this section, we make no convexity assumption on H, nor do we assume any analogue of (2.9) or (2.11).
The homogenization of coercive Hamilton-Jacobi equations in almost periodic environments was proved in [17]. The key observation was an elegant proof of the following fact. The rate of convergence in almost periodic environments follows from Lemma 7.1 once a rate for the limit (8.4) is obtained. For the latter, it is necessary to quantify the almost periodicity of H, which leads us to introduce, for each K > 0 and R > 0, It is immediate from (8.1) that, for each K > 0, ρ K is continuous and we see from its definition that it is decreasing. The assumption (8.3) is equivalent to the statement that, for each K > 0, lim R→∞ ρ K (R) = 0. We next define, for each K > 0 and 0 < δ < 1, The properties of ρ K yield that η K is a modulus, i.e., for each K > 0, Observe that if y → H(p, y) is 1-periodic, then ρ K 1 2 = 0 for all K > 0 and, hence, η K (δ) ≤ 1 2 δ. We also define, for each K > 0, the quantity which has the property (this is not difficult to check using similar arguments as in Appendix A) that, for every p ∈ R d , |Dv δ (· ; p)| ≤ L(|p|). (8.8) We prove next a quantitative version of Proposition 8.1. The argument is inspired from [21,17]. Here we simply reorganize and quantify it.
Since y ∈ R d was arbitrary, we obtain (8.10).
The combination of Proposition 8.2 and Lemma 7.1 yields the following convergence rate for the homogenization of (1.1) in almost periodic media. In order to apply Lemma 7.1, we note that its proof did not depend in any way on the random environment or the structural assumptions, such as level-set convexity or (2.9), which are not in force in this section. subject to the initial condition u ε (0, t) = u(0, t) = u 0 (x) ∈ C 0,1 (R d ), and let K > 0 be such that |u ε (x, t) − u ε (y, s)| ∨ |u(x, t) − u(y, s)| ≤ K(|x − y| + |t − s|) .
Then there exists a constant C 1 > 0 such that, for all T ≥ 1 and ε > 0, where L = L(K) is given by (8.7) and the modulus η L (·) by (8.5).
Observe that for a periodic Hamiltonian satisfying (8.1) and (8.2), Proposition 8.3 gives a rate of convergence of O ε 1 3 for homogenization.
Appendix A. Sketches of the proofs of Propositions 3.1, 3.6 and 3.8 Throughout this section, we assume that H satisfies (2.10). We begin with the following helpful lemma, which is due to the level set convexity of H and is useful for checking whether u ∈ L is a subsolution of the equation H(Du, y, ω) ≤ µ for µ ∈ R. A simple proof can be found in [4]. Obvious analogues of Lemma A.1 hold for equations with zero order terms, and so forth. We leave these to the reader.
A commonly used fact in the theory of viscosity solutions is that the supremum (infimum) of a family of subsolutions (supersolutions) is a subsolution (supersolution), see [9]. Observe that, in light of Lemma A.1, the infimum of a family of subsolutions of (A.1) is a subsolution and, in particular, the infimum of a family of solutions of (A.1) is a solution.
We next give details for some elementary facts concerning the functions m µ defined in (3.1). Most of what follows is well-known and can be found for example in [20] or [4], but we give sketches of the arguments for completeness and the convenience of the reader. Here µ > H * , where H * is a critical parameter defined as the infimum of all µ for which the equation admits a global subsolution u ∈ C(R d ) in R d . It turns out (see [4]) that H * = min H and the assumption (2.9) implies that H * = H(0) = 0. We begin by stating a comparison principle, which makes minimal assumptions on the growth of the subsolution and supersolution at infinity. (ii) Assume that (3.5) fails and, by adding a constant to u, that the right side of (3.5) is negative while the left side is positive at some point y 0 ∈ U . Define v(y) := m µ (y, x, ω) if y ∈ R d \ U, u(y) ∨ m µ (y, x, ω) if y ∈ U, and observe that v(x) = 0, m µ (·, x, ω) ≤ v in R d and m µ (y 0 , x, ω) < v(y 0 ). Moreover, it is clear from its definition that v is a subsolution of (A.2) in R d . This contradicts (3.1).
(iv) The lower bound of (3.8) follows from the observation that, if µ ≤ K, then (2.5) and (2.9) imply the existence of some c > 0, depending on K, such that, for every x ∈ R d , the function y → cµ|y −x| is a subsolution of (A.2) in R d . The upper bound is immediate from Proposition (A.2) and the fact that, for large enough C > 0 and any x ∈ R d , the map y → C|y − x| is a supersolution in R d \ {x}.
(iii) The dependence of δv δ on p can be controlled using the comparison principle together with (2.5) and (3.38). The argument is routine, so we merely sketch it. One inserts v δ (·, ω ; q) into (3.31), adds or subtracts a constant until the resulting function is a supersolution or subsolution, and applies Proposition 3.7. The estimate produced by this argument is (3.38). We remark that, due to (2.5) and (2.6), the right side of (3.39) is controlled by C|p − q| for a constant C > 0 depending on an upper bound for |p| ∨ |q|.