The solution of the Kato problem for degenerate elliptic operators with Gaussian bounds

We prove the Kato conjecture for degenerate elliptic operators in R^n. More precisely, we consider the divergence form operator L_w = -1/w div (wA) grad, where w is a Muckenhoupt A_2 weight and A is a complex valued n x n matrix which is bounded and uniformly elliptic. We show that if the associated semigroup satisfies Gaussian upper bounds, then the Kato square root estimate holds.


Introduction
The purpose of this work is to give a positive answer to the Kato square root problem for a class of degenerate elliptic operators, under the assumption that the associated heat kernel satisfies classic Gaussian upper bounds.
Before stating our results, we briefly sketch the background. Given a uniformly elliptic, n×n complex matrix A, define the second-order elliptic operator L = −divA∇. Then the square root L 1/2 can be defined using the functional calculus. The original Kato problem was to show that for all f in the Sobolev space H 1 , L 1/2 f L 2 ≈ ∇f L 2 . This was first posed by Kato [23] in 1961, but only solved in the past decade in a series of remarkable papers by Auscher, et al. [3,4,20]. Initially, they solved the problem given the additional assumption that the heat kernel of the semigroup e −tLw satisfied Gaussian bounds. Such estimates were known to be true in the case A was real symmetric, but it had been shown that they need not hold for complex matrices in higher dimensions [2]. The final proof omitted this hypothesis. For a more complete history of this problem, we refer the reader to the above papers or to the review by Kenig [25].
We have extended this result to the case of degenerate elliptic operators, where the degeneracy is controlled by a weight in the Muckenhoupt class A 2 . We say that Date: June 7, 2009June 7, . 1991 Mathematics Subject Classification. Primary 35J70, 35C15, 47D06 ; Secondary 47N20, 35K65.
Given A ∈ E n (w, λ, Λ), define the degenerate elliptic operator in divergence form L w = −w −1 divA∇. Such operators were first considered by Fabes, Kenig and Serapioni [15] and have been considered by a number of other authors since. (See, for example, [7,8,9,10,16,26,27].) It is a natural question to extend the Kato problem to these operators: that is, to show that for all f in the weighted Sobolev space H 1 0 (w). (Exact definitions will be given below.) We consider this problem in the special case that the heat kernel of the semigroup e −tLw satisfies Gaussian bounds. More precisely, we assume there exists a heat kernel W t (x, y) associated to the operator e −tLw such that for all f ∈ C ∞ c , Furthermore, for all t > 0 and x, y ∈ R n , the kernel W t satisfies the Gaussian bounds and the Hölder continuity estimates where h ∈ R n is such that 2|h| ≤ t 1/2 + |x − y|. The constants C 1 , C 2 and µ depend only on n, w, λ, and Λ. If these three properties hold we will say that e −tLw satisfies Condition (G).
Our main result is the following theorem.
To prove Theorem 1.1 it actually suffices to prove the second inequality, . For suppose this inequality holds. Since A ∈ E (n, λ, Λ, w) implies A * ∈ E (n, λ, Λ, w), (1.6) holds for (L 1/2 w ) * = (L * w ) 1/2 . (This operator identity follows from the functional calculus, for instance, from (5.2).) Therefore, by the ellipticity conditions Our proof of inequality (1.6) follows the outline of the proof of the classical Kato problem with Gaussian bounds in [20]. (See also the expository treatment in [19].) There are four main steps: in Section 5 we reduce (1.6) to a square function inequality; in Section 6 we show that this inequality is a consequence of a Carleson measure estimate; in Section 7 we prove a weighted T b-theorem for square roots; finally, in Section 8 we construct the family of test functions need to use the T b theorem to prove the Carleson measure estimate. Prior to the proof itself, in Sections 2 and 3 we give some preliminary results about degenerate elliptic operators, Gaussian bounds and weighted norm inequalities. And in Section 4 we prove two weighted square function inequalities needed in our proof. The first, in particular, is central, since it is the replacement for the (much simpler) Fourier transform estimates used in the unweighted case.
Throughout, all notation is standard or will be defined as needed. The letters C, c will denote constants whose value may change at each appearance. Given a function f and t > 0, define f t (x) = t −n f (x/t). Given an operator T on a Banach space X, let T B(X) denote the operator norm of T .

Degenerate Elliptic Operators
The properties of the degenerate elliptic operator L w and the associated semigroup e −tLw were developed in detail in [11] and we refer the reader there for complete details. Here we state the key ideas.
Given a weight w ∈ A 2 , the space H 1 0 (w) is the weighted Sobolev space that is the completion of C ∞ c with respect to the norm Given a matrix A ∈ E n (w, λ, Λ), define a(f, g) to be the sesquilinear form Since w ∈ A 2 and A satisfies (1.1), a is a closed, maximally accretive, continuous sesquilinear form. Therefore, there exists a densely defined operator L w on L 2 (w) such that for every f in the domain of L w and every g ∈ L 2 (w), If f, g are in C ∞ c and in the domain of L w , then integration by parts yields where , is the standard complex inner-product on L 2 . Thus, at least formally, Further, the properties of the sesquilinear form a guarantee that the semigroup e −tLw exists. In the special case when A is real and symmetric, then the heat kernel of e −tLw satisfies Condition (G).
Finally, in [11] we proved the following results which will be needed in our proof. Lemma 2.1. Given a matrix A ∈ E n (w, λ, Λ), suppose that the heat kernel of the associated semigroup e −tLw satisfies Condition (G). Then for all t > 0, e −tLw 1 = 1: If e −tLw satisfies Gaussian bounds, then its derivative satisfies similar bounds. More precisely, let V t = −2tL w e −t 2 Lw = d dt e −t 2 Lw . Then the following result holds. Lemma 2.2. The operator V t has a kernel V t (x, y) with the following properties: For all x, y ∈ R n and t > 0, For almost every There exists α = α (n, λ, Λ, w) > 0 such that for almost every x, y, ∈ R n , 2 |h| < t + |x| ,

Weighted Norm Inequalities
Central to our proof is the theory of weighted norm inequalities for classical operators, particularly singular integrals and square functions. In this section we state the results we need; the standard ones are given without proof and we refer the reader to Duoandikoetxea [13], García-Cuerva and Rubio de Francia [17], and Grafakos [18] for complete information.
We begin with the weighted norm inequalities for the Hardy-Littlewood maximal operator, for convolution operators, and for singular integrals.
Lemma 3.1. Let w ∈ A 2 . Then M is bounded on L 2 (w) and M B(L 2 (w)) ≤ C(n, [w] A 2 ). Furthermore, suppose φ and Φ are such that for all x, |φ(x)| ≤ Φ(x), and Φ is radial, decreasing and integrable. Then the operators φ t * f are uniformly bounded on L 2 (w); in fact, are bounded on L 2 (w), and T B(L 2 (w)) , T * B(L 2 (w)) ≤ C(n, [w] A 2 , K). Remark 3.4. An important example of singular integrals are the Riesz transforms: where the constant c n is chosen so that Our next two results are square function inequalities. The first is a weighted version of Carleson's theorem due to Journé [21]. Define the weighted Carleson measure norm of a function γ t by Lemma 3.5. Let w ∈ A 2 and suppose γ t is such that γ t C,w < ∞. Let p ∈ C ∞ c (R n ) be such that p is a non-negative, radial, decreasing function, supp(p) ⊂ B 1 (0), and p 1 = 1. Then for all f ∈ L 2 (w), The second result is a Littlewood-Paley type inequality.
Lemma 3.6. Given w ∈ A 2 , let ψ be a Schwartz function such that ψ(0) = 0. Then for all f ∈ L 2 (w), Proof. A direct proof of Lemma 3.6 is given by Wilson [30]. Here we sketch a proof that is implicitly based on the idea that can be regarded as a vector-valued singular integral.
By a standard argument in the theory of weighted norm inequalities, it will suffice to prove that for all 0 < δ < 1 there exists a constant C δ such that for x ∈ R n , where M is the Hardy-Littlewood maximal operator and M # is the sharp maximal operator of Fefferman and Stein. The proof of inequality (3.3) is readily gotten by adapting the argument in Cruz-Uribe and Pérez [12, Lemma 1.6] for the g * λ operator. The changes are straightforward, so here we only indicate the key steps. (Also seeÁlvarez and Pérez [1].) Our assumptions on ψ guarantee that g ψ is bounded on L 2 and is weak (1,1). (See [17, p. 505].) Therefore, we only have to prove that if |x| > |h|/2, then This is the vector-valued analog of the gradient condition in [12,Lemma 1.6]. To prove (3.4), note that since ψ is a Schwartz function it is bounded. Hence, by the mean value theorem, for each x and t there exists θ, 0 < θ < 1, such that Since |x + θh| > |x|/2 and |∇ψ(y)| ≤ C|y| −2n−2 , We estimate the second integral in the same way, using that |∇ψ(x)| ≤ C|x| −2n . Taking the square root we get (3.4).
The next proposition is a key estimate in our proof of Theorem 1.1. It yields a square function estimate given size and regularity assumptions on the kernel of the operator. In the unweighted case, this result can be found in, for example, Grafakos [18, p. 643] or Hofmann [19]; in a somewhat different form it can be found in Auscher and Tchamitchian [5].
Proposition 3.7. Let w ∈ A 2 and let ψ be a radial Schwartz function such that ψ(0) = 0 and .
. Given a family of sublinear operators {R t }, suppose that each R t is bounded on L 2 (w), and for all t, s > 0 the composition R t Q s is bounded on L 2 (w) and for some α > 0, Then the family {R t } satisfies the square function estimate The proof of Proposition 3.7 requires a weighted version of the Calderón reproducing formula given by Wilson [30].
where this equality is understood as follows: for each j > 1, let B j be the ball centered at 0 of radius j, and define the function Then for each j, f j ∈ L 2 (w) and {f j } converges to f in L p (w).
Proof of Proposition 3.7. Fix f ∈ L 2 (w) and let f j be as in Lemma 3.8. Since R t is bounded on L 2 (w), we have that for each t > 0, Since each R t is sublinear, we have that Therefore, by Fatou's lemma, Minkowski's inequality, and (3.6), and the same is true if we reverse the roles of s and t. Therefore, if we apply Schwartz' inequality, Fubini's theorem and Lemma 3.6 we get that Operator norm bounds such as those in (3.6) can generally be deduced in the unweighted case using the Fourier transform or kernel estimates. We will make use of the following result from Grafakos [18, Theorem 8.6.3]. Lemma 3.9. Let {T t }, t > 0 be a family of integral operators such that T t 1 = 0 and such that the kernels K t satisfy Then for some α > 0, In order to get this estimate on L 2 (w), w ∈ A 2 , we use the following clever application of interpolation due to Duoandikoetxea and Rubio de Francia [14].
Proof. This is a consequence of the structural properties of A 2 weights and the theory of interpolation with change of measure due to Stein and Weiss [28] (see also Bergh and Löfström [6]). Given w ∈ A 2 , there exists s > 1, depending only on is bounded by a constant that depends only on [w] A 2 and n. Choose θ such that , so by interpolation with change of measure,

Two Square Function Inequalities
In this section we prove two weighted square functions inequalities. The first is for the operator V t = d dt e −t 2 Lw and is used in Sections 6 and 7 below. Lemma 4.1. Let p ∈ C ∞ c be a non-negative, radial, decreasing function such that Then there exists C > 0 depending only on n, w and the constants in the Gaussian estimates, such that The second square function inequality is needed in the last step of the proof in Section 8. In the unweighted case this inequality is due to Journé [22]; our proof is adapted from that of Auscher and Tchamitchian [5] as explicated by Grafakos [18].
Define the averaging operator A t by

It follows immediately from the definition of
p is a nonnegative, radial function such that supp(p) ⊂ B 1 (0) and p 1 = 1. Then again by Remark 4.3. Though Lemma 4.2 is stated in terms of the dyadic grid, it will be clear from the proof that it is true if we replace the dyadic grid by the "dyadic" grid relative to a fixed cube Q: that is, the collection of cubes gotten as in the construction of the standard dyadic grid, but starting with Q instead of [0, 1) n .
Proof of Lemma 4.1. The proof requires several steps. First, we will show that there exists a family of sublinear operators {R k t }, k ≥ 0, such that (4.1) holds provided that there exists A > 1 such that and the R j are the Riesz transforms. This square function estimate will follow from Proposition 3.7 if we can prove that the operators R k t are uniformly bounded on L 2 (w) and satisfy two operator norm estimates. Let ψ be a radial Schwartz function such that ψ(0) = 0 and such that (3.5) holds, and let Q s f = ψ s * f . We will first show that for all s, t > 0, and then show the stronger estimate Reduction to (4.3). By Lemma 2.2, In the first integral, make the change of variables h = (y − x)/t; in the integrals in the sum, make the change of variables h = (y − x)/2 k t. Then there exist positive constants B 1 and B 2 such that Note that for any Therefore, by Hölder's inequality we get the following estimate: if we make the change of variables t → 2 −k t = t k , we get (4.6) Assume for the moment that for each k there exists a family of sublinear operators where F = R · ∇f , and that there exists A > 1 such that (4.3) holds. By Lemma 3.3, , and so if we combine (4.6) and (4.3) we get inequality (4.1).
To construct the operators R k t and show that (4.7) holds, recall that the the Riesz potential I 1 is the convolution operator with kernel i 1 (x) = c n |x| 1−n , where the constant c n is chosen so that The following identities are well-known (see, for instance, Stein [29]): if f is a Schwartz function and F = R · ∇f , then f = I 1 F . Further, given any h ∈ R n , h · ∇f = −R · (hF ).

Define the convolution kernel
then (4.7) holds. And, since the operators H k t,h are linear, each R k t is sublinear.
Proof of inequality (4.4). Since by Lemma 3.1 the operators Q s are uniformly bounded on L 2 (w), to prove (4.4) it will suffice to prove that for all t and k, . By definition, for all Schwartz functions g we have that To prove (4.8) we will prove that each term on the right-hand side is uniformly bounded on L 2 (w). The boundedness of the second follows immediately from Lemmas 3.1 and 3.3: since p 1 = 1, for all k and t we have that We will now prove that the first term is bounded on L 2 (w): To prove this we first estimate the inner integral on the left-hand side: c n is the constant in the Riesz potential and R * i is the maximal singular integral associated with the Riesz transform R i .
Therefore, to complete the proof of (4.9) we need to show that But by Lemma 3.1 it will suffice to prove that (4.10) |L(x)| ≤ C(n) min 1 |x| n−1 , 1 |x| n+1 , since the right-hand side is a radial, decreasing function in L 1 . To prove this estimate we treat several cases depending on the size of x.
Sub-case 2.3: 2 −k−2 ≤ |x| ≤ 2 −k+1 . In this case we estimate as follows : if we make the change of variables u = h/|x|, we get = c n |x| n |x| n−1 where the last equality holds by rotational symmetry. Note that this last integral is finite and its value depends on n but is independent of k and x.
If we apply these three sub-cases to (4.11), we get that for all |x| ≤ 2, This completes our proof of (4.10), and so of (4.9). Therefore, we have shown that inequality (4.8) holds.
Proof of Inequality (4.5). By Lemma 3.10 and inequality (4.4), to prove (4.5) it will suffice to show the corresponding unweighted norm estimate: Fix s, t > 0, k ≥ 0, and recall that Q s f = ψ s * f . By Hölder's inequality and Plancherel's theorem, To estimate the last term, we will use the following: since ψ is a Schwartz function such that ψ(0) = 0, And, since p(0) = 1, Therefore, we have that Combining these estimates we get that Inequality (4.13) follows immediately. This finishes the proof of Lemma 4.1.
Proof of Lemma 4.2. For brevity, let R t = P t − A t . Then we have that so it will suffice to prove that (4.2) holds with R t replaced by R t (I − P t ) and R t P t . By our remarks above, it is clear that both of these operators are uniformly bounded on L 2 (w). Therefore, by Lemma 3.10 and Proposition 3.7 it will suffice to show that for all s, t > 0, there exist constants C, α > 0 such that where Q s f = ψ s * f (x) with ψ a radial Schwartz function with ψ(0) = 0.
We first prove (4.14). Since R t is uniformly bounded on L 2 , We can bound the right-hand side using Plancherel's theorem. By our choice of ψ and sincep(0) = 1, Therefore, and so On the other hand, since convolution operators commute and I − P t is uniformly bounded on L 2 , For any α, 0 < α < 1/2, there exists a constant C such that (See Grafakos [18].) Further, since we can again use Plancherel's theorem to see that It follows that for α < 1/2 and so we get that (4.14) holds.
To prove (4.15) we will apply Lemma 3.9. It will suffice to show that R t P t (1) = 0, and that the kernel K t of the operator R t P t satisfies (3.8) and (3.9). The identity is immediate: both A t and P t are bounded on L ∞ and A t (1) = P t (1) = 1.
where J t is the kernel of P 2 t and L t is the kernel of A t P t , that is, It is immediate from these expressions that there exists a constant c > 0 such that both J t (x, y) and L t (x, y) are non-zero only if |x − y| < ct. Further, we have that Inequality (3.8) follows at once.
The proof of (3.9) is similar. By the mean value theorem, If we use this to estimate |J t (x, y) − J t (x, y ′ )| and |L t (x, y) − L t (x, y ′ )| and argue as before, we get that both satisfy a similar bound. Inequality (3.9) follows at once. Therefore, we have proved (4.15) and our proof is complete.

Reduction to a Square Function Estimate
In this section we begin our proof of Theorem 1.1 by proving that inequality (1.6) holds if we have the square function estimate where V t = −2tL w e −t 2 Lw . Fix f ∈ D(L w ); recall that D(L w ) is dense in H 1 0 (w) (see [11]). If we apply integration by parts to a well known formula (see, for instance, Kato [24]) we get that Therefore, for all g ∈ L 2 (w), Hence, by duality we have shown that (1.6) follows from (5.1) provided that we can prove the square function inequality Since the semigroups e −tL * w and e −tLw satisfy the same estimates, this is equivalent to proving that To prove this square function estimate we use Proposition 3.7. Let G(x) = exp(−C 2 |x| 2 ). Then by Lemma 2.2, Since G ∈ L 1 and is radial, by Lemma 3.1, sup t>0 (G t * |f |)(x) ≤ CMf (x) and the operators tV t are uniformly bounded on L 2 (w). Let ψ be a radial Schwartz function such that ψ(0) = 0. For s > 0, let Q s = ψ s * f . Again by Lemma 3.1 we have that the operators Q s are uniformly bounded on L 2 (w). Therefore, there exists a constant C such that for all s, t > 0, Further, by Lemmas 2.2 and 3.9 there exists β > 0 such that Hence, by Lemma 3.10 we have that for some α > 0, Therefore, the operators {tV t } satisfy the hypotheses of Lemma 3.7, so (5.3) holds.

Reduction to a Carleson Measure Estimate
In this section we prove that (5.1) holds provided that we have a Carleson measure estimate. More precisely, we will show that if w ∈ A 2 , then for all Schwartz functions f , To prove this, we first show the role played by the Carleson measure estimate. Let p ∈ C ∞ c be a non-negative, radial, decreasing function such that p 1 = 1 and supp(p) ⊂ B 1 (0). By Lemma 2.2, V t 1 = 0, so The first term in (6.2) satisfies a square function estimate. This follows from the Carleson measure estimate: if γ t C,w < ∞, then by Lemma 3.5, The proof of (6.1) now follows from Lemma 4.1.

The Weighted T b Theorem for Square Roots
We have reduced the proof of Theorem 1.1 to proving that γ t (x) = V t φ (x) is a Carleson measure with respect to the weight w (x): that is, where γ t (x) = tL w e −t 2 Lw ϕ (x) and ϕ (x) = x.
In order to prove this fact we will establish a T b theorem for square roots, a weighted version of a result due to Auscher and Tchamitchian [5]. For technical reasons we actually need a slightly different theorem that is given in Lemma 7.2 below. However, it seemed clearer to start with this simpler version and then sketch the modifications needed to prove the full result.
Lemma 7.1. Suppose that for every cube Q there exists a mapping F Q : 5Q → C n and a constant C = C (n, λ, Λ, w) < ∞ such that . Then γ t C,w ≤ C (n, λ, Λ, w) < ∞, i.e., γ t is a Carleson measure with respect to w.
Proof. We follow the proof of the unweighted lemma in [19]. Since the kernel of V t satisfies the Gaussian bounds (2.3), we have that Indeed, since V t has zero moment, thus, applying the Gaussian estimates, and this bound is independent of t. Hence, given (iv) to prove (7.1) we only need to show that Since p is supported in the unit ball and the gradient operator commutes with P t , for x ∈ Q and 0 ≤ t ≤ ℓ (Q) we have that where c 0 is a constant to be fixed below. LetF Q = F Q − c 0 ; then (7.4) is equivalent to Recall that 1 2 V t = −tL w e −t 2 Lw = −te −t 2 Lw L w = te −t 2 Lw w −1 div A∇. Define θ t = 2te −t 2 Lw w −1 div A; then θ t acts on n × n matrix valued functions. If 1 is the n × n identity matrix, γ t = θ t ∇φ = θ t 1, so for x ∈ Q and 0 ≤ t ≤ ℓ (Q) we can write where R t = θ t 1 (x) P t − θ t . We claim that If this is the case, then by assumption (ii) Furthermore, since the operator e −t 2 Lw is a contraction in L 2 (w), by (iii), we have Therefore, to complete the proof we only have to show that (7.7) holds. Since V t (x, y) dy = 0, By Lemma 4.1, .
We estimate the right hand side using the product rule and the weighted Poincaré inequality (see [15]); if we fix This proves (7.7) and our proof is complete.
In our proof we actually need to replace (iv) with a more complicated criterion; for ease of reference we record this as a separate lemma.
The proof is essentially identical to the proof of Lemma 7.1. By (7.3), it is enough to show that and this is done exactly as in the proof of (7.4).

Proof of the Weighted Kato Theorem
To complete the proof of Theorem 1.1 we will construct a finite index set {ν} and for each cube Q ⊂ R n a family of functions F Q,ν : 5Q → C that satisfy the hypotheses of Lemma 7.2. To do so we adapt the proof of the non-weighted case in [19].
Recall that ϕ (x) = x. Given a cube Q ⊂ R n define the function where ε > 0 will be chosen below. The set {ν} will be a finite collection of vectors in C n , |ν| = 1, also to be chosen below. Define Since |F Q,ν | ≤ |F Q |, and similarly for the gradient, to prove (I) and (II) in Lemma 7.2 it will suffice to prove that F Q satisfies (ii) and (iii) in Lemma 7.1. To prove (iii): from (7.3) we have for some constant independent of t. Then .
Since w ∈ A 2 it is a doubling measure, and so This proves (iii).
To prove that (III) holds we make two reductions. First, recall that the averaging operator A t is defined by where Q t (x) is the unique dyadic cube containing x such that t ≤ ℓ(Q t (x)) < 2t. As before, letχ (x) ∈ C ∞ 0 (R n ) be such that supp(χ) ⊂ 4Q,χ| 3Q ≡ 1 and |∇χ| ≤ Cℓ (Q) −1 . Then P t f (x) = P tχ f (x), and A t f (x) = A tχ f (x) for all x ∈ Q and t, 0 ≤ t ≤ ℓ (Q). Since P t commutes with the gradient, by Lemma 4.2 we have that Therefore, by (I) we have and so to prove (III) it is enough to establish this estimate with P t replaced by A t .
For our second reduction, we need a lemma that was proved in [19] in the unweighted case; essentially the same argument works in the weighted case.
then µ is a Carleson measure with respect to w, and Therefore, to complete our proof that (III) holds, we need to find our finite index set {ν} and construct sets E Q as in Lemma 8.1 such that We have now come to "the heart of the matter," as was said in [19]. For every ν ∈ C n with |ν| = 1 define the cone Clearly, for each ε > 0 there exists a positive integer N = N (ε, n) and unit vectors ν 1 , . . . , ν N ∈ C n such that C n ⊂ N j=1 Γ ν j . Below we will fix our ε and will then let {ν} = {ν j } N j=1 be our index set. We will first construct the sets E Q . To do so, we will construct sets E Q,ν that depend on ν; the desired set E Q will be the union of these sets over our index set {ν}. We need two more lemmas. The first is from [19]; since its proof depends only on the Gaussian bounds for the kernel of e −tLw , it holds in the weighted case without change.
Lemma 8.2. There exists a constant C > 0 depending only on the constants C 1 , C 2 in the Gaussian bounds, such that for any cube Q ⊂ R n , The second lemma gives two properties of A 2 weights. The first is the well-known A ∞ condition, and the second is closely related. For a proof see [17,18]. Lemma 8.3. If w ∈ A 2 , there exists constants α, δ > 0 and constants β, ǫ > 0 such that given any cube Q and measurable set E ⊂ Q, To prove (III) we first construct the sets {E Q,ν } via the stopping time argument used in [19]. We include the details to show how the proof adapts to the weighted case. Let S 1 be the collection of all maximal dyadic cubes Q ′ ⊂ Q such that  ν · ∇F Q,ν (y) dy = Re By Hölder's inequality, (I) and the definition of A 2 weights, (8.10) Re |∇F Q,ν (y)| 2 w(y) dy Similarly, by the definition of B 2 and (I), |∇F Q,ν (y)| 2 w(y) dy By Lemma 8.3 we may assume that ε is so small that (8.11) implies Combining this estimate with (8.10) we obtain Re B\B 1 ν · ∇F Q,ν (y) dy ≤ 1 8 |Q| , and putting this together with inequalities (8.8) and (8.9), for ε small enough we get 1 16 |Q| ≤ Re E Q,ν ν · ∇F Q,ν (y) dy .
But then, since w ∈ A 2 , Since w −1 is also in A 2 , by Lemma 8.3 (applied twice) there exists η > 0 such that (8.12) w (E Q,ν ) ≥ η w (Q) . Now write B Q,ν = Q\E Q,ν = Q ν,j , where the Q νj are disjoint maximal dyadic cubes. Let E * Q,ν = R Q \ R Q ν,j be the sawtooth region above E Q,ν . If (x, t) ∈ E * Q,ν , and Q t (x) is the biggest dyadic sub-cube of Q containing x with t ≤ ℓ (Q t (x)) < 2t, then by the maximality of the cubes in S 1 and S 2 we have that Q t (x) ∈ B; hence |∇F Q,ν (y)| dy ≤ 1 8ε .
By the definition of A t , this implies that By the definition of Γ ν , if z ∈ Γ ν , then |z| < (1 + ε)|z ·ν|. Hence, for ε > 0 sufficiently small, we have that Now fix ε small; form our index set {ν} = {ν j } N j=1 as described above, and let E Q = j E Q,ν j . Therefore, if we let z = γ t (x), then γ t (x) ∈ Γ ν j for some 1 ≤ j ≤ N, and we have 1 for all (x, t) ∈ E * Q . It follows that 1 where, by (8.12), w(E Q ) ≥ η w(Q). This proves (8.5); thus we have shown that (III) in Lemma 7.2 holds, and so have completed the proof of Theorem 1.1.