On general local Tb theorems

In this paper, local Tb theorems are studied both in the doubling and non-doubling situation. We prove a local Tb theorem for the class of upper doubling measures. With such general measures, scale invariant testing conditions are required (L^{\infty} or BMO). In the case of doubling measures, we also modify the general non-homogeneous method of proof to yield a new proof of the local Tb theorem with L^2 type testing conditions.


INTRODUCTION
There are multiple local T b theorems with a bit different assumptions. In these theorems, one assumes that to every cube Q there exist functions b 1 Q and b 2 Q , supported on Q, so that we also know something about T b 1 Q and T * b 2 Q (where T is a Calderón-Zygmund operator). One wishes to conclude that T : L 2 → L 2 boundedly so that T has natural dependence on the assumptions.
The first local T b theorem is by Michael Christ [Chr90], and there it was assumed that b 1 Q ∞ ≤ C and T b 1 Q ∞ ≤ C (and similarly for b 2 Q and T * b 2 Q ). This was proven for doubling measures (even in metric spaces). Nazarov, Treil and Volberg [NTV02] obtained a version of this theorem for measures satisfying the power bound µ(B(x, r)) r m for a given number m. So it is a non-homogeneous version of Christ's theorem in R n (it also allows BMO control in the operator side if the kernel of T is antisymmetric).
For doubling measures, one can also consider more general L p type testing conditions introduced by Auscher, Hofmann, Muscalu, Tao and Thiele [AHM + 02], and further studied by Hofmann [Hof07], Auscher and Yang [AY09] and Tan and Yan [TY09]. The most general assumption used in these papers is of the form that Q |b 1 Q | p ≤ |Q|, Q |b 2 Q | q ≤ |Q|, Q |T b 1 Q | q ′ ≤ |Q| and Q |T * b 2 Q | p ′ ≤ |Q|, where s ′ denotes the dual exponent of s and 1 < p, q ≤ ∞.
In [AHM + 02] a theorem of this type is proved only for very special operators, the so-called perfect dyadic singular integral operators. This was expected to easily generalize for all Calderón-Zygmund operators -but this turned out not to be the case (it being easy, at least). In [Hof07] the theorem is extended for standard 2. DEFINITIONS AND THE MAIN RESULT 2.A. Upper doubling measures and Calderón-Zygmund operators. Let λ : R n × (0, ∞) → (0, ∞) be a function so that r → λ(x, r) is non-decreasing and λ(x, 2r) ≤ C λ λ(x, r) for all x ∈ R n and r > 0. Let µ be a Borel measure in R n . We assume that µ is upper doubling with the dominating function λ, that is, µ(B(x, r)) ≤ λ(x, r) for all x ∈ R n and r > 0. In the case of doubling measures one can take λ(x, r) = µ(B(x, r)), and in the case of power bounded measures (µ(B(x, r)) ≤ Cr m ), one can take λ(x, r) = Cr m . Let d = log 2 C λ -this is a convenient number for us, and can be thought of as a dimension of the measure µ.
A Calderón-Zygmund operator with a standard kernel K is a bounded linear operator T taking L 2 (µ) into L 2 (µ) so that there holds T f (x) = K(x, y)f (y) dµ(y) for x not in the support of f .
Note that while we assume the boundedness of T a priori, we are interested in quantitative bounds for T , which only depend on some specified information.
2.B. Systems of accretive functions. When working with a general upper doubling measure µ, we assume that to every cube Q ⊂ R n there exist two functions b 1 Q and b 2 Q so that there holds We call these accretive L ∞ systems. At least in the case of an antisymmetric kernel, one could make do with BMO control in the operator side (see [NTV02]), but we focus only on this case.
When working with a doubling measure ν, we may also use the following set of assumptions: to every cube Q ⊂ R n there exist two functions b 1 Q and b 2 Q so that there holds We call these accretive L 2 systems (suppressing from the name the fact that we actually impose the somewhat stronger L s conditions in (iii)).
We now formulate our main theorem.
2.1. Theorem. Let µ be an upper doubling measure with a dominating function λ and T : L 2 (µ) → L 2 (µ) a Calderón-Zygmund operator with a standard kernel K. Assuming the existence of accretive L ∞ systems (b 1 Q ) and (b 2 Q ), we have T ≤ C, where C depends on the dimension n and on the explicit constants in the definitions of λ, K, (b 1 Q ) and (b 2 Q ). If µ = ν for some doubling measure ν, then the same conclusion holds assuming only the existence of accretive L 2 systems (b 1 Q ) and (b 2 Q ). The rest of this paper contains a direct proof which simultaneously gives the theorem with either set of assumptions. In particular, the proof is neither a reduction to a local T 1 theorem, nor to a perfect dyadic case. Notation-wise we work so that we use µ as long as everything works with the use of either set of assumptions, and sometimes write µ = ν when we explicitly estimate differently in the doubling L 2 case. We write X Y to mean X ≤ CY with some constant C like in the theorem. Also, X ∼ Y means Y X Y . Sometimes we absorb other parameters, but then it is either explicitly said or written in the notation (e.g. X δ Y would mean X ≤ C(δ)Y ).

2.2.
Remark. While the second part of the theorem concerning the L 2 test function case is not new, the proof is. Certainly some new ideas are still needed to establish the theorem with general p and q. However, the point is not solely in the range of exponents. For example, we point out that the non-homogeneous proof technique completely avoids the use of the so called Hardy type inequalities used and studied in [AR10].

PRELIMINARIES
We begin by recording the following basic facts. Let a dyadic system D be given. The side length of a cube Q ∈ D is denoted ℓ(Q), and Q (j) denotes the unique cube S ∈ D for which Q ⊂ S and ℓ(S) = 2 j ℓ(Q). We also set for every Q ∈ D. The condition is called the Carleson (measure) condition.
The following is the famous Carleson embedding theorem.
3.2. Theorem. Given a Carleson sequence (a Q ) there holds for any f ∈ L 2 (µ) that The following is called the unweighted square function estimate.

Theorem.
There holds for any f ∈ L 2 (µ) that

3.A.
Stopping times and the martingale difference operators ∆ Q . Let D be a dyadic system of cubes, and let Q 0 ∈ D be a fixed large cube.
One easily checks that for some τ < 1.
Next, one fixes a cube Q k 1 and considers all the maximal D-cubes Q ⊂ Q k 1 for which there holds One does this for every Q k 1 ∈ D 1 , and then the resulting collection of cubes is called D 2 = {Q k 2 } k . One proceeds like this to obtain collections D j for every j. Of course, we have the property that for every Q ∈ D j there holds Fixing δ to be small enough, one easily checks that for some τ < 1. This is then continued just like in the L ∞ case.

3.A.3. Martingale difference operators.
For every Q ⊂ Q 0 we let Q a be the smallest cube in the family D j containing Q. Note that if Q ⊂ Q 0 is such that Q a ∈ D t , there holds for every j ≥ 1 that We state a very useful (but immediate) consequence of this as a lemma.
3.4. Lemma. The following is a Carleson sequence: α Q = 0 if Q is not from j D j , and it equals µ(Q) otherwise.
Given a cube Q let ch(Q) consist of those cubes Q ′ ⊂ Q for which ℓ(Q ′ ) = ℓ(Q)/2. Let f be a function supported on Q 0 . We define Note that then we have Also set holds both pointwise almost everywhere and in L 2 (µ).
where naturally E k h = Q∈D k χ Q h Q . It is immediate to see that the right hand side of (3.6), for a fixed k, is precisely E a,1 k f . It follows from the stopping time construction that almost every x ∈ Q 0 belongs to only finitely many stopping cubes P ∈ ∞ t=0 D t . If S is the smallest of them, then Q a = S for all Q ∋ x with ℓ(Q) = 2 −k ≤ ℓ(S). Thus b a,1 k f → f . In the case of accretive L ∞ systems, the L 2 (µ) convergence is immediate from dominated convergence, since |E a,1 k f | Mf , where M is the dyadic maximal operator. It remains to prove that E a,1 k f → f in L 2 (ν) in the case of accretive L 2 systems. Note that E a,1 k f 2 f 2 . Thus, it suffices to prove the convergence for a given bounded function f . As the convergence is in any case fine in the pointwise almost everywhere sense, we just need to find a suitable square integrable majorant. And we have by Lemma 3.4.
We are usually given two dyadic systems D and D ′ . Then we use operators ∆ Q constructed using (b 1 Q ) in connection with the family D and operators ∆ R constructed using (b 2 R ) in connection with the family D ′ (in the L 2 case the stopping time for the latter also uses T * instead of T , of course). It would perhaps be better to write ∆ 1 Q and ∆ 2 R to indicate the difference (as we have done above for some operators that we need not use so frequently), but we omit this for brevity. It should nevertheless be clear from the various summing conditions like Q ∈ D and R ∈ D ′ .
3.B. Square function estimates. With accretive L ∞ systems, the estimates are quite clear (see [NTV02,chapter 3]). So in the rest of this subsection, we work with a doubling measure ν and L 2 type test functions (the second estimate is actually, perhaps surprisingly, generally false in this setting).

Lemma. The sequence
is Carleson.
Proof. Let Q ∈ D be such that Q a ∈ D t . We simply write as follows by the unweighted square function estimate (Theorem 3.3) and Lemma 3.4.

Proposition. There holds
Furthermore, there holds (as ν is doubling) that Here we used Lemma 3.4 to bound the first term by f 2 2 (the bound for the second term follows from the unweighted square function estimate, Theorem 3.3).
Next, note that The latter term is yet again bounded by f 2 2 by the unweighted square function estimate (Theorem 3.3), and the first one is, too, bounded by f 2 2 by the previous lemma.
The following example is a bit disconcerting. After all, we want to work with accretive L 2 systems of functions, and the failure of such a fundamental estimate seems like a real predicament. A weaker, but sufficient for us, substitute result is offered afterwards.
3.9. Example. The estimate is not, in general, true for accretive L 2 systems.
Proof. Consider the one-dimensional situation with Q 0 = [0, 1) and N ∈ Z + a fixed but arbitrary parameter. We construct a sequence of examples, where the constant in the dual square function estimate grows without limit as a function of N. Let and b Q := χ Q for all other dyadic intervals Q. They satisfy |Q| −1 Q |b Q | 2 dx ≤ 2, and the accretivity of these functions is not an issue; however, the normalized L 2 norm of b [0,2 −j ) on [0, 2 −k ) will increase as k increases. With a suitable choice of the stopping parameters, it follows that the stopping cubes are precisely all the Q j := [0, 2 −j ), j = 0, 1, . . . , N. In particular, We apply this to the function yielding, by a simple computation, Since χ Q j 2 2 = 2 −j and f 2 = 1, it follows that and this proves the impossibility of the dual square function estimate.
The following weaker estimate is, however, true and still useful.
3.10. Proposition. For general accretive L 2 systems, there holds Let t be such that P ∈ D t . Note that and (as ν is doubling) that Here we used the fact that Q ′ ∈ D t+1 are disjoint. For the rest of the terms, we write Recalling Lemma 3.7 and the unweighted square function estimate, Theorem 3.3, we have that is dominated by where the last estimate follows since on P \ D t+1 we have L ∞ -control of b 1 P by Lebesgue's differentiation theorem, and S∈D t+1 S⊂P 3.11. Remark. The stronger estimate is true if our test functions satisfy Q |b 1 Q | q dν ν(Q) for some q > 2 (and the stopping time argument is modified to use this condition, of course). The point is that then one can cope with summing over the multiple generations of D j because of the better estimate | b 1 the Hardy-Littlewood maximal function is then bounded on L 2/p ).

RANDOM DYADIC CUBES AND THE DECOMPOSITION OF THE PAIRING T f, g
Start by fixing once and for all two compactly supported functions f and g so that f 2 = g 2 = 1 and T /2 ≤ | T f, g |. We choose a big enough integer N so that spt f , spt g ⊂ B(0, 2 N −3 ). Consider two independent random squares The cubes Q 0 and R 0 are taken to be the starting cubes of the independent grids D and D ′ (only the cubes inside Q 0 and R 0 matter). Of course, the probability measure in question is the normalized Lebesgue measure on the where α is the number from the kernel estimates and d = log 2 C λ . The number r is fixed to be large enough (this is quantified later).
We shall use the badness morally in the same line as it is usually used [NTV02, NTV03] -the details are somewhat different, however. There are various reasons for this, and we shall carefully elaborate on those after performing the decomposition, since this seems to us like a genuine source of trouble.
We define k Q∈D = Q∈D, ℓ(Q)>2 −k . Using the facts that E a,1 k f → f in L 2 and E a,1 k f 2 f 2 combined with dominated convergence (in the probability space) we see that where E is the expectation over the random grids D and D ′ ; sometimes we will explicitly write it as the pairing on the right hand side can be written in the form Note that spt E a,2 k g ⊂ Q 0 for all sufficiently large k. Thus, one can bound . There seems to be no such equally cheap way to further bound | T ( However, this can be controlled using a much simplified version of the arguments we shall use in Section 7 concerning adjacent cubes of comparable size in the main series k R∈D ′ k Q∈D T (∆ Q f ), ∆ R g . We detail on this at the end of that chapter. Therefore, one is (remembering the above remark) reduced to estimating with a bound independent of k. The summation after the expectation is finite, and thus all the rearrangements one could want to make are legitimate. In the sequel, the index k = k 0 is fixed, and we no longer make any reference to it in the notation. (The symbol k will then be free for other uses. ) We continue to write the summation We denote the corresponding parts of the sum by Σ i , i = 1, 2, 3. Goodness will be separately inserted only in the middle sum Σ 2 . We shall now study these sums one by one in the following sections (using both set of assumptions). Note that the sum ℓ(R) < ℓ(Q) will then also be in check by the symmetry of our assumptions.
4.1. Remark. We now give a few technical comments to compare our strategy with previous works based on the use of random dyadic grids. One can safely ignore these, especially if one is not too familiar with non-homogeneous analysis.
It is natural (if one follows the beautiful strategy pioneered by Nazarov, Treil and Volberg in their deep papers [NTV97], [NTV03], [NTV02] and some others) and then write f = f good + f bad . One does the similar thing also for g but using the grid D ′ and operators ∆ R . Then one decomposes T f, g = T f good , g good + T f good , g bad + T f bad , g . One usually wants to reduce the considerations to the pairing T f good , g good by arguing that the bad parts are small. However, getting a hold of this smallness would typically exploit the dual square function estimate, the failure of which we already saw in our general context of accretive L 2 systems (see Example 3.9). However, with a moderate amount of work and a certain trick we managed to show (also in the L 2 case) that, after all, E f bad 2 c(r) f 2 , where c(r) → 0 when r → ∞. So this reduction could, nevertheless, always be made.
Here comes another unfortunate snag: in our local situation even the good part, as defined above, seems not so good after all. Let us explain. In the global T b theorems there holds ∆ Q f good = ∆ Q f , if Q ∈ D good , and ∆ Q f good = 0, if Q ∈ D bad . However, there is no reason for this to be true in this local situation with the more complicated operators ∆ Q , which in general fail the pairwise orthogonality ∆ Q ∆ R = 0 for Q = R. This means that in the pairing one cannot remove any goodness from the summation -which one can in the global situation, if one replaces ∆ Q f = ∆ Q f good (and similarly for g), and then notes that adding some bad cubes to the sum just amounts to adding zeroes. One works hard to add the restriction to good cubes only, so why would one need to remove some of it? The answer is that in the paraproduct part of the argument there is a subtle phenomenon, where it is essential that the bigger cube has no restrictions for a certain telescoping sum to collapse. If the bigger cubes are restricted to be good, the sum does not collapse, and the resulting object seems to be way too complicated to handle. This is the reason why we choose to modify this earlier strategy, and insert the goodness in a different way. However, the paraproduct still does not become quite as simple as usually, and it is basically for this reason that in the L 2 test function case we need the stronger integrability exponent s > 2 on the operator side.
There are subtle tricks which depend on independence to add and remove goodness, see [Hyt09], [Hyt10b] and [Mar10]. These cannot be used here either, and this is basically because ∆ Q depends not only on the cube Q and its children (like in the global T b theorems), but also, through the stopping time argument, on the whole grid D (and this stops one from using certain independence properties).
The fact that the corresponding matrix generates a bounded operator in ℓ 2 is the content of the next proposition (this is [HM09, Proposition 6.3]).
The long range interaction lemma will still have further use to us when dealing with the sum Σ 2 in the next section.

all the cubes of the size of R or larger
T (∆ Q f ), ∆ R g .

6.
A. The disposal of the bad bart Σ 2, bad . Define D bad, A to be the collection of those cubes Q ∈ D which are bad with respect to some D ′ -cube of side length A or larger. We do not always explicitly write the summing conditions ℓ(Q) ≤ 2 −r ℓ(R) and d(Q, R) ≤ 2n 1/2 ℓ(Q) γ ℓ(R) 1−γ , but these are in force, nevertheless. We then estimate as follows where the last estimate used Proposition 3.8 and the fact that given Q, there are 1 cubes R so that ℓ(R) = 2 k ℓ(Q) and d(Q, R) ≤ 2n 1/2 ℓ(Q) γ ℓ(R) 1−γ . Thus, we have (here the expectation where c(r) → 0, when r → ∞ (recall f 2 = g 2 = 1). We now fix a large r so that E|Σ 2, bad | ≤ T /16. We are done with the bad part.
6.B.1. Case (R 1 ) a = R a . We begin by assuming that (R 1 ) a = R a . In this case One may then perform the usual decomposition The last term, where χ R\R 1 = 2 n i=2 χ R i , can be readily estimated using the long range interaction lemma: The corresponding matrix is a bounded operator in ℓ 2 by [NTV02, Lemma 6.1] (this is a lemma which uses no special properties of the measure). The first term will be part of the soon to be formed paraproduct.
Let us now bound the term Let us first bound this in the easier case of the L ∞ test functions. We have that where we used [HM09, Lemma 2.4] and the fact that d(Q, ∂R 1 ) ℓ(Q) 1/2 ℓ(R) 1/2 . Let us now establish the same bound in the case of L 2 test functions (we do not even need a doubling measure for this -so this gives another proof of the above estimate too). Here we need to use the fact that Q is good with respect to R and all the bigger cubes. Let M be such that (R 1 ) (M +1) = R a . We have There holds (since γ(α and so using the fact that (R 1 ) (j+1) |b 2 R a | dµ µ((R 1 ) (j+1) ), we have and this is known to be acceptable (see again [NTV02, Lemma 6.1]).
6.B.2. Case (R 1 ) a = R 1 . We then assume that (R 1 ) a = R 1 . In this case we write B R 1 = g R 1 / b 2 R 1 R 1 and C R = g R / b 2 R a R , and then decompose as follows: The last term, being identical to the last term in (6.1), is again handled using the long range interaction lemma. The next to last term is also estimated as above, except that this time we have |C R | | g R |, so we get This is again fine by [NTV02, Lemma 6.1], since by Carleson's embedding theorem. The first two terms in (6.2) will be part of the paraproduct.
6.C. The paraproduct and its boundedness. Let D good, k be the collection of those Q ∈ D which are good with respect to all D ′ -cubes of side length 2 k ℓ(Q) and larger. If Q ∈ k≥r D good, k , let α(Q) be the smallest index k so that Q ∈ D good, k . So collecting the terms that we did not yet estimate in (6.1) and (6.2), we see that we need to bound Note that there is a unique R of each side length in the inner sum, the one with R ⊃ Q. In the above summation, let S(Q) ∈ D ′ be R 1 , when ℓ(R) = 2 α(Q) ℓ(Q). Then bringing the R summation inside the pairing, we see that the sum collapses to We write this in the form So we were able to collapse the sum because we introduced the goodness in a more restricted way than is usually done (see Remark 4.1). But the result is somewhat different from the usual paraproducts, since S(Q) can be arbitrarily larger than Q. At this stage we bring the absolute values inside the summations. We may then consider the following, somewhat more general, situation. Let us be given a collection F ⊂ D so that to every cube Q ∈ F there is associated a unique cube F (Q) ∈ D ′ for which there holds Q ⊂ F (Q). The rest of this section is concerned with proving that We begin by recalling from [NTV02,p. 271] that It follows that always This can be further bounded by | f Q |µ(P ) 1/2 (in the L 2 case, the doubling property is needed here). We estimate There holds that 2 . We are reduced to showing that (a R ) and (b R ) form Carleson sequences (both in the L ∞ test function and in the L 2 test function case).

Lemma. The sequence
is Carleson.
Proof. Consider an arbitrary R ∈ D ′ . We write We are reduced to showing that for an arbitrary H ∈ D ′ there holds We estimate as follows where M(H) consists of maximal Q ∈ D for which Q ⊂ H. Now the claim is very easy in the L ∞ case. Just use the dual square function estimate and the fact that We are thus reduced to the case µ = ν with L 2 test functions. For a given Q ∈ M(H) we estimate using Proposition 3.10 that 6.4. Lemma. The sequence is Carleson.
Proof. As in the proof of the previous lemma, this reduces to showing that for an arbitrary H ∈ D ′ there holds Letting M(H) consist of the maximal Q ∈ D for which Q ⊂ H, we have The L ∞ case is again clear from this (recalling Lemma 3.4). Otherwise, we have as in the proof of the previous lemma that where p = s/2 > 1.
6.5. Remark. The proofs of the previous two lemmata are the only places of the paper where we use, in the case of accretive L 2 systems, the stronger integrability exponent s > 2 on the operator side. The lemmata are true with s = 2, if one always has ν(F (Q)) ν(Q). Unfortunately, if F (Q) = S(Q), as in the proof of the main theorem, then this does not have to be the case. It does not seem to be easy to arrange the collapse of the paraproduct in such a way that S(Q) would be, say, always precisely r generations larger than Q (and still know how to estimate the bad part to be small).
The above two lemmata end our proof of the boundedness of the paraproduct. Recalling that Σ 2 = Σ 2, good + Σ 2, bad , E|Σ 2, bad | ≤ T /16, and that Σ 2, good decomposes into the paraproduct and some other terms, all of which we have shown to be bounded, we have established the following proposition.

ADJACENT CUBES OF COMPARABLE SIZE
We shall sum over those Q ∈ D, R ∈ D ′ for which 2 −r ℓ(R) < ℓ(Q) ≤ ℓ(R) and d(Q, R) ≤ 2n 1/2 ℓ(Q) γ ℓ(R) 1−γ . For a given R, there are only boundedly many such Q. Thus, this reduces to considering a finite number of subseries where Q = Q(R). Moreover, one may assume that R → Q(R) is invertible.
There holds If Q ∈ D k , one can write Here we interpret Hence, we can dominate our series with nine summands of the form , and the summands are determined by the choices Q a )} and analogous choices for g. Observe that in each case we have by the construction of the stopping time, using doubling in the case of accretive L 2 systems.
We fix the parameters i, j now.
where∆ Q is the operator ∆ Q without the multiplying b functions: is shown in the same way as Proposition 3.8, so we are done. 7.A. Surgery. We then begin the delicate surgery part of the argument -this is done a bit differently than in [NTV02] (e.g. the concept of badly intersected cubes is not needed). Also, the L 2 test function case needs several modifications.
To handle the various separated terms that we shall encounter in a unified manner, estimates in the spirit of the following lemma are useful (this is a small modification of [HM09, Lemma 9.3]). 7.2. Lemma. Let S 1 and S 2 be two sets so that we have d(S 1 ) ∼ d(S 2 ) and d(S 1 , S 2 ) δ min(d(S 1 ), d(S 2 )). Suppose we are also given functions ϕ and ψ supported on S 1 and S 2 respectively. Then there holds that , and define the analogous sets also for R. (The subscript s refers to separation from R j .) Of course, e.g. Q i,∂ depends also on j, but the dependence is suppressed, as j is considered fixed here, in any case. We may then decompose

7.
A.1. Arguments involving η-boundary regions. We always have that χ Q i ϕ Q,i 2 µ(Q i ) 1/2 . Thus, by separation, The relevant series with these matrix elements are then bounded by Next, we have Thus, there holds where c(η) → 0 if η → 0. A similar estimate holds also with the matrix element We are left to deal with T (χ ∆ Q i ϕ Q,i ), χ ∆ R j ψ R,j . Choose j(η) ∈ Z so that η/64 ≤ 2 j(η) < η/32. Let D * be another independent grid (e.g. choose a large cube U 0 at random so that Q 0 ∪ R 0 ⊂ U 0 always, and use that as the starting cube of the grid D * ). Let s = 2 j(η) ℓ(Q i ) and G = G(R) = D * − log 2 s . We enlarge the sets ∆ Q i and ∆ R j to obtain new sets ∆ G The series which has the sum of the last two terms as its matrix element is, after averaging, dominated by c(η) T by the very same argument used above. We fix at this point η to be so small that the above four η-boundary region terms contribute no more than c(η)C T < T /32. 7.A.2. Arguments involving ǫ-boundary regions. We are reduced to consider the pairing T (χ ∆ G Q i ϕ Q,i ), χ ∆ G R j ψ R,j . Let ǫ > 0, and set G ǫ = G ǫ (R) = g∈G δ ǫ g , where δ ǫ g = (1 + ǫ)g \ (1 − ǫ)g. We define ∆ ′ Q i = ∆ G Q i ∩ G ǫ and∆ Q i = ∆ G Q i \ G ǫ (and similarly for R). We then write T µ(Q i ) 1/2 χ Gǫ χ R j ψ R,j 2 . Again, we have E D * χ Gǫ (x) ≤ c(ǫ), where c(ǫ) → 0 when ǫ → 0. Therefore, the series which has the sum of the last two terms as its matrix element is, after averaging, dominated by c(ǫ) T .
We are now left with T (χ∆ Q i ϕ Q,i ), χ∆ R j ψ R,j . It suffices to consider pairings T (χ g 1 χ∆ Q i ϕ Q,i ), χ g 2 χ∆ R j ψ R,j for g 1 , g 2 ∈ G, as there are only boundedly many (depending on η -but this is fixed) cubes in G which matter. Suppose first that g 1 = g 2 . Then, because of separation, | T (χ g 1 χ∆ Q i ϕ Q,i ), χ g 2 χ∆ R j ψ R,j | ǫ µ(Q i ) 1/2 µ(R j ) 1/2 . This implies, like above, that the relevant series with this matrix element is dominated by C(ǫ) f 2 g 2 = C(ǫ).

Lemma. We have
Proof. There holds We have (using 1 µ(H) H b 2 H dµ = 1) that Furthermore, we have for x, y ∈ H that |τ (x) − τ (y)| ≤ We conclude that where the last estimate follows by noting that in the L ∞ case |M µ ϕ Q,i | ≤ ϕ Q,i ∞ 1, and that in the L 2 case this also works out by the stopping time and the doubling property of the measure.