A T(b) Theorem on Product Spaces

The main result of this paper is a bi-parameter T(b) theorem for the case that b is a tensor product of two pseudo-accretive functions. In the proof, we also discuss the L^2 boundedness of different types of the b-adapted bi-parameter paraproducts.


INTRODUCTION
The study of the T (1)/T (b) type theorems in the multi-parameter setting can be dated back to 1985, when Journé [14] proved the first multi-parameter T (1) theorem by treating the singular integral operator as a vector-valued one-parameter operator. The result itself is very elegant except that some partial boundedness of the operator needs to be assumed. More recently, Pott and Villarroya in [20] prove a new bi-parameter T (1) theorem with much weaker assumptions on the operator, where they formed different types of mixed conditions instead of assuming the partial boundedness. This is the point of view taken by Martikainen in [16], where he proved a representation theorem for bi-parameter singular integral operators which then implies a T (1) result, and in his joint work with Hytönen [12], where they showed a bi-parameter T (1) theorem in spaces of non-homogeneous type. 1 In this paper, for the first time, we prove a T (b) theorem in product spaces, which is a natural extension of the work we have mentioned above.
1.1. Definition. A function b ∈ L ∞ (R n × R m ) is called pseudo-accretive if there is a constant C such that for any rectangle R in R n × R m with sides parallel to axes, imply that there exists a constant C, such that for any cubes K ⊂ R n , V ⊂ R m , 1 |K| | K b 1 | > C and 1 |V | | V b 2 | > C, i.e. b 1 and b 2 are both pseudo-accretive in the classical sense. Although this seems to be too restrictive, it is actually quite natural. Note that b = 1 falls in this class. Moreover, in all of the papers mentioned above, some partial structures on the operator are required in order to treat those mixed problems risen in the bi-parameter setting. In other words, the singular integral operator itself we are looking at behaves like a tensor product in some sense. It is essential in our argument for b to be a tensor product, otherwise, even defining T b would become a problem.
Just as in the situation for the bi-parameter T (1) theorems, we still need to assume that besides T, T * , the partial adjoints of T also map b to a BMO function, an assumption shown by Journé [14] to be unnecessary for T to be L 2 bounded. A more detailed discussion can be found in Section 6 of [14].
The main technique of the proof is to decompose L 2 functions into sums of martingale differences adapted to b, analyze each part of the sums, and show that they have good enough decay to be summed up. The advantage of analyzing martingale differences is that they are supported on dyadic rectangles, constant on each of their children, and have orthogonality. Martikainen followed a similar strategy in [16], using Haar functions. However, when we treat b instead of 1, we have to create a bi-parameter b-adapted martingale difference decomposition, which makes the estimate of each part of the sum much less transparent. In the one-parameter setting, the idea of using such b-adapted martingale difference operators is well known and has been discussed by many authors in their proofs of different types of T b theorems, such as David, Journé and Semmes [5], Coifman, Jones and Semmes [3], Nazarov, Treil and Volberg [18], Hytönen and Martikainen [11]. But in the bi-parameter case, the b-adapted martingale difference has never been treated before.
The operator T studied in this paper is initially defined as a continuous linear map from bC ∞ 0 (R n × R m ) to its dual. In order to justify the convergence of pairings of martingale differences, we also assume a priori that T is bounded on L 2 (R n × R m ), although we will show that quantitatively the operator norm of T is bounded by some constant depending only on the weak assumptions introduced in the following, but has nothing to do with the assumed L 2 → L 2 norm. Note that although this a priori assumption is often unnecessary, it appears as a hypothesis in the proofs of some T (1) theorems: many authors have added this assumption ( [16], [12]), even in the one-parameter setting ( [18], [9]). It is not a consequence of involving b, but results from the fact that one has an initially continuously defined operator which is treated dyadically. Thus, we are more interested in showing how those weak assumptions quantitatively control the L 2 → L 2 norm of T . However, in some specific examples that we will mention later, this a priori assumption can be removed.
The plan for the paper is the following. First, we introduce the assumptions on the operators as well as necessary preliminary on bi-parameter b-adapted martingale differences. Second, before stating and proving the T (b) theorem, we discuss some types of bi-parameter b-adapted paraproducts, which will be used later. Next, we give an averaging formula in the same flavor as in [16], which enables us to use the concept of "goodness" of cubes in our estimate. Then, we will move on to the main body of this paper, prove the T (b) theorem by a case by case estimate of terms in the averaging formula.

ACKNOWLEDGEMENT
The author would like to thank Jill Pipher for guiding her into this area, suggesting the topic and the numerous fruitful discussions. The author is also grateful to Michael Lacey and Brett Wick for useful discussions during her visit to Georgia Institute of Technology.

ASSUMPTIONS ON THE OPERATOR
Bi-parameter b-adapted martingale differences. As a preliminary, we begin with a quick introduction of the martingale difference decomposition adapted to our problem.
Let ω n = (ω n i ) i∈Z , where ω n i ∈ {0, 1} n . Let D n 0 be the standard dyadic grid on R n . We define the shifted dyadic grid D n ω n = {I + i: 2 −i <ℓ(I) 2 −i ω n i : I ∈ D n 0 } = {I ∔ ω n : I ∈ D n 0 }, where I ∔ ω n := I + i: 2 −i <ℓ(I) 2 −i ω n i . There is a natural probability structure on ({0, 1} n ) Z , which gives us a random dyadic grid D n ω n in R n . When there is no need to specify what is the ω n , most of the time, we just write D n for short. Interested readers can find more detailed discussion of random dyadic grids in [9] or [16].
Given a pseudo-accretive function b = b 1 ⊗ b 2 , and two fixed dyadic grids D n , D m in R n , R m , respectively. For each p ∈ Z, let D n p be the collection of cubes of side length 2 −p in D n , we have Similarly, we have E b 2 q and E b 2 J defined for each q ∈ Z, J ∈ D m . Then their composition is a b-adapted double expectation operator: p for each I ∈ D n p , and similarly for the other variable. The b-adapted double martingale difference is defined as The following properties can be easily checked: I×J f is supported on the dyadic rectangle I × J, and is a constant on each of its children; (3) ∆ b p,q ∆ b k,l = 0 unless p = k, q = l, and in this case it equals ∆ b p,q ; Property (4) can be verified by iteration of the one-parameter martingale difference argument in [18].
Moreover, we observe that We now introduce the assumptions on T that we will need throughout the argument. Fix two pseudo-accretive The kernel K : (R n+m ×R n+m )\{(x, y) ∈ R n+m ×R n+m : x 1 = y 1 or x 2 = y 2 } → C is assumed to satisfy (1) Size condition (2) Hölder conditions Partial Calderón-Zygmund structure. We also need some C-Z structure on R n and R m separately to deal with the case when f, g are only separated on one variable. If f = f 1 ⊗ f 2 , g = g 1 ⊗ g 2 and sptf 1 ∩ sptg 1 = ∅, then we have the partial kernel representation The partial kernel K f 2 ,g 2 defined on (R n × R n ) \ {(x 1 , y 1 ) ∈ R n × R n : x 1 = y 1 } is assumed to satisfy the following standard estimates: (1) Size condition (2) Hölder conditions This assumption is in the same flavor of [16], and is important of defining T (b). In fact, we can weaken this by assuming the above only for the cases when , for any cube V ⊂ R m , and u V being a V -adapted function with zero-mean (i.e. sptu V ⊂ V , |u V | ≤ 1 and u V = 0).
We also need to assume that there exists a universal constant C, such that It is easily shown that both full and partial kernel representations also hold when f, g are finite linear combinations of characteristic functions, or even tensor products of compactly supported L ∞ functions, as long as for the required variable, they are still disjointly supported. To see this, when taking those functions, following from the standard condition on the kernels, both integrals are still convergent. We can use them to define the corresponding bilinear forms. After we finally show that T is bounded on L 2 (here we don't even need the boundedness assumption on T a priori), use the density of C ∞ 0 functions and Lebesgue dominated convergence theorem, we can show that the bilinear form has to be equal to the kernel representation, hence is well defined.
The partial C-Z structure assumption is natural. Recall how Journé defined his class of operators in [14]. Rephrasing in terms of our definition, Journé assumed that the partial kernel K f 2 ,g 2 (x 1 , y 1 ) is a bilinear form associated with a L(L 2 (R m ), L 2 (R m )) valued standard C-Z kernel, which then implies the size and Hölder conditions (2.1), (2.2), (2.3). In the bi-parameter setting, the partial C-Z structure assumptions are required to both define T b and to handle the "mixed cases". That arise because of the independent behavior in each variable. (See Section 6, 7, 9, 12 for discussions of different "mixed cases"). As far as we know, all the previous literature in this area needs some assumptions about the partial C-Z structure of the operator. For example, in Pott and Villarroya's most recent version of [20], they included such an assumption on the operator so that they can fully justify the definition of T 1. Although it is formulated a little differently, but is in spirit the same as ours. Martikainen ([16]) also requires a similar assumption. (See Section 2 of [16]).
Note that in the case f, g are separated in both variables, i.e. when we have the full kernel representation, the partial kernels are just and both of the size and Hölder conditions follow easily. We also assume that the symmetric partial kernel representation and corresponding conditions on kernel K f 1 ,g 1 in the case sptf 2 ∩ sptg 2 = ∅.
Weak boundedness property. We assume that there exists a constant C such that, for any cube K ⊂ R n and V ⊂ R m , where T 1 is the partial adjoint of T defined by Here, by assuming that they are in BMO(R n × R m ), equivalently, we mean that they are in BMO d (R n × R m ), the dyadic BMO space for any dyadic grid. It is proved by Pipher and Ward [19] that in the bi-parameter setting, the product BMO is the average of dyadic BMO. This result is then reproved and extended to multi-parameter by Treil [21] through a different method. We now run into a problem of defining T b (and similarly for the other three functions). In order to do this, we are going to show that T b lies in the dual of some properly selected subspace A of H 1 d (R n × R m ), i.e. the bilinear form g, T b is well defined for any g ∈ A.
While the integrand is not compactly supported, and the Hölder condition for partial kernels implies that the integral is convergent, it can be used to serve as the definition of the bilinear form on the left hand side.
Part four: In this part, the functions have good separations on both variables. As above, although we don't have a full kernel representation for the bilinear form directly due to the fact that the integrand is not compactly supported, we can define it as follows: and prove that the integral does converge. To see this last fact, we change K(x, y) to K(x, y) − K(c I , x 2 , y) − K(x 1 , c J , y) + K(c I , c J , y) by cancellation. Then the Hölder condition for the full kernel will imply the convergence of the integral. Note that in parts two, three and four, we don't give an arbitrary definition to those bilinear forms. A simple limiting argument shows that they are well defined. Consider part four for example. Let ϕ be a cut-off function, such that ϕ = 1 on I ×J, and ϕ = 0 outside 3I ×3J.
f is a finite linear combination of characteristic functions, by the linearity of bilinear forms and full kernel representations, we have Changing the kernel and using the Hölder condition for the full kernel as above, together with the boundedness of f and ϕ, we can show that the integrand is uniformly bounded by a constant multiple of Then the Lebesgue dominated convergence theorem implies that And it's easily seen that the above definition is independent of the choice of ϕ.
Hence, T b lies in the dual of A. By saying that it belongs to BMO d (R n × R m ), we mean that it is bounded on A and can be boundedly extended to a functional defined on the whole H 1 d (R n × R m ). And we can use the same technique above to give meanings to the other three objects similarly. Note that we can actually weaken this BMO assumption by only assuming that T (b) is a functional on A, and similarly for the other three (but with differently chosen subspaces of H 1 (R n × R m )). We will see in the following that this is all we need.
Diagonal BMO conditions. There exists constant C such that, for any cube K ⊂ R n , V ⊂ R m , and any zero-mean functions a K , b V which are K, V adapted, respectively, the following hold: In this section, we will discuss the boundedness of three different kinds of bi-parameter b-adapted paraproducts that will be used in the proof of our T (b) theorem.
Partial paraproducts. By partial paraproduct we mean a classical one-parameter b-adapted paraproduct with respect to one variable.
3.1. Definition. Let a ∈ BMO(R m ). Then, for two fixed pseudo-accretive func- is a partial paraproduct, acting on functions on R m : Similarly, there is a symmetric partial paraproduct with respect to the other variable for fixed pseudo-accretive functions b 1 , b ′ 1 ∈ L ∞ (R n ), acting on functions on R n .

Proposition. Partial paraproducts are bounded operators on L 2 . Specifically,
and a similar inequality holds for the symmetric one.
Proof. We only prove the first inequality. For any f, g ∈ L 2 (R m ), where the fourth and fifth lines follow from Hölder inequality. Hence, it suffices to show that To see this, by the boundedness of b ′ 2 , Hence, it suffices to prove Observing the above inequality, we see that by Carleson embedding theorem, all we need is to show that And this is not hard to prove since the b-adapted martingale differences satisfy the L 2 property by [18]. Indeed, since a 2 where the last equality is because ∆ b 2 I maps any constant function to 0. And this completes the proof.
Full paraproducts. We now introduce a "real" bi-parameter b-adapted paraproduct, which is a natural generalization of the classical one-parameter one.

Proposition. Full paraproducts are bounded operators on
To prove this proposition, we need to first consider the space It is well known that H 1 can be characterized using both martingale maximal function and square function with the norms being equivalent ( [6]). Similarly, if we define a b-adapted maximal function then, we have the following fact We then have the following theorem.
To prove Theorem 3.6, we use the idea of double martingale by Bernard and a technique involving atomic decomposition. See [1].
First, in our b-adapted case, the well known equivalence of L 2 norm between martingale maximal function and square function is still true. More specifically, we have Proof. Iteration of a well known one-parameter L 2 result (see [18]) gives Hence, On the other hand, by accretivity

and the strong maximal function is bounded on
For simplicity, denote f p,q = E b 1 p E b 2 q f , and for each pair (p, q) ∈ Z × Z, let F p,q be the σ-algebra generated by all the dyadic rectangles of size 2 −p × 2 −q .
Note that if we call F = {x : τ (x) = Z × Z}, then from property (2) in the definition, both a * b and S b a are supported on F . Also, such functions are called atoms because they have the following property.

Proposition. If a is an atom, then
Proof. Using the supports of a * b and S b a, Hölder inequality implies We now state the theorem of atomic decomposition.

3.11.
Theorem. Given f ∈ K 1 b ∩ L 2 , there exists a sequence of atoms a n and a sequence of scalars λ n such that (1) f = n λ n a n , a.e.
(2) n |λ n | f K 1 b . Before stating the proof of Theorem 3.11, we show that this atomic decomposition result will imply Theorem 3.6.
Proof. (of Theorem 3.6) It suffices to show the result holds for f ∈ L 2 . For any such function, atomic decomposition implies f t = n λ n a n t , a.e. ∀t ∈ Z × Z. Then, We turn to the prove of Theorem 3.11.
Proof. (of Theorem 3.11) For any n ∈ Z, let F n = {x : where E t is the classical expectation operator. It is easy to check that τ n is a stopping time, and τ n ⊂ τ n+1 .
Using this, define We claim that such a n and λ n satisfy all the properties required in the theorem.
To check property (2): In the above, the second line follows from Chebyshev Inequality, and the fourth line uses the L 2 boundedness of the classical martingale maximal function.
To check property (1): It suffices to check that For the first limit, Chebyshev Inequality implies that which implies lim n→∞ |E t (χ Fn )| = 0 a.e. uniformly in t. So when n is large enough, τ n = Z × Z a.e., i.e. f τn = f .
Then the convergence is automatically true.
We claim that all the terms appearing in the sum are 0, hence lim n→−∞ f τn (x) = 0.
Then the only thing left to check is that all the a n defined are indeed atoms. To see this, firstly, a n ∈ L 2 . Indeed, Secondly, just as how we argued for the second property above, we see that Thirdly, if t + 1 ∈ τ n , for any double integer s not satisfying s ≤ t, by a simple computation, we have On the other hand, if s ≤ t, then s ∈ τ n , hence, which implies a n t = E b t (a n ) = 0.
which is equivalent to The first term can be dealt with trivially, For the second term, let D t denote all those dyadic rectangles of generation t, then In the above, the second lines follows from the fact that ∆ b t−1 f is a constant on each R, and the third line uses t ∈ τ n+1 . Combining I and II gives us which completes our proof for the theorem of atomic decomposition.
With the result of Theorem 3.6, we return to the full paraproducts, and give a proof of Proposition 3.4.
where the last step in the above follows from Theorem 3.6. Hence, it suffices to show that To see this, notice that where M S (f ) is the strong maximal function, which is bounded on L 2 . Since S b is also bounded on L 2 , we have Mixed paraproducts. Since we are working in the bi-parameter setting, there appears a new mixed type of b-adapted paraproducts which requires particular attention. Basically, it means we have an average on a, and a difference on f with respect to one variable, and conversely with respect to the other.
is called a mixed paraproduct, defined as 3.13. Proposition. Mixed paraproducts are bounded operators on L 2 (R n × R m ). Specifically, Since we already have the b-adapted square function characterization of H 1 b , this proposition can be proved in the same way as a similar result in [20].
Proof. For any f, g ∈ L 2 (R n × R m ), We claim that To see this, note that where the last two operators are just formally defined, but not the compositions of the square functions and maximal functions. Since pointwisely, And this is true because In the above, M 2 means the Hardy-Littlewood maximal function with respect to the second variable. In the fourth line, we used the Fefferman-Stein inequality. And in the sixth line, the operator S b 1 is the one-parameter b 1 -adapted square function, defined as S b 1 f = ( I |∆ b 1 I f | 2 ) 1/2 . It is straightforward to see that S b 1 is an L 2 isometry up to some constant, which implies the seventh line in the above, where f y (x) denotes f (x, y). Hence, the L 2 boundedness of the mixed paraproduct is fully justified.

MAIN THEOREM AND THE STRATEGY
We return to the main theorem of this paper. We will prove that, under the assumptions stated in Section 2, T is bounded on L 2 (R n × R m ) with the operator norm depending only on the constants appearing in the above weak assumptions. By density and boundedness of b, b ′ , it suffices to show that for any C ∞ 0 functions f, g, there is a universal constant C such that To prove this, recall that Martikainen [16] gave an averaging formula for the bilinear form T f, g using a probabilistic concept called "goodness" of cubes.
Here, if we decompose f using the new defined b-adapted martingale difference instead, there is a natural generalization of the averaging formula as follows.
To understand the above formula, recall that in [9], a cube I ∈ D n ω n is called bad if there existsĨ ∈ D n ω n so that ℓ(Ĩ) ≥ 2 r ℓ(I) and d(I, ∂Ĩ) ≤ 2ℓ(I) γn ℓ(Ĩ) 1−γn . γ n = δ/(2n + 2δ), where δ > 0 appears in the kernel estimates. And π n good := P ω n (I ∔ ω n is good) is independent of I ∈ D n 0 . By lemma 2.3 in [9], the parameter r can be chosen large enough such that π n good > 0. Moreover, for a fixed I ∈ D n 0 the position of I ∔ ω n depends on ω n i with 2 −i < ℓ(I), while the goodness of I ∔ ω n depends on ω n i with 2 −i ≥ ℓ(I). Hence, they are independent. The proof of Proposition 4.1 is identical to the proof of Proposition 2.1 in [16], which we omit here.
Note that as in [9] and [16], we do need to justify that the sum on the right hand side converges to the left hand side, which is the only place throughout the paper where we use the a priori L 2 → L 2 boundedness of T . Indeed, by the convergence of expectation operators in L 2 , the boundedness of T will easily imply the convergences in the formula. However, when dealing with specific operators in practice, sometimes we can prove the convergence of the formula without assuming the boundedness assumption.
For example, if T is canonically associated with a standard antisymmetric kernel K(x, y), in the sense that and K satisfies all the size and Hölder conditions.
Then for any is well defined. Hence, we automatically have the full and partial kernel representations. Also, by antisymmetry, which corresponds to the weak boundedness property for b = b ′ = 1. With these observations in mind, it is not hard to show that for any f, g ∈ C ∞ 0 and any fixed dyadic grid, So the a priori boundedness of T is not necessary any more. With the averaging formula, it suffices to bound the sum on the right hand side uniformly for any fixed random grids, to do which, we will divide the sum into different parts according to the relative positions of the cubes, and discuss different cases one by one. By symmetry, except for one mixed case (ℓ(I 1 ) ≤ ℓ(I 2 ), ℓ(J 1 ) > ℓ(J 2 )), all the other cases are symmetric to (ℓ(I 1 ) ≤ ℓ(I 2 ), ℓ(J 1 ) ≤ ℓ(J 2 )), which we will start with.
In preparation, we state two control lemma here which will be repeatedly used when we deal with different cases in the following. For simplicity of notation, write (i 1 ,i 2 ) where K ∈ D n and i 1 , i 2 ∈ N.

Lemma. (Full control lemma) For fixed
Proof. It follows as a consequence of Hölder inequality.
In the last step above, we used the L 2 property of b-adapted double martingale difference.

Lemma. (Partial control lemma) For fixed
These two inequalities are symmetric, and they can both be derived using a similar technique as for the above lemma. The only difference here is that we need to use the L 2 property of the b-adapted martingale difference of only one variable instead.
Before we move on to the main part of the proof of the theorem, i.e. the case by case estimate of summands in the averaging formula, let's look at an example to see how our theory fits into some known results of boundedness of bi-parameter singular integral operators.
Consider operators associated with antisymmetric standard kernels. Journé, in [14], proved that if K = LÃ, the bicommutator of Calderón-Coifman type, where L is any standard antisymmetric function, and for some A : R n × R m → C such that ∂ 2 12 A ∈ L ∞ , then, the L 2 → L 2 boundedness of the operator associated to L implies T 1 ∈ BMO, as well as the other BMO conditions. It is also not hard to verify directly that T satisfies the weak boundedness property and the four diagonal BMO assumptions. (All of them are actually zero!). Hence, by our main theorem, T is bounded on L 2 with operator norm controlled by the weak assumptions.
Hence, we can write The main goal of this section is to show that the following inequality holds.

Proposition.
If this is true, then by the full control lemma we stated in the beginning, σ out / out can be bounded by f L 2 g L 2 .

SEPARATED/INSIDE: σ out / in
Since J 1 J 2 , J 1 is contained in some child of J 2 , which we denote by J 2,1 .
J denotes the b 2 -adapted average of f over J with respect to the second variable: Write Part σ ′ out / in . In order to bound σ ′ out / in by f L 2 g L 2 , by the full control lemma, it suffices to prove the following. 6.1. Proposition.
Proof. Case 1: ℓ(J 1 ) < 2 −r ℓ(J 2 ). The two functions in the pairing are separated in both variables, which enables us to use the full kernel representation: Since in this case, the size of J 1 is "significantly" small compared with J 2 , by the goodness of J 1 , dist(J 1 , J c 2,1 ) ≥ 2ℓ(J 1 ) γm ℓ(J 2,1 ) 1−γm ≥ ℓ(J 1 ) γm ℓ(J 2 ) 1−γm , which implies good separation on both variables. Hence, using the cancellation property in y variable, we can change the kernel K(x, y) in the above to K(x, y) − K(x, y 1 , c J 1 ) − K(x, c I 1 , y 2 ) + K(x, c I 1 , c J 1 ).

By Hölder condition and a similar computation as in the Separated/Separated case,
where in the third line, J 2,j denotes all the children of J 2 except J 2,1 , and we used the fact that ∆ J 2 g is constant with respect to x 2 on each child of J 2 . And the fourth line follows from the estimate of those averages of ∆ Let's further split I into two parts: In I ′′ , we still have good separation on both variables, so following from exact the same computation in Case Separated/Separated and the fact that now the size of J 1 , J 2 are comparable, Hence, the only thing left to deal with is I ′ . Since now the separation in the second variable is not good enough, we have to use the mixed Hölder-size condition instead. Again, in the full kernel representation, by cancellation property we can change the kernel to K(x, y) − K(x, c I 1 , y 2 ), then In the above, the fifth line is because ∆ b 1 I 1 ∆ b 2 J 1 f is a constant on each child of I 1 ×J 1 , and the last line follows from the fact that the size of J 1 , J 2 are comparable. This completes the proof of the proposition.

Part σ ′′
out / in . For the part σ ′′ out / in , we are going to rewrite it into a form containing a partial b-adapted paraproduct. Rewrite I 2 g are constant with respect to x 1 on each child of I 1 , I 2 , respectively. If we decompose the above pairing into parts that are restricted on children of I 1 , I 2 , then where h I 1,t ,I 2,k (x 2 ) = (∆ b 1 I 1 T * (χ I 2,k b ′ 1 ⊗b ′ 2 ))| I 1,t , and the following lemma guarantees that the partial paraproduct is well defined. 6.2. Lemma. h I 1,t ,I 2,k is in BMO(R m ), and satisfies h I 1,t ,I 2,k BM O 2 −i 1 δ/2 |K| −1 |I 2 |.
We will assume the lemma to be true for the moment and prove it at the end of this section. The above pairing can be further rewritten as: Then, dist(I 1 ,I 2 )>ℓ(I 1 ) γn ℓ(I 2 ) 1−γn dist(I 1 ,I 2 )>ℓ(I 1 ) γn ℓ(I 2 ) 1−γn We claim that for any t, k, (6.3) To see this, first observe that since b ′ is pseudo-accretive, for any L 2 function h, hgb ′ .

And we have
Hence by linearity, LHS of (6.3) is comparable to Since |I 2 | 2 ) 1/2 = |I 2 | −1/2 , and by Lemma 6.2, h I 1,t ,I 2,k BM O 2 −i 1 δ/2 |K| −1 |I 2 |, the RHS of the above inequality where the last step follows from the first partial control lemma we stated in the beginning. Then, to complete this section, we give a proof of Lemma 6.2.
For (2), since the two functions in the pairing have good separation on both variables, and a = ∆ b 1 * I 1 ( χ I 1,t |I 1,t | ) = 0, use full kernel representation and change the kernel to Then, by Hölder condition, For (1), there is good separation on only one variable, so we need to use the partial kernel representation.
(1) = In the last step of the above, we used the partial C-Z assumption that C(b −1 2 a, χ 3V ) |V |.

SEPARATED/EQUAL: σ out /=
In this part, By the full control lemma, it suffices to prove the following proposition.
For (2), the partial kernel representation gives For (1), the full kernel representation and the mixed Hölder-size condition give which completes the proof.

SEPARATED/NEARBY: σ out / near
In this part, we still want to use the full control lemma to bound the pairing. Notice that since J 1 , J 2 are near, from a simple lemma proved by Hytönen in [9], the cube V = J 1 ∨ J 2 satisfies ℓ(V ) ≤ 2 r ℓ(J 1 ), hence the size of J 1 , J 2 and V are comparable. Since To see this, since now both variables are separated but only the first separation is good, by the full kernel representation and the mixed Hölder-size condition, where the last step follows from the fact that the size of J 1 , J 2 and V are comparable.

INSIDE/INSIDE: σ in / in
This part is comparably difficult to deal with, and is also the first place where the assumed BMO conditions stated in the beginning come into play. We will also see that the boundedness of full paraproducts will play an important role in our estimates. To begin with, we first do the following decomposition. Let I 1 ⊂ I 2,1 ∈ ch (I 2 ), J 1 ⊂ J 2,1 ∈ ch (J 2 ), then Part II,III. These two parts are symmetric, so it suffices to estimate one of them, say part III. This can be similarly dealt with as the second part in section Separated/Inside, where we used partial paraproducts.
)| I 1,t . Note that although formally, s I 1,t ,I 2,k is exactly the h I 1,t ,I 2,k we've encountered in section Separated/Inside, but here since the relative position of I 1 , I 2 has changed, they are actually different functions. And we will prove later that although s I 1,t ,I 2,k is still in BMO(R m ), the estimate of its norm is different from h I 1,t ,I 2,k . More specifically, 9.1. Lemma.
Let's assume this to be true right now. Then Note that part (1) is exactly the same as the pairing appeared in σ ′′ out / in , except that here the partial paraproduct is defined using a different BMO function. Hence, following exactly the same argument, for any t, k, we have where again, in the last step, we used the first partial control lemma.
Similarly, although in part (2), the form of the pairing is a little bit different, however, when dealing with ∆ b ′ 1 * I 2 ( χ I 2,1 |I 2,1 | ), we only need to bound it by and since the norm of the BMO function has the same bound, so all the rest of the argument for part (1) still works here. i.e. This part satisfies the same estimate as part (1) does.
In conclusion, And we are only left to prove Lemma 9.1: Proof. (of Lemma 9.1) We only prove the inequality for s I 1,t ,I 2,k , since the other one follows from exactly the same argument. Let cube V ⊂ R m , a is any function supported on V such that |a| ≤ 1, a = 0. It suffices to show s I 1,t ,I 2,k , a 2 2 −i 1 δ/2 |V |.
By the partial kernel representation and size condition for the partial kernel, By the partial kernel representation and Hölder condition for the partial kernel, By the full kernel representation and mixed Hölder-size condition, By the full kernel representation and Hölder condition, Hence, the proof is complete.
Part I. In part I, since the functions in the pairing are separated on both variables, by an argument similar to what we did in the section Separated/Inside, which combined with the full control lemma, will give the boundedness of part I. (Note that in order to prove the above inequality, we need to discuss four different cases depending on whether ℓ(I 1 ) < 2 −r ℓ(I 2 ) and whether ℓ(J 1 ) < 2 −r ℓ(J 2 ), and use size, Hölder, or mixed Hölder-size conditions accordingly in each case.) To bound σ ′ in /= . In the case ℓ(I 1 ) < 2 −r ℓ(I 2 ), it can be dealt with similarly as in the case Separated/Equal. In the case 2 −r ℓ(I 2 ) ≤ ℓ(I 1 ) < ℓ(I 2 ), we claim that then the full control lemma implies the correct bound.
In part (1) and (2), both variables are separated, so we use the full kernel representation. And by the size condition and the mixed Hölder-size condition, respectively, they are bounded. In part (3) and (4), only the first variable is separated, so we need the partial kernel representation. By the size condition and Hölder condition for the partial kernel, respectively, they are bounded as well. We omit the details. Now we deal with σ ′′ in /= , which needs the partial paraproduct argument, but is much easier than the cases we've seen before. As before, rewrite )| Vt is a BMO function whose norm satisfies the following lemma.

Lemma.
r Vt,V k BM O(R n ) C.
We postpone the proof, and assume this bound for the moment. Then By a similar argument as in the previous two partial paraproducts, involving the estimate of the BMO norm of r Vt,V k and the L 2 boundedness of the partial paraproduct, it is not hard to show that for any t, k, which completes the estimate of part σ ′′ in /= . Proof. (of Lemma 10.1) For any cube K ⊂ R n and any function a supported on K such that |a| ≤ 1, a = 0, we claim that r Vt,V k , a 1 |K|.
If s = k, use partial kernel representation and size condition for the partial kernel, If s = k, by the first diagonal BMO condition, For part (2) and (3), write and similarly for (3). If s = k, since both variables are separated, we can use full kernel representation, and mixed Hölder-size condition for (2), size condition for (3). If s = k, we use partial kernel representation, and Hölder condition for (2), size condition for (3). The details can be carried out similarly as for (1), and we omit them.
11. EQUAL/EQUAL, EQUAL/NEARBY AND NEARBY/NEARBY: σ =/= We discuss these three cases together. When J 1 , J 2 are near each other, the sizes of J 1 , J 2 , J 1 ∨ J 2 are comparable, similarly for the other variable. So by the full control lemma, in either of these three cases, it suffices to show We only prove the above for the case Equal/Equal, which is the most difficult one since there is no separation on either variable. Note that for Equal/Nearby, one can use partial kernel representation and size condition to prove it, and for Nearby/Nearby, the full kernel representation and size condition will do.
Write I 1 = I 2 = K, J 1 = J 2 = V , and decompose the pairing into restrictions on each pair of their children, and T 1 (d ′ ) BM O < ∞ is one of our BMO assumptions. This completes the estimate of the mixed cases.