Multiple ergodic averages for three polynomials and applications

We find the smallest characteristic factor and a limit formula for the multiple ergodic averages associated to any family of three polynomials and polynomial families of the form $\{l_1p,l_2p,...,l_kp\}$. We then derive several multiple recurrence results and combinatorial implications, including an answer to a question of Brown, Graham, and Landman, and a generalization of the Polynomial Szemer\'edi Theorem of Bergelson and Leibman for families of three polynomials with not necessarily zero constant term. We also simplify and generalize a recent result of Bergelson, Host, and Kra, showing that for all $\epsilon>0$ and every subset of the integers $\Lambda$ the set $$ \big\{n\in\N\colon d^*\big(\Lambda\cap (\Lambda+p_1(n))\cap (\Lambda+p_2(n))\cap (\Lambda+ p_3(n))\big)>(d^*(\Lambda))^4-\epsilon\big\} $$ has bounded gaps for"most"choices of integer polynomials $p_1,p_2,p_3$.


Introduction and main results
1.1. Background. A far reaching generalization of the theorem of Szemerédi [32] on arithmetic progressions states that every subset of the integers with positive upper Banach density 1 contains infinitely many configurations of the form {x, x + p 1 (n), . . . , x + p k (n)}, where p 1 , . . . , p k is any collection of integer polynomials (meaning they have integer coefficients) with zero constant term. This was proved by Bergelson and Leibman [7] using a Correspondence Principle of Furstenberg and the following result in ergodic theory: [7]). Let (X, X , µ, T ) be an invertible measure preserving system and let p 1 , . . . , p k be integer polynomials with p i (0) = 0 for i = 1, . . . , k. If A ∈ B with µ(A) > 0, then (1) lim inf A key step in establishing multiple recurrence properties like the one above is to analyze the limiting behavior of some closely related multiple ergodic averages. For the previous result the relevant ones are Bergelson and Leibman studied these averages in [7], in a depth that was sufficient for proving (1). Obtaining a better understanding of their limiting behavior (as N −M → ∞) in L 2 (µ) has been a driving force of research in ergodic theory during the last two decades. The basic approach for studying them goes back to the original paper of Furstenberg [15]. Using modern terminology, it consists of finding an appropriate factor C of a given system, called characteristic factor, such that the L 2 -limit of the averages in question remains unchanged when each function is replaced by its projection on this factor. Equivalently, this means that the averages (P ) converge to 0 in L 2 (µ) as N − M → ∞ whenever E(f i |C) = 0 for some i = 1, . . . , k, where E(f |C) is the conditional expectation of f given C. The next step is to obtain a concrete description for some well chosen characteristic factor that is going to facilitate our study. Using methods from [19], this was done in [20] for weak convergence, and in [25] for strong convergence of the averages (P ). [20]-Leibman [25]). Let p 1 , . . . , p k be a family of essentially distinct (meaning, p i and p i − p j = const for i = j) integer polynomials. Then there exists a d = d(p 1 , . . . , p k ) ∈ N with the following property: For every invertible ergodic system some characteristic factor for the averages (P ) is an inverse limit of d-step nilsystems (defined in Section 2).

Theorem 1.2 (Host & Kra
This result opens up the road for a better understanding of the limiting behavior of the averages (P ), and in fact combined with a recent result of Leibman [22] immediately implies that they converge in L 2 (µ). But we are still left with some interesting problems since computing the smallest characteristic factor and the actual limit in the case of a nilsystem is still a difficult task. For example, it is not even clear from the results in [20] and [25] whether the minimal d(p 1 , p 2 ) is bounded when the polynomials p 1 , p 2 vary, and what the limit of the averages (P ) is for k = 2. Formulas for the limit are known when all the polynomials are linear (see [34]) or linearly independent (see [13]). Also, very recently some other cases where covered in [27].
In this paper we are going to find the smallest characteristic factor and limit formulas for the averages (P ) for any family of three polynomials and for polynomial families of the form {l 1 p, l 2 p, . . . , l k p}. We will then use these results to derive several combinatorial implications.
1.2. Results in ergodic theory. Given a measure preserving system and a family of integer polynomials P = {p 1 , . . . , p k } we say that a factor C is the smallest characteristic factor for P , if it is a characteristic factor for the averages (P ) and it is a factor of every other such characteristic factor. We will completely determine the structure of the smallest characteristic factor for any family of three polynomials and the family {l 1 p, l 2 p, . . . , l k p}. The reader who is not familiar with the notions we use in ergodic theory may wish to consult Section 2.1 first.
We first deal with the polynomial family {l 1 p, l 2 p, . . . , l k p}: Theorem A. Let (X, X , µ, T ) be an invertible ergodic system, p be a nonconstant integer polynomial, and l 1 , . . . , l k nonzero distinct integers. If k ≥ 2 then the (k−1)-step nilfactor Z k−1 is the smallest characteristic factor for the multiple ergodic averages 1 N − M N −1 n=M T l 1 p(n) f 1 · T l 2 p(n) f 2 · . . . · T l k p(n) f k . 2 Moreover, if the system is totally ergodic then the L 2 -limit as N − M → ∞ does not depend on the choice of the polynomial p and can be computed explicitly.
We will use this result to answer a question of Brown, Graham and Landman [9] (see Theorem D), and to deal with characteristic factors for families of three polynomials (see Theorem B). The proof of Theorem A is based on Proposition 2.7 which enables us to compare the family {l 1 p, l 2 p, . . . , l k p} with the family {l 1 n, l 2 n, . . . , l k n}.
Before we deal with families of three polynomials we take a moment to define three classes of polynomial families that will help us expedite the discussion: Definition 1. 3. We say that the family {p 1 , p 2 , p 3 } of essentially distinct integer polynomials is of type (e 1 ), (e 2 ), (e 3 ), if some permutation of the polynomials {p 1 ,p 2 ,p 3 }, wherep i = p i − p i (0), i = 1, 2, 3, has the form {lp, mp, rp}, {lp, mp, kp 2 + rp}, {kp 2 + lp, kp 2 + mp, kp 2 + rp} correspondingly, for some integer polynomial p and constants k, l, m, r ∈ Z with k = 0.
Theorem B. Let (X, B, µ, T ) be an invertible ergodic system and {p 1 , p 2 , p 3 } be a family of essentially distinct integer polynomials. Consider the multiple ergodic averages Then the following three mutually exclusive cases describe the smallest 3 characteristic factor for the averages (2) : (i) It is the rational Kronecker factor K rat if the polynomialsp 1 ,p 2 ,p 3 are linearly independent. (ii) It is the 2-step nilfactor Z 2 if the polynomials are of type (e 1 ), and the 2-step affine factor A 2 if the polynomials are of type (e 2 ), (e 3 ). (iii) It is the Kronecker factor K in all other cases. Furthermore, if the system is totally ergodic we can give explicit formulas for the L 2 -limit of the averages (2) as N − M → ∞.
The following examples illustrate the different limiting behaviors the averages (2) may exhibit: (a) If P = {n, n 2 , n 3 } then the rational Kronecker factor K rat is characteristic. In the totally ergodic case the limit is the product of the integrals of the three functions.
(b) If P = {n, n 2 , n 2 + n} then the Kronecker factor K is characteristic. In the totally ergodic case the limit is the same as in the case of the double averages (averaging over m, n) associated to the family {m, n, m + n}.
(c) If P = {n, 2n, n 3 } then the Kronecker factor K is characteristic. In the totally ergodic case the limit is the product of the limit of the ergodic averages corresponding to the family {n, 2n} and the integral of the third function.
(d) If P = {n, 2n, n 2 } then the two step affine factor A 2 is characteristic. This is the first example that we know of a polynomial family with smallest characteristic factor (for totally ergodic systems) not of the form Z m for some nonnegative integer m. In the totally ergodic case the limit can be computed explicitly and unlike the case {n, 2n, n 3 } it depends nontrivially on the third function.
(e) If P = {n, 2n, 3n} or P = {n 2 , 2n 2 , 3n 2 } then the 2-step nilfactor Z 2 is characteristic. In the totally ergodic case the limit is the same in both cases and can be computed explicitly.
The proof of Theorem B is rather complicated so let us briefly explain the main ideas. Our first step is to use Theorem 1.2 in order to show that it suffices to restrict our study to totally ergodic nilsystems (Proposition 4.1). At this point we are left with establishing various uniform distribution properties on nilmanifolds. Our main tool is a "reduction to affine argument" which consists of the following two steps: (i) Reduce the uniform distribution problem to a simpler one that involves only nilpotent affine transformations on finite dimensional tori. This reduction is done using Theorems 2.5 and 2.6 but varies in difficulty depending on the problem. (ii) Verify the simplified (but often challenging) problem "by hand" for affine transformations. Here is a more detailed sketch of how this plan is executed to deal with the various parts of Theorem B: Part (i) deals with linearly independent polynomial families, a case that has been already worked out in [13] and [14] using a "reduction to affine argument". Part (ii) deals with families of type (e 1 ), (e 2 ), and (e 3 ). A typical family of type (e 1 ) is P = {n 2 , 2n 2 , 3n 2 }. It follows from Theorem A (which is again proved using a "reduction to affine argument") that P ∼ {n, 2n, 3n}. 4 This completes our task since it is known ( [10], [11]) that for this family the factor Z 2 is characteristic. A typical family of type (e 2 ) is P = {n, 2n, n 2 }. To deal with this case, we first use Van der Corput Lemma and the fact that for families of the form {n, n 2 , n 2 + kn} (k = 0) the Kronecker factor K is characteristic (Lemma 4.2 and 4.3), to show that if E(f 3 |K) = 0 then the averages (2) converge to zero in L 2 . This fact greatly simplifies the analysis, and we are led to consider averages corresponding to the family {n, 2n} for a transformation S = T × R where R is a 2-step affine transformation on T 2 . From this we deduce using a result from [12] that the factor A 2 is characteristic. To complete the study of families of type (e 2 ) we need also to show that {lp, mp, kp 2 + rp} ∼ {ln, mn, kn 2 + rn} when the polynomial p is nonconstant. To do this we use Proposition 2.7 (again proved using a"reduction to affine argument") which roughly speaking tells us that if p(n) is a nonconstant integer polynomial, then the substitution n → p(n) does not change the distribution of any polynomial sequence that has connected orbit closure. Families of type (e 3 ) are easily reduced to families of type (e 2 ), thus completing the study of part (ii). Finally, to deal with part (iii), the crucial step is Proposition 3.7. We show there that the polynomial families that were not covered by part (i) and (ii) have Weyl complexity 2 (defined in Section 3). This fact, combined with Lemmas 4.2 and 4.3, allows us to conclude that in this case the Kronecker factor K is characteristic.
The following is an immediate corollary of Theorem B: Corollary. For any two essentially distinct polynomials and every invertible ergodic system, the Kronecker factor K is characteristic for the corresponding averages (P ), and for any three essentially distinct polynomials the 2-step nilfactor Z 2 is characteristic.
It seems plausible that for k ≥ 2 the (k − 1)-step nilfactor Z k−1 is characteristic for any family of k essentially distinct polynomials. Moreover, one would expect that for k ≥ 2 the smallest m for which the factor Z m−1 is characteristic for a family P of essentially distinct integer polynomials is W (P ) (defined in Section 3). It is an immediate consequence of Theorem B and Proposition 3.7 that both statements hold for k = 2, 3.
Next we establish a multiple recurrence result that generalizes a result of Bergelson, Host, and Kra [6]: Theorem C. Let (X, X , µ, T ) be an invertible ergodic system, A ∈ X with µ(A) > 0, and {p 1 , p 2 , p 3 } be integer polynomials with p i (0) = 0, for i = 1, 2, 3. Then for every has bounded gaps. Moreover, the set has bounded gaps, unless the polynomials are essentially distinct and of type (e 1 ) with l < m < r and r = l + m, or of type (e 2 ), (e 3 ).
This result was established in [6] for the polynomial families {n, 2n} and {n, 2n, 3n}. Moreover, it was shown that an analogous result fails for the family {n, 2n, 3n, 4n}, in fact no fixed power of µ(A) works as a lower bound. To prove Theorem C we use Theorem A and parts (i), (iii) of Theorem B. We remark that even for the two cases covered in [6] our argument is different and much simpler (1 and 2 pages long correspondingly). The crucial observation is that although we cannot get good lower bounds for the averages corresponding to the families {n, 2n} and {n, 2n, 3n} if we average over the full set of positive integers, we can get optimal lower bounds as long as the average is taken over an appropriately chosen subset of the integers (that depends on the system given). 5 This observation greatly simplifies the whole analysis, as we do not have to rely on the rather complicated nilsequence decompositions used in [6].
For the exceptional polynomial families of Theorem C we believe that the analogous result fails and we provide conditional counterexamples in Section 5.5.
1.3. Results in combinatorics. We are going to utilize the previous results in ergodic theory to derive several implications in combinatorics. We mention them in increasing degree of difficulty.
We start with an answer to a question of Brown, Graham, and Landman. In [9] the authors define a set S ⊂ Z to be large if every finite coloring of the positive integers contains arbitrarily long monochromatic arithmetic progressions with common difference a nonzero integer in S. It follows from Theorem 1.1 that if p is an integer polynomial with p(0) = 0 then the set S p = {p(n) : n ∈ N} is large. If we do not assume that p(0) = 0 an obvious necessary condition for the set S p to be large is that it contains multiples of every positive integer. The authors of [9] asked whether this condition is also sufficient and in particular whether the range of the polynomial p(n) = (n 2 − 13)(n 2 − 17)(n 2 − 221) is large; this is an example of a polynomial with no linear integer factors whose range does contain multiples of every positive integer 6 (this can be easily verified using properties of the Legendre symbol). We will give a positive answer to these questions, in fact we will verify a stronger "density" statement. We say that S ⊂ Z is a set of multiple recurrence if every subset of the integers with positive density contains arbitrarily long arithmetic progressions with common difference a nonzero integer in S. We show: Theorem D. Let p be an integer polynomial. Then S p = {p(n) : n ∈ N} is a set of multiple recurrence if and only if it contains multiples of every positive integer.
To prove this result we use Theorem A and Furstenberg's Multiple Recurrence Theorem [15]. Polynomials that satisfy the conditions of Theorem D have been studied in [4]. It is shown there that p(n) ≡ 0 (mod m) is solvable for every m ∈ N if and only if it is solvable for a finite set of m ∈ N explicitly depending on p.
Our next application is to construct a set S that has bad recurrence properties but its set of squares S 2 is a set of multiple recurrence. Note that if S is a set multiple recurrence it is not known whether its set of squares S 2 is always a set of multiple recurrence (the chromatic version of this question was conjectured to be true in [9]).
Theorem E. There exists a set S ⊂ N that is not a set of multiple (in fact not even single) recurrence but p(S) = {p(s), s ∈ S} is a set of multiple recurrence for every integer polynomial p with degree greater than 1.
Our example is explicit, in fact we show that the set S = n ∈ N : {n √ 2} ∈ [1/4, 3/4] works. To prove this we rely on Lemma 2.8.
Our next application deals with an extension of Theorem 1.1 to families of polynomials with not necessarily zero constant term. We say that the family of integer polynomials {p 1 , . . . , p k } is universal if every subset of the integers with positive density contains infinitely many configurations of the form {x, x + p 1 (n), . . . , x + p k (n)}, where x, n ∈ N. From Theorem 1.1 we know that every family of integer polynomials with zero constant term is universal. We show: Theorem F. The family of integer polynomials {p 1 , p 2 , p 3 } is universal if and only if the congruence p 1 (n) ≡ p 2 (n) ≡ p 3 (n) ≡ 0 (mod m) has a solution for every m ∈ N.
To prove this result we make essential use of Theorem B, so we are currently unable to extend it to deal with families of k polynomials for k ≥ 4.
Finally, using a modification of the Correspondence Principle of Furstenberg, due to Lesigne (see Section 5), we give the following combinatorial implication of Theorem C: 6 As shown in [4], the smallest possible degree of a polynomial having this property is 5, an example is p(n) = (n 3 − 19)(n 2 + n + 1). 7 Theorem C'. Let Λ ⊂ N and p 1 , p 2 , p 3 be integer polynomials with p i (0) = 0 for i = 1, 2, 3. Then for every ε > 0 the set has bounded gaps, and the set has bounded gaps, unless the polynomials are essentially distinct and of type (e 1 ) with l < m < r and r = l + m, or of type (e 2 ), (e 3 ).
Examples of random sets show that the lower bounds given are tight. The same result was established in [6] in the special case of the polynomial families {n, 2n} and {n, 2n, 3n}. In the case of the family {n, 2n} a related finite version of this result was established by Green [18]. Some other examples of eligible 3-term polynomial families are the following: {n, 3n, 4n}, {n k , 2n k , 3n k } for all k ∈ N, {n, n 2 , an 2 + bn} with a = 0, and {n, 2n, n k } for all k ≥ 3. It was shown in [6] that similar lower bounds fail for the polynomial family {n, 2n, 3n, 4n}. In contrast to this, similar lower bounds hold for any family of k linearly independent polynomials with zero constant term (see [14]).
As was the case with the corresponding result in ergodic theory, for the exceptional polynomial families of Theorem C' we believe that the analogous result fails and we provide conditional counterexamples in Section 5.5. Notation: The following notation will be used throughout the article: , UD-lim(a n ) = 0 if for every ε > 0 we have d * ({n : |a n | > ε}) = 0.
Acknowledgements. The author would like to thank B. Kra for helpful discussions during the preparation of this article, M. Johnson for helpful remarks, and S. Leibman for providing the simple proof of Proposition 2.7.
2. Background in ergodic theory and nilsystems 2.1. Ergodic theory background and notation. Background information we assume in this article can be found in the books [16], [30], [33]. By a measure preserving system (or just system) we mean a quadruple (X, X , µ, T ), where (X, X , µ) is a probability space and T : X → X is a measurable map such that µ(T −1 A) = µ(A) for all A ∈ X . Without loss of generality we can assume that the probability space is Lebesgue. A factor of a system can be defined in any of the following three ways: it is a T -invariant sub-σ-algebra D of X , it is a T -invariant sub-algebra F of L ∞ (X), or it is a system (Y, Y, ν, S) and a measurable map π : X ′ → Y ′ , where X ′ is a T -invariant set and Y ′ is an S-invariant set of full measure, such that µ • π −1 = ν and S • π(x) = π • T (x) for x ∈ X ′ . . In a slight abuse of terminology, when any of these conditions holds, we say that Y (or the appropriate σ-algebra of X ) is a factor of X and call π : X ′ → Y ′ the factor map. If the factor map π : X ′ → Y ′ can be chosen to be injective, then we say that the systems (X, X , µ, T ) and (Y, Y, ν, S) are isomorphic (bijective maps on Lebesgue spaces have measurable inverses).
If Y is a T -invariant sub-σ-algebra of X and f ∈ L 2 (µ), we define the conditional expectation E(f |Y) of f with respect to Y to be the orthogonal projection of f onto L 2 (Y). We frequently use the identities For each r ∈ N, we define K r to be the factor induced by the algebra We define K rat to be the factor induced by the algebra generated by the functions The Kronecker factor K is induced by the algebra spanned by the bounded eigenfunctions of T , i.e. functions that satisfy T f = e(a) · f for some a ∈ R. We also define higher order eigenfunctions and their corresponding factors. Let E 0 denote the set of eigenvalues of T and for k ∈ N we define inductively We call the factor spanned by E k the k-step affine factor of the system, and denote it by A k . The reason for this notation is that for totally ergodic systems the factor system induced by A k is isomorphic to a nilpotent k-step affine transformation on some connected compact abelian group (this is a result of Abramov [1]), and A k is the largest factor with this property.
The transformation T is ergodic if K 1 consists only of constant functions, and T is totally ergodic if K rat consists only of constant functions. Every system (X, X , µ, T ) has an ergodic decomposition, meaning that we can write µ = µ t dλ(t), where λ is a probability measure on [0, 1] and µ t are T -invariant probability measures on (X, X ) such that the systems (X, X , µ t , T ) are ergodic for t ∈ [0, 1]. We sometimes denote the ergodic components by (T t ) t∈[0,1] .
We say that the system (X, X , µ, T ) is an inverse limit of a sequence of factors (X, X j , µ, T ) if {X j } i∈N is an increasing sequence of T -invariant sub-σ-algebras such that j∈N X j = X up to sets of measure zero. Following [19], for every system (X, X , µ, T ) and function f ∈ L ∞ (µ), we define inductively the seminorms |||f ||| k as follows: For k = 1 we set |||f ||| 1 = |E(f |I)|, 7 where I is 7 In [19] the authors work with ergodic systems, in which case |||f ||| 1 = f dµ, but the whole discussion can be carried out for nonergodic systems as well without extra difficulties.
the σ-algebra of T -invariant sets. For k ≥ 2 we set It was shown in [19] that for every integer k ≥ 1, ||| · ||| k is a seminorm on L ∞ (µ) and it defines factors Z k−1 in the following manner: the T -invariant sub-σ-algebra Z k−1 is characterized by We remark that if (T t ) t∈[0,1] are the ergodic components of the system then E(f |Z k (T )) = 0 if and only if E(f |Z k (T t )) = 0 for a.e. t ∈ [0, 1]. For ergodic systems the factor Z 0 is trivial, Z 1 = A 1 = K, and A k ⊂ Z k (the inclusion is in general proper for k ≥ 2). The factors Z k are of particular interest since they are characteristic for L 2 -convergence of ergodic averages (P ). Moreover, in [19] it was shown that the factor Z k is an inverse limit of k-step nilsystems which brings us to our next topic of discussion.

Nilsystems, definition and examples.
Fundamental properties of nilsystems, related to our discussion, were studied in [2], [29], [28], [23], and [24]. Below we summarize some facts that we shall use, all the proofs can be found in [22]. Given a topological group G, we denote the identity element by e and we let G 0 denote the connected component of e. If A, B ⊂ G, then [A, B] is defined to be the subgroup We define the commutator subgroups recursively by G 1 = G and G k+1 = [G, G k ]. A group G is said to be k-step nilpotent if its (k + 1) commutator G k+1 is trivial. If G is a k-step nilpotent Lie group and Γ is a discrete cocompact subgroup, then the compact space X = G/Γ is said to be a k-step nilmanifold. The group G acts on G/Γ by left translation and the translation by a fixed element a ∈ G is given by T a (gΓ) = (ag)Γ. Let m denote the unique probability measure on X that is invariant under the action of G by left translations (called the Haar measure) and let G/Γ denote the Borel σ-algebra of G/Γ. Fixing an element a ∈ G, we call the system (G/Γ, G/Γ, m, T a ) a k-step nilsystem and call the map T a a nilrotation.
If H is a closed subgroup of G then Y = (HΓ)/Γ ≃ H/(H ∩ Γ) may not be compact in general (take X = R/Z and H = {t √ 2 : t ∈ R}), but if Hx is closed in X for some x ∈ X, then it can be shown that Y is compact and the set Hx can be given the structure of a nilmanifold. In particular if x = gΓ for some g ∈ G we have Hx ≃ H/∆ where ∆ = H ∩ gΓg −1 . We call any such set a sub-nilmanifold of X.
Examples of nilsystems are rotations on compact abelian Lie groups, and more generally, every nilpotent affine transformation on a compact abelian Lie group is isomorphic to a nilsystem (see Example 1). But these examples do not cover all the possible nilsystems (see Example 2). Example 1. On the space G = Z × R 2 , define multiplication as follows: if g 1 = (m 1 , x 1 , x 2 ) and g 2 = (n 1 , y 1 , y 2 ), let g 1 · g 2 = (m 1 + n 1 , x 1 + y 1 , x 2 + y 2 + m 1 y 1 ).
Then G is a 2-step nilpotent group and the discrete subgroup Γ = Z 3 is cocompact. If a = (m 1 , a 1 , a 2 ), it turns out that T a is isomorphic to the a nilpotent affine transformation S : T 2 → T 2 given by Example 2. On the space G = R 3 , define multiplication as follows: if g 1 = (x 1 , x 2 , x 3 ) and g 2 = (y 1 , y 2 , y 3 ), let Then G is a 2-step nilpotent group and the discrete subgroup Γ = Z 3 is cocompact. Let a = (a 1 , a 2 , 0), where a 1 , a 2 ∈ [0, 1) are linearly independent. It turns out that T a is isomorphic to a skew product transformation S : T 3 → T 3 that has the form It can be shown that S (or T a ) is not isomorphic to a nilpotent affine transformation on some finite dimensional torus.
Let (X = G/Γ, G/Γ, m, T a ) be an ergodic nilsystem. The subgroup < G 0 , a > projects to an open subgroup of X that is invariant under a. By ergodicity this projection equals X. Hence, X =< G 0 , a > /Γ ′ where Γ ′ = Γ∩ < G 0 , a >. Using this representation of X for ergodic nilsystems we have that (4) G is generated by the connected component of the identity element and a.
¿From now on when we work with an ergodic nilsystem we will freely assume that hypothesis (4) is satisfied. We remark that under this hypothesis it was shown in [23] that for every integer k ≥ 2 the group G k is connected. We will make frequent use of the following simple facts: be an ergodic nilsystem. Then (i) The system is totally ergodic if and only if X is connected. (ii) There exists an r ∈ N such that the (finitely many) ergodic components of T r a are totally ergodic.
Proof. We first prove statement (i). Suppose that the system is totally ergodic. Let X 0 be the identity component of X. Since X is compact, it is a disjoint union of a finite number of translations of X 0 . Since a permutes these copies, there exists an r ∈ N such that a r preserves X 0 . By assumption the translation by T a r = T r a is ergodic and so X 0 = X.
Conversely, suppose that X is connected and let r ∈ N. Because T a is ergodic, there exists x 0 ∈ X such that the sequence {a n π(x 0 )} n∈N is uniformly distributed in Z = G/([G, G]Γ), where π : X → Z is the natural projection. Since Z is a connected compact abelian group, it is well known that {a rn π(x 0 )} n∈N is also uniformly distributed in Z. By Theorem 2.5 below we have that T r a = T a r is ergodic. Since r ∈ N was arbitrary, T a is totally ergodic.
We now prove statement (ii). The Kronecker factor of an ergodic nilsystem is isomorphic to a rotation on a monothetic compact abelian Lie group G. Every such group has the form Z d 1 × T d 2 for some positive integer d 1 and nonnegative integer d 2 , where Z d denotes the cyclic group with d elements. It follows that K rat = K d 1 , and T d 1 has finitely many ergodic components and they are all totally ergodic.

Factors of nilsystems.
Given an ergodic nilsystem the following result allows us to identify its factors Z k (T a ): It will also be convenient for us to identify the 2-step affine factor A 2 of an ergodic nilsystem. We adapt a technique from [23] to do this. We first need a lemma: Proof. We know that A 2 (T a ) is a factor of Z 2 (T a ), so by Theorem 2.2 the function f factors through G/(G 3 Γ). Hence, after replacing G by G/G 3 we can assume that G is 2-step nilpotent. We know from [1] that |f | = const, so we can assume that |f | = 1 in which case we have thatf

By Theorem 2.2 the function h factors through the compact abelian group G/([G, G]Γ).
Moreover, since h is an eigenfunction of T a it is a character of G.
We first claim that and c belongs to the center of G we find that Hence, f c ·f ∈ E 1 (T a ). We define a map φ : where C is the set of constant functions. We will use a connectedness argument to do this. If we equip E 1 (T a ) with the L 2 (m) topology then the map φ is continuous. Since T a is ergodic the connected component of the function This proves the claim. We now show that for every b ∈ G we have f b ∈ E 2 (T a ). We compute Since h is a character of G we have Using that [a, b] belongs to the center of G and (5) we find Putting together equations (6), (7), (8), we find . This completes the proof.
we can assume that G is 2-step nilpotent and that G 0 is abelian. In this case, by Theorem 2.6 below the system is isomorphic to a 2-step nilpotent affine transformation on some finite dimensional torus. For such systems it is easy to verify that A 2 (T a ) = L ∞ (m), and so f ∈ A 2 (T a ).
We move now to the converse. It suffices to show that if f ∈ E 2 (T a ) then f factors through G/(G 3 [G 0 , G 0 ]Γ). We know from [1] that f = const, so we can assume that First notice that by Lemma 2.3 the map φ takes values in E 2 (T a ). Next we claim that φ(G 0 ) ⊂ C where C is the set of constant functions of modulus 1. We will use a connectedness argument to show this (similar to the one used in Lemma 2.3). If we equip E 1 (T a ) with the L 2 (m) topology then the map φ is continuous. The connected component of the function 1 in E 2 (T a ) is the set C. One can see this by using the fact that if f ∈ E 2 (T a ) is nonconstant then f dm = 0 (see [1]), which implies that f − c L 2 (m) = √ 2 for c ∈ C. Since φ is continuous and φ(e) = 1 we have that φ(G 0 ) ⊂ C. Now it is easy to check that φ : This completes the proof.

2.4.
Polynomial sequences on nilmanifolds. If G is a nilpotent Lie group, a 1 , . . . , a k ∈ G, and p 1 , . . . , p k are integer polynomials N d → Z, a sequence of the form is called a polynomial sequence in G. If the polynomials p 1 (n), . . . , p k (n) are linear then g(n) is called a linear sequence. The following result of Leibman ([23], [24]) gives information about the orbit closure of polynomial sequences on nilmanifolds and helps us handle their uniform distribution properties 8 by reducing them to uniform distribution properties on a certain factor: Theorem 2.5 (Leibman [23], [24]). Let X = G/Γ be a nilmanifold and g(n) be a polynomial sequence in G.
and let π : X → Z and π ′ : X → Z ′ be the corresponding natural projections. Then for every x ∈ X: (iii) If X is connected and a 1 , . . . , a k ∈ G are commuting elements that together with G 0 generate G, and g(n) = a We remark that the groups G 0 , [G 0 , G 0 ], and [G.G] are normal subgroups of G. Also note that the connected component the identity element of the group G/[G 0 , G 0 ] is abelian. The next result shows how we can use this property to our advantage. In order to state it we need some notation. If G is a group then a map T : Theorem 2.6 (Frantzikinakis & Kra [13]). Let X = G/Γ be a connected k-step nilmanifold such that G 0 is abelian. Then every nilrotation T a (x) = ax defined on X with the Haar measure m is isomorphic to a k-step nilpotent affine transformation on some finite dimensional torus. Furthermore, the conjugation can be chosen to be continuous.
Next we give two applications of Theorems 2.5 and 2.6 that will be needed in the sequel. We will use the first one frequently, for example in the proofs of Theorems A, B and F. The simple argument given below was communicated to us by S. Leibman: Since Y is connected and H 0 is abelian, by Theorem 2.6 we can assume that Y = T m and the nilrotations T a i , i = 1, . . . , k, are nilpotent affine transformations on T m . Then the coordinates of the sequence {g(n)x} n∈N are polynomials in n with real coefficients, and our problem reduces to the following one: If u : N → T m is a sequence with polynomial coordinates such that {u(n)} n∈N = T m , then {u(p(n))} n∈N = T m for every nonconstant polynomial p. To see this, first notice that u has the form where u i are integer-vector-valued polynomials, q ∈ Q, and a 1 , . . . , a k are linearly independent irrational numbers. Then using Corollary 2.4 in [8] we have that u(n) is dense in T m if and only if Span(u 1 (n)) + . . . + Span(u k (n)) (mod 1) = T m , where for u(n) = (q 1 (n), . . . q r (n)) we define Span(u(n)) = Span{(q 1 (x), . . . q r (x)), x ∈ R}. But clearly the last identity remains valid if we replace n with any nonconstant polynomial p(n). This completes the proof.
The next lemma will be used in the proof of Theorem E. Lemma 2.8. Suppose that X = G/Γ is a nilmanifold, g : N → G is a polynomial sequence, and p is an integer polynomial with deg p > 1. Then for every x ∈ X and β ∈ T irrational we have x)} n∈N on T × Y and repeating the argument used in the previous lemma we can reduce our problem to the following one: If u : N → T m is a sequence with polynomial coordinates such that {u(p(n))} n∈N = T m and deg p > 1, then {(nβ, u(p(n))} n∈N = T m+1 . To see this, first notice that u has the form where u i are integer-vector-valued polynomials, q ∈ Q, and a 1 , . . . , a k are rationally independent numbers. Since {u(p(n))} n∈N = T m , by Corollary 2.4 in [8] we have We also have (nβ, g(p(n))x) = (n, 0)β + (0, u 0 (p(n)))q + (0, u 1 (p(n)))a 1 + . . . + (0, u l (p(n)))a l .
In the general case we argue as follows: By [23] there exists an r ∈ N such that {g(p(rn + i))} n∈N is connected for i = 0, . . . , r − 1. Repeating the previous argument for the sequence {h(rn + i)} n∈N we find that This implies (9) and completes the proof.

Limit formula for linear sequences.
In the case where all the polynomials are linear the limit of the corresponding multiple ergodic averages (P ) was computed in [34] (for a simpler proof see [6]). To state the result we need some notation. Let G/Γ be a nilmanifold. Given l 1 , . . . , l k ∈ N define the set It was shown in [22] that H is a closed subgroup of G k . The discrete subgroup ∆ is cocompact so the nilmanifold H/∆ carries a Haar measure, call it m H . The next result is a straightforward generalization of a formula given by Ziegler in [34] that can be obtained using some computations of Leibman in [27]: [34]). Let (X = G/Γ, G/Γ, T a , m) be an ergodic nilsystem and where y = (y 1 , . . . , y k ), and H, ∆ are as before.
Combining this with Theorem 2.5 we easily deduce the following: Corollary 2.10. Let (X = G/Γ, G/Γ, T a , m) be an ergodic nilsystem with X connected (or equivalently T a is totally ergodic) and l 1 , . . . , l k ∈ Z. Then for a.e. x ∈ X the set H x = {(a l 1 n x, a l 2 n x, . . . , a l k n x)} n∈N is connected.
Proof. By Theorems 2.5 and 2.9 we have that for a.e. x ∈ X the set H x is homeomorphic to the nilmanifold H/∆ where the subgroup H and ∆ are as before. Since X = G/Γ is connected and G i is connected for i ≥ 2 it follows that H/∆ is connected. Hence, H x is connected for a.e. x ∈ X.
3. Weyl complexity for families of three polynomials Following [8], we will define the Weyl complexity of a family P = {p 1 , . . . , p k } of essentially distinct integer polynomials. Roughly speaking, this notion is designed to capture the minimum m ∈ N for which the factor Z m−1 is characteristic for the corresponding ergodic averages (P ). In Proposition 3.7 we will give an effective way of determining the Weyl complexity of any family of three polynomials.

Definition of Weyl complexity and basic properties.
A connected Weyl system is a system induced by an ergodic nilpotent affine transformation acting on some finite dimensional torus with the Haar measure. A standard Weyl system of level d is a system induced by a transformation T : T d → T d given by (11) T for some irrational number α ∈ T. A quasi-standard Weyl system of level d is a system induced by a transformation T : T d → T d given by where α i ∈ T, α 1 is irrational, and m i,i−1 = 0 for all i = 2, . . . , d. Note that every quasi-standard Weyl system is ergodic ( [16] page 67). Given a system (X, T ) we denote the diagonal in X k+1 by ∆ X k+1 , and we define the orbit of a polynomial family P = {p 1 , . . . , p k } with respect to the system (X, T ) to be Definition 3.1. Let P = {p 1 , . . . , p k } be a family of distinct integer polynomials with p i (0) = 0 for i = 1, . . . , k. The Weyl complexity W (P ) is the minimal r ∈ N with the following property: For every d ∈ N with d ≥ r, for some/every 9 quasi-standard Weyl system (X, T ) of level d we have For a general family P = {p 1 , . . . , p k } of essentially distinct polynomials we define The next two results give equivalent characterizations of the Weyl complexity that are better suited for our purposes. The first follows easily from the definition.
Proposition 3.2. The Weyl complexity W (P ) of a family P = {p 1 , . . . , p k } of essentially distinct integer polynomials is the maximal s ∈ N (or 1 if there is no such s) with the following property: For some/every quasi-standard Weyl system (X, T ) of level s − 1 of the form (12), there exist characters χ i of X, i = 0, . . . , k, at least one of which depends nontrivially on the variable x s−1 , such that For a proof of the next result see the remarks after Proposition 5.1 in [8].
Proposition 3.3. The Weyl complexity W (P ) of a family of essentially distinct integer polynomials P = {p 1 , . . . , p k } is the minimal m ∈ N with the following property: For every connected Weyl system (X, T ) the factor Z m−1 is characteristic for L 2 -convergence or weak convergence of the averages (P ).
We remark that for a quasi-standard Weyl system of the form (12) the factor Z m coincides with the sub-σ-algebra of sets that depend only on the first m coordinates.
We will make frequent use of the following simple identity: . . , p k } is a family of essentially distinct polynomials then Proof. This follows immediately from Proposition 3.3 and the identity 3.2. Different scenarios for the Weyl complexity of three polynomials. We will give an explicit criterion for determining W (p 1 , p 2 , p 3 ). We first show: Proof. We argue by contradiction. Suppose that W (p 1 , p 2 , p 3 ) ≥ 4. We can assume that p i (0) = 0 for i = 1, 2, 3. Consider the quasi-standard Weyl system (T 3 , T ) where By Proposition 3.2 there exist characters χ 0 , χ 1 , χ 2 , χ 3 of T 3 , at least one of which depends nontrivially on the variable x 3 , such that for all x ∈ T 3 . We use that and substitute in (14). Suppose that for some integers k i , l i , m i for i = 0, 1, 2, 3. Plugging in (14) we get that the system has a solution on the integers k i , l i , m i , i = 1, 2, 3, with at least one of the k 1 , l 1 , m 1 nonzero. Let d i = deg p i , i = 1, 2, 3, and a 1 , b 1 , c 1 be the leading coefficients of the polynomials p 1 , p 2 , p 3 . After rearranging the polynomials we can assume that d 1 ≥ d 2 ≥ d 3 . We consider three cases: If k 2 = 0 then looking at the leading coefficients of the polynomials in (16) we get that l 1 = −m 1 which implies (using (15)) that p 2 = p 3 , a contradiction. Hence, k 2 = 0 and since l 1 = −m 1 we get from (16) that d 1 = 2d. But then the polynomial on the left hand side of (17) has degree 4d, a contradiction.
Case 3. If d 1 = d 2 = d 3 = d then looking at the leading coefficients of the polynomials in (15), (16), (17), we get that the system has a nontrivial integer solution on k 1 , l 1 , m 1 . The determinant of the corresponding matrix is a 1 ). Since a 1 , b 1 , c 1 are nonzero, two of them must be equal. Without loss of generality we can assume that a 1 = b 1 . Then after replacing p 1 with q 1 = −p 1 , p 2 with q 2 = p 2 − p 1 , p 3 with q 3 = p 3 − p 1 , and using Proposition 3.4, reduces our problem to either Case 1 or Case 2. So we get again a contradiction showing that W (p 1 , p 2 , p 3 ) ≤ 3.
We will also need the following simple lemma: Lemma 3.6. Suppose that a 1 , b 1 , c 1 ∈ Z are nonzero and distinct, a 2 , b 2 , c 2 ∈ Z, and for some integers k 1 , l 1 , m 1 , not all of them zero. Then there exist r, s ∈ Q such that Proof. Without loss of generality we can assume that l 1 = 0. Performing some elementary operations we get the system Using that b 1 c 1 = 0 the first two equations easily imply that a 1 c 2 = c 1 a 2 . Since (a 1 − b 1 )(a 1 − c 1 ) = 0 the first and third equation easily imply that b 1 c 2 = b 2 c 1 . The result follows.
We can now prove the main result of this section: Proof. We can assume that p i (0) = 0 for i = 1, 2, 3. We first show part (i). Consider the standard Weyl system (T, T ) of level 1 induced by the transformation T where m i ∈ Z for i = 0, 1, 2, 3, be characters of T. Since the polynomials p 1 , p 2 , p 3 are linearly independent the equation gives that m i = 0 for i = 0, 1, 2, 3. By Proposition 3.2 we have that W (p 1 , p 2 , p 3 ) = 1.
To show part (i) we first notice that since by Proposition 3.5 we have W (p 1 , p 2 , p 3 ) ≤ 3, it remains to show that W (p 1 , p 2 , p 3 ) ≥ 3 if and only if the polynomials have the form (a) or (b). To do this consider the quasi-standard Weyl system (T 2 , T ) defined by T (x 1 , x 2 ) = (x 1 + α, x 2 + 2x 1 + α). 20 By Proposition 3.2 we have W (p 1 , p 2 , p 3 ) ≥ 3 if and only if there exist characters χ 0 , χ 1 , χ 2 , χ 3 of T 2 , at least one of which depends nontrivially on the variable x 2 , such that for all x ∈ T 3 . We use that and substitute in (18). We get that W (p 1 , p 2 , p 3 ) ≥ 3 if and only if the system has an integer solution on the k i , l i , m i , i = 1, 2, with at least one of the k 1 , l 1 , m 1 nonzero.
If the polynomial family has the form (a) then the following are eligible solutions to the previous system: (i) If k = 0 then k 1 = mk, l 1 = −kl, Hence, W (p 1 , p 2 , p 3 ) = 3. By Proposition 3.4 we get that the same is true for any polynomial family of the form (b).
We now focus on the hardest part of the result which is to show that if W (p 1 , p 2 , p 3 ) = 3 then some permutation of the polynomials has either the form (a) or (b). Let p 1 (n) = a 1 n d 1 + . . . + a d 1 n, p 2 (n) = b 1 n d 2 + . . . + b d 2 n, p 3 (n) = c 1 n d 3 + . . . + c d 3 n for some d i ∈ N, i = 1, 2, 3, and a i , b i , c i ∈ Z with a 1 , b 1 , c 1 = 0. After rearranging the polynomials we can assume that d 1 ≥ d 2 ≥ d 3 . We consider the following three cases: Case 1. If d 1 > d 2 ≥ d 3 we will show that some permutation of the polynomials has either the form (a) or (b). From (19) we get k 1 = 0, so p 2 , p 3 are integer multiples of the same integer polynomial p. Using this and (20), we get that k 2 p 1 is an integer combination of p and p 2 . This easily implies that the polynomials have the form (a), possibly with some rational multiple of p in place of p.
Case 2. If d 1 ≥ d 2 > d 3 we will show that p 1 = p 2 , a contradiction. From (19) we get that d 1 = d 2 = d and looking at the leading coefficients of the polynomials in (19) and (20) we get the system k 1 a 1 + l 1 b 1 = 0, k 1 a 2 1 + l 1 b 2 1 = 0. Since a 1 , b 1 = 0 and k 1 , k 2 are not both zero, we easily get that a 1 = b 1 and k 1 + l 1 = 0.
are not both zero, and they do not exceed d 0 − k. By possibly permuting the polynomials we can further assume that Substituting p 1 = q + p ′ 1 , p 2 = q + p ′ 2 , and k 1 = −l 1 in equations (19) and (20) gives the system By (22) we get m 1 = 0 (otherwise k 1 = m 1 = 0) and the polynomial p 3 has degree at most d ′ 1 . By (21) the polynomial q(p ′ 1 − p ′ 2 ) has degree d + d ′ 1 which is greater than the degree of all other polynomials that appear in (23). This can only happen if k 1 = 0, which gives m 1 = 0, contradicting our assumption that one of the integers k 1 , l 1 , m 1 is nonzero. Hence, d 0 = d which gives that p 1 = p 2 .
Case 3. If d 1 = d 2 = d 3 = d we will show that the polynomials have the form (b). We consider two subcases. Suppose that two of the three leading coefficients are the same, say for example that a 1 = b 1 (the other cases can be treated similarly). Then after replacing p 1 with q 1 = −p 1 , p 2 with q 2 = p 2 − p 1 , p 3 with q 3 = p 3 − p 1 , and using Proposition 3.4, reduces our problem to either Case 1 or Case 2. Since Case 2 is impossible, the polynomials q 1 , q 2 , q 3 have the form (a). It follows that the polynomials p 1 , p 2 , p 3 have the form (b) for some k = 0.
So it remains to deal with the case where all three polynomials have degree d and their leading coefficients a 1 , b 1 , c 1 are distinct. In this case we will show that the polynomials have the form (b) with k = 0. The case where d = 1 is trivial so we can assume that d ≥ 2. There exist nonzero r, s ∈ Q such that a 1 = rb 1 = sc 1 . We will show by induction on t that for all 1 ≤ t ≤ d we have (24) (a 1 , a 2 , . . . , a t ) = r(b 1 , b 2 , . . . , b t ) = s(c 1 , c 2 , . . . , c t ).
The t = d case gives that the polynomials p 1 , p 2 , p 3 have the form (b) with k = 0. For t = 1 the statement is true by assumption. To better illustrate the idea of the inductive step we first work out the t = 2 case. Looking at the coefficient of n d and n d−1 in (19), and the coefficient of n 2d and n 2d−1 in (20), we get (for d ≥ 2 we have 2d − 1 > d) the system Since the integers a 1 , b 1 , c 1 are nonzero and distinct we get by Lemma 3.6 that (a 1 , a 2 ) = r(b 1 , b 2 ) = s(c 1 , c 2 ) for some nonzero r, s ∈ Q, proving that (24) holds for t = 2.
Inductive step: Suppose that (24) holds for some t ∈ N with 1 ≤ t < d, we will show that it holds for t + 1. So we need to establish that (25) (a 1 , a t+1 ) = r(b 1 , b t+1 ) = s(c 1 , c t+1 ). 22 Looking at the coefficient of n d and n d−t in (19), and the coefficient of n 2d and n 2d−t in (20), we get the system Since a 1 = rb 1 = sc 1 , the first equation in (27) gives that where the last equality holds from (28). This shows that in the second equation in (27) all the terms in the sum except the first one are zero, hence If we replace the second equation in (27) with this simpler one, Lemma 3.6 applies and gives (25). This completes the induction and the proof. In this section we will prove Theorems A and B.
Proposition 4.1. Let P be an eligible collection of k-term polynomial families. Suppose that there exists an m ∈ N such that for every totally ergodic nilsystem and every {p 1 , . . . , p k } ∈ P the factor Z m is characteristic for weak convergence of the ergodic averages (P ). Then the same is true for L 2 -convergence and for every ergodic system.
Proof. We can assume that p i (0) = 0 for i = 1, . . . , k. By Theorem 1.2 we know that the averages (P ) converge in L 2 (µ), so the corresponding weak and strong limits coincide. Suppose that the factor Z m satisfies the assumption of the Proposition. It suffices to show that for every ergodic system (X, X , µ, T ) if f i ∈ L ∞ (µ) for i = 1, . . . , k and E(f i |Z m ) = 0 for some i = 1, . . . , k, then the averages (P ) converge to 0 in L 2 (µ) as N − M → ∞. Without loss of generality we can assume that i = 1. For ergodic systems, by Theorem 1.2 there exists a characteristic factor that is an inverse limit of nilsystems induced by some T -invariant sub-σ-algebras {X i } j∈N . Since E(f 1 |Z m (X )) = 0 implies that E(f 1 |Z m (X j )) = 0, for j ∈ Z, an approximation argument allows us to assume that our system is an ergodic nilsystem, say (X = G/Γ, G/Γ, m, T a ). By Proposition 2.1 there exists an r ∈ N such that the ergodic components of T r a are totally ergodic. Since p i (0) = 0, we have that p i (nr) = rq i (n) for some integer polynomials q i , for i = 1, . . . , k. Because P is eligible we have that {q 1 , . . . , q k } ∈ P. We know from [26] that for every nonzero integer r and m ∈ N we have Z m (T a ) = Z m (T r a ). Since T r has finitely many ergodic components, it follows that if E(f |Z m ) = 0 then the same holds for the ergodic components of T r a . So using our assumption for the ergodic components of T r a and the polynomial family {q 1 , . . . , q k }, we get that the averages (P ) converge to 0 in L 2 (µ) as N −M → ∞ if we substitute p i (rn) for p i (n) for i = 1, . . . , k. Finally, since E(f 1 |Z m ) = 0 implies that E(T j a f 1 |Z m ) = 0, for j ∈ N, a similar argument shows that the limit is also zero if we substitute p i (nr + s) for p i (nr) in (P ) for s = 0, . . . , r − 1. It follows that the averages (P ) converge to 0 in L 2 (µ) as N − M → ∞, completing the proof.

Next we prove Theorem A.
Proof of Theorem A. Let p be a nonconstant integer polynomial. We first claim that for totally ergodic systems the L 2 -limit of the ergodic averages associated to the families {l 1 p(n), l 2 p(n), . . . , l k p(n)} and {l 1 n, l 2 n, . . . , l k n} are the same (a formula for the limit is then follows from Theorem 2.9). Using Theorem 1.2 and an approximation argument it suffices to check this for every totally ergodic nilsystem. So let (X = G/Γ, G/Γ, m, T a ) be such a system. It suffices to show that for a.e. x ∈ X the sequences {(a l 1 n x, a l 2 n x, . . . , a l k n x)} n∈N and {(a l 1 p(n) x, a l 2 p(n) x, . . . , a l k p(n) x)} n∈N are equidistributed, or equivalently that the sequences {g(n)x} n∈N and {g(p(n))x} n∈N are equidistributed, where g(n)= (a l 1 n ,a l 2 n ,. . . , a l k n ) is a linear sequence in G k andx = (x, . . . , x) ∈ X k . By Theorem 2.5 it is enough to show that for a.e. x ∈ X the two sequences have the same closure. By Corollary 2.10 the set {g(n)x} n∈N is connected for a.e. x ∈ X, so Proposition 2.7 applies and gives the required identity.
We know from [19], [26] that the factor Z k−1 is characteristic for the family {l 1 n, l 2 n,. . . , l k n}, hence Z k−1 is also characteristic for the family {l 1 p(n), l 2 p(n), . . . , l k p(n)} for totally ergodic systems. Since the collection of polynomial families of the form {l 1 p(n), l 2 p(n), . . . , l k p(n)} with p ∈ Z[t] nonconstant is eligible, by Proposition 4.1 the factor Z k−1 is also characteristic for every ergodic system. It was shown in [35] that Z k−1 is in fact the smallest characteristic factor for the family {l 1 n, l 2 n, . . . , l k n}, the same argument shows that this is also the case for any family of the form {l 1 p(n), l 2 p(n), . . . , l k p(n)} where p ∈ Z[t] is nonconstant. The next two lemmas will enable us to show that the Kronecker factor is characteristic for the averages (P ) when k = 3 and W (p 1 , p 2 , p 3 ) = 2.
Lemma 4.2. Let k 1 , k 2 , l 1 , l 2 ∈ Z be such that the polynomials k 1 m, k 2 n, l 1 m+l 2 n are distinct. Then for ergodic systems the Kronecker factor is characteristic for L 2 -convergence of the averages Proof. Let (X, X , µ, T ) be an ergodic system and suppose that f 1 , f 2 , f 3 ∈ L ∞ (µ) with f i ∞ ≤ 1 for i = 1, 2, 3. It suffices to show that if E(f i |K) = 0 for some i = 1, 2, 3 then the L 2 -limit of the averages in (29) is zero. Suppose that E(f 3 |K) = 0, the argument is identical if E(f 2 |K) = 0, and if E(f 1 |K) = 0 we only have to interchange the role of m and n. By Theorem 2.5 10 the L 2 -limit (30) lim does not depend on the choice of the Følner sequence F N . We claim that the L 2 -limit (30) is zero for the Følner sequence where a(N) is an increasing sequence of integers that will be chosen later. We start by using the well known fact that for the family {an, bn}, where a, b are distinct integers, the Kronecker factor is characteristic (this is implicit in [15]). Since E(f 3 |K) = 0, we get that for every N ∈ N there exists an a(N) ∈ N such that for all m ∈ {0, 1, . . . , N − 1}. Furthermore, we can make sure that the sequence a(N) is increasing in N. We have . Combining (32) and (33), we get that for the choice of Følner sequence made in (31) the L 2 -limit in (30) is zero, and so the same is true for the L 2 -limit of the averages (29). 10 More accurately, we have to combine Theorem 2.5 with a result in [26] that reduces the study of the limiting behavior of linear multiple ergodic averages along any Følner sequence to nilsystems. 25 Lemma 4.3. Let (X, X , µ, T ) be a totally ergodic system, p 1 , p 2 be linearly independent integer polynomials, and k 1 ,k 2 ,l 1 ,l 2 ∈ Z be such that the family P = {k 1 p 1 , k 2 p 2 , l 1 p 1 +l 2 p 2 } has Weyl complexity 2. If f 0 , f 1 , f 2 , f 3 ∈ L ∞ (µ) then the averages Proof. By Theorem 1.2 and Lemma 4.2 there exists a factor of the system that is characteristic for both averages and is an inverse limit of finite step nilsystems. So using an approximation argument it suffices to verify the lemma when the system is a totally ergodic nilsystem, say (X = G/Γ, G/Γ, m, T a ). By Proposition 4.1 the set X is connected so using Theorem 2.5 it suffices to show that for every x ∈ X the sequences Using part (iii) of Theorem 2.5 we can show that the closure of the sequence in (36) is the connected nilmanifold H/∆, where ∆ = H ∩ Γ 4 (alternatively we can directly quote a more general result proved in Section 4 of [27]). It remains to show that the closure of the sequence in (37) is equal to H/∆ as well. To do this we are going to apply Theorem 2.5. First notice that if a 0 = (a, a, a, a), a 1 = (e, a k 1 , e, a l 1 ), a 2 = (e, e, a k 2 , a l 2 ), andx = (x, x, x, x), then the sequences in (36)  We first obtain some information about the quotient H/([H 0 , H 0 ]∆). We claim that [H 0 , H 0 ] = [G 0 , G 0 ] 4 . The ⊂ inclusion is obvious. To establish the other inclusion first notice that for g ∈ G 0 elements of the form (g, g, g, g), (e, g k 1 , e, g l 1 ), and (e, e, g k 2 , g l 2 ) belong to H 0 . Taking commutators of these elements and using the fact that the group G 0 is divisible, we easily get that where S is an ergodic nilpotent affine transformation of T d . We have thus reduced our problem to showing that for every ergodic nilpotent affine transformation S acting on X = T d , linearly independent integer polynomials p 1 , p 2 , and every x ∈ T d , the sequences have the same closure. Since S is uniquely ergodic the sequence {S m x} m∈N is dense in X for every x ∈ X. So it suffices to show that the sets O(P, ∆ X 3 , S) and O(Q, ∆ X 3 , S) have the same closure, where Q is the family of 2-variable polynomials {k 1 n, k 2 r, l 1 n + l 2 r}. This in turn will follow if we show that the averages (34) and (35) have the same limit as N − M → ∞ in the special case where the transformation T is equal to S. Since W (P ) = 2, by Proposition 3.3 the characteristic factor for the averages (34) when T = S is the Kronecker factor. By Lemma 4.2 the Kronecker factor is also characteristic for the averages (35), so it suffices to check the identity for group rotations. This can be easily verified for characters and then for general bounded functions by approximating them in L 2 by finite linear combinations of characters, thus completing the proof.

Characteristic factors and limit formulas.
We are now ready to prove Theorem B. The argument is rather lengthy so we refer the reader to the Introduction for a brief sketch. Notice that by Proposition 3.7 the cases (i), (ii), (iii) of Theorem B correspond to the cases where the polynomial family has Weyl complexity 1, 3, 2 correspondingly. We deal with each one separately. 4.2.1. Weyl complexity 1. Characteristic factor : We can assume that p i (0) = 0 for i = 1, 2, 3. If the polynomials are linearly independent it was shown in [14] that the rational Kronecker factor K rat is characteristic for L 2 -convergence of the averages (P ).

Limit formula:
In the case where the system is totally ergodic the factor K rat is trivial, hence for every where the limit is taken in L 2 (µ).

4.2.2.
Weyl complexity 2. Characteristic factor : The collection of 3-term polynomial families of Weyl complexity 2 is easily shown to be eligible, so by Proposition 4.1 we can assume that the system is totally ergodic. It follows from Proposition 3.7 that the polynomials {p 1 ,p 2 ,p 3 } are linearly dependent. Hence, for some linearly independent integer polynomials q 1 , q 2 with zero constant term and k 1 , k 2 , l 1 , l 2 , c 1 , c 2 , c 3 ∈ Z. Combining Lemmas 4.3 and 4.2, we get that for totally ergodic systems the Kronecker factor K is characteristic for L 2 -convergence of the averages (P ).
It can be easily seen that for polynomial families of Weyl complexity 2 every characteristic factor (thought of as a subalgebra of functions) for the averages (P ) contains all the eigenfunctions of the system, and as a result it contains the Kronecker factor. Hence, for ergodic systems the Kronecker factor is the smallest characteristic factor.
Limit formula: We now compute the limit of the corresponding ergodic averages (P ) for totally ergodic systems. We can assume that c i = 0 for i = 1, 2, 3. After replacing all three functions with their projection to the Kronecker factor K we can assume that X = K. Every Kronecker system is an inverse limit of 1-step nilsystems so we can assume that our system is a totally ergodic rotation on a compact abelian Lie group G with the Haar measure m. Moreover, by Proposition 4.1 the group G has to be connected, so G = T d for some nonnegative integer d. In this case it is easy to check that for every for a.e. t ∈ T d .

4.2.3.
Weyl complexity 3. By Proposition 3.7 the polynomial triple (p 1 , p 2 , p 3 ) either has the form (a) (lp + c 1 , mp + c 2 , kp 2 + rp + c 3 ), or (b) (kp 2 + lp + c 1 , kp 2 + mp + c 2 , kp 2 + rp + c 3 ), for some integer polynomial p, and k, l, m, r, c 1 , c 2 , c 3 ∈ Z. We consider the following three cases: Case 1: The family of essentially distinct polynomials has the form (a) with k = 0. This case is covered by Theorem A. The smallest characteristic factor is Z 2 . To find a limit formula in the totally ergodic case, first using standard deductions we can assume that the system is a totally ergodic 2-step nilsystem. In this case Theorem 2.9 gives a formula for the limit.
Case 2: The family of essentially distinct polynomials has the form (a) with k = 0. We first deal with the case p(n) = n and then reduce the case of a general polynomial p(n) to this one.
Characteristic factor for p(n) = n: Since the collection of polynomial families of the form (a) with k = 0 is eligible, by Proposition 4.1 we can assume that the system is totally ergodic. Furthermore, we can assume that c i = 0 for i = 1, 2, 3. We first claim that if f 3 ∈ K ⊥ then the averages We apply the Hilbert space Van der Corput Lemma ([5]) 11 for the sequence of functions T (m−l)n (T mh f 2 ·f 2 ) · T kn 2 +(2kh+r)n (T kh 2 +h f 3 ) · T kn 2 +rnf 3 converges to zero in L 2 (µ) as N − M → ∞. Using Proposition 3.7 it is easy to check that for all h, k, l, m, r ∈ Z with h, k, l, m = 0 and l = m we have W ((m − l)n, kn 2 + (2kh + r)n, kn 2 + rn) = 2.
Hence, as shown in the Weyl complexity 2 case, the characteristic factor for the ergodic averages (42) is the Kronecker factor. This proves the claim.
Next we claim that if f 1 or f 2 ∈ A ⊥ 2 then the averages (41) converge to zero in L 2 (µ) as N − M → ∞. We prove this for f 1 , the argument is similar for f 2 . As we have shown, we can replace f 3 with E(f 3 |K) without changing the limit of the averages (41). Moreover, after approximating E(f 3 |K) by a linear combination of eigenfunctions, and using linearity, we can assume that f 3 is either constant, or a λ-eigenfunction where 11 Let {x n } n∈N be a bounded sequence in a Hilbert space. If for every m ∈ N one has lim N −M→∞ λ = e(α) for some α ∈ (0, 1). Moreover, since the system is totally ergodic α is irrational. If f 3 is constant the claim follows from a classical result of Furstenberg [15]. If not, the average (41) is equal to f 3 times the average A simple computation shows that there exist characters χ 1 , χ 2 : T 2 → C such that χ 1 (R ln (t 1 , t 2 )) · χ 2 (R mn (t 1 , t 2 )) = e((kn 2 + rn)α) holds for every n ∈ N, where R : T 2 → T 2 is defined by and β is some appropriately chosen rational multiple of α. Consider the product system (X × T 2 , µ × m, S = T × R), where m is the Haar measure on T 2 , and let h 1 = f 1 · χ 1 , h 2 = f 2 · χ 2 . Then the average (43) takes the form Let S t , t ∈ [0, 1], be the ergodic components of S. We will show that if f 1 ∈ A 2 (T ) ⊥ then h 1 (x) ∈ K(S t ) ⊥ for a.e. t. As it is well known, this would follow if we show that for a.e.
x ∈ X × T 2 we have f 1 (T n x) · e(ns + n 2 γ) = 0 for a.e. x ∈ X, where γ is some integer multiple of β. Since f 1 ∈ A 2 (T ) ⊥ and T is totally ergodic this follows from [12]. Hence, h 1 (x) ∈ K(S t ) ⊥ for a.e. t. From [15] we know that for distinct nonzero integers l, m the Kronecker factor is characteristic for the ergodic averages associated to the family {ln, mn}. So an ergodic decomposition argument gives that the average in (44) converges to zero in L 2 (µ) as N − M → ∞, proving the claim. This shows that the factor A 2 is characteristic for the averages (41).
Limit formula for p(n) = n: We now compute the limit of the averages (41) for totally ergodic systems. We can assume that c i = 0 for i = 1, 2, 3. Since A 2 is a factor of Z 2 , and Z 2 is an inverse limit of 2-step nilsystems, using an approximation argument we can assume that our system is a totally ergodic 2-step nilsystem that coincides with its 2-step affine factor A 2 . In this case the system is isomorphic to a 2-step nilpotent affine transformation on a connected compact abelian group G (see [1]). Furthermore, our system is a nilsystem, so the group G has to be Lie. Hence, we can assume that G is a finite dimensional torus. In this case, the evaluation of the limit is a straightforward computation, which is done (for general k-step affine systems) in [8] (or [27]). Instead of reproducing this rather complicated formula let us illustrate how the limit is computed in a simple case. Suppose that T : T 2 → T 2 is given by where α is irrational and b is a nonzero integer. We find by direct computation that for a.e. (t 1 , t 2 ) ∈ T 2 we have f 1 (t 1 +lx 1 , t 2 +ly 1 +l 2 y 2 )·f 2 (t 1 +mx 1 , t 2 +my 1 +m 2 y 2 )·f 3 (t 1 +ky 2 +rx 1 , y 3 ) dx 1 dy 1 dy 2 dy 3 .
It immediately follows from this formula (and more generally from the formulas in [8] or [27]) that for almost every x ∈ G the set H x of (46) is connected.
Connectedness for p(n) = n: In order to deal with the case of a general polynomial p(n) we will apply Proposition 2.7 which allows as to make the substitution n → p(n) when computing the orbit closure of a polynomial sequence with connected closure. We now verify that the connectedness assumption is satisfied, i.e. that for every totally ergodic nilsystem (X = G/Γ, G/Γ, m, T a ) the set (46) H x = {(a ln x, a mn x, a kn 2 +rn x)} n∈N is connected for a.e. x ∈ X. By Theorem 2.5 we have for a.e. x ∈ X. Since the factor A 2 is characteristic for convergence of the averages in (47), we can replace every function by its projection to A 2 which by Proposition 2.4 is . This shows that the set H x factors through Z 3 . Furthermore, we know that T a acting on Z is topologically conjugate to a 2-step nilpotent affine transformation on some finite dimensional torus T d . As we mentioned before, we can compute explicitly the limit in this case and derive that for a.e. x ∈ X the projection of π(H x ) of H x onto Z 3 is connected. It follows that the set H x is a product of the connected set π(H x ) and the connected nilmanifold ( Hence, for a.e. x ∈ X the set H x is connected. General case: To deal with the general case notice that all the previous results carry through once we show that for totally ergodic systems the L 2 -limit of the averages in (43) remains the same if we replace n with any nonconstant polynomial p(n). Using Theorem 1.2 and an approximation argument, it suffices to verify that this is the case for totally ergodic nilsystems. By Theorem 2.5 we can further reduce this to showing that if (X = G/Γ, G/Γ, m, T a ) is a nilsystem with X connected, then for almost every x ∈ X the sequences {(a ln x, a mn x, a kn 2 +rn x)} n∈N and {(a lp(n) x, a mp(n) x, a kp(n) 2 +rp(n) x)} n∈N have the same closure. We previously showed that for a.e. x ∈ X the closure of the first sequence is connected. Hence, Proposition 2.7 applies and proves the claim.
It can be easily seen that for polynomial families of the form (a) with k = 0 every characteristic factor (thought of as a subalgebra of functions) for the averages (P ) contains all the functions in E 2 (defined in Section 2.1), and as a result it contains the factor A 2 . Hence, for ergodic systems the factor A 2 is the smallest characteristic factor.
Case 3: The family of essentially distinct polynomials has the form (b) with k = 0. Characteristic factor : It suffices to show that if f i ∈ A ⊥ 2 for some i = 1, 2, 3 then the averages We show this for i = 1, the argument is similar for i = 2, 3. This time applying Van der Corput's lemma doesn't help. Instead, we notice that since the limit in L 2 (µ) as N − M → ∞ of the averages (48) exists, it suffices to show that if f 1 ∈ A ⊥ 2 then for every f 0 ∈ L ∞ (µ) we have Equivalently we need to show that for every f 0 ∈ L ∞ (µ), which is true by Case 2.
An argument analogous to the one explained in Case 2 shows that for ergodic systems the factor A 2 is the smallest characteristic factor for the averages (48). Also a limit formula goes along the lines of Case 2.

Applications in combinatorics
In this section we are going to derive several combinatorial implications of our results in ergodic theory. Our starting point will always be the Correspondence Principle of Furstenberg that enables us to translate statements in combinatorics to statements in ergodic theory. We mention a slight modification of this principle due to Lesigne (see [6]) that allows us to work with ergodic systems (this is crucial for Theorem C'): Furstenberg's Correspondence Principle. For every Λ ⊂ N there exists an invertible ergodic system (X, X , µ, T ) and A ∈ X with µ(A) = d * (Λ) and such that for all k ∈ N and integers n 1 , . . . , n k .

5.1.
Sets of multiple recurrence. We will prove Theorem D.
Proof of Theorem D. Suppose that p(n) is an integer polynomial that satisfies the assumptions of the theorem. Using Furstenberg's Correspondence Principle it suffices to show that if f ∈ L ∞ (µ) is nonnegative and not a.e. zero then Using an ergodic decomposition argument we can assume that the system is ergodic and by Theorem 1.2 we can reduce the problem to showing (49) in the case where the system is an inverse limit of nilsystems. Moreover, an argument completely analogous to that of Lemma 3.2 in [17] shows that the positiveness property (49) is preserved by inverse limits. Hence, we can further assume that the system is an ergodic nilsystem. In this case by Proposition 4.1 there exists an r ∈ N such that the ergodic components of T r are totally ergodic. By our assumption there exists an n 0 ∈ N such that p(n 0 ) ≡ 0 (mod r).
Then p(rn + n 0 ) = rq(n) for some integer polynomial q and the limit in (49) is greater or equal than 1/r times Using Theorem A for the ergodic components of T r we get that this last limit equals which is positive by [15].

5.2.
A bad set for recurrence with good powers. We will prove Theorem E. It will be a consequence of the Polynomial Szemerédi Theorem and the following multiple ergodic theorem: Proposition 5.1. Let (X, X , µ, T ) be an invertible system, h : T → C be Riemann integrable, f 0 , . . . , f k ∈ L ∞ (µ), and β be an irrational number. Then for every integer polynomial p with deg p > 1 we have Proof. Using an ergodic decomposition argument it suffices to check (50) when the system is ergodic. In [19] it is shown that for every 1], denote the ergodic components of T × T . Keeping this in mind, and applying Theorem 1.2 for the ergodic components of the product system (X × X, X × X , µ × µ, T × T ), we get that there exists an m ∈ N such that if E(f i |Z m ) = 0 for some i = 0, . . . , k then So (50) is obvious when E(f i |Z m ) = 0 for some i = 0, . . . , k, since then both limits are zero. We can therefore assume that f i ∈ Z m for i = 0, . . . , k. Since the factor Z m (T ) is an inverse limit of nilsystems, a standard approximation argument shows that it suffices to check (50) when the system is an ergodic nilsystem, say (X = G/Γ, G/Γ, m, T a ). In this case, equation (50) follows if we show that for f 1 , . . . , f k ∈ L ∞ (µ) we have for a.e.
x ∈ X that By Theorem 2.5 it suffices to show that for a.e. x ∈ X we have Since deg p > 1 this follows from Lemma 2.8.
Proof of the Theorem E. We will show that the set S = n ∈ N : {n √ 2} ∈ [1/4, 3/4] has the advertised property. Clearly S is not good for single recurrence since it is not good for recurrence for the rotation by √ 2 on T. We will show that p(S) is a set of multiple recurrence whenever p is an integer polynomial with deg p > 1. So let (X, X , µ, T ) be an invertible system and A ∈ X with µ(A) > 0. We apply Proposition 5.1 for f i = 1 A , i = 0, 1, . . . , k, h = 1 [1/4,3/4] , and The last limit is positive by Theorem 1.1, showing that p(S) is a set of multiple recurrence.

5.3.
Universal families of three polynomials. We prove Theorem F.
Proof of Theorem F. We can assume that the polynomials p 1 , p 2 , p 3 are essentially distinct. We claim that under the assumptions of the theorem, if f ∈ L ∞ (µ) is nonnegative and not a.e. zero then An argument analogous to the one used in the beginning of the proof of Theorem D allows us to reduce the problem to showing (54) in the case where the system is an ergodic nilsystem, say (X = G/Γ, G/Γ, m, T a ). Weyl complexity 1. If the polynomials p 1 −p(0), p 2 −p 2 (0), p 3 −p 3 (0) are linearly independent we have from Theorem B that the factor K rat is characteristic for the averages in (54), hence we can assume that X = K rat . Since our system is a nilsystem we have K rat = K r for some r ∈ N. By our assumption there exists n 0 ∈ N such that p i (n 0 ) ≡ 0 (mod r) for i = 1, 2, 3. Then p i (rn + n 0 ) = rp ′ i (n) for some integer polynomials p ′ i for i = 1, 2, 3. Hence, whenever n ≡ n 0 (mod r) we have T p i (n) = id for i = 1, 2, 3, and so the integral in (54) is equal to f 4 dµ > 0. The result follows.
Weyl complexity 2. We start with some reductions on the polynomial family. We have that (p 1 , p 2 , p 3 ) = (k 1 q 1 + c 1 , k 2 q 2 + c 2 , l 1 q 1 + l 2 q 2 + c 3 ) for some linearly independent integer polynomials q 1 , q 2 and k 1 , k 2 , l 1 , l 2 , c 1 , c 2 , c 3 ∈ Z. Since p i (n) ≡ 0 (mod k i ) has a solution for i = 1, 2, we get that So we are reduced to the case where the polynomial family has the form (k 1 q 1 , k 2 q 2 , l 1 q 1 + l 2 q 2 + c 3 ). If c 3 = 0 we can choose an r ∈ N that is relatively prime to the integers k 1 , k 2 , c 3 . Then the system of equations p i (n) ≡ 0 (mod r), i = 1, 2, 3, does not have a solution, contrary to our assumption. Hence, c 3 = 0. So we can assume that (p 1 , p 2 , p 3 ) = (k 1 q 1 , k 2 q 2 , l 1 q 1 + l 2 q 2 ) for some linearly independent integer polynomials q 1 , q 2 and k 1 , k 2 , l 1 , l 2 ∈ Z.
By Proposition 2.1 there exists an r ∈ N such that the ergodic components of T r a are totally ergodic. By our assumption there exists n 0 ∈ N such that q i (n 0 ) ≡ 0 (mod r) for i = 1, 2. Then q i (rn + n 0 ) = rq ′ i (n) for some linearly independent integer polynomials q ′ 1 , q ′ 2 , and the average in (54) is greater or equal than 1/r times Working with the (totally ergodic) ergodic components of T r a and the polynomial family (which also has Weyl complexity 2) we get from Lemma 4.3 that the limit in (55) is equal to which is easily shown to be positive. Weyl complexity 3. The argument is similar to the one used in the previous case so we just sketch the main steps. By Proposition 3.7 some permutation of the polynomials p 1 , p 2 , p 3 either have the form (a) (lp + c 1 , mp + c 2 , kp 2 + rp + c 3 ), or the form (b) (kp 2 + lp + c 1 , kp 2 + mp + c 2 , kp 2 + rp + c 3 ), for some integer polynomial p, and some k, l, m, r, c 1 , c 2 , c 3 ∈ Z. Arguing as in the Weyl complexity 2 case, we can assume that c i = 0 for i = 1, 2, 3 and the system is totally ergodic. We consider the following three cases: In the case (a) with k = 0 we get from Theorem A that the limit in (54) is equal to which is positive by [15].
In the case (a) with k = 0 we showed in the proof of part (ii) of Theorem B that the limit in (54) is equal to which is positive by Theorem 1.1.
To deal with the case (b) with k = 0 we use the identity which allows us to show to reduce case (b) with k = 0 to case (a) with k = 0 that we previously handled. This completes the proof.

5.4.
Positive results for lower bounds. The proof of Theorem C' is an immediate consequence of Theorem C and Furstenberg's Correspondence Principle. So it remains to prove Theorem C.
Proof of Theorem C. The proof for the case of two polynomials goes along the lines of the case of three polynomials with Weyl complexity ≤ 2 and so we omit it. So let {p 1 , p 2 , p 3 } be a family of essentially distinct integer polynomials that is not equal to any of the exceptional forms mentioned in Theorem C. Then by Proposition 3.7 the polynomial family either has Weyl complexity ≤ 2, or some permutation of the polynomials has the form {kp, lp, (k + l)p}, for some integer polynomial p with p(0) = 0 and k, l ∈ Z. So we have to deal with the following two cases: Case 1. Suppose that W (p 1 , p 2 , p 3 ) ≤ 2. If W (p 1 , p 2 , p 3 ) = 1 the polynomials are linearly independent and the result follows from [14]. If W (p 1 , p 2 , p 3 ) = 2 we can assume that (p 1 , p 2 , p 3 ) = (k 1 q 1 , k 2 q 2 , l 1 q 1 + l 2 q 2 ), where q 1 , q 2 are some linearly independent integer polynomials and k 1 , k 2 , l 1 , l 2 ∈ Z.
Suppose first that the system is totally ergodic. We can assume that its Kronecker factor has the form (G, G, m, R b ) where G is a connected compact abelian group, G is the Borel σ-algebra, m is the Haar measure, and We claim that if f 1 , f 2 , f 3 ∈ L ∞ (µ) are such that E(f i |K) = 0 for some i = 1, 2, 3, then where the limit is taken in L 2 (µ). We verify this as follows: First notice that since 1 S (n) = 1 V (q 1 (n)b, q 2 (n)b) and the function 1 V is Riemann integrable, using an approximation argument it suffices to show that (56) holds for χ 1 (q 1 (n)b) · χ 2 (q 2 (n)b) in place of 1 S (n), where χ 1 , χ 2 are any two characters of G. To see this consider the transformation T ′ = T × R b/k 1 × R b/k 2 acting on G 3 with the Haar measure, where by b/k we denote a solution to the equation kx = b (since G is connected such a solution always exists). Let T ′ t , t ∈ [0, 1], be the ergodic components of T ′ . It is well known ( [15]) that if E(f 1 |K(T )) = 0 then E(f 1 (x) · χ 1 (y)|K(T ′ t )) = 0 for a.e. t. Applying part (iii) of Theorem B for the ergodic components of T ′ in place of T , and the functions f 1 (x) · χ 1 (y), f 2 · χ 2 (z), f 3 in place of f 1 , f 2 , f 3 , we get the advertised identity.
So we are left with estimating (59), for some appropriately chosen δ. First notice that if F : G × G → C is continuous then So if δ is small enough, and f i = f = 1 A , for i = 0, 1, 2, 3, the quantity in (59) is greater than Summarizing, we have shown that if W (p 1 , p 2 , p 3 ) ≤ 2 and the system (X, B, µ, T ) is totally ergodic, then for every ε > 0, if δ is small enough we have This completes the proof of Case 1 for totally ergodic systems.
In the general case, since the Kronecker factor K is an inverse limit of systems with finite rational Kronecker factor K rat , we can choose r ∈ N and a factor K ′ of K such that K ′ ∩ K rat = K r and Then up to an error term ε, equation (58) remains valid after replacingf i with E(f i |K ′ ), for i = 0, 1, 2, 3. The system (K ′ , T ) is isomorphic to an ergodic rotation on H ×Z r , where H is a connected abelian group. We write q i (rn) = rq ′ i (n), i = 1, 2, for some integer polynomials q 1 , q 2 , and work with T r in place of T and q ′ i (n) in place of q i (n), i = 1, 2. Arguing as in the totally ergodic case we get the desired lower bound, completing the proof of Case 1.
Case 2. Suppose that some permutation of the essentially distinct polynomials has the form {lp, mp, (l + m)p} for some integer polynomial p with p(0) = 0 and l, m ∈ Z. Our tactic will be similar to the one used in the previous case but extra complications arise because the relevant characteristic factor in this case is not "abelian".
Suppose first that the system is totally ergodic. Using an approximation argument we can assume that the factor Z 2 is an ergodic 2-step nilsystem, say (X = G/Γ, G/Γ, m, T a ). By Proposition 4.1 we have that X is connected. Since G is 2-step nilpotent, the subgroup Γ 2 = G 2 ∩ Γ is normal in G. So G/Γ 2 is a group and X = (G/Γ 2 )/(Γ/Γ 2 ). Using this representation for X we can assume that Γ 2 = {e} and so G 2 is a compact abelian Lie group. Since G 2 is connected we can further assume that it is a finite dimensional torus with the Haar measure λ. Likewise, Z = X/[G, G] is a connected compact abelian group and so we can assume that it is a finite dimensional torus with the Haar measure λ ′ .
If π : X → Z is the natural projection, and V is an open subset of Z, let S = {n ∈ N : p(n)a 0 ∈ V } where a 0 = π(aΓ) (we use additive notation on Z). We first claim that if f 1 , f 2 , f 3 ∈ L ∞ (µ) are such that E(f i |Z 2 ) = 0 for some i = 1, 2, 3, and l, m, r are distinct nonzero integers, then where the limit is taken in L 2 (µ). We verify this as follows: We can assume that the integers l, m, r are relatively prime (if not we write l = l ′ d, m = m ′ d, r = r ′ d where d = gcd(l, m, r), and work with the polynomial family {l ′ p ′ , m ′ p ′ , rp ′ } where p ′ = dp). Hence, there exist l 1 , m 1 , r 1 ∈ Z such that ll 1 +mm 1 +rr 1 = 1. Since 1 S (n) = 1 V (p(n)a 0 ) and the function 1 V is Riemann integrable, using an approximation argument it suffices to verify (61) with χ(p(n)a 0 ) in place of 1 S (n), where χ is any character of Z (using our notation we have χ(a p(n) Γ) = χ(p(n)a 0 )). This last statement follows immediately by applying Theorem A for the functions f 1 · χ(l 1 g), f 2 · χ(m 1 g), f 3 · χ(r 1 g) in place of f 1 , f 2 , f 3 .
We will now apply (61) for the set S δ = {n ∈ N : p(n)a 0 ∈ B(0, δ)} in place of S, where δ > 0. First notice that since the sequence p(n)a 0 is uniformly distributed in Z we have that for i = 0, 1, 2, 3. We claim that the second limit in (63) is equal to 12 (B(0, δ)). This can be seen as follows: Since X is connected, we can use the formula of Theorem 2.9 with p(n) in place of n (by Theorem A), χ,f 1 ,f 2 ,f 3 in place of f 1 , f k , f l , f m , and 1 in place of all other f i , where χ is any character of Z. We get that for a.e. x = gΓ ∈ X we have (65) lim Using an approximation argument we can verify that (65) holds with 1 B(G 2 ,δ) in place of χ. If we multiply this last identity with f 0 (gΓ), then integrate with respect to m(x) and use (62), we get that the limit in (63) is equal to (64), proving our claim.
So we are left with estimating (64) for r = l + m and some well chosen δ > 0. It suffices to show that when all functions are equal to f = 1 A the limit of (64) as δ → 0 is greater or equal than µ(A) 4 . Since π −1 (0) = (G 2 Γ)/Γ ≃ G 2 it is not hard to see that this limit is equal to 2 Γ) dλ(g 2 ) dλ(g 1 ) dm(gΓ). 12 It may not be immediately obvious but the next integral is well defined. For more details see [34].
Since the elements of G 2 commute with all the elements of G we can write the last integral as (66) x) dλ(g 1 ) dλ(g 2 ) dm(x).
An easy algebraic manipulation shows that the set {(g, gg l 1 g So the integral (67) can be rewritten as Using Cauchy-Schwarz and a change of variables, we see that the last integral is greater or equal than which is greater or equal than This completes the proof for totally ergodic systems. In the general case, since every nilsystem is an inverse limit of nilsystems with finite rational Kronecker factor K rat , there exists r 0 ∈ N and a factor Y of our system, such that Y is a nilsystem, Y ∩ K rat = K r 0 , and Then up to an error term ε equation (63) remains valid after replacingf i with E(f i |Y), for i = 0, 1, 2, 3. Moreover, the ergodic components of the system (Y, T r 0 ) are totally ergodic. We write p(r 0 n) = r 0 q(n) for some integer polynomial q and work with T r 0 in place of T and q(n) in place of p(n). Arguing as in the totally ergodic case we get the desired lower bound.

5.5.
Conditional counterexamples for the exceptional cases. We explain why we expect the lower bounds of Theorems C and C' to fail for the exceptional polynomial families (e 1 ) with l < m < r and r = l+m, (e 2 ), (e 3 ). To avoid unnecessary complications we will work out the details for two typical cases, the general case can be treated in a similar fashion.
By our assumption, for every δ < Γ, there exist sets Λ N ⊂ {1, . . . , N}, such that |Λ N | ≫ N δ and Λ N contains no solution to (71) with distinct entries. Let Because of the condition on Λ N it can be easily verified that if x, y, z, w ∈ B satisfy (71) then at least two of the x, y, z, w belong to the same subinterval I N = j 0 9N , j 0 9N + 1

81N
for some j 0 ∈ Λ N . Say for example that x, y are these two elements. We get that 4nt ∈ − 4n 2 α − 1 81N , −4n 2 α + 1 81N and so t belongs to a set of measure at most 2/(81N). The other five cases give a similar condition, so t belongs to a set I n,N of measure at most 12/(81N) < 1/N. Hence, the integral in (70) is at most Since |Λ N | ≫ N δ , for c = 2−δ 1−δ an easy computation shows that By choosing N large enough we get the advertised estimate. (ii) Let (X, X , µ, T ) be the system used in (i) and let A = B × B ∈ B, where the B will be chosen later. We find (72) µ(A ∩ T n A ∩ T 2n A ∩ T n 2 A) ≤ 1 B (t) · 1 B (s) · 1 B (s + 2nt + n 2 α) · 1 B (s + 4nt + 4n 2 α) · 1 B (t + n 2 α) dλ(s) dλ(t).
Since the transformation (t, s) → (t, s + 2nt + n 2 α) acting on T 2 is measure preserving, this leads to the estimate Similarly, we find the same bound for the other 8 integrals. Combining all 9 integrals we get that µ(A ∩ T n A ∩ T 2n A ∩ T n 2 A) ≤ |Λ N | N 2 . Since |Λ N | ≫ N δ , for d = 1 2 · 2−δ 1−δ an easy computation shows that |Λ N | By choosing N large enough we get the advertised estimate.
From the previous results we conclude that if the type of the equation x+8z = 6y +3w is greater than 2/3, or the type of the equation 2x + y + w = 2z + 2v is greater than 6/7, then the lower bounds of Theorems C and C' fail for the families {2n, 3n, 4n}, {n, 2n, n 2 } correspondingly. If the type of both equations is 1 then they fail for any fixed power of µ(A) or d * (Λ).
All the other exceptional families of Theorems C and C' can be treated similarly. Polynomial families of the form (e 1 ) with l < m < r and r = l + m lead to equations of the form (69), and polynomials families of the form (e 2 ), (e 3 ) lead to equations in five variables. Unfortunately, none of these equations can be treated using the results in [21].