Weak hypergraph regularity and applications to geometric Ramsey theory

Let $\Delta=\Delta_1\times\ldots\times \Delta_d\subseteq\mathbb{R}^n$, where $\mathbb{R}^n=\mathbb{R}^{n_1}\times\cdots\times\mathbb{R}^{n_d}$ with each $\Delta_i\subseteq\mathbb{R}^{n_i}$ a non-degenerate simplex of $n_i$ points. We prove that any set $S\subseteq \mathbb{R}^n$, with $n=n_1+\cdots +n_d$ of positive upper Banach density necessarily contains an isometric copy of all sufficiently large dilates of the configuration $\Delta$. In particular any such set $S\subseteq \mathbb{R}^{2d}$ contains a $d$-dimensional cube of side length $\lambda$, for all $\lambda\geq \lambda_0(S)$. We also prove analogous results with the underlying space being the integer lattice. The proof is based on a weak hypergraph regularity lemma and an associated counting lemma developed in the context of Euclidean spaces and the integer lattice.

A result of Furstenberg, Katznelson, and Weiss [6] states that if S ⊆ R 2 has positive upper Banach density, then its distance set {|x − x ′ | : x, x ′ ∈ S} contains all sufficiently large numbers. Note that the distance set of any set of positive Lebesgue measure in R n automatically contains all sufficiently small numbers (by the Lebesgue density theorem) and that it is easy to construct a set of positive upper density which does not contain a fixed distance by placing small balls centered on an appropriate square grid.
This result was later reproved using Fourier analytic techniques by Bourgain in [1] where he established the following more general result for all configurations of n points in R n whose affine span is n−1 dimensional, namely for all non-degenerate simplices.
Recall that a finite point configuration ∆ ′ is said to be an isometric copy of λ∆ if there exists a bijection φ : ∆ → ∆ ′ such that |φ(v) − φ(w)| = λ |v − w| for all v, w ∈ ∆, i.e. if ∆ ′ is obtained from λ∆ (the dilation of ∆ by a factor λ) via a rotation and translation.
Bourgain deduced Theorem B as an immediate consequence of the following stronger quantitative result for measurable subsets of the unit cube of positive measure. In the proposition below, and throughout this article, we shall refer to a decreasing sequence {λ j } J j=1 as lacunary if λ j+1 ≤ λ j /2 for all 1 ≤ j < J.
In [12] the authors provided a short direct proof of Theorem B without using Proposition B. It is based on the observation that uniformly distributed sets S ⊆ R d contain the expected "number" of isometric copies of dilates λ∆ and that all sets of positive upper density become uniformly distributed at sufficiently large scales. However, for the purposes of this paper it will be important to recall Bourgain's indirect approach.
To see that Proposition B implies Theorem B notice that if Theorem B were not to hold for some set S ⊆ R n of upper Banach density δ * (S) > δ > 0, then there must exist a lacunary sequence λ 1 ≥ · · · ≥ λ J ≥ 1, with J the constant in Proposition B, such that S does not contain an isometric copy of λ j ∆ for any 1 ≤ j ≤ J. Taking a sufficiently large cube Q with side length N ≥ λ 1 and |S ∩ Q| ≥ δ|Q| and scaling back Q → [0, 1] n contradicts Proposition B.
We further note that by taking λ j = 2 −j in Proposition B we obtain the following "Falconer-type" result for subsets of [0, 1] n of positive Lebesgue measure.
Corollary B. If ∆ ⊆ R n is a non-degenerate simplex of n points, then any S ⊆ [0, 1] n with |S| > 0 will necessarily contain an isometric copy of λ∆ for all λ in some interval of length at least exp(−C ∆ |S| −3n ).
Bourgain further demonstrated in [1] that no result along the lines of Theorem B can hold for configurations that contain any three points in arithmetic progression along a line, specifically showing that for any n ≥ 1 there are sets of positive upper Banach density in R n which do not contain an isometric copy of configurations of the form {0, y, 2y} with |y| = λ for all sufficiently large λ. This should be contrasted with the following remarkable result of Tamar Ziegler.
Theorem C (Ziegler [25]). Let F be any configuration of k points in R n with n ≥ 2.
If S ⊆ R n has positive upper density, then there exists a threshold λ 0 = λ 0 (S, F ) such that S ε contains an isometric copy of λF for all λ ≥ λ 0 and any ε > 0, where S ε denotes the ε-neighborhood of S.
Bourgain's example was later generalized by Graham [9] to establish that the condition that ε > 0 in Theorem C is necessary and cannot be strengthened to ε = 0 for any given non-spherical configuration F in R n for any n ≥ 1, that is for any finite configuration of points that cannot be inscribed in some sphere. We note that the sets constructed by Bourgain and Graham have the property that for any ε > 0 their ε-neighborhoods will contain arbitrarily large cubes and hence trivially satisfy Theorem C with λ 0 = 0.
It is natural to ask if any spherical configuration F , beyond the known example of simplices, has the property that every positive upper Banach density subset of R n , for some sufficiently large n, contains an isometric copy of λF for all sufficiently large λ, and even to conjecture that this ought to hold for all spherical configurations. The first breakthrough in this direction came in [12] when the authors established this for configurations of four points forming a 2-dimensional rectangle in R 4 and more generally for any configuration that is the direct product of two non-degenerate simplices in R n for suitably large n.
The purpose of this article is to present a strengthening of the results in [12] and to extend them to cover configurations with a higher dimensional product structure in both the Euclidean and discrete settings. (i) If S ⊆ R 2d has positive upper Banach density, then there exists a threshold λ 0 = λ 0 (S, R) such that S contains an isometric copy of λR for all λ ≥ λ 0 .
(ii) For any 0 < δ ≤ 1 there exists a constant c = c(δ, R) > 0 such that any S ⊆ [0, 1] 2d with |S| ≥ δ is guaranteed to contain an isometric copy of λR for all λ in some interval of length at least c.
The multi-dimensional extension of Szemerédi's theorem on arithmetic progressions in sets of positive density due to Furstenberg and Katznelson [5] implies, and is equivalent to the fact, that there are isometric copies of λR in S for arbitrarily large λ, with sides parallel to the coordinate axis. Theorem 1.1 states that there is an isometric copy of λR in S for every sufficiently large λ, but only with sides parallel to given 2-dimensional coordinate subspaces which provides an extra degree of freedom for each side vector of the rectangle R.
A weaker version of Theorem 1.1, with R 2d replaced with R 5d , was later established by Durcik and Kovač in [4] using an adaptation of arguments of the second author with Cook and Pramanik in [3]. This approach also makes direct use of the full strength of the multi-dimensional Szemerédi theorem and as such leads to quantitatively weaker results.
Our arguments work for more general patterns where d-dimensional rectangles are replaced with direct products of non-degenerate simplices.
(i) If S ⊆ R n has positive upper Banach density, then there exists a threshold λ 0 = λ 0 (S, ∆) such that S contains an isometric copy of λ∆ for all λ ≥ λ 0 .
(ii) For any 0 < δ ≤ 1 there exists a constant c = c(δ, ∆) > 0 such that any S ⊆ [0, 1] n with |S| ≥ δ is guaranteed to contain an isometric copy of λ∆ for all λ in some interval of length at least c.
Moreover the isometric copies of λ∆ in both (i) and (ii) above can all be realized in the special form ∆ ′ 1 × · · · × ∆ ′ d with each ∆ ′ j ⊆ R nj an isometric copy of λ∆ j .
Quantitative Remark. A careful analysis of our proof reveals that the constant c(δ, ∆) can be taken greater than W d (C ′ ∆ δ −3n1···n d ) −1 where W k (m) is a tower of exponentials defined by W 1 (m) = exp(m) and W k+1 (m) = exp(W k (m)) for k ≥ 1.
1.3. Existing Results II: Distances and Simplices in Subsets of Z n . The problem of counting isometric copies of a given non-degenerate simplex in Z n (with one vertex fixed) has been extensively studied via its equivalent formulation as the number of ways a quadratic form can be represented as a sum of squares of linear forms, see [11] and [19]. This was exploited by the second author in [16] and [17] to establish analogous results to those described in Section 1.1 above for subsets of the integer lattice Z n of positive upper density.
Recall that the upper Banach density of a set S ⊆ Z n is analogously defined by where | · | now denotes counting measure on Z n and Q(N ) the discrete cube [−N/2, N/2] n ∩ Z n .
In light of the fact that any pairs of distinct points {x 1 , x 2 } in Z n has the property that the square of the distance between them |x 2 − x 1 | 2 is always a positive integer we introduce the convenient notation √ N := {λ : λ > 0 and λ 2 ∈ Z}.
Note that the fact that S ⊆ Z n could fall entirely into a fixed congruence class of some integer 1 ≤ q ≤ δ −1/n ensures that the q 0 that appears in Theorems A ′ and B ′ above must be divisible by the least common multiple of all integers 1 ≤ q ≤ δ −1/n . Indeed if S = (qZ) n with 1 ≤ q ≤ δ −1/n then S has upper Banach density at least δ, however the distance between any two points x, y ∈ S is of the form |x − y| = qλ for some λ ∈ √ N.
However, in both Theorems A ′ and Part (i) of Theorem B ′ , one can take q 0 = 1 if the sets S are assumed to be suitably uniformly distributed on congruence classes of small modulus. This leads via an easy density increment strategy to short new proofs, see [14] for Theorem A ′ and Section 8 for Part (i) of Theorem B ′ .
The original argument in [17] deduced Theorem B ′ from the following discrete analogue of Proposition B.
To see that Proposition B ′ implies Theorem B ′ notice that if Part (i) of Theorem B ′ were not to hold for some set S ⊆ Z 2n+3 of upper Banach density δ * (S) > δ > 0 with q 0 from Proposition B ′ , then there must exist a lacunary sequence λ 1 ≥ · · · ≥ λ J ≥ 1 in q 0 √ N, with J the constant from Proposition B ′ , such that S does not contain an isometric copy of λ j ∆ for any 1 ≤ j ≤ J. Since we can find a sufficiently large cube Q with integer side length N that is divisible by q 0 and greater than λ 1 such that |S ∩ Q| ≥ δ|Q| , this contradicts Proposition B ′ . Part (ii) of Theorem B ′ follows from Proposition B ′ by taking λ j = 2 J−j q 0 .
1.4. New Results II: Rectangles and Products of Simplices in Subsets of Z n .
We will also establish the following discrete analogues of Theorem (i) If S ⊆ Z 5d has upper Banach density at least δ, then there exist integers q 0 = q 0 (δ, R) and λ 0 = λ 0 (S, R) such that S contains an isometric copy of q 0 λR for all λ ∈ √ N with λ ≥ λ 0 .
Moreover, each of the isometric copies in (i) and (ii) above can be realized in the special form an isometric copy of q 0 λ∆ j and λ∆ j , respectively.
Quantitative Remark. A careful analysis of our proof reveals that the constant q 0 (δ, ∆) (and consequently also N (δ, ∆)) can be taken less than is a tower of exponentials defined by W 1 (m) = exp(m) and W k+1 (m) = exp(W k (m)) for k ≥ 1.
1.5. Notations and Outline. We will consider the parameters d, n 1 , . . . , n d fixed and will not indicate the dependence on them. Thus we will write f = O(g) if |f | ≤ C(n 1 , . . . , n d )g. If the implicit constants in our estimates depend on additional parameters ε, δ, K, . . . the we will write f = O ε,δ,K,... (g). We will use the notation f ≪ g to indicate that |f | ≤ c g for some constant c > 0 sufficiently small for our purposes.
Given an ε > 0 and a (finite or infinite) sequence L 0 ≥ L 1 ≥ · · · > 0, we will say that the sequence is ε-admissible if L j /L j+1 ∈ N and L j+1 ≪ ε 2 L j for all j ≥ 1. Moreover, if q ∈ N is given and L j ∈ N for all 1 ≤ j ≤ J, then we will call the sequence L 0 ≥ L 1 ≥ · · · ≥ L J (ε, q)-admissible if in addition L J /q ∈ N. Such sequences of scales will often appear in our statements both in the continuous and the discrete case.
Our proofs are based on a weak hypergraph regularity lemma and an associated counting lemma developed in the context of Euclidean spaces and the integer lattice. In Section 2 we introduce our approach in the model case of finite fields and prove an analogue of Theorem 1.1 in this setting. In Section 3 we review Theorem 1.2 for a single simplex and ultimately establish the base case of our general inductive approach to Theorem 1.2. In Section 4 we address Theorem 1.2 for the direct product of two simplices, this provides a new proof (and strengthening) of the main result of [12] and serves as a gentle preparation for the more complicated general case which we present in the Section 5. The proof of Theorem 1.4 is outlined in Sections 6 and 7, while a short direct proof of Part (i) of Theorem B ′ is presented in Section 8.

2.
Model case: vector spaces over finite fields.
In this section we will illustrate our general method by giving a complete proof of Theorem 1.1 in the model setting of F n q where F q denotes the finite field of q elements. We do this as the notation and arguments are more transparent in this setting yet many of the main ideas are still present.
We say that two vectors u, v ∈ F n q are orthogonal, if x·y = 0, where "·" stands for the usual dot product. A rectangle in F n q is then a set R = {x 1 , y 1 } × · · · × {x n , y n } with side vectors y i − x i being pairwise orthogonal. The finite field analogue of Theorem 1.1 is the following Proposition 2.1. For any 0 < δ ≤ 1 there exists an integer q 0 = q 0 (δ) with the following property: If q ≥ q 0 and t 1 , . . . , t d ∈ F * q , then any S ⊆ F 2d q with |S| ≥ δ q 2d will contain points where we used the shorthand notation x j := (x j1 , x j2 ) for each 1 ≤ j ≤ d and the averaging notation: for a finite set A = ∅. We have also used the notation for each t ∈ F * q . Note that the function σ t may be viewed as the discrete analogue of the normalized surface area measure on the sphere of radius √ t. It is well-known, see [10], that Note that if N t (1 S ) > 0, then this implies that S contains a rectangle of the form {x 11 , Our approach to Proposition 2.1 in fact establishes the following quantitatively stronger result. Proposition 2.2. For any 0 < ε ≤ 1 there exists an integer q 0 = q 0 (ε) with the following property: If q ≥ q 0 , then for any S ⊆ F 2d q and t 1 , . . . , t d ∈ F * q one has A crucial observation in the proof of Proposition 2.2 is that the averages N t (1 S ) can be compared to ones which can be easily estimated from below. We define, for any S ⊆ F 2d q , the (unrestricted) count It is easy to see, by carefully applying Cauchy-Schwarz d times to E x11∈V1,...,x d1 ∈V d 1 S (x 11 , . . . , x d1 ), that Our approach to Proposition 2.2 therefore reduces to establishing that for any ε > 0 one has The validity of (2.2) will follow immediately from the d = k case of Proposition 2.3 below. However, before we can state this counting lemma we need to introduce some further notation from the theory of hypergraphs, notation that we shall ultimately make use of throughout the paper.
Note that the map x → x e defines a projection π e : V 2 → V . With this notation, we can clearly now write Now for any 1 ≤ k ≤ d and any edge e ′ ∈ H d,k , i.e. e ′ ⊆ {1, . . . , d}, |e ′ | = k, we let V e ′ := j∈e ′ V j . For every x ∈ V 2 and e ∈ H 2 d,k , we define x e := π e (x) where π e : V 2 → V π(e) is the natural projection map.
Our key counting lemma, Proposition 2.3 below, which we will establish by induction on 1 ≤ k ≤ d below, is then the statement that given a family of functions f e : V π(e) → [−1, 1], e ∈ H 2 d,k , the averages (generalizing those discussed above) which are defined by If we apply this Proposition with d = k and f e = 1 S for all e ∈ H 2 d,d , then Theorem 2.1 clearly follows given the lower bound (2.1).

2.3.
Proof of Proposition 2.3. We will establish Proposition 2.3 by inducting on 1 ≤ k ≤ d.
For k = 1 the result follows from the basic observation that if ) by the properties of the functionσ given above.
To see how this implies Proposition 2.3 for k = 1 we note that since H The induction step has two main ingredients, the first is an estimate of the type which is often referred to as a generalized von-Neumann inequality, namely where for any e ∈ H 2 d,k and f : V π(e) → [−1, 1] we define The corresponding inequality for the multilinear expression M(f e ; e ∈ H 2 d,k ), namely the fact that is well-known and is referred to as the Gowers-Cauchy-Schwarz inequality [8].
The second and main ingredient is an approximate decomposition of a graph to simpler ones, and is essentially the so-called weak (hypergraph) regularity lemma of Frieze and Kannan [7]. We choose to state this from a somewhat more abstract/probabilistic point of view, a perspective that will be particularly helpful when we consider our general results in the continuous and discrete settings.
We will first introduce this in the case d = 2. A bipartite graph with (finite) vertex sets V 1 , V 2 is a set S ⊆ V 1 × V 2 and a function f : V 1 × V 2 → R may be viewed as weighted bipartite graph with weights f (x 1 , x 2 ) on the edges (x 1 , x 2 ). If P 1 and P 2 are partitions of V 1 and V 2 respectively then P = P 1 × P 2 is a partition V 1 × V 2 and we let E(f |P) denote the function that is constant and equal to E x∈A f (x) on each atom A = A 1 × A 2 of P. The weak regularity lemma states that for any ε > 0 and for any weighted graph f : Informally this means that the graph f can be approximated with precision ε with the "low complexity" graph E(f, P). If we consider the σ-algebras B i generated by the partitions P i and the σ-algebra B = B 1 ∨ B 2 generated by P 1 × P 2 then we have E(f |B), the so-called conditional expectation function of f . Moreover it is easy to see, using Cauchy-Schwarz, that estimate (2.9) follows from With this more probabilistic point of view the weak regularity lemma says that the function f can be approximated with precision ε by a low complexity function E(f |B 1 B 2 ), corresponding to σ-algebras B i on V i generated by O(ε −2 ) sets. This formulation is also referred to as a Koopman-von Neumann type decomposition, see Corollary 6.3 in [23]. We will need a natural extension to k-regular hypergraphs. See [22,8], and also [2] for extension to sparse hypergraphs. Given an edge e ′ ∈ H d,k of k elements we define its boundary ∂e ′ : We say that the complexity of a σ-algebra B f ′ is at most m, and write complex(B f ′ ) ≤ m, if it is generated by m sets.
The proof of Lemmas 2.1 and 2.2 are presented in Section 2.4 below. We close this subsection by demonstrating how these lemmas can be combined to establish Proposition 2.3.

Proof of Proposition 2.3.
Let ε > 0, 2 ≤ k ≤ d and assume that the lemma holds for k − 1. It follows from Lemma 2.2 that there exists σ-algebras  The conditional expectation functionsf e are linear combinations of the indicator functions 1 Ae of the atoms A e of the σ-algebras B e := f ′ ∈∂π(e) B f ′ . Since the number of terms in this linear combination is at most , with coefficients at most 1 in modulus, plugging these into the multi-linear expressions N t (f e ; e ∈ H The key observation is that these expressions are at level k − 1 instead of k. let p f ′ (e) := (j 1 l 1 , . . . , j k l k ) ∈ H 2 d,k−1 , obtained from e by removing the jl-entry. Then we have It therefore follows that and similarly that . It then follows from the induction hypotheses that . This, together with (2.13) and (2.14), establishes that (2.5) hold for d = k as required. Proof of Lemma 2.1. We start by observing the following consequence of (2.6), namely that and t ∈ F * q . Now, fix an edge, say e 0 = (11, 21, . . . , k1). Partition the edges e ∈ H 2 d,k into three groups; the first group consisting of edges e for which 1 / ∈ π(e), the second where 11 ∈ e and write e = (11, e ′ ) with e ′ ∈ H 2 d−1,k−1 and the third when 12 ∈ e, using the notation H 2 d−1,k−1 := {(j 2 l 2 , . . . , j k l k )}. Accordingly we can write and then we can write By (2.15) we can estimate the inner sum in (2.17) by the square root of Thus by Cauchy-Schwarz, and the fact that f e : V π(e) → [−1, 1] for all e ∈ H 2 d,k , we can conclude that The expression on the right hand side of the inequality above is similar to that in (2.16) except for the following changes. The functions f e for 1 / ∈ e are eliminated i.e. replaced by 1, as well as the factor σ t1 . The functions f (12,e ′ ) , are replaced by f (11,e ′ ) for all e ′ ∈ H 2 d−1,k−1 . Repeating the same procedure for j = 2, . . . , k one eliminates all the factors σ tj for 1 ≤ j ≤ k, moreover all the functions f e for edges e such that j / ∈ π(e) for some 1 ≤ j ≤ k, which leaves only the edges e so that π(e) = (1, 2, . . . , k), moreover for such edges the functions f e are eventually replaced by f e0 = f 11,21,...,k1 . The factors σ tj (x j2 − x j1 ) are not changed for j > k however as the function f e0 does not depend on the variables x jl for j > k, averaging over these variables gives rise to a factor of 1 + O(q −1/2 ). Thus one obtains the following final estimate This proves the lemma, as it is clear that the above procedure can be applied to any edge in place of e 0 = (11, 21, . . . , k1).

Proof of Lemma 2.2.
For a function f e : V π(e) → [−1, 1] and a σ-algebra B π(e) on V π(e) define the energy of f e with respect to B π(e) as and for a family of functions f e and σ-algebras B π(e) , e ∈ H 2 d,k its total energy as We will show that if (2.12) does not hold for a family of σ-algebras B π(e) = f ′ ∈∂π(e) B f ′ , then the σ-algebras B f ′ can be refined so that the total energy of the system increases by a quantity depending only on ε.
Since the functions f e are bounded the total energy of the system is O(1), the energy increment process must stop in O ε (1) steps, and (2.12) must hold. The idea of this procedure appears already in the proof of Szemerédi's regularity lemma [20], and have been used since in various places [7,22,8].
Initially set B f ′ := {∅, V f ′ } and hence B π(e) = {∅, V π(e) } to be the trivial σ-algebras. Assume that in general (2.12) does not hold for a family of σ-algebras B f ′ , with f ′ ∈ H d,k−1 . Then there exists an edge e ∈ H 2 d,k so that g e (V π(e) ) ≥ ε, with g e := f e − E(f e |B π(e) ). Let e = (11, . . . , k1) for simplicity of notation, hence π(e) = (1, . . . , k). Then, with notation x ′ = (x 12 , . . . , x k2 ), one has for some functions h j,x ′ that are bounded by 1 in magnitude. Indeed if and edge e = (11, . . . , k1) then x e does not depend at least one of the variables x j1 . Thus there must be an x ′ for which the inner sum in the above expression is at least ε 2 k . Fix such an x ′ . Decomposing the functions h j,x ′ into their positive and negative parts and then writing them as an average of indicator functions, one obtains that there sets B j ⊆ V π(e)\{j} such that which can be written more succinctly, using the inner product notation, as Since the functions 1 Bj are measurable with respect to the σ-algebra B ′ π(e) for all 1 ≤ j ≤ k, we have that and hence, by Cauchy-Schwarz, that Note that the first equality above follows from the fact that conditional expectation function E(f |B) is the orthogonal projection of f to the subspace of B-measurable functions in L 2 . This also implies that energy of a function is always increasing when the underlying σ-algebra is refined, and (2.22) tells us that the energy of f e is increased by at least c k ε 2 k+1 .
Then the total energy of the family f e with respect to the system d,k is also increased by at least c k ε 2 k+1 . It is clear that the complexity of the σ-algebras B f ′ are increased by at most 1, hence, as explained above, the lemma follows by applying this energy increment process at most O(ε −2 k+1 ) times.

The base case of an inductive strategy to establish Theorem 1.2
In this section we will ultimately establish the base case of our more general inductive argument. We however start by giving a quick review of the proof of Theorem 1.2 when d = 1 (which contains both Theorem B and Corollary B as stated in Section 1.1), namely the case of a single simplex. This was originally addressed in [1] and revisited in [12] and [13].

3.1.
A Single Simplex in R n . Let Q ⊆ R n be a fixed cube and let l(Q) denotes its side length.
Let ∆ 0 = {v 1 = 0, v 2 , . . . , v n } ⊆ R n be a fixed non-degenerate simplex and define t kl := v k · v l for 2 ≤ k, l ≤ n where " · " is the dot product on R n . Given λ > 0, a simplex ∆ = {x 1 = 0, x 2 , . . . , x n } ⊆ R n is isometric to λ∆ 0 if and only if x k · x l = λ 2 t kl for all 2 ≤ k, l ≤ n. Thus the configuration space S λ∆ 0 of isometric copies of λ∆ 0 is a non-singular real variety given by the above equations. Let σ λ∆ 0 be natural normalized surface area measure on S λ∆ 0 , described in [1], [12], and [13]. It is clear that the variable x 1 can be replaced by any of the variables x i by redefining the constants t kl .
For any family of functions f 1 , . . . , f n : Q → [−1, 1] and 0 < λ ≪ l(Q) we define the multi-linear expression We note that all of our functions are 1-bounded and both integrals, in fact all integrals in this paper, are normalized. Recall that we are using the normalized integral notation ffl A f := 1 |A|´A f. Since the normalized measure σ λ∆ 0 is supported on S λ∆0 we will not indicate the support of the variables (x 2 , . . . , x n ) explicitly.
Note also that if S ⊆ Q is a measurable set and N 1 λ∆ 0 ,Q (1 S , . . . , 1 S ) > 0 then S must contain an isometric copy of λ∆ 0 . The following proposition (with Q = [0, 1] n ) is a quantitatively stronger version of Proposition B that appeared in Section 1.1 and hence immediately establishes Theorem 1.2 for d = 1.
Proposition 3.1. For any 0 < ε ≤ 1 there exists an integer J = O(ε −2 log ε −1 ) with the following property: Given any lacunary sequence l(Q) ≥ λ 1 ≥ · · · ≥ λ J and S ⊆ Q, there is some 1 ≤ j < J such that Our approach to establishing Proposition 3.1 is to compare the above expressions to simpler ones for which it is easy to obtain lower bounds. Given a scale 0 < λ ≪ l(Q) we define the multi-linear expression is the shift of the cube Q(λ) by the vector t. Note that if S ⊆ Q is a set of measure |S| ≥ δ|Q| for some δ > 0, then for a given ε > 0, Hölder implies for all scales 0 < λ ≪ ε l(Q).
In light of this observation, and the one above regarding a lower bound for M 1 λ,Q (1 S , . . . , 1 S ), our proof of Proposition 3.1 reduces to establishing the following "counting lemma".
There are two main ingredients in the proof of Proposition 3.2, this will be typical to all of our arguments. The first ingredient is a result which establishes that the our multi-linear forms N 1 λ∆ 0 ,Q (f 1 , . . . , f n ) are controlled by an appropriate norm which measures the uniformity of distribution of functions f : Q → [−1, 1] with respect to particular scales L. This is analogous to estimates in additive combinatorics [8] which are often referred to as generalized von-Neumann inequalities.
For any collections of functions f 1 , . . . , f n : The corresponding inequality for the multilinear expression M 1 λ,Q (f 1 , . . . , f n ), namely the fact that follows easily from Cauchy-Schwarz together with the simple observation that The second key ingredient, proved in [13] and generalized in Lemma 3.3 below, is a Koopman-von Neumann type decomposition of functions where the underlying σ-algebras are generated by cubes of a fixed length. To recall it, let Q ⊆ R n be a cube, L > 0 be scale that divides l(Q), Q(L) = [− L 2 , L 2 ] n , and G L,Q denote the collection of cubes t + Q(L) partitioning the cube Q and Γ L,Q denote the grids corresponding to the centers of the cubes. By a slightly abuse of notation we also write G L,Q for the σ-algebra generated by the grid. Recall that the conditional expectation function E(f |G L,Q ) is constant and equal to ffl A f on each cube A ∈ G L,Q . Lemma 3.2 (A Koopman-von Neumann type decomposition [13]). Let 0 < ε ≤ 1 and Q ⊆ R n be a cube.
There exists an integerJ 1 = O(ε −2 ) such that for any ε-admissible sequence l(Q) ≥ L 1 ≥ · · · ≥ LJ 1 and function f : Let G Lj ,Q be the grid obtained from Lemma 3.2 for the functions f = 1 S for some fixed ε > 0. Letf := E(f |G Lj ,Q ), then by (3.6) and multi-linearity, we have Thus in showing (6.4) one can replace the functions f withf . If we make the additional assumption that λ ≪ εL j then it is easy to see, using the fact that the functionf is constant on the cubes Q t (L j ) ∈ G Lj ,Q , that Since the condition ε −6 L j+1 ≪ λ ≪ εL j can be replaced with L j+1 ≪ λ ≪ L j if one passes to a subsequence of scales, for example L ′ j = L 5j , this completes the proof of Proposition 3.2.
3.2. The base case of a general inductive strategy.
In this section, as preparation to handle the case of products of simplices, we prove a parametric version of Proposition 3.2, namely Proposition 3.3 below, which will serve as the base case for later inductive arguments.
Let Q = Q 1 × · · · × Q d with Q i ⊆ R ni be cubes of equal side length l(Q). Let L be a scale dividing l(Q) and for each t Let 0 < ε ≤ 1 and R ≥ 1. There exists an integer such that for any ε-admissible sequence of scales L 0 ≥ L 1 ≥ · · · ≥ L J1 with the property that L 0 divides l(Q) and collection of functions The proof of Proposition 3.3 will follow from Lemma 3.1 and the following generalization of Lemma 3.2 in which we simultaneously consider a family of functions supported on the subcubes in a partition of an original cube Q.
Finally, if we also have λ ≪ εL j then it is easy to see that as the functionsf i,r k,t are constant on cubes Q ti (L j ) of G Lj ,Qi , which are of size L j ≪ εL 0 . Passing first to a subsequence of scales, for example L ′ j = L 5j , the condition ε −6 L j+1 ≪ λ ≪ εL j can be replaced with L j+1 ≪ λ ≪ L j so this completes the proof of the Proposition.
We conclude this section with a sketch of the proof of Lemma 3.3. These arguments are standard, see for example the proof of Lemma 3.2 given in [12].

Proof of Lemma 3.3. First we make an observation about the
for any function g : Q → [−1, 1]. Moreover, since the cube Q s (L) is partitioned into the smaller cubes Q t (L ′ ), we have by Cauchy-Schwarz From these observations it is easy to see that and we note that the right side of the above expression is E(g|G L ′ ,Q ) 2 L 2 (Q) since the conditional expectation function E(g|G L ′ ,Q ) is constant and equal to ffl x∈Qt(L ′ ) g(x) dx on the cubes Q t (L ′ ). Suppose that (3.10) does not hold for some 1 ≤ i ≤ m for every t in some set T ε ⊆ Γ L0,Q of size |T ε | > ε |Γ L0,Q |. If we apply the above observation to g := f i,t − E(f i,t |G Lj ,Qt(L0) ), for every t ∈ T ε , we obtain by orthogonality that It is clear that the sums in the above expressions are bounded by m for all j ≥ 1, thus (3.11) cannot hold for some 1 ≤ j ≤J 1 forJ 1 := C m ε −3 . This implies that (3.10) must hold for some 1 ≤ j ≤J 1 , for all

Product of two simplices in R n
Although not strictly necessary, we discuss in this section the special case d = 2 of Theorem 1.2. This already gives an improvement of the main results of [12], but more importantly serves as a gentle preparation for the more complicated general case, presented in the Section 5, which involve both a plethora of different scales and the hypergraph bundle notation introduced in Section 2.2.
. . , v 2n2 } ⊆ R n2 two non-degenerate simplices. In order to "count" configurations of the form ∆ = ∆ 1 × ∆ 2 ⊆ R n1+n2 with ∆ 1 and ∆ 2 isometric copies of λ∆ 0 1 and λ∆ 0 2 respectively for some 0 < λ ≪ l(Q) in a set S ⊆ Q we introduce the multi-linear expression Indeed, if f kl = 1 S for all 1 ≤ k ≤ n 1 and 1 ≤ l ≤ n 2 then the above expression is 0 unless there exists a configuration ∆ ⊆ S of the form ∆ 1 × ∆ 2 with ∆ 1 and ∆ 2 isometric copies of λ∆ 0 1 and λ∆ 0 2 respectively. The short argument presented in Section 1.1 demonstrating how both Theorem B and Corollary B follow from Proposition B, and hence from Proposition 3.1, applies equally well to each of our main theorems. This reduces our main theorems to analogous quantitative results involving an arbitrary lacunary sequence of scales. In the case d = 2 of Theorem 1.2 this stronger quantitative result takes the following form: Given any lacunary sequence l(Q) ≥ λ 1 ≥ · · · ≥ λ J and S ⊆ Q, there is some 1 ≤ j < J such that Our approach to establishing Proposition 4.1 is again to compare the above expressions to simpler ones for which it is easy to obtain lower bounds. For any 0 < λ ≪ l(Q) and family of functions f kl : Note that if S ⊆ Q is a set of measure |S| ≥ δ|Q| for some δ > 0, then careful applications of Hölder's inequality give for all scales 0 < λ ≪ ε l(Q).
In light of the observation above, and the discussion preceding Proposition 3.2, we see that Proposition 4.1, and hence Theorem 1.2 when d = 2, will follows as a consequence of the following Proposition 4.2. Let 0 < ε ≪ 1. There exists an integer J 2 = O(exp(Cε −12 )) such that for any ε-admissible sequence of scales l(Q) ≥ L 1 ≥ · · · ≥ L J2 and S ⊆ Q there is some 1 ≤ j < J 2 such that There are again two main ingredients in the proof of Proposition 4.2. The first establishes that the our multi-linear forms N 2 λ∆ 0 ,Q ({f kl }) are controlled by an appropriate box-type norm attached to a scale L. Let Q = Q 1 × Q 2 be a cube. For any scale 0 < L ≪ l(Q) and function f : Q → R we define its local box norm at scale L to be Lemma 4.1 (A Generalized von-Neumann inequality [12]). Let ε > 0, 0 < λ ≪ l(Q), and 0 < L ≪ ε 24 λ. For any collections of functions f kl : The result above was essentially proved in [12] for the multi-linear forms N 2 λ∆ 0 ,Q when Q = [0, 1] n1+n2 , however a simple scaling argument transfers the result to an arbitrary cube Q. For completeness we include its short proof in Section 4.2 below.
The second and main ingredient is an analogue of a weak form of Szemerédi's regularity lemma due to Frieze and Kannan [7]. The more probabilistic formulation, we will use below, can be found for example in [21], [22], and [23], and is also sometimes referred to as a Koopman-von Neumann type decomposition.
For any cube Q ⊆ R n and scale L > 0 that divides l(Q) we will let Q(L) = [− L 2 , L 2 ] n and G L,Q denote the collection of cubes Q t (L) = t + Q(L) partitioning the cube Q and let Γ L,Q denote grid corresponding to the centers of these cubes. We will say that a finite σ-algebra B on Q is of scale L if it contains G L,Q and for simplicity of notation will write B t for B| Qt(L) .
Recall that if we have two σ-algebras B 1 on a cube Q 1 and B 2 on Q 2 then by B 1 ∨ B 2 we mean the σ-algebra on Q = Q 1 × Q 2 generated by the sets B 1 × B 2 with B 1 ∈ B 1 and B 2 ∈ B 2 . Recall also that we say the complexity of a σ-algebra B is at most m, and write complex(B) ≤ m, if it is generated by m sets. Let 0 < ε ≪ 1 and Q = Q 1 × Q 2 with Q 1 ⊆ R n1 and Q 2 ⊆ R n2 be cubes of equal side length l(Q).
There exists an integerJ 2 = O(ε −12 ) such that for any ε 4 -admissible sequence l(Q) ≥ L 1 ≥ · · · ≥ LJ 2 and function f : Q → [−1, 1] there is some 1 ≤ j ≤J 2 and a σ-algebra B of scale L j on Q such that which has the additional local structure that for each t = (t 1 , t 2 ) ∈ Γ Lj,Q there exist σ-algebras B 1,t on Q t1 (L j ) and B 2,t on Q t2 (L j ) with complex(B i,t ) = O(j) for i = 1, 2 such that B t = B 1,t ∨ B 2,t .
Comparing the above statement to Lemma 2.2 for d = 2, i.e to the weak regularity lemma, note that the σ-algebra B of scale L j has a direct product structure only locally, inside each cube Q t (L j ). Moreover this product structure varies with t ∈ Γ Lj,Q , however the "local complexity" remains uniformly bounded.
Assuming for now the validity of Lemmas 4.1 and 4.2 we prove Proposition 4.2. We will make crucial use of Proposition 3.3, namely our parametric counting lemma on R n for simplices.
Proof of Proposition 4.2. Let 0 < ε ≪ 1, ε 1 := exp(−C 1 ε −12 ) for some C 1 ≫ 1, and {L j } j≥1 be an ε 1admissible sequence of scales. Set R = ε ε −1 1 and J 1 (ε 1 , R) be the parameter appearing in Proposition 3.3, Applying Lemma 4.2, with f kl = f := 1 S for all 1 ≤ k ≤ n 1 and 1 ≤ l ≤ n 2 , guarantees the existence of a σ-algebra B of scale L ′ j on Q such that (4.8) f Moreover, we know that B has the additional local structure that for each Thus, if we let R 1,t and R 2,t denote the number of atoms in B 1,t and B 2,t respectively, then we can assume, by formally adding the empty set to these collections of atoms if necessary, that R 1,t = R 2,t = R ′ := exp(Cε −12 ) for all t ∈ Γ L ′ j ,Q . If we letf := E(f |B 1 ∨ B 2 ), then by Lemma 4.1 and multi-linearity we have For a given t ∈ Γ Q,L ′ j writef t for the restriction off to the cube Q t (L ′ j ). By localization, one then has (4.10) 1,t } 1≤r1≤R ′ and {A r2 2,t } 1≤r2≤R ′ are the collections of the atoms of the σ-algebras B 1,t and B 2,t defined on the cubes Q t 1 (L ′ j ) and Q t 2 (L ′ j ). Thus for each t ∈ Γ L ′ j ,Q one has where r = (r 1 , r 2 ). Plugging these linear expansions into the multi-linear expressions in above one obtains using the notations r kl = (r 1,kl , r 2,kl ), α r,t = kl α r kl ,t . Notice that the product 1,t , that is if r 1,kl = r 1,k for all 1 ≤ l ≤ n 2 , as the atoms A r 1,t are all disjoint. Similarly, one has that r 2,kl = r 2,l for all 1 ≤ k ≤ n 1 . Thus, in fact and similarly Note, that indices r are running through the index set [1, The key observation is that (4.14) N 2 ).

Proof of Theorem 1.2: The general case.
After these preparations we will now consider the general case of Theorem 1.2. Let Q = Q 1 ×· · ·×Q d ⊆ R n with Q i ⊆ R ni cubes of equal side length l(Q) and ∆ 0 = ∆ 0 1 × · · · × ∆ 0 d with each ∆ i ⊆ R ni a non-degenerate simplex of n i points for 1 ≤ i ≤ d.
We will use a generalized version of the hypergraph terminology introduced in Section 2. In particular, for a vertex set I = {1, 2, . . . , d} and set K = {il; 1 ≤ i ≤ d, 1 ≤ l ≤ n i } we will let π : K → I denote the projection defined by π(il) := i. As before we will let H d,k := {e ⊆ I; |e| = k} denote the complete k-regular hypergraph with vertex set I, and for the multi-index n = (n 1 , . . . , n d ) define the hypergraph bundle H n d,k := {e ⊆ K; |e| = |π(e)| = k} noting that |π −1 (i)| = n i for all i ∈ I.
In order to parameterize the vertices of direct products of simplices, i.e. sets of the form ∆ = ∆ 1 ×· · ·×∆ d with ∆ i ⊆ Q i , we consider points x = (x 1 , . . . , x d ) with x i = (x i1 , . . . , x ini ) ∈ Q ni i for each i ∈ I. Now for any 1 ≤ k ≤ d and any edge e ′ ∈ H d,k we will write Q e ′ := i∈e ′ Q i , and for every x ∈ Q n1 1 × · · · × Q n d d and e ∈ H n d,k we define x e := π e (x), where π e : Q n1 1 × · · · × Q n d d → Q π(e) is the natural projection map. Writing ∆ i = {x i1 , . . . , x ini } we have that ∆ 1 × · · · × ∆ d = {x e : e ∈ H n d,d } since every edge x e is of the form (x 1l1 , . . . , x dl d ). We can therefore identify points x with configurations of the form ∆ 1 × · · · × ∆ d .
For any 0 < λ ≪ l(Q) the measures dσ λ∆ 0 i , introduced in Section 3.1, are supported on points (y 2 , . . . , y ni ) for which the simplex ∆ i = {0, y 2 , . . . , y ni } is isometric to λ∆ 0 i . For simplicity of notation we will writê Note that the support of the measure dσ λ i is the set of points x i so that the simplex ∆ i := {x i1 , . . . , x ini } is isometric to λ∆ 0 i and x i1 ∈ Q i , moreover the measure is normalized. Thus if S ⊆ Q is a set then the density of configurations ∆ in S of the form ∆ = ∆ 1 × . . . × ∆ d with each ∆ i ⊆ Q i an isometric copy of λ∆ 0 i is given by the expression The proof of Theorem 1.2 reduces to establishing the following stronger quantitative result.
Proposition 5.1. For any 0 < ε ≪ 1 there exists an integer J d = J d (ε) with the following property: Given any lacunary sequence l(Q) ≥ λ 1 ≥ · · · ≥ λ J d and S ⊆ Q, there is some 1 ≤ j < J d such that Quantitative Remark. A careful analysis of our proof reveals that there is a choice of J d (ε) which is less than W d (log(C ∆ ε −3 )), where W k (m) is again the tower-exponential function defined by W 1 (m) = exp(m) and W k+1 (m) = exp(W k (m)) for k ≥ 1.
For any 0 < λ ≪ l(Q) and set S ⊆ Q we define the expression: for all scales 0 < λ ≪ ε l(Q).
In light of the discussion above, and that preceding Proposition 3.2, we see that Proposition 5.1, and hence Theorem 1.2 in general, will follows as a consequence of the following Proposition 5.2. Let 0 < ε ≪ 1. There exists an integer J d = J d (ε) such that for any ε-admissible sequence of scales l(Q) ≥ L 1 ≥ · · · ≥ L J d and S ⊆ Q there is some 1 ≤ j < J d such that The validity of Proposition 5.2 will follow immediately from the d = k case of Proposition 5.3 below. 5.1. Reduction of Proposition 5.2 to a more general "local" counting lemma.
For any given 1 ≤ k ≤ d and collection of functions f e : Q π(e) → [−1, 1] with e ∈ H n d,k we define the following multi-linear expressions and Our strategy to proving Proposition 5.2 is the same as illustrated in the finite field settings, that is we would like to compare averages N λ∆ 0 ,Q (f e ; e ∈ H n d,k ) to those of M d λ,Q (f e ; e ∈ H n d,k ), at certain scales λ ∈ [L j+1 , L j ], inductively for 1 ≤ k ≤ d. However in the Euclidean case, an extra complication emerges due to the fact the (hypergraph) regularity lemma, the analogue of Lemma 2.2, does not produce σ-algebras B f , for f ∈ H n d,k−1 , on the cubes Q f . In a similar manner to the case for d = 2 discussed in the previous section, we will only obtain σ-algebras "local" on cubes Q t f (L 0 ) at some scale L 0 > 0. This will have the effect that the functions f e will be replaced by a family of functions f e,t , where t runs through a grid Γ L0,Q .
To be more precise, let L > 0 be a scale dividing the side-length l(Q). For t ∈ Γ L,Q and e ′ ∈ H d,k we will use t e ′ to denote the projection of t onto Q e ′ and Q t e ′ (L) := t e ′ + Q e ′ (L) to denote the projection of the cube Q t (L) centered at t onto Q e ′ . It is then easy to see that for any ε > 0 we have At this point the proof of Proposition 5.2 reduces to showing that the expressions in (7.8) and (7.9) only differ by O(ε) at some scales λ ∈ [L j+1 , L j ], given an ε-admissible sequence L 0 ≥ L 1 ≥ · · · ≥ L J , for any collection of bounded functions f e,t , e ∈ H n d,k , t ∈ Γ L0,Q . Indeed, our crucial result will be the following Proposition 5.3 (Local Counting Lemma). Let 0 < ε ≪ 1 and M ≥ 1. There exists an integer J k = J k (ε, M ) such that for any ε-admissible sequence of scales L 0 ≥ L 1 ≥ · · · ≥ L J k with the property that L 0 divides l(Q), and collection of functions

Proof of Proposition 5.3.
We will prove Proposition 5.3 by induction on 1 ≤ k ≤ d. For k = 1 this is basically Proposition 3.3.
By Proposition 3.3 there exists an 1 ≤ j < J 1 = O(M ε −4 ) and an exceptional set T ε ⊆ Γ L0,Q of size |T ε | ≤ ε|Γ L0,Q |, such that uniformly for t / ∈ T ε and for 1 ≤ i ≤ d, one has For the induction step we again need two main ingredients. The first establishes that the our multi-linear forms N d λ∆ 0 ,Q (f e ; e ∈ H n d,k ) are controlled by an appropriate box-type norm attached to a scale L. Let Q = Q 1 × · · · × Q d and 1 ≤ k ≤ d. For any scale 0 < L ≪ l(Q) and function f : Q e ′ → [−1, 1] with e ′ ∈ H d,k we define its local box norm at scale L by for any cube Q of the form Q = Q 1 × · · · × Q k . The crucial ingredient is the following analogue of the weak hypergraph regularity lemma.
There existsJ k = O(M ε −2 k+3 ) such that for any ε 2 k -admissible sequence L 0 ≥ L 1 ≥ · · · ≥ LJ k with the property that L 0 divides l(Q) and collection of functions there is some 1 ≤ j <J k and σ-algebras B e ′ ,t of scale L j on Q t e ′ (L 0 ) for each t ∈ Γ L0,Q and e ′ ∈ H d,k such that Moreover, the σ-algebras B e ′ ,t have the additional local structure that the exist σ-algebras B e ′ ,f ′ ,s on Q s f ′ (L j ) with complex(B e ′ ,f ′ ,s ) = O(j) for each s ∈ Γ Lj,Q , e ′ ∈ H d,k , and f ′ ∈ ∂e ′ such that if s ∈ Q t (L 0 ), then Lemma 5.2 is the parametric and simultaneous version of the extension of Lemma 3.7 to the product of d simplices. The difference is that in the general case one has to deal with a parametric family of functions f m e,t as t is running through a grid Γ L0,Q . The essential new content of Lemma 5.2 is that one can develop σ-algebras B e ′ ,t on the cubes Q t (L 0 ) with respect to the family of functions f m e,t such that the local structure described above and (5.16) hold simultaneously for almost all t ∈ Γ L0,Q .
Proof of Proposition 5.3. Assume the Proposition holds for k − 1.
Let ε > 0, ε 1 := exp (−C 1 ε −2 k+3 ) for some large constant C 1 = C 1 (n, k, d) ≫ 1, and {L j } j≥1 be an Lemma 5.2 then guarantees the existence of σ-algebras B e ′ ,t of scale L ′ j on Q t e ′ (L 0 ) for each t ∈ Γ L0,Q and e ′ ∈ H d,k , with the local structure described above, such that where {A re π(e),s } 1≤r≤Re,s is the family of atoms of the σ-algebra B π(e),t restricted to the cube Q s (L ′ j ). Note that |α s,re | ≤ 1 and |R e,s | = O(exp (Cε −2 k+3 )). By adding the empty set to the collection of atoms one may assume |R e,s | = R := exp (Cε −2 k+3 ) for all e ∈ H n d,k and s ∈ Γ L ′ j ,Q . Then, by multi-linearity, using the notations r = (r e ) e∈H n d,k and α r,s = e α s,re , one has both (5. The key observation is that these expressions in the sum above are all at level k − 1 instead of k. To see this let e = (i 1 l 1 , . . . , i m l m , . . . , i k l k ) so e ′ = π(e) = (i 1 , . . . , i m , . . . , i k ). If f ′ = e ′ \{i m } then recall that the edge p f ′ (e) = (i 1 l 1 , . . . , i k l k ) ∈ H n d,k−1 is obtained from e by removing the i m l m -entry. Thus, for any atom A e ′ ,s of B s,e ′ (L ′ j ) we have by (5.17), that where A e ′ ,f ′ ,s is an atom of the σ-algebra B e ′ ,f ′ ,s . Thus It follows that Since the cubes Q t (L 0 ) form a partition of Q as t runs through the grid Γ L0,Q the relative density of the set S ε1 can substantially increase only of a few cubes Q t (L 0 ). Indeed, it is easy to see that |T ′′ ε1 | ≤ ε We claim that (5.11) holds for λ ∈ [L j+1 , L j ] uniformly in t / ∈ T ε := T ′ ε ∪ T ′′ ε1 , e ∈ H n d,k , and 1 ≤ m ≤ M . Indeed, from (7.17), (7.18), and (5.31) and the fact that |α s,r | ≤ 1, it follows Proof of Lemma 5.1. The argument is similar to that of Lemma 2.1. Fix an edge, say e 0 = (11, 12, . . . , 1k), and partition the edges e ∈ H n d,k in to as follows. Let H 0 be the set of those edges e for which 1 / ∈ π(e), and for l = 1, . . . , n 1 let H l denote the collection of edges of the form e = (1l, j 2 l 2 , . . . , j k l k ), in other words e ∈ H l if e = (1l, e ′ ) for some edge e ′ = (j 2 l 2 , . . . , j k l k ) ∈ H n d−1,k−1 . Accordingly write Then one may write For the inner integrals we have, using (3.6), the estimate g 1 (y 11 )g 1 (y 12 )ψ 1 L (y 12 − y 11 ) dy 11 dy 12 + O(ε 2 k ).
provided 0 < L ≪ (ε 2 k ) 6 λ, where as in the proof of Lemma 4.1 we use the notation . .
The expression we have obtained above is similar to the one in (5.6) except for the following changes. The variable x 1 ∈ Q n1 1 is replaced by y 1 ∈ Q 2 1 and the measure dσ λ 1 by dω 1 L . The functions f 1l,e ′ are replaced by f 11,e ′ , for 1 ≤ l ≤ n 1 , while the functions f e for all e ∈ H n d,k such that 1 / ∈ π(e) are eliminated, that is replaced by 1. Repeating the same procedure for i = 2, . . . , k replaces all variables x i with variables y i as well as the measures dσ λ i with dω i L . The procedure eliminates all functions f e when e is an edge such that i / ∈ π(e) for some 1 ≤ i ≤ k; for the remaining edges, when π(e) = (1, . . . , k), it replaces the functions f e with f e0 = f 11,21,...,1k . For k < i the variables x i and the measures dσ λ i are not changed, however integrating in these variables will have no contribution as the measures are normalized. Thus one obtains the following final estimate noting that these integrals are not normalized. Thus, one may write the expression in (5.34), using a change of variables y i1 := y i1 − t i , y i2 := y i2 − t i , as where the last equality follows from the facts that the function f e0 is supported on the cube Q π(e0) and hence the integration in t is restricted to the cube Q + Q(L), giving rise an error of O(L/l(Q)). Estimate (5.14) follows from (5.34) and (5.35) noting that the above procedure can be applied to any e ∈ H , ∅} for e ′ ∈ H d,k , f ′ ∈ ∂e ′ , and t, s ∈ Γ L0,Q . We will develop σ-algebras B e ′ ,t (L j ) of scale L j such that (5.17) holds with complex(B e ′ ,f ′ ,s (L j )) ≤ j.
We define the total energy of a family of functions f m e,t with respect to a family of σ-algebras B e ′ ,t (L j ) as E(f m e,t |B π(e),t (L j )) 2 L 2 (Qt π(e) (L0)) .
Since |f m e,t | ≤ 1 for all e, m, and t it follows that the total energy is bounded by M · |H n d,k | = O(M ). Our strategy will be to show that if (5.16) does not hold then there exist a family of σ-algebras B e ′ ,t (L j+2 ) such that the total energy of the family of functions f m e,t is increased by at least c k ε 2 k+3 with respect to this new family of σ-algebras, and at the same time ensuring that (5.17) remains valid with complex(B e ′ ,f ′ ,s (L j+2 )) ≤ j + 2. This iterative process must stop at some j = O(M ε −2 k+3 ) proving the Lemma.
Fix t ∈ T ε and let e ∈ H n d,k and 1 ≤ m ≤ M be such that f m e,t − E(f m e,t |B π(e),t (L j )) L j+1 (Qt π(e) (L0)) ≥ ε and write e ′ := π(e). Consider the partition of the cube Q t e ′ (L 0 ) into small cubes Q s e ′ (L j+2 ) where s e ′ ∈ Γ Lj+2,Q e ′ ∩Q t e ′ (L 0 ). By the localization properties of the Lj+1 (Q)-norm, and the fact that L j+2 ≪ ε 2 k L j+1 we have that For a given cube Q and functions f, g : Q → R, define the normalized inner product of f and g as Then by the well-known property of the -norm, see for example [23] or the proof of Lemma 2.2, it follows from (5.37) that there exits sets If s ∈ Γ Lj+2,Q then there is a unique t = t(s) ∈ Γ L0,Q such that s ∈ Q t (L 0 ). If t ∈ T ε and s e ′ ∈ S ε,e,t then we define the σ-algebras B f ′ ,e ′ ,s (L j+2 ) on Q s f ′ (L j+2 ) as follows. Write B f ′ ,e ′ ,s = B f ′ ,s e ′ ,t where t = t(s) and let B f ′ ,e ′ ,s (L j+2 ) be the σ-algebra generated by the set B f ′ ,e ′ ,s and the σ-algebra B f ′ ,e ′ ,s ′ (L j ) restricted to Q s f ′ (L j+2 ) where s ′ ∈ Γ Lj ,Q is the unique element so that s ∈ Q s ′ (L j ). Note that that the complexity of the σ-algebra B f ′ ,e ′ ,s (L j+2 ) is at most one larger then the complexity of the σ-algebra B f ′ ,e ′ ,s ′ (L j ) as restricting a σ-algebra to a set does not increase its complexity. If t = t(s) / ∈ T ε or s e ′ / ∈ S ε,e,t then let B f ′ ,e ′ ,s (L j+2 ) be simply the restriction of B f ′ ,e ′ ,s ′ (L j ) to the cube Q s f ′ (L j+2 ), or equivalently define the sets B f ′ ,e ′ ,s := Q s f ′ (L j+2 ). Finally, let be the corresponding σ-algebra on the cube Q s e ′ (L j+2 ).
Since the cubes Q s e ′ (L j+2 ) partition the cube Q t e ′ (L 0 ) as s e ′ runs through the grid Γ Lj+2,Q e ′ ∩ Q t e ′ (L 0 ), these σ-algebras define a σ-algebra B e ′ ,t (L j+2 ) on Q t e ′ (L 0 ), such that its restriction to the cubes Q s e ′ (L j+2 ) is equal to the σ-algebras B e ′ ,s (L j+2 ).
Since the function f ′ ∈∂e ′ 1 B f ′ ,e ′ ,s is measurable with respect to the σ-algebra B e ′ ,t (L j+2 ) restricted to the cube Q s e ′ (L j+2 ) one clearly has (5.40) f m e,t − E(f m e,t |B e ′ ,t (L j+2 )), and hence, by (5.38), that It then follows from Cauchy-Schwarz and orthogonality, using the fact that the σ-algebra B e ′ ,t (L j+2 )) is a refinement of B e ′ ,t (L j+2 ), that At this point we have shown that if t ∈ T ε then there exists an edge e ∈ H n d,k , 1 ≤ m ≤ M , and σ-algebras B e ′ ,t (L j+2 )) of scale L j+2 on Q t e ′ (L 0 ), with e ′ = π(e), such that (5.43) holds.
As the total energy E(f m e,t |B e ′ ,t (L j )) is bounded by O(M ), the process must stop at a step j = O(M ε −2 k+3 ) where (5.16) holds for a σ-algebra of "local complexity" at most j, completing the proof of Lemma 5.2.
6. The base case of an inductive strategy to establish Theorem 1.4 In this section we will ultimately establish the base case of our more general inductive argument. We will however start by giving a (new) proof of Theorem B ′ , namely the case d = 1 of Theorem 1.4. 6.1. A Single Simplex in Z n . Let ∆ 0 = {v 1 = 0, v 2 , . . . , v n1 } be a fixed non-degenerate simplex of n 1 points in Z n with n = 2n 1 + 3 and define t kl := v k · v l for 2 ≤ k, l ≤ n 1 . Recall, see [17], that a simplex ∆ = {m 1 = 0, . . . , m n1 } ⊆ Z n is isometric to λ∆ 0 if and only if m k · m l = λ 2 t kl for all 2 ≤ k, l ≤ n 1 .
Let Q ⊆ Z n be a fixed cube and let l(Q) denotes its side length. For any family of functions and 0 < λ ≪ l(Q) we define the following two multi-linear expressions Note that if S ⊆ Q and N 1 λ∆ 0 ,q,Q (1 S , . . . , 1 S ) > 0 then S must contain an isometric copy of λ∆ 0 , while if |S| ≥ δ|Q| for some δ > 0 then as before Hölder implies that Recall that for any 0 < ε ≪ 1 and positive integer q we call a sequence L 1 ≥ · · · ≥ L J (ε, q)-admissible if L j /L j+1 ∈ N and L j+1 ≪ ε 2 L j for all 1 ≤ j < J and L J /q ∈ N. Note that if λ 1 ≥ · · · ≥ λ J ′ ≥ 1 is any lacunary sequence in q √ N with J ′ ≫ (log ε −1 ) J + log q, one can always finds an (ε, q)-admissible sequence of scales L 1 ≥ · · · ≥ L J with the property that for each 1 ≤ j < J the interval [L j+1 , L j ] contains at least two consecutive elements from the original lacunary sequence.
In light of these observations we see that the following "counting lemma" ultimately establishes a quantitatively stronger version of Proposition B ′ that appeared in Section 1.3 and hence immediately establishes Theorem 1.4 for d = 1.
There exists J 1 = O(ε −2 ) such that for any (ε, q J1 )-admissible sequence of scales l(Q) ≥ L 1 ≥ · · · ≥ L J1 and S ⊆ Q there is some 1 ≤ j < J 1 such that As in the continuous setting the proof of Proposition 6.1 has two main ingredients, namely Lemmas 6.1 and 6.2 below. In these lemmas, and for the remainder Sections 6 and 7, we will continue to use the notation q 1 (ε) := lcm{1 ≤ q ≤ Cε −10 } for any given ε > 0.
Let 0 < ε ≪ 1 and q j := q 1 (ε) j for all j ≥ 1. There exists an integerJ 1 = O(ε −2 ) such that any (ε, qJ 1 )admissible sequence of scales l(Q) ≥ L 1 ≥ · · · ≥ LJ 1 and function f : The reduction of Proposition 6.1 to these two lemmas is essentially identical to the analogous argument in the continuous setting as presented at the end of Section 3.1, we choose to omit the details.
Proof of Lemma 6.1. We will rely on some prior exponential sum estimates, specifically Propositions 4.2 and 4.4 in [17]. First we deal with the case n 1 ≥ 3. By the change of variables m 1 := m 1 , m i := m i − m 1 for 2 ≤ i ≤ n 1 , one may write We now write . . , v n1−1 } and for each m 2 , . . . , m n1−1 ∈ (qZ) n we are using σ m2,...,mn 1 −1 λ,q (m) denote the (essentially) normalized indicator function of the subset of (qZ) n that contains m if and only if m · m k = λ 2 t kn1 for all 2 ≤ k ≤ n 1 .
Using the fact that |f i | ≤ 1, together with Cauchy-Schwarz and Plancherel, one can then easily see that It then follows by Propositions 4.2 and 4.4 in [17], with δ = ε 4 and after rescaling by q, that in addition to being non-negative and uniformly bounded in ξ we in fact have for all l ∈ Z n .
We note that the expression H λ,q (ξ) may be interpreted as the Fourier transform of the indicator function of the set of integer points on a certain variety, and estimate (6.9) indicates that this concentrates near rational points of small denominator. It is this crucial fact from number theory which makes results like Theorem B ′ possible. Since it is easy to see that χ q,L (l/q) = 1 for all l ∈ Z n and that there exists some absolute constant C > 0 such that (6.10) 0 ≤ 1 − χ q,L (ξ) 2 ≤ C L |ξ − l/q| for all ξ ∈ T n and l ∈ Z n . It is then easy to see using our assumption that qq 1 (ε)|q ′ that for some constant C > 0 uniformly in ξ ∈ T n provided L ≪ ε 5 λ. Substituting inequality (6.7) into (6.8), we obtain This proves Lemma 6.1 for k ≥ 3, as it is clear that by re-indexing the above estimate holds for any of the functions f i in place of f n1 . For n 1 = 2 an easy modification of arguments in [14], specifically the proof of Lemma 3 therein, establishes that Proof of Lemma 6.2. Let q, L ∈ N such that L|N , q|L. The "modulo q" grids Q t (q, L) = t+Q(q, L) partition the cube Q with t running through the set Γ q,L,Q = {1, . . . , q} n + Γ L,Q , where Γ L,Q denote the centers of the "integer" grids t + Q(L) in an initial partition of Q. Let q ′ , L ′ be positive integers so that q|q ′ , L ′ |L and L ′ ≪ ε 2 L. If s ∈ Γ q ′ ,L ′ ,Q and t ∈ Q s (q ′ , L ′ ) then |t − s| = O(L ′ ) and hence E x∈Qt(q,L) g(x) = E x∈Qs(q,L) g(x) + O(L ′ /L) for any function g : Q → [−1, 1]. Moreover, since the cube Q s (q, L) is partitioned into the smaller cubes Q t (q ′ , L ′ ), we have by Cauchy-Schwarz From this it is easy to see that and we note that the right side of the above expression is E(g|G q ′ ,L ′ ,Q ) 2 L 2 (Q) since the conditional expectation function E(g|G q ′ ,L ′ ,Q ) is constant and equal to E x∈Qt(q ′ ,L ′ ) g(x) on the cubes Q t (q ′ , L ′ ). Now suppose (6.7) does not hold for some j ≥ 1, that is Since L j+2 ≪ ε 2 L j+1 , L j+2 |L j , and q j+1 |q j+2 we can apply the above observations to g := f − E(f |G qj ,Lj,Q ) and obtain, by orthogonality, that (6.12) E(f |G qj+2,Lj+2,Q ) 2 L 2 (Q) ≥ E(f |G qj ,Lj,Q ) 2 L 2 (Q) + cε 2 for some constant c > 0. Since the above expressions are clearly bounded by 1, the above procedure must stop in O(ε −2 ) steps at which (6.7) must hold for some 1 ≤ j ≤J 1 (ε) withJ 1 (ε) = O(ε −2 ). 6.2. The base case of our general inductive strategy.
be cubes of equal side length l(Q) and ∆ 0 i ⊆ Z 2ni+3 be a non-degenerate simplex of n i points for 1 ≤ i ≤ d.
As in the continuous setting we will ultimately need a parametric version of Proposition 6.1, namely Proposition 6.2 below. Proposition 6.2 (Parametric Counting Lemma on Z n for Simplices). Let 0 < ε ≤ 1 and R ≥ 1.
We note that it is easy to show, as in the continuous, that if S ⊆ Q with |S| ≥ δ|Q| for some δ > 0 then for all scales λ ∈ q √ N with 0 < λ ≪ ε l(Q). In light of this observation and the discussion preceding Proposition 6.1 the proof of Theorem 1.4 reduces, as it did in the continuous setting, to the following Proposition 7.1. Let 0 < ε ≪ 1. There exist positive integers J d = J d (ε) and q d (ε) such that for any (ε, q d (ε) J d )-admissible sequence of scales l(Q) ≥ L 1 ≥ · · · ≥ L J1 and S ⊆ Q there is some 1 ≤ j < J d such that Quantitative Remark. A careful analysis of our proof reveals that there exist choices of J d (ε) and q d (ε) which are less than W d (log(C ∆ ε −3 )) and W d (C ∆ ε −13 ) respectively where W k (m) is again the towerexponential function defined by W 1 (m) = exp(m) and W k+1 (m) = exp(W k (m)) for k ≥ 1.
The proof of Proposition 7.1 follows along the same lines as the analogous result in the continuous setting. As before we will compare the averages N d λ∆ 0 ,q,Q (f e ; e ∈ H n d,k ) to those of M d λ,q,Q (f e ; e ∈ H n d,k ), at certain scales q and λ ∈ q √ N with with L j+1 ≤ λ ≤ L j , inductively for 1 ≤ k ≤ d. As the arguments closely follow those given in Section 5 we will be brief and emphasize mainly just the additional features. 7.1. Reduction of Proposition 7.1 to a more general "local" counting lemma.
For any given 1 ≤ k ≤ d and a family of functions f e : Q π(e) → [−1, 1] with e ∈ H n d,k it is easy to see that for any ε > 0, scale L 0 > 0 dividing the side-length l(Q), and q 0 |q we have provided 0 < λ ≪ εL 0 where f e,t denotes the restriction of a function f e to the cube Q t (q 0 , L 0 ).
Thus the proof of Proposition 7.1 reduces to showing that the expressions in (7.8) and (7.9) only differ by O(ε) for all scales λ ∈ q √ N with L j+1 ≤ λ ≤ L j , given an (ε, q)-admissible sequence L 0 ≥ L 1 ≥ · · · ≥ L J , for any collection of bounded functions f e,t , e ∈ H n d,k , t ∈ Γ q0,L0,Q . Indeed, our crucial result will be the following Proposition 7.2 (Local Counting Lemma in Z n ). Let 0 < ε ≪ 1 and q 0 , M ∈ N.
There exist positive integers J k = J k (ε, M ) and q k (ε) such that for any (ε, q J d )-admissible sequence of scales L 0 ≥ L 1 ≥ · · · ≥ L J1 with L 0 dividing l(Q) and q j := q 0 q k (ε) j for j ≥ 1, and collection of functions f m e,t : Q t π(e) (q 0 , L 0 ) : for all λ ∈ q j √ N with L j+1 ≤ λ ≤ L j and t / ∈ T ε uniformly in e ∈ H

7.2.
Proof of Proposition 7.2. We will prove Proposition 7.2 by induction on 1 ≤ k ≤ d.
For k = 1 this is basically Proposition 6.2, exactly as it was in the base case of the proof of Proposition 5.3.
For the induction step we will again need two main ingredients. The first establishes that the our multilinear forms N d λ∆ 0 ,q,Q (f e ; e ∈ H n d,k ) are controlled by a box-type norm attached to scales q ′ and L. Let Q = Q 1 × . . . × Q d with Q i ⊆ Z 2ni+3 be cubes of equal side length l(Q) and 1 ≤ k ≤ d. For any scale 0 < L ≪ l(Q) and function f : Q e ′ → [−1, 1] with e ′ ∈ H d,k we define its local box norm at scales q ′ and L by for any cube Q of the form Q = Q 1 × · · · × Q k . We note that (7.4) and (7.5) are special cases of (7.11) and (7.12) with k = d, n = (2, . . . , 2), and f e = f for all e ∈ H n d,d . Lemma 7.1 (A Generalized von-Neumann inequality on Z n ). Let 1 ≤ k ≤ d.
The crucial ingredient is again a parametric weak hypergraph regularity lemma, i.e. Lemma 5.2 adapted to the discrete settings. The proof is essentially the same as in the continuous case, with exception that the Lj -norms are replaced by qj ,Lj -norms where q j = q 0 q j is a given sequence of positive integers and L 0 ≥ L 1 ≥ · · · ≥ L J is an (ε, q J )-admissible sequence of scales. To state it we say that a σ-algebra B on a cube Q is of scale (q, L) if it is refinement of the grid G q,L,Q , i.e. if its atoms partition each cube Q t (q, L) of the grid. We will always assume that q|L and L|l(Q). Recall also that we say the complexity of a σ-algebra B is at most m, and write complex(B) ≤ m, if it is generated by m sets. Lemma 7.2 (Parametric weak hypergraph regularity lemma for Z n ).
The proof of Lemma 7.2 follows exactly as the corresponding proof of Lemma 5.2 in the continuous setting, so we will omit the details. We will however provide some details of how one deduces Proposition 7.2, from Lemmas 7.1 and 7.2. The arguments are again very similar to those in the continuous setting, however one needs to make a careful choice of the integers q k (ε), appearing in the statement of the Proposition.
Proof of Proposition 7.2. Let 2 ≤ k ≤ d and assume that the lemma holds for k − 1.

Appendix: A short direct proof of Part (i) of Theorem B ′
We conclude by providing a short direct proof of Part (i) of Theorem B ′ , namely the following Theorem 8.1 (Magyar [17]). Let 0 < δ ≤ 1 and ∆ ⊆ Z 2k+3 be a non-degenerate simplex of k points.
For any ε > 0 we define q ε := lcm{1 ≤ q ≤ Cε −10 } with C > 0 a (sufficiently) large absolute constant. Following [14] we further define S ⊆ Z n to be ε-uniformly distributed (modulo q ε ) if its relative upper Banach density on any residue class modulo q ε never exceeds (1 + ε 2 ) times its density on Z n , namely if for all s ∈ {1, . . . , q ε } d . It turns out that this notion is closely related to the U 1 q,L (Q)-norm introduced in Section 6. Recall that for any cube Q ⊆ Z n and function f : Q → [−1, 1] we define (8.1) f U 1 q,L (Q) := 1 |Q| t∈Q |f * χ q,L (t)| 2 1/2 with χ q,L denoting the normalized characteristic function of the cubes Q(q, L) := [− L 2 , L 2 ] n ∩ (qZ) n . Note that the U 1 q,L (Q)-norm measures the mean square oscillation of a function with respect to cubic grids of size L and gap q.
The following observation from [14] (specifically Lemmas 1 and 2) is key to our short proof of Theorem 8.1. Let ∆ 0 = {v 1 = 0, v 2 , . . . , v k } be a fixed non-degenerate simplex of k points in Z n with n = 2k + 3 and define t ij := v i · v j for 2 ≤ i, j ≤ k. We now define a function which counts isometric copies of λ∆ 0 .
If S is not ε-uniformly distributed, then its upper Banach density is increased to at least δ 1 := (1 + ε 2 )δ when restricted to a residue class s+(q ε Z) n . Identify s+(q ε Z) n with Z n and simultaneously the set S| s+(qεZ) n with a set S 1 ⊆ Z n , via the map y → q −1 ε (y − s). Note that if S 1 is ε-uniformly distributed then it contains an isometric copy of λ∆ 0 for all sufficiently large λ ∈ √ N and hence S contains an isometric copy of q ε λ∆ 0 .
Repeating the above procedure one arrives to a set S j = q −j ε (S − s j ) ⊆ Z n for some s j ∈ Z n in j = O(log ε −1 ) steps which contains an isometric copy of λ∆ 0 for all sufficiently large λ ∈ √ N.