New Bounds on cap sets

We provide an improvement over Meshulam's bound on cap sets in $F_3^N$. We show that there exist universal $\epsilon>0$ and $C>0$ so that any cap set in $F_3^N$ has size at most $C {3^N \over N^{1+\epsilon}}$. We do this by obtaining quite strong information about the additive combinatorial properties of the large spectrum.


Introduction
A set A ⊂ F N 3 is called a cap set if it contains no lines.In this paper, we will be concerned with proving the following theorem: Theorem 1.1.There exists an ǫ > 0, and C < ∞ such that if A ⊆ F N 3 is a cap set, then The problem of the maximal size of cap sets is a characteristic 3 model for the problem of finding arithmetic progressions of length 3 in rather dense sets of integers.Meshulam [M95] , through a direct use of ideas of Roth, was able to show that there is a constant C so that any cap set A has density at most C N .Our result may be viewed as a very modest improvement over Meshulam's result.
Sanders [S11] recently showed that any subset of the integers whose density in {1, . . ., M } is at least C(log log M ) 5 log M must contain arithmetic progressions of length 3.This may be thought of as bringing the results known for arithmetic progressions almost to the level of Meshulam's result.This has spurred further interest in improving Meshulam's result in hopes that it might suggest a way of improving the results on arithmetic progressions.
A rather concrete, though perhaps still out of reach, goal in this direction is a conjecture of Erdös and Turan: Conjecture 1.2.Suppose A ⊆ Z is such that Then A contains an arithmetic progression of every length.
It is clear that the present paper is directly relevant only for finding 3-term progressions.
However it is also easy to see, based purely on density considerations, that proving an estimate of the type in Theorem 1.1 in the integer setting would yield the 3-term case of the conjecture stated above.In fact, Polymath 6 [PM6] has recently been started with the goal of adapting the ideas of this paper to the integer setting.See [PM] for more information about so-called "polymath" projects in general.
While the research in this paper was well underway, Gowers [G] wrote a post on his blog suggesting that one could attack the problem of bounding cap sets by studying the additive structure of their large spectrum.This had been our approach as well and we wrote a reply [K1] sketching our rather strong results regarding that structure.In the course of a few days, we realized that we actually could convert our structural theory into an estimate on the size of cap sets.We recorded this [K2] in a second reply to Gowers's blog.The current paper should be viewed as an elaboration of these two posts.
We describe our plan for proving Theorem 1.1.To prove this theorem we will prove a theorem about sets without unusually dense subspaces, a notion we make precise below.Definition 1.3.We say a set A has no strong increments if for every subspace V ⊆ F N 3 with d = codimV ≤ N 2 , we have Theorem 1.4.There exists an ǫ > 0, and C < ∞ such that if A ⊆ F N 3 is a cap set with no strong increments, then Major ingredients needed to prove this theorem are Proposition 3.3, Lemma 5.3, and Theorem 7.1.We combine them with a Fourier analytic argument in Section 8.
Proof.We deduce Theorem 1.1 from Theorem 1.4 using induction.Suppose that for every n ≤ N − 1 we have shown that if B ⊆ F n 3 is a cap set, then We aim for a contradiction: assume there exists a cap set A ⊆ F N 3 such that |A| 3 N > C N 1+ǫ .By Theorem 1.4, this implies A has a strong increment.Since A has a strong increment, there exists an affine subspace V ⊆ F N 3 with codimension ≤ N 2 such that since the derivative of C x 1+ǫ is uniformly bounded by 16CN −2−ǫ = 16ρN −1 whenever 0 < ǫ < 1.But we know that A ∩ V (in fact any subset of A) is a capset.This contradicts the induction hypothesis, yielding Theorem 1.1.

Proof Sketch
We sketch our plan for proving Theorem 1.4.
We will study the large spectrum ∆ of a cap set A with no strong increments.The reader should think of ∆, for a cap set of the size 3 N N given by Meshulam's estimate, as consisting of the positions at which the absolute value of the Fourier transform of A is around 1 N 2 .The set ∆ should have cardinality approximately N 3 (established in Section 3) and have about N 7 additive quadruples (established in Section 4).Recall that an additive quadruple is a quadruple (x 1 , x 2 , x 3 , x 4 ) of elements of ∆ with the property that Similarly an additive octuple is an octuple (x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 ) of elements of ∆ with x 1 + x 2 + x 3 + x 4 = x 5 + x 6 + x 7 + x 8 .
It is easy to see a priori that a set with many additive quadruples will have many additive octuples.In our case, a set with size N 3 , having N 7 quadruples must have at least N 15 octuples.(But it may have more octuples.)The number of octuples it has should be taken as an indication of its structure.If there are many octuples, it means that the sumset ∆ + ∆ looks like it has more additive structure than the set ∆.We then say the set ∆ is additively smoothing.(It becomes smoother under addition.)We show, however, that this cannot be the case for ∆, the large spectrum.We use the probabilistic method to do this, finding too much of the spectrum contained in a small subspace in the additively smoothing case.We establish this in Section 5. (This is somewhat reminiscent of the paper of Croot and Sisask [CS11] where random selections are used to uncover structure.) Thus our set ∆ is entirely additively non-smoothing.This means it is already as smooth as it will become under a small number of additions.This makes its additive structure particularly easy to uncover as it is already present without adding the set to itself.This kind of idea was first exploited in a paper of the second author with Koester [KK10] , and we use techniques quite similar to those found in that paper.We end up showing that the set ∆ should be thought of as looking like the sum of a very structured set K of size N ( that is to say that K is almost additively closed) with a very random set Λ of size N 2 .Section 6 contains the proof of a structural theorem for sets with substantial additive energy (i.e., many additive quadruples) but no additive smoothing.
We find that this structure of ∆ is inconsistent with A's being a cap set with no strong increments.The reason is that we can use Freiman's theorem to place K inside a subspace H with relatively low dimension.We can essentially mod out by H, examining the "fibers", the intersections of A with translates of H ⊥ .We find that the structure of ∆ makes the behavior of the fibers unrealistic.This argument is suggested by a paper of Sanders [S10], and is carried out in Section 8.
One final remark about the value of ǫ obtained in this paper: it is necessarily rather small (at least with our current argument).We discuss the reasons for this in a brief final section and give some conjectures that, if true, would greatly improve the efficiency of our argument.we have not attempted to optimize ǫ (or even keep track of exact dependence on ǫ) throughout the paper.
Proof.Note that we can express |P | by Applying the Cauchy-Schwarz inequality we see Here we introduce another variant of Cauchy Schwarz: Lemma 2.2.Let (X, m) be a measure space with total measure M .Let A 1 , . . .A k be measurable subsets of X and 0 < ρ < 1 be a number (the density), so that m(A j ) ρM for each j.Suppose k >> 1 ρ .Then since kρ >> 1.Thus we may estimate the full sum Define c(x) to be the measurable function giving for each x, the number of sets A j which contain x; i.e., Thus we would like to estimate In what follows µ shall be a small exponent.We will frequently use expressions like N O(µ) .The exponent will be bounded by Cµ for C a universal constant which varies from line to line in the paper.We illustrate this by the following version of the large families principle (this is the principle which says that most children belong to large families) which will be used extremely often in this paper.Lemma 2.3.Let M 1 , . . .M K > 0 be real numbers and let R > 0 be a real number.Suppose that M j ≤ RN µ for each j and suppose that Then there exists a subset J of {1, . . ., K} with |J| N −O(µ) K so that for each j ∈ J, we have Suppose that |J| N −10µ K. Then by the upper bound on M j , we have that j∈J Combining these two estimates gives us a contradiction.
We take a moment to state the asymmetric Balog-Szemeredi-Gowers theorem which we will have occasion to use.A set B ⊂ F N 3 will be said to be µ-additively closed if , and assume L N 10 .Then there exists µ depending only on η, with µ → 0 as η → 0, and there exist an µ-additively closed set and an element x ∈ F N 3 so that In particular, the last inequality implies An only slightly stronger form of this lemma appears in the book of Tao and Vu as Lemma 2.35.[TV06] Another way of stating this result which we will use frequently is to define a function f with lim t−→0 f (t) = 0 and to let µ = f (η).We will use this kind of notation frequently in the paper with the choice of f varying from line to line.
Finally, we record the form of Freiman's theorem which we shall use.Theorem 2.5.A µ additively closed set is contained in a subspace of dimension N O(µ) .
Various improvements of the finite characteristic Freiman's theorem have occured such as the result by Sanders [S08] but these only affect the constant in our formulation.Even Ruzsa's original version [R99] suffices.
3 Review of Meshulam's argument, Fourier analysis in F N 3 , and sparsity of the spectrum For the remainder of this paper A will be a subset of F N 3 with |A| = ρ3 N >> 3 N N 1+ǫ with ǫ > 0 to be determined later.Moreover A shall be a cap set meaning that it contains no lines.A line in F N 3 is characterized by being a set with exactly three distinct elements a, b, c satisfying a + b + c = 0.
In this section we will establish some basic facts needed for our proof, and that are enough to obtain Meshulam's bound of ∼ 1 N on the density of capsets.Further, we shall prove a statement of the form "The spectrum ∆ does not have too much intersection with any small subspace." We will assume that there are no strong increments for A in the sense of Definition 1.3.Precisely, we assume there is no hyperplane H so that A ∩ H has density ≥ ρ + 20ρ N in H and no subspace H of codimension d ≤ N 2 so that A ∩ H has density ≥ ρ + 20ρd N .We recall that a contradiction of this assumption will mean that every large cap set A has strong increments.This will contradict the existence of large cap sets.
We define the character e : 2 i.We will study the Fourier transform of the set A, namely As a consequence of the assumption that A is a large cap set without strong increments, we shall see that there is a significant set ∆ of x for which | Â(x)| is fairly large (the set ∆ will be called the spectrum of A) and we shall see that ∆ does not concentrate too much in any fairly low dimensional subspace of F N 3 .Our first nontrivial fact about Â is that A − ρ has large L 3 norm, and moreover that this large L 3 norm is accounted for by the set of x where | Â(x)| is large.Precisely: We shall refer to the set ∆ as the spectrum of A and it shall be our central object of study for the remainder of the paper.
Note that with this definition, ∆ is symmetric, that is It is worth noting that the following proposition is the only place in the paper where we use the assumption that A is a capset specifically; the other parts of the paper use only the assumption that A has no strong increments.Proposition 3.2.If A is a capset, then and Note that this proposition is already enough to obtain Meshulam's estimate: in particular, it guarantees that the set ∆ defined in the statement of the proposition is nonempty.This means there is at least one x such that | A(x)| ρ 2 .This guarantees the existence of a hyperplane P such that the density of A ∩ P inside P is at least ρ + cρ 2 .Taking ρ large enough compared to 1 N already contradicts the no-strong-increment hypothesis, yielding Meshulam's estimate.
In what follows, we prove this proposition.We consider Summing first in x, we see that this expression yields 3 −2N mutiplied by the number of solutions of the equation a + b + c = 0 with a, b, c taken from A. Since we have assumed that A is a cap set, the only solutions occur when a = b = c.Thus However, we observe that Â(0) = ρ.Thus, given the size of A, we see that ρ 3 dominates 3 −N ρ and we conclude However, following the proof of Plancherel's inequality, we see that Summing first in x, we conclude By the assumption that A has no strong codimension 1 increment, we conclude By selecting the implicit constant in the definition of ∆ correctly, we see combining inequalities 3.1 and 3.2 that Combining inequalities 3.3 and 3.4, we see that Combining the definition of ∆ with 3.2 we get Now we prove a statement of the form "The spectrum ∆ does not have too much intersection with any small subspace."Precisely: Proposition 3.3.Let A be a set without strong increments.Let ∆ be the spectrum, as in Definition 3.1.Then for any subspace W of F N 3 having dimension d ≤ N 2 , we have Moreover for such a subspace W , we have the estimate Here we see what the assumption of no higher codimension strong increments implies about the spectrum ∆.Let H be a subspace with codimension d < N 2 , we let V be a dimension d subspace which is transverse to H (i.e., V + H = F N 3 ) and we let W be the annihilator space of H. Then for any w = 0 ∈ W , we see that Then we have By getting an upper bound on the right hand side of equation 3.5, we can obtain an upper bound on |∆ ∩ W |, which is our goal.
To estimate the right hand side, we subdivide V = V + ∪ V − , where We observe that since v∈V We observe that the hypothesis that A has no strong increments implies that for v ∈ V + , we have the estimate Thus simply using that |V | = 3 d , we get the estimates Now for v ∈ V − , we have the trivial estimate In light of equation 3.6 and estimate 3.7 this yields Combining inequalities 3.8 and 3.9, gives the estimate Thus equation 3.5 gives However, we recall that if w ∈ ∆, we have that Thus we get the desired estimate: 4 Additive structure in the spectrum of large cap sets In this section we establish that the spectrum has some nontrivial additive structure.Specifically, we prove it has N 7−O(ǫ) additive quadruples.Proposition 4.1.Let A be a large cap set.Let ∆ be the spectrum of A and let ∆ ′ be any symmetric subset of ∆ with Let E 4 (∆ ′ ) be the number of additive quadruplets The argument for a major subset ∆ ′ of ∆ is no different, so for convenience of notation we assume in fact ∆ ′ = ∆.
We retain the notation of the previous section considering ∆ the spectrum of a large cap set A. In particular, we have |A| >> 3 N N 1+ǫ , we have , and we have that ∆ is symmetric, namely ∆ = −∆.
From the lower bound on | A(x)|, we have for each x, an affine hyperplane H x , annihilated by x so that Summing over ∆, we obtain We wish to rewrite this as a double sum by introducing 1 Hx , the indicator function of H x .
x∈∆ y∈A We interchange the order of the sums: Now we apply Hölder's inequality: Taking everything to the fourth power and simplifying, we get Crudely expanding the sum, we get the apparently weaker inequality We can rewrite this as We claim the estimate 4.1 says that the spectrum ∆ has substantial additive structure.This will be demonstrated by the following proposition.Proposition 4.2.Let x 1 , x 2 , x 3 , and x 4 be non-zero elements of F N 3 .Then the expression Moreover it vanishes unless an equality of the form Proof.We introduce the Fourier transforms of the balanced function of the hyperplanes Then we use the standard Fourier identity We observe that f j (z j ) vanishes unless z j = ±x j .
The upper bound on the sum just follows from the triangle inequality.
To finish the proof of Proposition 4.1, we apply the inequality 4.1, the fact that |A| >> 3 N N 1+ǫ , the proposition 4.2 and the fact that the spectrum ∆ is symmetric.

Random selection argument for additively smoothing spectrum
In this section we study the additive properties of random subsets of the spectrum.We will show that they typically have very poor additive structure.This will allow us to conclude that, although the spectrum has many 4-tuples, it cannot have too many 8-tuples.The significance of this will only be made clear in Section 6.
We defined E 4 (∆) to be the number of additive quadruplets in ∆.We define E 2m (∆) to be the number of additive 2m-tuples such that x 1 , x 2 , . . ., x 2m ∈ ∆.We let ∆(x) be the Fourier transform: which can be sen by expanding the sum on the right, as in the proof of Plancherel's theorem.We always have E 2 (∆) = |∆|.When we have nontrivial amounts of additive structure in the sense that say E 2k (∆) >> |∆| k , we can lift this up to counts of higher-tuplets using Hölder's inequality.(We use the inequality to bound the 2k-norm by the 2-norm and the 2m-norm.)We can view this process as a poor man's Plunnecke theorem.We record this result for high E 4 and high

Discussion of additive smoothing
We are now ready to introduce the notion of additive smoothing.We keep in mind two examples of kinds of sets having additive structure.One kind of set consists of a subspace plus a random set.The other consists of a random subset of a subspace.We think of the first kind of set as not being additively smoothing because as you add it to itself, its expansion rate stays essentially constant.This is the kind of example for which Lemma 5.1 is close to sharp.But the second kind of set, when added to itself, will quickly fill out the subspace and its rate of additive expansion will shrink dramatically.The lack of additive structure smooths out under addition.This is the kind of example for which Lemma 5.1 is far from sharp.
We will momentarily define ∆ to be additively smoothing if E 8 (∆) is substantially larger than expected from Lemma 5.1 and our lower bound for E 4 (∆) obtained in Section 4. (Nonetheless, for the purposes of this paper, the gain in the exponent need only be O(ǫ).) We will define additive smoothing so that if ∆ is additively smoothing then for some not very large m (depending only on ǫ), we may expect to find additive m-tuplets of ∆ in a randomly chosen set S of d elements.
Before we formally define the property of additive smoothing, we illustrate how the calculation works in the case ǫ = 0.In that case d ∼ N so that an element of ∆ (which has size N 3 ) is chosen with probability N −2 .We have a lower bound of N 7 on E 4 (∆).Suppose that we can improve on the lower bound of N 15 for E 8 (∆) which we get from the first part of Lemma 5.1 and in fact for some δ > 0. Then from the second part of Lemma 5.1, we obtain the estimate Thus there is some m which depends only on δ so that Thus the expected number of 2m-tuplets in S is >> 1.We will formally define additive smoothing to achieve the same effect when ǫ is different from 0.

Nonsmoothing of the spectrum
In this subsection we make rigorous the arguments of the last subsection.Definition 5.2.We define ∆ to be additively smoothing if there is some σ > 30ǫ so that E 8 (∆) >> N 15+σ .
We are now in a position to state the main result of this section.Lemma 5.3.If ∆ is the spectrum of a large cap set without strong increments then ∆ is not additively smoothing.
We begin with a few comments about our proof strategy in this section.If S is a "random" subset of ∆, then we expect Thus we can show that E 2m (∆) is small by showing that E S is small for a typical (somewhat large) subset S of ∆.
Now we fix a particular number d and consider random subsets of ∆ of size d.We will take with the explicit constant to be determined later.Our first goal is to prove that we expect this subset to span a space of dimension d.More precisely: Definition 5.4.Let S be a set of d vectors x 1 , . . ., x d ∈ F N 3 .We say that the set S has nullity k if the dimension of the span of S is d − k.
We will consider uniform random selections of sets of d elements from ∆.We can view these selections as d-fold repetitions of uniform selection without replacement.We will prove Lemma 5.5.A random selection S of size d from the spectrum ∆ has nullity at least k with probability 2 −k .
Proof.Once we have completed our first m choices, our selections x 1 , . . ., x m span a vector space W m with dimension no more than m.Thus |∆ ∩ W m | mN 1+2ǫ by Proposition 3.3.We choose the constant in 5.1 so that the probability that the m + 1st element of S lies in W m is bounded by 1 d for all m ≤ d − 1.Note that since m ≤ d, this probability is bounded by Thus the probability that S has nullity at least k is bounded by the probability that for d independent events with probability 1 d at least k occur.The probability that exactly k events from d independent events with probability 1 d occur is exactly The numbers g(k, d) decrease by a factor of more than 2 as k is increased by 1 as long as k > 2. This completes the proof of the lemma.
Now that we know our random subset is likely to have full rank, we estimate the number of 2m-tuples it contains in the case it does not have full rank.Given a set S with nullity k we will bound the number of possible additive 2m-tuplets between elements of S. Specifically: Lemma 5.6.A set S of size d and nullity k has E 2m (S) C m k 2m .
Proof.We write a list E of all equations among elements of S which involve 2m or fewer elements of S. Because the nullity is k, the span of these equations has dimension at most k.We pick a basis B for E and the equations in B involve at most 2mk elements of S. Thus all of the equations of E involve at most 2mk elements of S. Thus there are at most additive 2m-tuplets from S. We refer to h(m, k) as the number of possible m-tuplets in S.
Note that h(m, k) is a polynomial of degree 2m in k.
Proof of Lemma 5.3 .Now let S be a random selection of d elements from ∆. Then by Lemma 5.5, the probability that S has nullity k is 2 −k .Thus the expected value of the number of possible 2m-tuples k≥0 h(m, k) is m k≥0 2 −k k 2m 1.Now we will show that we have defined additive smoothing so that the expected number of 2m-tuples is >> 1.This will give us a contradiction.
We know that d N 1−ǫ .Thus our selection S will be expected to have >> 1 non-trivial 2m-tuples, whenever E 2m (∆) >> N 4m+2mǫ .(We simply calculate the probability that an individual 2m-tuple involves only elements of S.) Thus we may assume that Using the fact that |∆| N 3+3ǫ and the second part of Lemma 5.1, we get that Choosing m sufficiently large gives The choice of m and hence the constants depend on ǫ but not on N .
6 Structure of robust additively non-smoothing sets In this section, the only properties of the spectrum ∆ which we shall use are its size, its additive structure, and its non-additive smoothing.Consequently the results can be stated in somewhat more generality.We leave intact, however, the numerology coming from the case of spectrum of cap sets.
We will say that a symmetric set ∆ ⊂ F N 3 is a robust additively non-smoothing set of strength δ provided that we know its size: that we know how many additive quadruples can be made from any large subset of it, namely that if ∆ ′ ⊂ ∆ with |∆ ′ | ≥ 3 5 |∆| and ∆ ′ symmetric, we have and that we have additive non-smoothing, namely and moreover that for each element a ∈ ∆, there are at most N 4+δ quadruples of the form ±a ± b = ±c ± d with b, c, d ∈ ∆.
Let us pause to consider the case of ∆, the spectrum of a cap set with no strong increments.We know that the number of a ∈ ∆ participating in more than N 4+O(ǫ) quadruplets is smaller than 1 10 |∆| since otherwise ∆ would have more quadruples and hence more octuples than allowed by Lemma 5.3.Let ∆ ′ be the remaining elements of ∆.Note that by its definition ∆ ′ is still symmetric.Note that any symmetric subset of ∆ ′ containing at least three fifths of its elements must contain at least half the elements of ∆.Thus from Proposition 3.2, Proposition 4.1, and Lemma 5.3 we know that: Proposition 6.1.Let ∆ be the spectrum of a large capset with no strong increments.There is a subset ∆ ′ of ∆ so that ∆ ′ is a robust additively non-smoothing set of strength O(ǫ).
Returning to the setting of robust additively non-smoothing sets, we let, for the remainder of the section, the set ∆ be a robust additively non-smoothing set of strength δ.

Given a value
Given a robust additively non-smoothing set ∆ of strength δ, for each α, we may define By the dyadic pigeonhole principle, there is an α so that Moreover, we know that no a in ∆ participates in more that N 4+δ quadruples.Thus no element a in ∆ participates in more than N 3−α+δ pairs in G α .Thus there are at least pairs in G α , by the large families principle (Lemma 2.3).
We now forget about optimizing our exponents and consolidate this information in a single definition.Definition 6.2.We say that (∆, G, D) is an additive structure at height α with ambiguity η if the following hold.We have |∆| ≤ N 3+η . We Now examining the equation (6.4) and dyadically pigeonholing, we observe that we can find β so that there are at least N 9−2α−β−4η pairs (x, y) so that for each such pair, we have Definition 6.4.We say that the additive structure (∆, G, D) at height α and with ambiguity η has comity µ if we can find the abovementioned We remark that this lemma contains the key use of the nonsmoothing hypothesis, which is hidden in the definition of "additive structure".
Proof.We dyadically pigeonhole the equation (6.4) to find β so that there is a set of at least N 9−2α−β−4η pairs (x, y) so that for each such pair, we have If it happens that β −1 > α−µ, then we are done.Otherwise, we will construct an additive structure at height β − 1.We have two distinct upper bounds on the number of such differences.First there are N 6−β+2η , since each difference is represented by ∼ N β pairs in ∆ × ∆ and there are only N 6+2η such pairs.The second estimate is that there are N 7−2β+O(η) many such differences, because otherwise E 4 (∆) would be much larger than N 7 which would make E 8 (∆) larger than N 15+η .The first upper bound is most effective (ignoring ambiguity) when β < 1 while the second is most effective when β > 1.Our plan (modulo ambiguity) is that we shall rule out the case β < 1 and that we shall show that the second upper bound is tight up to a factor of N O(η) .Both estimates will follow from the upper bound on E 8 (∆) and the Cauchy-Schwarz inequality, namely Lemma 2.1.
Since we have N 9−2α−β−O(η) pairs (x, y) with at most N 6−β+O(η) differences, by the Cauchy-Schwarz inequality, there must be at least N 12−4α−β−O(η) additive quadruples in D, namely x − y = x ′ − y ′ .(Here we let S be the set of pairs (x, y), we let T be the set of differences with ∼ N β representations as difference of ∆ and we let ρ be the difference map, ρ(x, y) = x − y.Then we can apply Lemma 2.1.)However since each difference x, y can be represented in N 1+α ways as a difference in ∆, we can represent each quadruple in D as an octuple in ∆ in N 4+4α ways.Thus there are at least N 16−β−O(η) many such octuples which implies β ≥ 1 − O(η).
Thus we are in the regime where the estimate that there are at most N 7−2β+O(η) many differences is most effective.Suppose that there were only N 7−2β−γ many such differences with γ >> η Then apply Cauchy-Schwarz again, we would see that there are at least N 11−4α−O(η)+γ many quadruples in D which implies N 15−O(η)+γ octuples in ∆, a contradiction with the nonsmoothing hypothesis in the definition of "additive structure".
Thus taking D ′ to be the differences x − y obtained from (x, y) so that and taking G ′ to consist of representatives of these differences coming from the intersections (as in the second paragraph of this proof), we obtain an additive structure (∆, G ′ , D ′ ) with height β − 1 < α − µ and ambiguity O(η).Corollary 6.6.Given an additive structure (∆, G, D) at height α and with ambiguity η there is an additve structure (∆, G, D) at height α ′ ≤ α with ambiguity µ and comity µ with µ 1 log 1 η .
Proof.We iteratively apply Lemma 6.5 with comity µ fixed by µ = K log 1 η with K a large constant, and with the ambiguity increasing by a constant factor C in each iteration.Since α decreases by µ each time we don't find comity we need only ∼ 1 µ iterations to achieve comity.At this point, we have ambiguity given by C log( 1 η ) K η << µ, as long as K was chosen sufficiently large.Now we begin to investigate what we can say about the shape of the set H of all pairs (b, c) in ∆ × ∆ having the property that b− c has at least N 1+α−O(µ) representations in ∆ × ∆ for (∆, G, D) an additive structure with height α and ambiguity and comity µ.We will find that the set H is rather thick in a product set whose projection has size N 3−α−O(µ) .Lemma 6.7.Let (∆, G, D) be an additive structure with height α and ambiguity and comity µ.Then there is a subset so that for any (b, c) ∈ H, the difference b − c has N 1+α−O(µ) representations in ∆ × ∆.
Proof.From the hypotheses, we have that and that there are at least N 8−3α−O(µ) pairs (x, y) for which (µ) .
Using the pigeonhole principle, we fix one value of x for which there are Again using the pigeonhole principle, we find an a ∈ ∆ and a set Y ⊂ D so that a ∈ ∆ G [y] for every for every y ∈ Y , so that |Y | = N 3−α−O(µ) and so that for each y ∈ Y , we have We notice that by definition a − Y ⊂ ∆.We choose we have by Cauchy-Schwarz (Lemma 2.2) that (µ) .
This implies that B satisfies the conclusion of the lemma.The reason is that by Lemma 2.3, we have a set Ỹ of pairs y, y ′ so that (µ) .This implies that a − y ′ − (a − y) = y − y ′ has at least N 1+α−O(µ) representatives as a difference of two elements of ∆.
Now we are going to use Lemma 6.7 repeatedly to show that for any robust additively non-smoothing set of size δ we can find an additive structure of ambiguity η with η 1 log 1 δ which breaks into dense blocks.Lemma 6.8.Let ∆ be a robust additively non-smoothing set of strength δ.Choose µ ∼ 1 log 1 δ .Then for some 0 ≤ α ≤ 1, there is an additive structure (∆, G, D) of height α and ambiguity µ and disjoint subsets B 1 , . . .
Note that since we are requiring that (∆, G, D) be an additive structure, this requires |G| Proof.Using Proposition 6.3 , Corollary 6.6, and Lemma 6.7.We can find a subset B 1 of ∆ so that for some choice of α, it has size at most N 3−α+O(µ) and nevertheless B 1 × B 1 contains at least N 6−2α−O(µ) pairs whose differences have N 1+α−O(µ) representations as differences in ∆.
Having done this, we use the robustness property of ∆ to apply the same argument to ∆\(B 1 ∪ −B 1 ).We continue removing sets from ∆ until we have exhausted half of ∆.Now one difficulty is that the disjoint sets B which we chose do not all have the same α.We use dyadic pigeonholing to resolve this for only a small cost in the number of sets.We call these sets B 1 , . . ., B K .Now Lemma 6.7 guarantees us in each B j × B j , a subset H j of cardinality at least N 6−2α−O(µ) so that each difference in H j is represented in ∆ × ∆ at least N 1+α−O(µ) times.We denote by D α the set of differences represented in ∆ × ∆ at least N 1+α−O(µ) times, and note that lest there be enough quadruples in ∆ to violate the additive non-smoothing condition.Using the large families principle and pigeonholing, we find some α ′ α − O(µ) so that at least N 5−2α−O(µ) differences are represented at least N 1+α ′ many times in ∪ j H j .We denote this set of differences as D ′ α .We let D be D ′ α and let G be a subset of ∪ j H j consisting of N 1+α ′ representatives of each difference in D.
Our goal now will be to use will be to use Lemma 6.8 to find almost additively closed sets E of size at least N 1−f (δ) inside robust non-additively smoothing sets of strength δ.
Here f : [0, 1] −→ [0, ∞) is some function with lim t−→0 f (t) = 0. We will be employing such functions from now on in the paper.They, like constants, will change from line to line.
The project of finding additively closed sets will be easiest when we have additive structures of height essentially zero having ambiguity and comity µ.For this reason, we are about to define a stylized structure which generalizes this situation.We will eventually use the generalized version, replacing ∆ with the blocks B j .
We will now define a µ-full stylized ρ-structure which is τ -energetic and has ambiguity and comity µ.(The error exponents µ are all the same.)This will be a set (∆ ′ , G, D) where (this was the µ-fullness), where D is the set of differences in pairs in G and each difference represented N τ and N τ +O(µ) times, hence τ -energetic.Finally we assume that there are at least N 3ρ−τ −O(µ) pairs (x, y) ∈ D × D so that which is of course the µ-comity.
We shall say that a set K is µ-additively closed provided that as in Section 2. Lemma 6.9.There is a function so that the following holds.Let (∆ ′ , G, D) be a µ-full stylized ρ-structure which is τenergetic and has ambiguity and comity µ.Then there is an f (µ) and a set X so that Proof.We proceed essentially as in the proof of Lemma 6.7.We find x ∈ D so that there is a set for every y ∈ Y .As before, we use the pigeonhole principle to find a ∈ ∆ ′ so that there is a subset Y a of Y so that for each y ∈ Y a , we have and so that However Y a ⊂ a − ∆ ′ .Thus we think of Y a as a dense part of a translate of −∆ ′ .Now we know that This precisely means that there are We are now prepared to state the main result of this section.Theorem 6.10.Let ∆ be a robust additively non-smoothing set of strength δ.As before choose so that for some γ ≥ 0, there is an f (µ)-additively closed set K with In the event that we must have γ = O(f (µ)), for some 0 ≤ α ≤ 1 , we may find pairwise disjoint subsets B 1 , . . ., B M ⊂ ∆ with M N α−O(µ) so that for each integer 1 ≤ j ≤ m, we have and moreover we find for each j a µ-additively closed set K j with together with a set X j with Further, there is a set D with |D| N 5−2α−f (µ) so that each element of D has at least N 1+α−f (µ) representations as a difference of elements of ∆ and so that for each j, the set of 4-tuples Moreover we may choose K j to be contained in the set of differences having at least N 5−2α−f (µ) representations as a difference between elements of D.
Proof.We apply Lemma 6.8 and restrict our attention to where the B j are the blocks obtained there.Now to G, we apply the argument used in the proof of Lemma 6.5.
That is, for any element x ∈ −(G), we study and we observe as before that there is some 1 + α ≥ β ≥ 1 − O(µ) so that there are at least We note that when this happens, for each a in the intersection ∆ Here if a ∈ B j , we must have b 1 ∈ B j and b 2 ∈ B j , since (a, b 1 ), (a, b 2 ) ∈ G.We argue as in the proof of Lemma 6.5, that there are at least N 7−2β−O(µ) differences having N β representations in Thus we have a set H of pairs x, y for which µ) and so that |H| N 8−3α−O(µ) .Now we use the pairwise disjointness of the blocks B j to write the identity Now we begin to use the comity of G.We first eliminate from the second sum in equation 6.5 all terms for which the relative density of for too large a constant C, again using the large families principle.By choosing C sufficiently large we do not reduce the sum by a factor of more than 2. We dyadically pigeonhole to obtain the largest possible sum from those terms where We denote this set of (x, y) as H γ,j ), Thus we have reduced the sum by at most a factor of log N .We keep only those j for which which we can do without sacrificing much by using again the large families principle.
We observe that for each x, there are at most N 3−α+O(µ) choices of y so that (x, y) ∈ H.The reason is that any a ∈ ∆ belongs to at most However if there were more than N 3−α+O(µ) choices of y so that (x, y) ∈ H then there would be elements a ∈ ∆ G [x] which are contained in ∆ G [y] for more than N 3−α+O(µ) choices of y Thus, since we have at least while at the same time (by the relative density of ∆ G [y] in ∆ G [x] ∩ B j ,) it must be that for most values of j, we have at least N 8−3α−γ pairs (x, y) ∈ H γ,j .This means, fixing one such value of j (since at most N 3−α+O(µ) values of y are paired with a given x) , there are at least N 5−2α−γ−O(µ) differences x with N 1+γ representations in G ∩ (B j × B j ).We call this set D j,γ .
Thus G ∩ (B j × B j ) is µ-full and 1 + γ-energetic.Another way of describing the µ-fullness is that N 5−2α−γ−O(µ) (up to N O(µ) factors) is the largest number of such x possible, purely based on the size of B j ×B j .Thus it must be that for a set of size N 5−2α−γ−O(µ) many such x, there are N 3−α−O(µ such y with (x, y) ∈ H γ,j (yet again, by the large families principle).
) is a 3 − α structure with ambiguity µ.Thus we are in a position to apply Lemma 6.9.This proves the first part of the theorem.Indeed, since all our estimates were optimal up to N O(µ) factors, there is a set J of choices of j for which we could apply Lemma 6.9 with |J| N α−O(µ) .
To prove the second part, we consider in detail the case γ = 0. We will apply the argument proving Lemma 6.9 to all j ∈ J.This will give us µ-additive sets K j and sets X j with appropriate upper and lower bounds since we can assume ∆ contains no µ-additive sets with more than N 1+f (µ) elements.
We will allow f to vary from line to line and we will express even quantities that are clearly O(µ) as f (µ).
We let D be the set of all differences x for which |∆[x] ∩ B j | N 1−f (µ) for at least N α−f (µ) values of j ∈ J.For each value of j ∈ J , there are at least N 8−3α−f (µ) pairs (x, y) µ) .Note that we may also restrict D to differences which cannot be represented in more than N 1+f (µ) ways as differences of elements of B j for more than N α−f (µ) values in j and so that in each B j our count of good pairs (x, y) consists only of pairs of differences which cannot be represent as differences in B j in more than N 1+f (µ) ways.Otherwise, we could choose γ > f (µ).Now we recall the structure of the argument in Lemma 6.9.We chose an a ∈ B j and a set B a of size N 3−α−f (µ) of the differences in which a participates, and a set K a which is actually of the form ∆ G [x] ∩ B j and has size N 1−f (µ) .We find N 5−α−f (µ) additive quadruples made up of two elements of B a and two elements of K a .We may strip down K a further to those elements which participate in at least N 4−α−f (µ) of these quadruples and not harm our estimate on the number of quadruples between K a and B a .Now, we note that since B a is a large subset of a translate of B j , it must be that there are N 2−f (µ) pairs (q 1 , q 2 ) ∈ K 2 a with the property that q 1 − q 2 is represented N 3−α−f (µ) times as a difference of elements of B j .We let K 1,j be the set of differences of B j that can be represented in N 3−α−f (µ) ways as differences in B j .Because B j contains no µ-additively closed set of size more than N 1+f (µ) , we have that |K 1,j | ≤ N 1+f (µ) .Otherwise we could apply the asymmetric Balog-Szemeredi-Gowers theorem, to obtain a µ-additively closed set, contained in B j , as in the γ >> O(f (µ)) case.
We can replace N 5−α−f (µ) quadruples q 1 − q 2 = x 1 − x 2 with q 1 , q 2 ∈ K 1,j and x 1 , x 2 ∈ B a by an equation of the form q = x 1 − x 2 with multiplicity at most N 1+f (µ) .Thus we obtain at least N 4−α−f (µ) such equations.We rewrite the equation as x 1 − q = x 2 and use Cauchy Schwarz to obtain N 5−α−f (µ) quadruples x 1 − q = x ′ 1 − q ′ .We see then that without losing more than N f (µ) factors in our estimates, we can replace K a by K 1,j .This is good since we have made it independent of the choice of a. Now we need only show that we can find many quadruples not only between K 1,j and B a but between K 1,j and D. This will give us the desired result.
To do this, we observe that we may delete from B a a set with relative density N −f (µ) without harming our estimates on the number of quadruplets between B a and K 1,j .Our goal will be to cover a subset D ′ by a disjoint union of subsets of the form B ′ a where B ′ a is a subset of B a with relative density 1 − N −f (µ) .To do this we observe that for any fixed We can do this because the sum counts triples (x, a 1 , a 2 ) with x a difference and a, a 2 are parts of representations of it.We have assumed that we are only dealing with differences with fewer than N 1+O(µ) representations.Here we are using that we are in the γ = 0 case, Now we produce D ′ as follows.We choose a 1 and keep the set B a 1 .We add all elements of B a 1 to D ′ We choose B a 2 to have the minimal possible sized intersection with B a 1 and let B ′ a 2 be those elements in B a 2 that are not already in D ′ .We choose B a 3 to have minimal possible intersection with B a 1 ∪ B a 2 .We continue in this way until we reach a k so that B ′ a k no longer has relative density 1 − N −f (µ) in B a k .Because the average intersection |B a ∩ B a ′ | is bounded by N 1+f (µ) , we get that k is at least N 2−α−f (µ) .Thus our set D ′ has relative density at least N −f (µ) in D. Thus we have that K 1,j has N 7−2α−f (µ) quadruples with D ′ and a fortiori with D. Now we slightly refine K 1,j to K 2,j consisting only to differences of elements of K 1,j which participate in at least N 5−α−f (µ) of the quadruples with D ′ .(We perform this refinement in order to prove the very last claim of the theorem.)Since D ′ is a disjoint union of sets with relative density N −f (µ) inside translates of B j , it must be that K 2,j still has N 5−α−f (µ) quadruples with B j .Thus we can apply the asymmetric Balog Szemeredi Gowers theorem to find a subset K j satisfying the conclusions of the theorem.

Structure of spectrum of large capsets with no strong increments
In this section, we transfer the results obtained in Theorem 6.10 over to the setting of the spectrum of large capsets with no strong increments.This turns out to be rather simple.
The main ideas which we have not yet taken advantage of are Freiman's theorem and the use of the estimate in Proposition 3.3 which bounds the number of elements of the spectrum in a subspace of dimension d by dN 1+2ǫ .
We state the main result of the section.Theorem 7.1.Let ∆ be the spectrum of large capset without strong increments.There is a function f : [0, 1] −→ [0, ∞] with lim t−→0 f (t) = 0 so that the following holds.There is a subspace H of F N 3 of dimension N f (ǫ) and a set Λ ⊂ F N 3 of size N 2−f (ǫ) so that for each element λ ∈ Λ, there is a subset H λ ⊂ H with the properties that and the sets λ + H λ are pairwise disjoint subsets of ∆.For any subspace W ⊂ F N 3 of dimension d, we have that W contains at most dN f (ǫ) elements of Λ.
Proof.As before we allow our function f to vary from line to line until we achieve the desired result.
In light of Theorem 2.5 and Proposition 3.3, any f (ǫ)-additively closed set in which the spectrum has N −f (ǫ) relative density, must be bounded in size by N 1+f (ǫ) .Therefore, we are in the γ = 0 case of Theorem 6.10.We know that there is a set ∆ ′ of density N −f (ǫ) in the spectrum ∆ which is contained in , with each K j an f (ǫ)-additively closed set (of size at least N 1−f (ǫ) and at most N 1+f (ǫ) ) and with each set X j of size N 2−α±f (ǫ) .Moreover, each set K j lies in the set K of differences having at least N 5−2α−f (ǫ) representations as differences of elements of D, the differences among elements of the spectrum which have at least N 1+α−f (ǫ) representations.In light of the non-additive smoothing property of ∆, we have that |K| N 1+f (ǫ) since there can be at most N 11−4α+f (ǫ) quadruplets among elements of D. We may eliminate all elements q of each K j for which ∆ ∩ q + X j does not have size at least N 2−α−f (ǫ) .Now we let K ′ be the set of elements of K which appear in at least N α−f (ǫ) many K j .We can find some K j which has intersection of size N 1−f (ǫ) with K ′ .Then K ′ ∩ K j = K ′′ is a f (ǫ)-additively closed set with cardinality at least N 1−f (ǫ) .Moreover each element of K ′′ is contained in N α−f (ǫ) many K j .Thus by pigeonholing there are at least N α−f (ǫ) many K j so that |K j ∩ K ′′ | N 1−f (ǫ) .We only keep these j and replace K j by K j ∩ K ′′ .But by Theorem 2.5, we have that K ′′ is contained in a subspace of dimension N f (ǫ) which we call H.This basically proves the first part of the theorem.We have that a subset of the spectrum of density N −f (ǫ) is contained in K ′′ + X, where X is the union of the X j 's.We will pick Λ and the sets H λ as follows: Find x 1 in X so that at least N 1−f (ǫ) elements of ∆ are contained in x 1 +K ′′ .Let ∆ 1 be the elements of ∆ contained in X +K ′′ but not in x 1 +K ′′ .Let ∆ 1 be those elements of ∆ contained in x 1 + K ′′ and let H x 1 = ∆ 1 − x 1 .Note that H x 1 is contained in K ′′ and therefore in H. Now we proceed iteratively.Find x j ∈ X so that there are at least N 1−f (ǫ) elements of ∆ j−1 in x j + K ′′ .When this is no longer possible, we terminate the process.Then we let ∆ j be the elements of ∆ j−1 not in x j + K ′′ and we let ∆ j be the ones that are.We let H x j = ∆ j − x j .We let Λ = {x j } after the iteration has terminated.Note that if for all remaining x j in X, then To prove the second part of the theorem, let S be any subset of Λ with some cardinality M .But S + H contains at least M N 1−f (ǫ) elements of ∆.This contradicts Proposition 3.3 unless the span of S has dimension at least M N f (ǫ) .

Contradiction
The goal of this section is to obtain a contradiction from the existence of large capsets without strong increments by using the result of Theorem 7.1.We begin by recording some easy consequences of Plancherel's identity for the interaction between the Fourier transform of the characteristic function of a set and the Fourier transforms of its fibers over a subspace.
For any set A ⊂ F N 3 , we define its Fourier transform We state Plancherel's identity: Proposition 8.1.
We let H be a subspace of F N 3 and we let H ⊥ be its annihilator.We let V be a subspace of the same dimension as H which is transverse to If we have h ∈ H, then Thus we arrive at another form of Plancherel: Proposition 8.2.
Next, we consider the situation where we have a subspace H ⊂ F N 3 and a larger subspace K with H ⊂ K ⊂ F N 3 .We let V be a subspace transverse to H ⊥ as before and we would like to consider the Fourier transforms of the fibers of A, namely the sets A H,v .We can think of each fiber A H,v as being identified with a subset of H ⊥ (by translation by v) and of course H ⊥ can be identified with F N The function ÂH,v (w) is well defined on F N 3 \H since a − v is in H ⊥ .Next, we write down a version of Proposition 8.2 which shows how the L 2 norms of the Fourier transforms of the fibers on K\H with the L 2 norm of the Fourier transform on K.We let W be a subspace transverse to K ⊥ with V ⊂ W . Proposition 8.3.With H, K, V , and W as above, Proof.Since clearly we have K ⊥ ⊂ H ⊥ , there is a unique subspace W ′ ⊂ W with W ′ + K ⊥ = H ⊥ .We have V + W ′ = W .
We consider the following function on W , Clearly, in light of the second part of Proposition 8.2, the left hand side of the identity we are trying to prove is the normalized square of the L 2 norm of the function g(w).
We now break up g as the sum of a function g 0 which is constant on translates of W ′ and functions g v with v running over V having mean zero and supported on the translate v + W ′ of W ′ .Clearly the functions g 0 and {g v } are pairwise orthogonal.The first term on the right hand side of the identity is the normalized square L 2 norm of the function g 0 .The second term on the right hand side represents the sum over v of the normalized square L 2 norm of the functions g v .The identity is then an application of the Pythagorean theorem.
(We remark that Proposition 8.3 can simply be thought of as Plancherel for a "local Fourier transform" of A. Here, we localize to the translates of H ⊥ .Now we are prepared to apply Proposition 8.3 to the setting in which A is large capset without strong increments and H is the subspace given to us by Theorem 7.1. We let A be a large capset with no strong increments.As usual, f will be a function taking [0, 1] to [0, ∞] with lim t−→0 f (t) = 0. We will vary f from line to line.
Then there is a subspace H with dimension N f (ǫ) and a set Λ of size N 2−f (ǫ) so that for each λ ∈ Λ there is a subset H λ of H, so that |H λ | > N 1−f (ǫ) so that for each h ∈ H λ , we have that We also have that the sets λ + H λ are pairwise disjoint.
Note therefore that Thus the structured elements of the spectrum of A account for a large proportion of the squared L 2 norm of the Fourier transform of A.
Now we would like to consider the fibers A H,v , where H is the subspace we've been discussing.Because the capset A has no strong increments, we know that for each value v, we have )(1 + N f (ǫ)−1 ).
We will now momentarily fix the function f .However, we don't have a good lower bound on |A H,v | in general.All we know is that sum of all the positive increments is equal to the sum of all the negative increments.(See the proof of Proposition 3.3.)We let V bad be the set of all v ∈ V for which (That is V bad is the set of those v ∈ V for the which the fiber has a bad negative increment.) We know that We define and we let A ′ = A\A bad .
We know that Thus removing A bad does not perturb the large spectrum of A too much.
(In making this precise, we now resume changing the function f from line to line, observing that we may take the next f to be larger than the previous √ f .)We may find a set Λ ′ which is a subset of Λ with |Λ ′ | N 2−f (ǫ) and so that for each λ ∈ Λ ′ there is a subset H ′ λ of H so that for each λ ∈ Λ ′ and each h ∈ H ′ λ we have Thus from the point of view of the structure of the spectrum, we have that A ′ is essentially as good as A. However, the set A ′ has a big advantage over A in that we have good bounds on the Fourier transform of its fibers.This is because the fibers are either empty or close to size |A| H . Empty fibers achieve no increments.On the other hand, fibers which are close to average cannot have an increment too large, or else the set A will have a strong increment on a translate of a codimension 1 subspace of H ⊥ .Precisely, the estimate we get is ǫ) .(8.1) have G ⊂ ∆ × ∆ with the property that for each (a, b) ∈ G we have that a − b ∈ D, and so that each d ∈ D has ∼ N 1+α representations as a difference of a pair in G.We have |G| ∼ N 6−α−η .Moreover there are at least N 3−η elements of ∆ participating in at least N 3−α−η sums each.Finally there are no more than N 15+η additive octuples among elements of ∆.We summarize what we have shown so far in a proposition.Proposition 6.3.Given a robust additively non-smoothing set ∆ of strength δ we may find G ⊂ ∆ × ∆, and D ⊂ ∆ − ∆ and α ≥ 0 so that (∆, G, D) is an additive structure at height α and ambiguity O(δ).We now describe a slightly deeper property of additive structures at height α and ambiguity η.Given a structure (∆, G, D), for each x ∈ D, we define the set ∆ G [x] to be the set of a ∈ ∆ so that there exists b ∈ ∆ with (a, b) ∈ G and a − b = x.In light of our definitions, we have for eachx ∈ D that |∆ G [x]| ∼ N 1+α .We consider the quantity K(∆, G, D) = x∈D y∈D |∆ G [x] ∩ ∆ G [y]|. (6.4) Clearly K(∆, G, D) counts the number of triples (a, x, y) with a ∈ ∆ G [x] and a ∈ ∆ G [y].Each element in a is contained in exactly as many sets ∆ G [x] as it participates (in the first position) in pairs in G. Thus we conclude that for (∆, G, D) an additive structure with height α and ambiguity η that 6.5.Given an additive structure (∆, G, D) at height α and with ambiguity η either it has comity µ or there is an additive structure (∆, G ′ , D ′ ) with height β − 1 < α − µ and ambiguity O(η).
Now any time that a ∈ ∆ G [x]∩∆ G [y], this means that we can write x = a−b and y = a−c.Thus we have x− y = c− b.We have between N β and 2N β representations of the difference x − y.It remains to determine how many such differences there are.