A new proof of Sarkozy's theorem

It is a striking and elegant fact (proved independently by Furstenberg and Sarkozy) that in any subset of the natural numbers of positive upper density there necessarily exist two distinct elements whose difference is given by a perfect square. In this article we present a new and simple proof of this result by adapting an argument originally developed by Croot and Sisask to give a new proof of Roth's theorem.


Introduction
Let D(N ) denote the maximum size of a subset of {1, . . . , N } that contains no perfect (non-zero) square differences. In other words, D(N ) is the threshold such that if A ⊆ {1, . . . , N } with |A| > D(N ), then the set A will necessarily contain two distinct elements whose difference if a perfect square.
In this note we shall be concerned with the behavior of this quantity for large values of N and at the outset we encourage the reader to convince herself of the essentially trivial upper and lower bounds for D(N ) of approximate quality N/4 and √ N respectively, and furthermore that any improvements on these bounds would be less than trivial to achieve. In Appendix B we give full justification for the following specific bounds (1) √ N − 1 ≤ D(N ) ≤ (N + 543443)/4.
It was conjecture by Lovász that D(N ) ≤ δN for any δ > 0, provided that N is sufficiently large, or equivalently that in any subset of the natural numbers of positive upper density 1 there necessarily exist two distinct elements (and hence infinitely many pairs of distinct elements) whose difference is given by a perfect square. This conjecture was subsequently proven to be correct, independently, by Sárközy and Furstenberg.
Theorem 1 (Sárközy [17]/Furstenberg [3]). The purpose of this note is to give a new and simple proof of this result by adapting an argument that was originally developed by Croot and Sisask [2] to give a new proof of Roth's theorem on three term arithmetic progressions. In particular we will establish the following result, which clearly implies Theorem 1. Remark on quantitative bounds. Although its proof is simple, Theorem 2 patently leads to quantitative upper bounds of the quality N/ log * N for D(N ) that are extremely weak 2 in comparison to the current best known upper bound, namely (2) D(N ) ≤ CN/(log N ) leading to intermediate bounds of the quality N/(log log N ) 1/11 and N log log N/ log N , see Green [4] and Lyall and Magyar [13], respectively.
We further note that it is conjectured that D(N ) ≥ N 1−ε for any ε > 0, provided N is sufficiently large (with respect to ε), and that Ruzsa [16] has demonstrated this conjecture to be true for all ε ≥ 0.267.
Remark on other polynomial differences. At this point the reader is presumably curious to know what is so special about square differences. The following Theorem gives a complete answer to this question.
Theorem 3 (Kamae and Mendès France [8]). Let f ∈ Z[n] and D f (N ) denote the maximum size of a subset of {1, . . . , N } that contains no two distinct elements whose difference is given by f (n) for some n ∈ Z. Then, if and only if f is an intersective polynomial, namely if f has a root modulo q for every q ≥ 2.
The approach of Kamae and Mendès France in [8] was indirect and gave no quantitative bounds for D f (N ), and while the methods of Pintz, Steiger and Szemerédi were later extended by Balog, Pelikán, Pintz and Szemerédi [1] to establish the quantitative bounds for any integer k ≥ 2, the current best known upper bounds for general intersective polynomials f ∈ Z[n] are due to Lucier [9], who showed that where k = deg(f ) and µ = 3 if k = 2 and µ = 2 if k ≥ 3. However, bounds of the same quality as (4) have recently been obtained for general intersective quadratic polynomials by Hamel, Lyall and Rice in [7].
The methods used to prove Theorem 2 can in fact be extended, using some (rather technical) additional results of Lucier, to also establish Theorem 3, these arguments will appear elsewhere.
A result almost as general as Theorem 3, namely that (3) holds whenever f is a polynomial in Z[n] with at least one integer root 3 , follows in a more straightforward manner using the same methods as those used in the proof of Theorem 2, see [10] (a preliminary version of this current paper) for a brief outline of how to extend the proof of Theorem 2 in this direction. For the current best known upper bounds for this class of polynomials see [11] and [12].
In this note we shall focus exclusively on the case of square differences and proving Theorem 2.

Proof of Theorem 2
Let A ⊆ {1, . . . , N } with no square differences and |A| = D(N ). Key to the argument we present is to construct, from this extremal set A, a new set B ⊆ {1, . . . , N } with the following properties: This construction, which will amount to defining B to be A ∪ (A + t 2 ) for some appropriate (large) value of t, will be carried out in Section 2.2 below. Having constructed a set with such properties we will then establish Theorem 2 by combining this with the following lower bound on the number of square differences contained in any given set B ⊆ {1, . . . , N }. The proof of this result is a straightforward exercise using ideas that where first exploited by Varnavides [18] in the context of counting three term arithmetic progressions. While, in our context of counting square differences, this quantitative result can easily be deduced by adapting the proof of Theorem 3.1 in [6] (for example) we will, for the sake of completeness, include a proof of Lemma 1 in Section 3.1 below.
We should also note at this point that the standard application Varnavides' argument is to show that Theorem 1 is equivalent to the statement that for any δ > 0 and for some c(δ) > 0. In other words, provided N is sufficiently large, B will contain not only one square difference, but a positive proportion of all the square differences in {1, . . . , N }. This result clearly follows easily from Lemma 1.
2.1. Proof of Theorem 2. It follows immediately from the upper bound on the number of square differences in B given by property (ii) and the lower bound given by Lemma 1, that N which follows immediately from property (i) of our constructed set B, gives the desired inequality.

2.2.
Construction of the set B. Given any set B ⊆ {1, . . . , N }, it is easy to see that where B(x) = 1 B (x) denotes the indicator function of the set B. Using the familiar orthogonality relation we can, as is standard, express our count (6) on the "transform side" as Key to our proof (and essentially the only true "machinary" used in the proof) is the following wellknown estimate for the Weyl sum S(α), which states that the only possible obstruction to cancellation in this exponential sum arises if α is "close" to a rational with "small" denominator.
provided N is sufficiently large with respect to ε, in particular N ≥ Cε −50 would be sufficient.
We are now ready to define our set B. Recalling that A ⊆ {1, . . . , N } is an extremal set with no square differences, we define (for a value of ε > 0 to de determined) , and consequently also that property (i) for our set B will hold, provided ε > 0 is chosen large enough for In order to see what actual restriction this places on our choice of ε > 0, we recall, as one can verify using only elementary properties of the prime numbers, that and hence that inequality (10) will hold whenever ε −2 ≪ log N.
Remark (on "≪ notation"). Whenever we write E ≪ F for any two quantities E and F we shall mean that E ≤ cF , for some some sufficiently small constant c > 0.
We therefore now fix (11) ε := C 1 (log N ) −1/2 with C 1 > 0 a sufficiently large (but absolute) constant. In order to to establish that our set B also satisfies property (ii) it will suffice to show, for this choice of ε > 0, that # of square differences in B ≤ 20 εN 3/2 for all sufficiently large N .
To establish (12) we first note that since A ′ ⊆ A contains no square differences, it follows that ε are disjoint, and hence, using the familiar and easily verified property that Fourier transformation takes translations to modulations, that Multiplying this expression for B(α) by its complex conjugate, we see that In light of (7), and the fact that A ′ contains no square differences, it follows that and hence that # of square differences in B = 2

dα.
A crucial observation at this point, which completes the proof of inequality (12), is the fact that where to establish the final inequality we have invoked the Plancherel identity, namely whose validity in this setting can be easily verified (using orthogonality), together with the simple observation that It remains to verify the uniform estimate (13). Since | cos(2πq 2 ε α) − 1| ≤ 2 for all α ∈ [0, 1], it follows from Proposition 1 that (13) will hold whenever α / ∈ M a/q (ε) for any (a, q) = 1 with 1 ≤ q ≤ ε −2 , since N = exp(C 2 1 ε −2 ) ≫ ε −50 . While if α ∈ M a/q (ε) for some (a, q) = 1 with 1 ≤ q ≤ ε −2 , then by definition we know that |α − a/q| ≤ ε −2 N −1 . Moreover, since q|q 2 ε (by the definition of q ε ) it follows that cos 2πq 2 ε α = cos 2πq 2 ε (α − a/q) and hence, by the Mean Value Theorem, we see that The result then follows, provided the constant C 1 in our choice of fixed ε > 0 is chosen sufficiently large, since 2πq 2 ε ε −2 N −1 ≤ ε whenever ε −2 ≪ log N (again) and we trivially know that | S(α)| ≤ √ N for all α ∈ [0, 1]. This completes the proof of Theorem 2 modulo Lemma 1 and Proposition 1. The proof of these two results are given in Section 3 below. A simple counting argument, which we give below, shows that Now while, as noted above, each of these good progressions contributes at least one square difference in B, it is of course also the case that some of these square differences could be getting over counted. However, as we shall also see below, each square difference in B is being over counted at most M 3/2 times, from which it follows that as required. We are thus left with the straightforward tasks of verifying (14) and the claim that the each square difference in B is being over counted in this argument at most M 3/2 times.
We will address the over counting argument first. Suppose we are given a pair {b, b + n 2 } in B. If this pair is contained in P a,r , then r must be a divisor of n and moreover n 2 ≤ M r 2 . It therefore follows that there are at most √ M choices for r and it is easy to see that each choice of r fixes a in at most M ways, thus each square difference is indeed over counted at most M 3/2 times.

3.2.
Proof of Proposition 1. We first recall Dirichlet's (pigeonhole) principle: Given any α ∈ R and Q ∈ N, there exist (a, q) = 1 with 1 ≤ q ≤ Q such that The proof of the following key result is completely standard, see for example [14] or [5].
Proposition 2 (The Weyl inequality). If |α − a/q| ≤ q −2 and (a, q) = 1, then We note (by Dirichlet's principle) that for any given α ∈ R and Q ∈ N, there always exist (a, q) = 1 with 1 ≤ q ≤ Q that satisfies the hypothesis of the Weyl inequality. Moreover, it is easy to see that this inequality gives a non-trivial conclusion whenever N µ ≤ q ≤ N 1−µ for some 0 < µ < 1/2. For the purposes of this exposition we shall take Q = N 1−µ with µ = 1/20 and define It is customary to say that α is in a major arc if α ∈ M ′ a/q for some (a, q) = 1 with 1 ≤ q ≤ N 1/20 , and call the complement of these major arcs, the minor arcs. If α is in one of these minor arcs, then it follows from Dirichlet's principle that there must exist a reduced fraction a/q with N 1/20 ≤ q ≤ N 19/20 such that |α − a/q| ≤ q −2 , and hence, by the Weyl inequality, that In order to obtain the full conclusion of Proposition 1, which is valid on a subset of [0, 1] which is strictly larger than the collection of classical minor arcs defined above, we must perform a careful analysis of the behavior our exponential sum S(α) on the major arcs. In particular, we will invoke the following.
The proof of Lemma 2 is standard, but for the sake of completeness we have chosen to included a proof in Appendix A below.
Appendix A. Proof of Lemma 2 (Major arc estimate) The proof of Lemma 2 hinges on the key observation that for each α in a major arc corresponding to a rational a/q, our exponential sum S(α) breaks naturally into an arithmetic part S(a, q) and a continuous part I N (α − a/q), up to a manageable error term. In particular we have Remark (on "big O notation"). Whenever we write E = O(F ) for any two quantities E and F we shall mean that |E| ≤ CF , for some constant C > 0.