Sumsets and fixed points of substitutions

By F. Michel Dekking

Abstract

In this paper we introduce a technique to determine the sumset $A+A$, where $A$ is the indicator function of the 0’s occurring in a fixed point $x$ of a substitution on the alphabet $\{0,1\}$.

1. Introduction

Let $A$ be the sequence given by $A(n)= \lfloor n\alpha \rfloor$ for $n\ge 1$, where $\alpha = (1+\sqrt {5})/2$ is the golden mean. In the recent paper Reference 5, it was proved that the sumset $A+A$ is equal to the natural numbers with exception of the numbers 1 and 3. Here the sumset $A+A$ is defined by

$$\begin{equation*} A+A = \{a+b: a\in A, b\in A\}. \end{equation*}$$

It was also shown that if $B$ is defined by $B(n)= \lfloor n\alpha ^2\rfloor$, then the set $B+B$ has a complement that is infinite. The determination of $B+B$ takes more than 8 pages, and the complicated structure of $B+B$ is described as containing “some fractal patterns”. The goal of the present paper is to elucidate what the source of these fractal patterns is, and at the same time to introduce a vast generalization of these types of sumsets.

Interestingly, Shallit in Reference 7 also positioned the sumset problem of the paper Reference 5 in the combinatorics on words context, but in a completely different way. The focus there is on the Fibonacci representation (also known as Zeckendorf representation) of the natural numbers, and the proofs are computer assisted. See also the paper Reference 9.

The crucial observation that we make is that the sequences $A=(\lfloor n\alpha \rfloor ) =(1, 3, 4, 6, 8, 9, 11,\dots )$ and $B=(\lfloor n\alpha ^2\rfloor )=(2, 5, 7, 10, 13, 15,\dots )$ give the positions of 0’s, respectively the positions of 1’s in the infinite Fibonacci word $x_{\mathrm{F}}=010010100100\dots$, fixed point of the substitution $0\mapsto 01$, $1\mapsto 0$ (see, e.g., Reference 6). Let $\sigma$ be any substitution on the monoid $\{0,1\}^*$, admitting a fixed point $x=\sigma (x)$. Then, let $A$ give the positions of 0 in $x$, and let $B$ give the positions of 1 in $x$.

Problem.

Determine $A+A$, $A+B$ and $B+B$.

The proofs in Reference 5 are purely arithmetical. Our approach, inspired by Reference 3, is to analyse $A+A$ by studying the product set $A\times A$. This amounts to passing from fixed points of substitutions to fixed points of two-dimensional substitutions. See, e.g., Reference 4 for an overview of the theory of two-dimensional substitutions. Our application of two-dimensional substitutions to the sumset problem is self-contained.

In Section 3 we give a very simple proof of the $A+A$ result of Reference 5 using a two-dimensional substitution which has $x_{\mathrm{F}}\times x_{\mathrm{F}}$ as fixed point. A proof of the $B+B$ result from Reference 5, analogously to the way this is done in Theorem 4, is very complex. The numbers of columns on the right side of the 2D fixed point of the product substitution that one has to use is unbounded for the Fibonacci substitution, whereas it is 15 for the von Neumann substitution.

In Section 4 we solve the $A+A$, $A+B$ and $B+B$ problem for the Thue-Morse word $t=0110100110010110\dots$, fixed point of the substitution $0\mapsto 01$, $1\mapsto 10$. We remark that our result is equivalent to Theorem 2 of the paper Reference 9, which is proved in a completely different, computer-assisted way.

In Section 5 we solve the $A+A$ and $B+B$ problem for the von Neumann word. The interest of this example lies in the fact that this example is one of a large family for which the method of Reference 7 does not work Reference 8.

In Section 6 we present the classical ‘sum of two squares’ problem in our 2D substitution context. This example also illustrates the fact that our technique extends from fixed points of substitutions to 0-1-valued morphic sequences, i.e., fixed points of substitutions on arbitrary alphabets which are mapped to $\{0,1\}^*$ by a letter-to-letter map.

2. Substitutions and their products

A substitution $\sigma$ is a homomorphism of the monoid $\mathcal{A}^*$ of words over an alphabet $\mathcal{A}$, that is, $\mathcal{A}^*$ consists of concatenations $a_1\dots a_m$ with $a_i$ from $\mathcal{A}$ for $i=1\dots m$, and $\sigma (vw)=\sigma (v)\sigma (w)$ for all $v,w\in \mathcal{A}^*$. Since we are interested in the characteristic functions of 0-1-words, we simplify the presentation and take $\mathcal{A}=\{0,1\}$.

Let $\sigma$ on $\mathcal{A}^*=\{0,1\}^*$ be a substitution given by

$$\begin{equation*} \sigma (0)=a_1\dots a_m, \quad \sigma (1)=b_1\dots b_{m'}, \end{equation*}$$

for two natural numbers $m$ and $m'$. We then define the direct product substitution $\sigma \times \sigma$ on the alphabet $\mathcal{A}\times \mathcal{A}$ by

$$\begin{align*} \sigma \times \sigma ((0,0))_{(k,\ell )} & =(a_k,a_\ell ), \quad \sigma \times \sigma ((0,1))_{(k,\ell ')}=(a_k,b_{\ell '}), \\ \sigma \times \sigma ((1,0))_{(k',\ell )} & =(b_{k'},a_\ell ), \quad \sigma \times \sigma ((1,1))_{(k',\ell ')}=(b_{k'},b_{\ell '}), \end{align*}$$

where $1\le k \le m, 1\le \ell \le m, 1\le k' \le m', 1\le \ell ' \le m'$.

Let the diagonal words $d_n$ be defined for $n\ge 2$ by

$$\begin{equation} d_n =\{[x\times x]_{(k,\ell )}: k+\ell =n, \, k\ge 1, \, l\ge 1\}. \tag{1} \end{equation}$$

Then $n\in A+A$ if and only if $d_n$ contains a symbol $(0,0)$. So we have obtained the following theorem.

Theorem 1.

Let $\sigma$ be a substitution on $\{0,1\}$, such that $\sigma (0)$ has prefix 0, and let $x$ be the fixed point of $\sigma$ with prefix $0$. Let $A$ be the sequence of positions of $0$ in $x$. Let $d_n$ be the diagonal words occurring in $x\times x$, for $n\ge 2$. Then $A+A= \{n\ge 2: \text{the symbol }(0,0) \text{ occurs in }d_n\}$.

Note that we also have that if $B$ is the sequence of positions of $1$ in $x$, then $B+B= \{n\ge 2: \text{the symbol }(1,1)\text{ occurs in } d_n\}$, and $A+B= \{n\ge 2: \text{the symbol }(0,1) \text{ or } (1,0) \text{ occurs in } d_n\}$.

3. The Fibonacci word

The Fibonacci word $x_{\mathrm{F}}$ is the unique fixed point of the substitution $\sigma : 0\mapsto 01$, $1\mapsto 0$. Let $\varphi \coloneq \sigma \times \sigma$ be the direct product of $\sigma$.

It is convenient to code $a\coloneq (0,0), b\coloneq (0,1), c\coloneq (1,0), d\coloneq (1,1)$, and to assign colors to these four letters so that one obtains more easily an idea of the structure of $\varphi ^n(a)$. The direct product substitution $\varphi$ on this new alphabet is given by

Here are some examples of 2D words obtained by iterating $\varphi$.

Note that

$$\begin{equation} d_2=a, \:d_3=bc, \:d_4=ada, \:d_5=acba,\: d_6=bcabc. \tag{2} \end{equation}$$

We are now in a position to give a completely different proof of Theorem 3.1. of Reference 5.

Theorem 2 (Reference 5).

Let $A=1$, $3$, $4$, $6$, $8$, $9$, $11$, $12$, $14$, $16$, … be the sequence $(\lfloor n\alpha \rfloor )$, where $\alpha$ is the golden mean. Then

$$\begin{equation*} A+A=\mathbb{N}\setminus \{1,3\}. \end{equation*}$$

Remark.

The proof shows that if $n=a+a'$, with $a,a'$ from $A$, then $a$ can always be chosen from the set $\{1,3,4\}$.

4. The Thue-Morse word

The Thue-Morse word $t$ is the fixed point of the substitution $\theta :0\mapsto 01$, $1\mapsto 10$. Although it is general practice (as in Reference 5) to index Beatty sequences starting from $n=1$, and similarly for fixed points of substitutions, this is not the case for the Thue-Morse sequence. The Thue-Morse word $t=t_0t_1\dots =01101001\dots$ is indexed starting from $n=0$. Thus $A=0$, $3$, $5$, $6$, … gives the positions of 0 in $t$, and $B=1$, $2$, $4$, $7$, … the positions of 1 in $t$.

Let $\overline{0}=1$, $\overline{1}=0$ be the symmetry operator on $\{0,1\}^*$. Note that $\theta$ is symmetric, i.e., $\overline{\theta (i)}=\theta (\overline{i})$ for $i=0$, $1$.

The direct product substitution of $\theta$ is the 2D substitution $\tau$ defined on the symbols $(i,j)$ for $i,j=0,1$ (written as $ij$) by

$$\begin{align*} & i\overline{j} \;\, \overline{i}\overline{j} \; & \\[-2.84544pt] \tau :\;ij\mapsto \; & ij \;\, \overline{i}j. \end{align*}$$

When we code $a\coloneq 00, b\coloneq 01, c\coloneq 10, d\coloneq 11$, and color the squares of the symbols $a,b,c$ and $d$, then the substitution $\tau$ takes the form

Since we start $t$ at index 0, we have to reconsider the definition of the diagonal words. It turns out that it is convenient to index these by their lengths. So $d_1=a, d_2=bc$, etc. The iterate $\tau ^8(a)$ with the diagonal words $d_1$, $d_2$, …, $d_{16}$ indicated by lines is given by

Our goal is to describe the diagonal words $d_n$, for $n=1$, $2\dots$. We see that in particular

$$\begin{equation} d_{1}=a, \:d_2=bc, \:d_3=bdc, \:d_4=adda. \cssId{texmlid1}{\tag{3}} \end{equation}$$

Proposition 1.

Let $\sigma$ be the morphism given by $\sigma (ij) = i\overline{j}, \overline{i}j$ for $i,j=0,1$. Then

$$\begin{equation*} d_{2n}=\sigma (d_n), \quad for\; n=1,2,\dots . \end{equation*}$$

Let $\beta$ be the 2-to-1 morphism given by $\beta (ij,i'j')=ij'$ for $i,j,i',j'=0,1$. Then

$$\begin{equation*} d_{2n+1}=\beta (d_{2n+2}), \quad for\; n=1,2,\dots . \end{equation*}$$

Theorem 3.

Let $A=0$, $3$, $5$, $6$, $9$, $10$, … give the positions of $0$’s in Thue Morse sequence, and $B=1$, $2$, $4$, $6$, $7$, $8$, … the positions of $1$ in the Thue Morse sequence. Then

$$\begin{gather*} A+A=\mathbb{N}_0\setminus \{2,4,\,2^{2n+1}-1, n\ge 0\}.\\ B+B=\mathbb{N}_0\setminus \{2^{2n+1}-1, n\ge 0\}.\\ A+B=\mathbb{N}_0\setminus \{2^{2n}-1, n\ge 0\}. \end{gather*}$$

Proof.

We only prove the $A+A$ result. The other two can be proved in a similar way.

According to Theorem 1, $n\in A+A$ if and only if $d_{n-1}$ contains a symbol $(0,0)=a$. Note that we had to shift the diagonal words, as they are redefined for this section.

Let $\sigma$ be the morphism from Proposition 1 in $abcd$-coding, $\sigma (a)=bc$, $\sigma (b)=ad$, $\sigma (c)=da$, $\sigma (d)=cb$. Then $\sigma ^2$ is given by

$$\begin{equation*} \sigma ^2(a)=adda, \,\sigma ^2(b)=bccb, \,\sigma ^2(c)=cbbc, \,\sigma ^2(d)=daad. \end{equation*}$$

It follows from this and Equation Equation 3 that $d(2^{2n+1})\in \{b,c\}^*$, $d(2^{2n})\in \{a,d\}^*$, for $n\ge 0$.

This takes care of the $2^{2n+1}-1$ part of the theorem. Obviously 2 and 4 are not in $A+A$. It remains to prove that all other numbers are in $A+A$, which is equivalent to proving that a symbol $a$ occurs in all diagonal words $d_n$ with $n>5$ and $n$ not equal to a power of 2. It looks attractive to use Proposition 1 to accomplish this. Indeed, let $T$ be the set consisting of the twelve 2-blocks $ab$, $ac$, …, $dc$ with two different symbols. Then one may check that all elements of $T$ occur in the length 4 blocks from $\sigma (T)$, and also in the length 3 blocks from $\beta (\sigma (T))$. This implies that a diagonal word $d_n$ in which all elements from $T$ occur propagates this property to $d_{2n}$ and $d_{2n-1}$. However, there are many diagonal words in which not all blocks from $T$ occur, in fact all $n$ of the form $n=2^N+2^K$ for some non-negative integers $K$ and $N$. This makes an induction proof very complex, and so we will give a completely different proof.

In Figure 1 we depict the $\tau ^3$-squares. Here the red lines indicate diagonal words without the symbol $a$, and the green lines indicate diagonal words with a symbol $a$.

We also indicated (parts of) the diagonal words above the main diagonal. These are denoted by $d_1^+$, …, $d_7^+$, starting from the main diagonal. For instance the block $\tau ^3(a)$ has the two red diagonal lines $d_6^+$ and $d_7^+$.

Figure 2 gives the red and green line structure for the $\tau ^4$-blocks inherited from the $\tau ^3$-blocks.

Observe that any diagonal that is partly red, partly green, actually should be a green diagonal. This leads to Figure 3.

We finish the proof with induction. Starting from $n=4$, the blocks $\tau ^n(a)$, $\tau ^n(b)$, $\tau ^n(c)$, $\tau ^n(d)$ have the following properties:

(1): There are some red diagonals $d_i$ with $i\in \{1,2,3,4,5\}$.
(2): There are some red diagonals $d^+_i$ with $i\in \{2^n-5,\dots ,2^n-1\}$.
(3): There are some red diagonals $d_i$ and $d^+_i$ with $i\in \{2^N, N\ge 2 \}$.
(4): All other diagonals $d_i$ and $d^+_i$ are green.

One checks that these four properties hold for $n=4$, and then using the induction hypothesis makes the step from $n$ to $n+1$ in the same way as the step from $n=3$ to $n=4$ is made as in Figures 1, 2 and 3.

■

5. The von Neumann word

The von Neumann word $u$ was introduced in the paper Reference 1. One has $u=u_0u_1\dots =1101100110110\dots$, fixed point of the substitution $\nu :\quad 0\mapsto 0$, $1\mapsto 110$.

The 2D von Neumann substitution $\psi =\nu \times \nu$ is given by

Before we continue, we formulate a simple lemma on the language $L_\nu$ of the substitution $\nu$, i.e., on the collection of words that can occur in $u$.

Lemma 1.

The words $010, 111$ and $101101$ do not occur in $L_\nu$.

Using Lemma 1 one simply derives the next lemma.

Lemma 2.

The word $11$ occurs uniquely as suffix of $011$. The word $101$ occurs uniquely as suffix of $001101$. The word $10110$ occurs uniquely as suffix of $0110110$.

We split the possible occurrences of words of $L_\nu$ in the von Neumann word $u$ according to a tree of prefixes, dividing these in six possible cases. This prefix tree is given in Figure 4, where $\Lambda$ is the empty word. Note that we used Lemma 1 again to determine the offspring of the node labeled 10.

Theorem 4.

Let $A=3$, $6$, $7$, $10$, $13$, … give the positions of $0$’s in the von Neumann sequence, and $B=1$, $2$, $4$, $5$, $8$, $9$, … the positions of $1$ in the von Neumann sequence. Then

$$\begin{gather*} A+A=\mathbb{N}\setminus \{2,3,4,5,7,8,11,15\}.\\ B+B=\mathbb{N}\setminus \{1\}. \end{gather*}$$

Proof.

The proof of $B+B=\mathbb{N}\setminus \{1\}$ is almost trivial.

The proof for $A+A$ is based on Theorem 1, with the trivial change that we consider the fixed point starting with 1. The proof runs as the proof for the Fibonacci word, except that here we need the 15 left most columns.

First one checks easily that the numbers $2,3,4,5,7,8,11,15$ are not in $A+A$.

To handle the diagonal words $d_n$ for $n\ge 16$, we split into the six cases we introduced in Figure 4. In Case 2, 3 and 4 we consider the three diagonals starting at the left border in the blocks $\psi (d)$, in Case 1, 5 and 6 the single diagonal starting at the left border in $\psi (c)$. See Figure 5 for the three cases 00, 11 and 001. This means that in Case 1 the top two blocks are $\psi (c)\psi (c)$, in Case 2 they are $\psi (d)\psi (d)$, and in Case 3 they are $\psi (c)\psi (c)\psi (d)$.

In Case 2 a block $\psi (c)$ must precede the top blocks $\psi (d)\psi (d)$, by Lemma 2. In a similar way there are forced blocks in Case 4 and Case 6.

The other blocks are then inserted following the von Neumannn words $ccaccaa\dots$ or $ddbddbb\dots$. In all cases, except the last, a row $ccaccaacca\dots$ has been added at the bottom. This is allowed because both $\psi (c)$ and $\psi (d)$ have the letter $c$ as prefix.

The red diagonals are those parts of the three diagonals starting in the top block $\psi (d)$, respectively the single diagonal starting in $\psi (c)$, that do not cross a square with label $a$, i.e., the red diagonal ends just before the first $a$-square. See Figure 6 for the three remaining cases.

The fact that in all six cases a square with label $a$ is encountered finishes the proof.

■

6. The sum of squares

Here we give a classical example of a sumset. Let $A=\{n^2: n\ge 1\}$. It is well known (see, e.g., Reference 2), that the characteristic function of $A$, as a word, is a letter-to-letter substitution $\lambda$ of the fixed point with prefix 0 of the morphism

$$\begin{equation*} 0\mapsto 01, \quad 1\mapsto 221, \quad 2\mapsto 2. \end{equation*}$$

The letter-to-letter map is given by $\lambda (0)=0$, $\lambda (1)=1$, $\lambda (2)=0$.

The corresponding 2D substitution is the morphism $\mu$ given by

The numbers $s=n^2+m^2$ that are a sum of two squares occur as sums of the indices $(n^2,m^2)$ of the squares with the symbols $11$ in the fixed point $S$ of $\mu$, limit of $\mu ^n(00)$ as $n\rightarrow \infty$.

At first sight it is surprising that this set has a fractal structure, but due to the fact that the morphism $\mu$ is not primitive (i.e., the incidence matrix of the substitution is reducible), there is no exponential scaling structure in the 2D word $S$, but rather a polynomial one. The latter simply amounts to the recursion $(n+1)^2=n^2+(2n+1)$. This recursion is clearly visible in Figure 7, which displays the two-dimensional word $\mu ^4(00)$.