# Completely uniformly distributed sequences based on de Bruijn sequences

By Emilio Almansi and Verónica Becher

## Abstract

We study a construction published by Donald Knuth in 1965 yielding a completely uniformly distributed sequence of real numbers. Knuth’s work is based on de Bruijn sequences of increasing orders and alphabet sizes, which grow exponentially in each of the successive segments composing the generated sequence. In this work we present a similar, albeit simpler, construction using linearly increasing alphabet sizes, and we give an elementary proof showing that the generated sequence is also completely uniformly distributed. In addition, we present an alternative proof of the same result based on Weyl’s criterion.

## 1. Introduction

Let be an infinite sequence of -dimensional vectors of real numbers. For a given set , we denote by the number of indexes in such that lies in . A sequence of -dimensional vectors in the unit cube is uniformly distributed if for every set included in ,

where is the measure of .

For the theory of uniform distribution see the monograph of Kuipers and Niederreiter Reference 15. Starting with a sequence of real numbers in the unit interval, we define for every integer the -dimensional sequence where for . The sequence is said to be completely uniformly distributed if for every integer , the sequence is uniformly distributed in the -dimensional unit cube.

If a sequence is completely uniformly distributed, then it also satisfies many other statistical properties common to all random sequences; see the work of Franklin Reference 5. For example, for every fixed integer , the probability of  consecutive terms having any specific relative order is .

The first reference to the concept of complete uniform distribution is the work of Korobov in 1948 Reference 8Reference 12Reference 13Reference 14, where he proves that the sequence with

is completely uniformly distributed. Since then, there have been many constructions of completely uniformly distributed sequences; see Reference 15Reference 19Reference 21 and the references cited there. In Reference 16 Levin gives a lower bound of decay of discrepancy of completely uniformly distributed sequences and gives constructions, using nets obtained with Sobol matrices and using Halton sequences, which achieve this low discrepancy bound.

Independently of the school of Korobov, in 1965 Knuth Reference 7 gives a construction of a completely uniformly distributed sequence of real numbers based on de Bruijn sequences of increasing orders and alphabet sizes. In each of the successive segments composing the sequence, the alphabet sizes of the underlying de Bruijn sequences increase exponentially. Knuth gives an elementary proof that the sequence is completely uniformly distributed by reasoning about the binary representation of the rational numbers . These binary representations are immediate because the alphabet size is always a power of .

In the present work we give a similar, albeit simpler, construction than the one given by Knuth in Reference 7 while preserving the property of complete uniform distribution. Our construction is based on de Bruijn sequences of linearly increasing orders and linearly increasing alphabet sizes. The argument used by Knuth to prove that his sequence is completely uniformly distributed does not hold in our case. We give two proofs: one is elementary, and the other is based on Weyl’s criterion. The construction and the proofs we present here can be generalized in a straightforward manner to other growths of alphabet sizes. A first version of this work appears in Reference 1.

It may be possible to construct completely uniformly distributed sequences with low discrepancy by using variants of de Bruijn sequences such as those considered by Becher and Carton in Reference 2 (nested perfect necklaces). In that work, they show that the binary expansion of the real number defined by Levin in Reference 17 using Pascal matrices is indeed the concatenation of such variants of de Bruijn sequences.

We finish this section with a short presentation of de Bruijn sequences. In Section 2 we present Knuth’s sequence, and we devote Section 3 to our construction.

As usual we write to denote the sets of positive integers, integers, real numbers, and complex numbers, respectively.

### 1.1. de Bruijn sequences

A -ary de Bruijn sequence of order  is a sequence of length  which, when viewed as a cycle, contains every possible -ary sequence of length  exactly once as a contiguous subsequence. For example, listed next are two distinct binary de Bruijn sequences of order :

Every binary sequence of length appears exactly once as a contiguous subsequence in each example. This includes those instances such as which wrap around the right-hand end of the sequences.

Each -ary de Bruijn sequence of order  is the label of a Hamiltonian cycle in the directed graph , where is the set of all -ary sequences of length  and in . Due to good properties of de Bruijn graphs, each -ary de Bruijn sequence of order  can also be identified with the label of an Eulerian cycle in . By the BEST theorem there are exactly different -ary de Bruijn sequences of order .

A -ary Ford sequence of order , denoted by , is the lexicographically least -ary de Bruijn sequence of order . In the example above, the first sequence is the binary Ford sequence of order , or . In Reference 6, Fredricksen and Maiorana introduce an algorithm for generating the -ary Ford sequence of order , which has constant amortized running time; see Reference 20.

The name de Bruijn sequence came after the work of the Dutch mathematician N. G. de Bruijn Reference 4. A historical background can be read from Reference 3. Korobov made significant contributions in this area and constructed uniformly distributed sequences based on de Bruijn sequences Reference 9Reference 10Reference 11.

## 2. Knuth’s sequence

We follow Knuth’s presentation in Reference 7. Knuth’s sequence can be defined from any given family of de Bruijn sequences. Here, we use Ford sequences because they are conveniently defined and because they can be efficiently generated.

### Definition.

Given in , let denote the -ary Ford sequence of order . An -sequence of order , denoted by , is the finite sequence of rational numbers obtained by dividing every term in by :

A -sequence of order , denoted by , is the sequence obtained from the concatenation of copies of :

By construction, the size of is

and the size of is

Notice that, for any given , all terms in and in are numbers in the set . For example, when ,

and , .

### Definition.

Knuth’s sequence, denoted by , is the infinite sequence of rational numbers resulting from the concatenation of the sequences for :

### Theorem (Knuth 1965, Reference 7, page 268).

The sequence is completely uniformly distributed.

Knuth provides an elementary proof of this theorem. Two choices play an important role in his proof. One is that the number of repetitions of each -sequence within a -sequence is . To ensure complete uniform distribution it is sufficient that the number of repetitions of each -sequence within a -sequence grows faster than . For instance, suffices.

The other choice in Knuth’s construction is that the alphabet sizes of the Ford sequences grow exponentially as . This allows us to reason about the rational numbers comprised in the sequence in terms of their binary representations. Knuth considers the distribution of the most-significant bits of each of the terms in a -ary Ford sequence, where , see Reference 7, Lemma 2.

## 3. Main result

We now present the main contribution of this work, which is a variant of Knuth’s construction based on Ford sequences with linearly increasing alphabet sizes.

### Definition.

Given in and a function , let denote an -ary Ford sequence of order . A -sequence of order , denoted by , is the finite sequence of rational numbers obtained by dividing each of the terms in by :

and a -sequence of order , denoted by , is the sequence obtained from the concatenation of consecutive copies of :

The size of is

and the size of is

For any given , all terms in and are numbers in the set . The key difference between the way -sequences are constructed when compared to -sequences from the previous section is that, as the order of the sequence grows, the alphabet size for the underlying Ford sequence grows linearly () rather than exponentially (). For example, when and :

and , .

### Definition.

Given , the sequence is the infinite sequence of real numbers obtained from the concatenation of the sequences for :

### Theorem 1.

If is a nondecreasing function and , then the sequence is completely uniformly distributed.

### Example.

If , then

and is completely uniformly distributed.

### 3.1. Proof of Theorem 1

Within the current section, let be an arbitrary but fixed function. For convenience we write and in place of and . Consider a prefix of of length , denoted by . Let be the integers determined by such that

where and . Here, is the order of the rightmost, possibly incomplete -sequence present in . The number is the amount of complete -sequences of order appearing before the rightmost, possibly incomplete -sequence, while is the amount of terms in it. Thus,

Let be a positive integer, and let be a set such that . Let range freely over the natural numbers, and let the quantity denote the number of windows of of size starting at indices that belong to the set :

where for .

Consider sufficiently large values of such that . This is always possible since is an unbounded, nondecreasing function of . We can decompose into four consecutive sections; namely, sequences , , , and :

Notice that and can potentially be empty, such as when or , respectively.

We denote the cumulative sums of the sizes of the sequences defined above as , and for . Now, we can similarly decompose into five parts. If we let

for , then

for some .

For each , the quantity accounts for windows contained entirely within the sequence , and accounts for all windows crossing any of the three borders between the four sections. This is enough to account for all possible windows, since any given window is either entirely contained in some section or it starts at a given section and ends at a subsequent one, thereby crossing a border.

We can rewrite the value of each in terms of windows over each sequence . If we let

for and , then

Before obtaining more precise expressions for these quantities, we state the following three technical propositions.

#### Proposition 2.

If is in and are in such that , then the number of integers from the set contained in is equal to for some in .

#### Proof.

Since , there are exactly nonnegative integers in the set for some in . Similarly for , there are exactly nonnegative integers in the set for some in . The difference between these two quantities is equal to the number of nonnegative integers contained in the set , which is . Observing that in and that all nonnegative integers between and belong to the set , the proof is complete.

#### Proposition 3.

If is a positive integer and and are two sequences of real numbers of length , then the product of their element-by-element sums can be expressed as follows:

#### Proof.

By induction on . First, notice that the property holds for :

Next, we see that the inductive step holds for any . First,

Since for every the value and is therefore even, we can add the factor to the product in the second term simply by raising the upper limit to . Similarly, in the fourth term we can add the factor to the product by raising the upper limit to and changing the limits in the sum to . This is true because adding to does not change the value of for any , but when the value and is therefore odd. The third term can be rewritten as a similar product for a value of , and by substituting it into the equation above,

which completes the proof.

#### Proposition 4.

Given in , the following holds:

#### Proof.

By induction on . The property holds for and :

and the inductive step holds for :

Therefore, the property holds for all in .

We now obtain an expression for the number of windows of a -sequence which are contained in the set . This is useful for evaluating .

#### Lemma 5.

Given a positive integer and a set where , let in such that and consider the sequence as a cyclic sequence. If we let for , then the number of windows of size from the sequence that lie in is

for some in .

#### Proof.

First, notice that any given window is contained in the set if and only if the following is true:

where and indices are taken modulo .

Since all terms in are numbers in the set , we multiply both sides of each inequality by , allowing us to reason about integers belonging to a Ford sequence instead of rational numbers. We obtain the following:

According to Proposition 2, for each inequality above with there are exactly possible solutions in the set for some value in . This yields a total of

possible solutions to the system of inequalities. Each solution, when seen as an -ary sequence of length , appears exactly times as a contiguous subsequence in . This is true because there are ways of extending an -ary sequence of length to one of length and, by construction, each of these appears exactly once in when viewed as a cycle. Since ranges exactly once over each possible window of , then

Using Proposition 3, we can expand this into the following:

If we define for as

then, for each , the value is in . This is true because the product on the right-hand side is composed of terms in and in and, since , there is always at least one term of the second kind. Given that

we can further simplify equation Equation 4 to get

Finally, since , there exist some in such that

and hence, by Equation 3 and Equation 5,

We can now give the proof of our main result.

#### Proof of Theorem 1 $1$.

Let be a positive integer, and let be a set such that . Let range freely over the natural numbers. We now obtain an expression for and compute its limit when .

Recall the following definitions:

where

for and .

The sequences and are entirely composed of complete -sequences of increasing order which are larger than or equal to . Moreover, with the exception of the last, rightmost instance in each of and , every single -sequence is immediately succeeded by another -sequence of the same or the following order, including those which are part of a -sequence. Additionally, any window starting at the right-hand end of a -sequence necessarily finishes within the first elements of the following -sequence, all of which are guaranteed to be .

Therefore, the amount of windows of size contained in ranging over and is equal to the sum over each composing -sequence viewed as a cycle, with an error of at most due to the fact that we are counting only windows entirely contained within each sequence. If we let

for and , then

for some values , and . From Lemma 5

for some values of in , . Substituting back into from equation Equation 2,

and factoring out terms multiplied by , we get

We now rewrite the first term using the relationship between , , , and from equation Equation 1:

After dividing both sides by and rearranging terms we obtain

Taking limits on both sides as , the third term on the right-hand side approaches since the contents of the brackets are dependent on and bounded as a function of . For the second term, using the fact that is nondecreasing together with Proposition 4, we can see that

and

Since is an unbounded, nondecreasing function of , this term approaches  as well.

Finally, consider the first term on the right-hand side. Since both and are values between and , then in . Moreover, using the identity

with , and using that , we can see that for large values of ,

which by hypothesis also approaches as . Hence,

Since and were chosen arbitrarily, the sequence is completely uniformly distributed and the proof of Theorem 1 is complete.

### 3.2. Alternative Proof of Theorem 1

Weyl’s criterion for the multidimensional case states that if is a sequence of -dimensional vectors of real numbers, then is uniformly distributed in the unit cube if and only if for all nonzero -dimensional vectors of integers ,

where denotes the dot product of and ; see Reference 15. Before applying Weyl’s criterion to the sequence , we first prove the following lemma based on a well-known fact about the roots of unity.

#### Lemma 6.

Given a positive integer and a nonzero -dimensional vector of integers , let be in such that and consider the sequence as a cyclic sequence. If we let for , then

#### Proof.

If we view as a cyclic sequence and let

for , then from the definition of a -sequence it follows that . Then, if we define , every in appears exactly times as a contiguous subsequence of . This is true because there are ways of extending an -ary sequence of length to one of length and, by construction, each of these appears exactly once in when viewed as a cycle. Substituting into the left-hand side of equation Equation 6,

Since in and the function , is periodic with a period equal to , then

where the congruence is taken modulo . The conditions for the existence of solutions to equations of the form

are well understood (see Reference 18, page 114). In particular, this equation only has solutions when divides and, in such a case, the total number of solutions is equal to . Therefore,

where the last step comes from substituting . If we also substitute and observe that , then

The term on the right-hand side is the sum of all the roots of unity of order . It is a well-known fact that this sum is equal to whenever . Finally, since , then , and the proof is complete.

#### Alternative Proof of Theorem 1 $1$.

Let be a positive integer, and let be a nonzero -dimensional vector of integers. Let range freely over the natural numbers. As in Section 3.1, we consider a prefix of the sequence of length , denoted , and we consider the values , , and as functions of . Let for .

Analogously to the first proof of Theorem 1, we define the complex value as the Weyl sum over the first windows of size  of :

Let and consider sufficiently large values of such that . As before, this is always possible since is an unbounded, nondecreasing function of . We can decompose into four consecutive sections; namely, sequences , , , and :

and define , , , and as the Weyl sums over windows entirely contained within each respective section such that

for some complex number with , which accounts for all windows crossing over any border.

Following the same reasoning as in Section 3.1, the values of and can be computed as the Weyl sums over the -sequences composing and viewed as cycles, plus two error terms accounting for right borders. However, in this case, due to Lemma 6 and the fact that all sequences in and have orders greater than , these sums vanish to zero. Therefore,

for some complex values with . We now consider the limit of as . Observe that

The numerator in the first term is dependent on and , and constant as a function of . As shown before, the second term approaches as . Therefore, the sum of both terms approaches as , which in turn implies that vanishes as well.

Given that and were chosen arbitrarily, Weyl’s criterion is satisfied for all values of and the sequence is completely uniformly distributed.