# Completely uniformly distributed sequences based on de Bruijn sequences

## Abstract

We study a construction published by Donald Knuth in 1965 yielding a completely uniformly distributed sequence of real numbers. Knuth’s work is based on de Bruijn sequences of increasing orders and alphabet sizes, which grow exponentially in each of the successive segments composing the generated sequence. In this work we present a similar, albeit simpler, construction using linearly increasing alphabet sizes, and we give an elementary proof showing that the generated sequence is also completely uniformly distributed. In addition, we present an alternative proof of the same result based on Weyl’s criterion.

## 1. Introduction

Let be an infinite sequence of vectors of real numbers. For a given set -dimensional we denote by , the number of indexes in such that lies in A sequence . of vectors in the unit cube is uniformly distributed if for every set -dimensional included in ,

where is the measure of .

For the theory of uniform distribution see the monograph of Kuipers and Niederreiter Reference 15. Starting with a sequence of real numbers in the unit interval, we define for every integer the sequence -dimensional where for The sequence . is said to be completely uniformly distributed if for every integer the sequence , is uniformly distributed in the unit cube. -dimensional

If a sequence is completely uniformly distributed, then it also satisfies many other statistical properties common to all random sequences; see the work of Franklin Reference 5. For example, for every fixed integer the probability of , consecutive terms having any specific relative order is .

The first reference to the concept of complete uniform distribution is the work of Korobov in 1948 Reference 8Reference 12Reference 13Reference 14, where he proves that the sequence with

is completely uniformly distributed. Since then, there have been many constructions of completely uniformly distributed sequences; see Reference 15Reference 19Reference 21 and the references cited there. In Reference 16 Levin gives a lower bound of decay of discrepancy of completely uniformly distributed sequences and gives constructions, using nets obtained with Sobol matrices and using Halton sequences, which achieve this low discrepancy bound.

Independently of the school of Korobov, in 1965 Knuth Reference 7 gives a construction of a completely uniformly distributed sequence of real numbers based on de Bruijn sequences of increasing orders and alphabet sizes. In each of the successive segments composing the sequence, the alphabet sizes of the underlying de Bruijn sequences increase exponentially. Knuth gives an elementary proof that the sequence is completely uniformly distributed by reasoning about the binary representation of the rational numbers These binary representations are immediate because the alphabet size is always a power of . .

In the present work we give a similar, albeit simpler, construction than the one given by Knuth in Reference 7 while preserving the property of complete uniform distribution. Our construction is based on de Bruijn sequences of linearly increasing orders and linearly increasing alphabet sizes. The argument used by Knuth to prove that his sequence is completely uniformly distributed does not hold in our case. We give two proofs: one is elementary, and the other is based on Weyl’s criterion. The construction and the proofs we present here can be generalized in a straightforward manner to other growths of alphabet sizes. A first version of this work appears in Reference 1.

It may be possible to construct completely uniformly distributed sequences with low discrepancy by using variants of de Bruijn sequences such as those considered by Becher and Carton in Reference 2 (nested perfect necklaces). In that work, they show that the binary expansion of the real number defined by Levin in Reference 17 using Pascal matrices is indeed the concatenation of such variants of de Bruijn sequences.

We finish this section with a short presentation of de Bruijn sequences. In Section 2 we present Knuth’s sequence, and we devote Section 3 to our construction.

As usual we write to denote the sets of positive integers, integers, real numbers, and complex numbers, respectively.

### 1.1. de Bruijn sequences

A de Bruijn sequence of order -ary is a sequence of length which, when viewed as a cycle, contains every possible sequence of length -ary exactly once as a contiguous subsequence. For example, listed next are two distinct binary de Bruijn sequences of order :

Every binary sequence of length appears exactly once as a contiguous subsequence in each example. This includes those instances such as which wrap around the right-hand end of the sequences.

Each de Bruijn sequence of order -ary is the label of a Hamiltonian cycle in the directed graph where , is the set of all sequences of length -ary and in Due to good properties of de Bruijn graphs, each . de Bruijn sequence of order -ary can also be identified with the label of an Eulerian cycle in By the BEST theorem there are exactly . different de Bruijn sequences of order -ary .

A Ford sequence of order -ary denoted by , is the lexicographically least , de Bruijn sequence of order -ary In the example above, the first sequence is the binary Ford sequence of order . or , In .Reference 6, Fredricksen and Maiorana introduce an algorithm for generating the Ford sequence of order -ary which has constant amortized running time; see ,Reference 20.

The name *de Bruijn* sequence came after the work of the Dutch mathematician N. G. de Bruijn Reference 4. A historical background can be read from Reference 3. Korobov made significant contributions in this area and constructed uniformly distributed sequences based on de Bruijn sequences Reference 9Reference 10Reference 11.

## 2. Knuth’s sequence

We follow Knuth’s presentation in Reference 7. Knuth’s sequence can be defined from any given family of de Bruijn sequences. Here, we use Ford sequences because they are conveniently defined and because they can be efficiently generated.

### Definition.

Given in let , denote the Ford sequence of order -ary An . of order -sequence denoted by , is the finite sequence of rational numbers obtained by dividing every term in , by :

A of order -sequence denoted by , is the sequence obtained from the concatenation of , copies of :

By construction, the size of is

and the size of is

Notice that, for any given all terms in , and in are numbers in the set For example, when . ,

and , .

### Definition.

Knuth’s sequence, denoted by is the infinite sequence of rational numbers resulting from the concatenation of the sequences , for :

### Theorem (Knuth 1965, Reference 7, page 268).

The sequence is completely uniformly distributed.

Knuth provides an elementary proof of this theorem. Two choices play an important role in his proof. One is that the number of repetitions of each within a -sequence is -sequence To ensure complete uniform distribution it is sufficient that the number of repetitions of each . within a -sequence grows faster than -sequence For instance, . suffices.

The other choice in Knuth’s construction is that the alphabet sizes of the Ford sequences grow exponentially as This allows us to reason about the rational numbers comprised in the sequence . in terms of their binary representations. Knuth considers the distribution of the most-significant bits of each of the terms in a Ford sequence, where -ary see ,Reference 7, Lemma 2.

## 3. Main result

We now present the main contribution of this work, which is a variant of Knuth’s construction based on Ford sequences with linearly increasing alphabet sizes.

### Definition.

Given in and a function let , denote an Ford sequence of order -ary A . of order -sequence denoted by , is the finite sequence of rational numbers obtained by dividing each of the terms in , by :

and a of order -sequence denoted by , is the sequence obtained from the concatenation of , consecutive copies of :

The size of is

and the size of is

For any given all terms in , and are numbers in the set The key difference between the way . are constructed when compared to -sequences from the previous section is that, as the order of the sequence grows, the alphabet size for the underlying Ford sequence grows linearly -sequences( rather than exponentially )( For example, when ). and :

and , .

### Definition.

Given the sequence , is the infinite sequence of real numbers obtained from the concatenation of the sequences for :

### Theorem 1.

If is a nondecreasing function and then the sequence , is completely uniformly distributed.

### Example.

If then ,

and is completely uniformly distributed.

### 3.1. Proof of Theorem 1

Within the current section, let be an arbitrary but fixed function. For convenience we write and in place of and Consider a prefix of . of length denoted by , Let . be the integers determined by such that

where and Here, . is the order of the rightmost, possibly incomplete present in -sequence The number . is the amount of complete of order -sequences appearing before the rightmost, possibly incomplete while -sequence, is the amount of terms in it. Thus,

Let be a positive integer, and let be a set such that Let . range freely over the natural numbers, and let the quantity denote the number of windows of of size starting at indices that belong to the set :

where for .

Consider sufficiently large values of such that This is always possible since . is an unbounded, nondecreasing function of We can decompose . into four consecutive sections; namely, sequences , , and , :

Notice that and can potentially be empty, such as when or respectively. ,

We denote the cumulative sums of the sizes of the sequences defined above as and , for Now, we can similarly decompose . into five parts. If we let

for then ,

for some .

For each the quantity , accounts for windows contained entirely within the sequence and , accounts for all windows crossing any of the three borders between the four sections. This is enough to account for all possible windows, since any given window is either entirely contained in some section or it starts at a given section and ends at a subsequent one, thereby crossing a border.

We can rewrite the value of each in terms of windows over each sequence If we let .

for and then ,

Before obtaining more precise expressions for these quantities, we state the following three technical propositions.

#### Proposition 2.

If is in and are in such that then the number of integers from the set , contained in is equal to for some in .

#### Proof.

Since there are exactly , nonnegative integers in the set for some in Similarly for . there are exactly , nonnegative integers in the set for some in The difference between these two quantities is equal to the number of nonnegative integers contained in the set . which is , Observing that . in and that all nonnegative integers between and belong to the set the proof is complete. ,

■#### Proposition 3.

If is a positive integer and and are two sequences of real numbers of length then the product of their element-by-element sums can be expressed as follows: ,

#### Proof.

By induction on First, notice that the property holds for . :

Next, we see that the inductive step holds for any First, .

Since for every the value and is therefore even, we can add the factor to the product in the second term simply by raising the upper limit to Similarly, in the fourth term we can add the factor . to the product by raising the upper limit to and changing the limits in the sum to This is true because adding . to does not change the value of for any but when , the value and is therefore odd. The third term can be rewritten as a similar product for a value of and by substituting it into the equation above, ,

which completes the proof.

■#### Proposition 4.

Given in the following holds: ,

#### Proof.

By induction on The property holds for . and :

and the inductive step holds for :

Therefore, the property holds for all in .

■We now obtain an expression for the number of windows of a which are contained in the set -sequence This is useful for evaluating . .

#### Lemma 5.

Given a positive integer and a set where let , in such that and consider the sequence as a cyclic sequence. If we let for then the number of windows of size , from the sequence that lie in is

for some in .

#### Proof.

First, notice that any given window is contained in the set if and only if the following is true:

where and indices are taken modulo .

Since all terms in are numbers in the set we multiply both sides of each inequality by , allowing us to reason about integers belonging to a Ford sequence instead of rational numbers. We obtain the following: ,

According to Proposition 2, for each inequality above with there are exactly possible solutions in the set for some value in This yields a total of .

possible solutions to the system of inequalities. Each solution, when seen as an sequence of length -ary appears exactly , times as a contiguous subsequence in This is true because there are . ways of extending an sequence of length -ary to one of length and, by construction, each of these appears exactly once in when viewed as a cycle. Since ranges exactly once over each possible window of then ,

Using Proposition 3, we can expand this into the following:

If we define for as

then, for each the value , is in This is true because the product on the right-hand side is composed of terms . in and in and, since there is always at least one term of the second kind. Given that ,

we can further simplify equation Equation 4 to get

Finally, since there exist some , in such that

and hence, by Equation 3 and Equation 5,

■

We can now give the proof of our main result.

#### Proof of Theorem .

Let be a positive integer, and let be a set such that Let . range freely over the natural numbers. We now obtain an expression for and compute its limit when .

Recall the following definitions:

where

for and .

The sequences and are entirely composed of complete of increasing order which are larger than or equal to -sequences Moreover, with the exception of the last, rightmost instance in each of . and every single , is immediately succeeded by another -sequence of the same or the following order, including those which are part of a -sequence Additionally, any window starting at the right-hand end of a -sequence. necessarily finishes within the first -sequence elements of the following all of which are guaranteed to be -sequence, .

Therefore, the amount of windows of size contained in ranging over and is equal to the sum over each composing -sequence*viewed as a cycle*, with an error of at most due to the fact that we are counting only windows entirely contained within each sequence. If we let

for and then ,

for some values and , From Lemma .5

for some values of in , Substituting back into . from equation Equation 2,

and factoring out terms multiplied by we get ,

We now rewrite the first term using the relationship between , , and , from equation Equation 1:

After dividing both sides by and rearranging terms we obtain

Taking limits on both sides as the third term on the right-hand side approaches , since the contents of the brackets are dependent on and bounded as a function of For the second term, using the fact that . is nondecreasing together with Proposition 4, we can see that

and

Since is an unbounded, nondecreasing function of this term approaches , as well.

Finally, consider the first term on the right-hand side. Since both and are values between and then , in Moreover, using the identity .

with and using that , , we can see that for large values of ,

which by hypothesis also approaches as Hence, .

Since and were chosen arbitrarily, the sequence is completely uniformly distributed and the proof of Theorem 1 is complete.

■### 3.2. Alternative Proof of Theorem 1

Weyl’s criterion for the multidimensional case states that if is a sequence of vectors of real numbers, then -dimensional is uniformly distributed in the unit cube if and only if for all nonzero vectors of integers -dimensional ,

where denotes the dot product of and see ;Reference 15. Before applying Weyl’s criterion to the sequence we first prove the following lemma based on a well-known fact about the roots of unity. ,

#### Lemma 6.

Given a positive integer and a nonzero vector of integers -dimensional let , be in such that and consider the sequence as a cyclic sequence. If we let for then ,

#### Proof.

If we view as a cyclic sequence and let

for then from the definition of a , it follows that -sequence Then, if we define . every , in appears exactly times as a contiguous subsequence of This is true because there are . ways of extending an sequence of length -ary to one of length and, by construction, each of these appears exactly once in when viewed as a cycle. Substituting into the left-hand side of equation Equation 6,

Since in and the function , is periodic with a period equal to then ,

where the congruence is taken modulo The conditions for the existence of solutions to equations of the form .

are well understood (see Reference 18, page 114). In particular, this equation only has solutions when divides and, in such a case, the total number of solutions is equal to Therefore, .

where the last step comes from substituting If we also substitute . and observe that then ,

The term on the right-hand side is the sum of all the roots of unity of order It is a well-known fact that this sum is equal to . whenever Finally, since . then , and the proof is complete. ,

■#### Alternative Proof of Theorem .

Let be a positive integer, and let be a nonzero vector of integers. Let -dimensional range freely over the natural numbers. As in Section 3.1, we consider a prefix of the sequence of length denoted , and we consider the values , , and , as functions of Let . for .

Analogously to the first proof of Theorem 1, we define the complex value as the Weyl sum over the first windows of size of :

Let and consider sufficiently large values of such that As before, this is always possible since . is an unbounded, nondecreasing function of We can decompose . into four consecutive sections; namely, sequences , , and , :

and define , , and , as the Weyl sums over windows entirely contained within each respective section such that

for some complex number with which accounts for all windows crossing over any border. ,

Following the same reasoning as in Section 3.1, the values of and can be computed as the Weyl sums over the composing -sequences and viewed as cycles, plus two error terms accounting for right borders. However, in this case, due to Lemma 6 and the fact that all sequences in and have orders greater than these sums vanish to zero. Therefore, ,

for some complex values with We now consider the limit of . as Observe that .

The numerator in the first term is dependent on and and constant as a function of , As shown before, the second term approaches . as Therefore, the sum of both terms approaches . as which in turn implies that , vanishes as well.

Given that and were chosen arbitrarily, Weyl’s criterion is satisfied for all values of and the sequence is completely uniformly distributed.

■