On measure-preserving ${\mathcal C}^1$ transformations of compact-open subsets of non-archimedean local fields

We introduce the notion of a \emph{locally scaling} transformation defined on a compact-open subset of a non-archimedean local field. We show that this class encompasses the Haar measure-preserving transformations defined by ${\mathcal C}^1$ (in particular, polynomial) maps, and prove a structure theorem for locally scaling transformations. We use the theory of polynomial approximation on compact-open subsets of non-archimedean local fields to demonstrate the existence of ergodic Markov, and mixing Markov transformations defined by such polynomial maps. We also give simple sufficient conditions on the Mahler expansion of a continuous map $\mathbb Z_p \to \mathbb Z_p$ for it to define a Bernoulli transformation.


Introduction
The p-adic numbers have arisen in a natural way in the study of some dynamical systems, for example in the study of group automorphisms of solenoids in Lind and Schmidt [LS94]; other situations in dynamics where the p-adic numbers come up are surveyed in Ward [War]. At the same time there has been interest in studying the dynamics (topological, complex, or measurable) of naturally arising maps (such as polynomials) defined on the p-adics; see for example Benedetto [Ben01], Khrennikov and Nilson [KN04], and Rivera-Letelier [RL03]. In particular, Bryk and Silva in [BS05] studied the measurable dynamics of simple polynomials on balls and spheres on the field Q p of p-adic numbers. The maps they studied are ergodic but not totally ergodic and they asked whether there exist polynomials on Q p that define (Haar) measure-preserving transformations that are mixing. Woodcock and Smart in [WS98] show that the polynomial map x → x p −x p defines a Bernoulli, hence mixing, transformation on Z p . A consequence of our work is a significant extension of the result for this map, placing it in a greater context (see in particular Example 8.6).
Rather than working on Q p we find that the natural setting for our work is over a nonarchimedean local field K. We introduce a class of transformations, called locally scaling, and show in Lemmas 4.4 that measure-preserving C 1 (in particular, polynomial) maps are locally scaling. In Section 5 we apply the theory of Markov shifts to classify the dynamics of locally scaling transformations, decomposing the transformation into a disjoint union of ergodic Markov transformations and local isometries. In particular, we show that a weakly mixing locally scaling transformation must be mixing. We also show the existence of polynomials defining transformations exhibiting nearly the full range of behaviors possible for locally scaling transformations, such as ergodic Markov, mixing Markov, and Bernoulli transformations.
Given a polynomial defined on a compact-open subset of K, our work shows that a finite computation may check whether it defines a measure-preserving transformation and whether it defines a mixing transformation; the question of ergodicity is also answered, except in the case where the polynomial is 1-Lipschitz, which has been studied by Anashin in [Ana02].
We briefly mention related works studying measurable dynamics of certain maps on spaces related to the p-adics. These works [RB], [FRL04], and [FRL06] construct a natural invariant measure for a wide-class of rational functions, as in existing constructions in complex dynamics. The natural domain for these constructions is the so-called Berkovich projective space, a space much larger than the ordinary p-adics.
We now indicate an outline of the rest of the paper. Section 2 reviews results on Markov shifts, Section 3 reviews preliminaries on non-archimedean local fields as well as some analytic definitions, and Section 6 recalls some of the theory of polynomial approximation on rings of integers of non-archimedean local fields.
Section 4 establishes the fact that measure-preserving C 1 maps are locally scaling, and Section 5 proves our main structural results, in particular Proposition 5.5 and Theorem 5.6. Section 7, in particular Theorem 7.2, shows that polynomial maps are in a sense a representative class of locally scaling transformations, and demonstrates the existence of polynomial maps defining locally scaling transformation with various behaviors, including mixing. Section 8 and Section 10 are devoted to demonstrating two interesting classes of locally scaling maps on Z p that arise naturally in the study of polynomial approximations. Specifically, Section 8 studies maps which are isometrically conjugate to the natural realization of the (one-sided) Bernoulli shift, and shows for instance that the map x → x p ℓ on Z p is Bernoulli. Section 10 then studies similar binomial-coefficient maps which are locally scaling and so have very regular structures but fail to be Haar measure-preserving.
1.1. Acknowledgements. This paper is based on research by the Ergodic Theory group of the 2005 SMALL summer research project at Williams College. Support for the project was provided by National Science Foundation REU Grant DMS -0353634 and the Bronfman Science Center of Williams College. The authors would like to thank several anonymous referees for careful readings of the paper and valuable suggestions.

Markov shifts
Let H be a finite non-empty set. By a stochastic matrix on H we mean a map A : Putting H into a bijection with the set {0, . . . , #H − 1} we may regard A as a #H × #H matrix with non-negative entries and the entries in each row summing to 1. In analogy with this case, we will refer to the sets {A(i, ·)} and {A(·, j)} as rows and columns of A, respectively. By a row vector on H we mean a map w : H → R. For w a row vector and A a stochastic matrix, we define their product as the row vector wA defined by We will say that w is non-negative (resp. positive) if it takes values in R ≥0 (resp R >0 ).
To any stochastic matrix A we may associate the following symbolic dynamical system: H : A(π n (x), π n+1 (x)) = 0 for all n ≥ 0} where π n : i≥0 H → H is projection to the n th coordinate. Give each finite factor the discrete topology, and X A the subspace topology inherited from the product topolog (ii) Let T A : X A → X A be defined by π n • T A = π n+1 ; that is, T A is simply "shifting left." Then, (X A , T A ) is a topological dynamical system. (iii) If in addition we are given a non-negative row vector w, then we may define a measure on X A by We call a set of the form [d 0 . . . d ℓ ] a cylinder set; we may observe that the cylinder sets form a base for the topology on X A . Note that if w is in fact positive, then µ A,w assigns positive measure to each cylinder set and hence to each open set. We may check that if w = wA then T A is measure-preserving with respect to µ A,w . We call such a dynamical system a Markov shift. We say that a dynamical system is Markov if it is isomorphic to some Markov shift.
We say that a stochastic matrix A is irreducible or ergodic if for each i, j ∈ H there exists a n ∈ N such that A n (i, j) > 0. This condition has a natural interpretation in terms of the connectedness of a certain directed graph associated with A, as we shall see in the proof of Proposition 2.1. We say that a stochastic matrix A is primitive if there exists a n ∈ N such that A n (i, j) > 0 for all i, j ∈ H.
Using the Perron-Frobenius Theorem on non-negative irreducible and primitive matrices, along with a graph theoretic interpretation of the stochastic matrix, one may obtain an ergodic decomposition result for Markov shifts: Proposition 2.1. Let A be a stochastic matrix, and w a positive row vector such that w = wA.
Then, we may partition H into disjoint sets . . , n Then, w k = w| H k satisfies w k = w k A k . And we have the ergodic decomposition of (X A , µ A,w , T A ) as Proof. Construct a graph on H as follows. We place a directed edge from i → j if and only if A(i, j) > 0. As w is strictly positive, this is equivalent to the condition that w(i)A(i, j) > 0. We say that the flow or flux associated to this edge is w(i)A(i, j). Now, the flow out of i is as w = wA. So, we see that the flux into and out of i are both equal to w(i). This implies that for every finite subset of H, the in-flux and out-flux will be equal. For i ∈ H, let R(i) be the set of points reachable from i, and B(i) the set of points which can reach i. Note that R(i) has out-flux 0 by construction, and B(i) has in-flux 0 by construction; as H is finite, these subsets are finite, so both have in-flux and out-flux equal to 0. Now, we can have no edges into or out of either of these two sets. But, if t ∈ R(i) and y ∈ B(i), then there is a path from y to t; so we must have t ∈ B(i) and y ∈ R(i), and so B(i) = R(i). So, B(i) = R(i) is strongly connected, and there are no edges into or out of this set.
For ℓ > 0, note that A ℓ (i, j) > 0 is equivalent to there being a path of length precisely ℓ from i to j. It follows that the collection gives our desired decomposition of H.
We readily note that w k = w k A k for for k = 1, . . . , n. Then, as A k is irreducible, [Wal82, Theorem 1.19] implies that the k th summand is ergodic, from which the ergodic decomposition follows. Finally, [Wal82, Theorem 1.31] implies that the k th summand is mixing if A k is primitive.
3. Analytic definitions, preliminaries, and notation Let K be a non-archimedean local field, which we take to be either a finite field extension of Q p or F p n ((t)) for some prime p.
Let |·| be a non-archimedean multiplicative valuation (sometimes called a "non-archimedean absolute value") on K, such that | · | generates the topology on K. Denote V = |K × | = {|x| : x ∈ K × }, O = {x ∈ K : |x| ≤ 1} and p = {x ∈ K : |x| < 1} (note that O, p are independent of the choice of valuation). It is the case that O is a ring with maximal idea p and that O/p is a finite field (the residue field ). Let p = char O/p, q = #O/p, both finite with q a power of p.
We denote B r (x) = {y ∈ K : |x − y| ≤ r}, and call such a set (for any value of r) a ball. A ball of radius precisely r will be called an r-ball. Let µ be Haar measure on K, normalized such that µ(O) = 1; define ρ : V → R >0 by ρ(r) = µ(B r (0)). Now, we recall the following standard results: (i) O is a discrete valuation ring with unique maximal ideal p; (ii) p = πO for any π ∈ p \ p 2 ; we call any such π a uniformizing parameter ; (iii) V is the discrete abelian (multiplicative) subgroup of Q generated by |π|; in light of this, we may define a map v : K → Z ∪ {+∞} defined by v(0) = +∞ and v(x) = log |π| |x| for x ∈ K × ; this is the additive valuation (sometimes just "valuation") on K; (iv) For r = |π| k , k ≥ 0 it is the case that Indeed, for r ∈ V we see that ρ(r) = q − log |π| r ; (v) A subset X ⊆ K is compact-open if and only if X is a finite union of balls.
We direct the interested reader to [Ser62] for a thorough treatment of related topics.
We will continue to use the symbols K, µ, p, q, | · |, O, p, π, v, V, ρ, B r with these meanings below.
Let X be an open subset of K and a ∈ X. Then, we say that a function f : X → K is strictly differentiable or C 1 at a (denoted f ∈ C 1 (a)) if the limit lim (x,y)→(a,a) x =y for each a ∈ X. For more on this notion, see [Sch84] or [Rob00].
4. Measure preserving C 1 maps on non-archimedean local fields Definition 4.1. For X ⊆ K compact-open, we say that a transformation T : X → X is locally scaling for r ∈ V if X is a finite union of r-balls and if there exists a function C : X → R ≥1 such that We will refer to C as the scaling function.
Remark 4.2. Let us make the following observations about locally scaling transormations: (i) By the symmetry of x and y in the previous displayed equation, C is constant on cosets of B r (0). We will write H = X/B r (0) for the set of cosets of B r (0) contained in X (recall that X is a union of such cosets); we treat elements of H as subsets of X. Then, C induces a map C : H → R ≥1 . (ii) The terminology "locally scaling" is convenient but perhaps slightly misleading: For us, such transformations must not only locally scale distances, but must do so by a factor that is at least 1.
This definition is motivated by the ease of analyzing the structure of such maps together with the following easy lemma: Then, |f ′ (x)| is locally constant on X. If, moreover, X is compact, f (X) ⊂ X, and |f ′ (a)| ≥ 1 or all a ∈ X, then the induced transformation f : X → X is locally scaling for some r ∈ V.
Proof. Fix a ∈ X. Since X is open and f ∈ C 1 (a), there exists r a ∈ V so that B ra (a) ⊂ X and By the strong triangle inequality, it follows that |f ( is an open cover of X. Taking r ≤ min k i=0 r a k , we see that X is a union of r-balls and that f restricted to each B r (a) scales distance by |f ′ (a)| ≥ 1 by the previous displayed equation. So, f is locally scaling for r ∈ V with scaling function It follows, by the strong triangle inequality, that |f (x) − f (y)| ≤ |x − y|α for x, y ∈ B r (a). So, B αr (f (a)) ⊆ X by construction and moreover f −1 (B αr (f (a))) ⊇ B r (a). Taking measures we note that This proves that |f ′ (a) ≥ 1 for all a ∈ X. The remaining part of the claim follows from Lemma 4.3.
Remark 4.5. Note that for f ∈ K[x], f ∈ C 1 (K). So, if f (X) ⊂ X induces a measurepreserving transformation f : X → X, then Lemma 4.4 and Lemma 4.3 imply that f is locally scaling.
Example 4.6. Consider the map f : Then, So, f is locally scaling for r = 1/2, since |x + y − 1| = 1 when |x − y| ≤ 1/2. We will see in Section 8 that f : Z 2 → Z 2 is actually measure-preserving and in fact Bernoulli.
defines a transformation f : Z p → Z p . Since f is polynomial, we have f ∈ C 1 (Z p ) and thus, by Lemma 4.3, is locally scaling for some r ∈ V. We can use the Taylor expansion of f to find such an r (this idea is similar to that in [KN04, p. 33, Lemma 1.6]). Specifically, writing we note by the strong triangle inequality that it suffices to choose r so that for all x ∈ Z p and n ≥ 2.

Structure of locally scaling transformations
is a bijection.
. This implies that T (B) ⊆ B ′ , so our restriction is well-defined. It also implies that the restriction is injective. For each k ≥ 0 we may take coset representatives a 0 , . . . , a q k −1 for B/B r ′ |π| k (0). Then for i, j ∈ {0, . . . , q k − 1} we have So, T (a 0 ), . . . , T (a q k −1 ) are precisely the q k coset representatives for is the continuous image of a compact set, thus compact, and so closed. So, T (B) = B ′ . This proves surjectivity, and the lemma is proved.
is either the empty set or a ball of radius r ′ /C(i), according as whether is a bijection. It follows that i ∩ T −1 (B) is non-empty, and we may in fact assume that y ∈ i ∩ T −1 (B). Then, as Definition 5.3. Let X ⊆ K be compact-open, and let T : X → X be locally scaling for r ∈ V. Let H = X/B r (0) and C : H → R ≥1 be the scaling function. Then, we define the associated transition matrix to be the map A : H 2 → R ≥0 given by, for i, j ∈ H, Lemma 5.4. Let X ⊆ K be compact-open and T : X → X be locally scaling for r ∈ V; let H = X/B r (0) and let A : H 2 → R ≥0 be the associated transition matrix. Then: By disjoint additivity of µ, it suffices to prove the equality in the case S ⊆ j for some j ∈ H. As the balls form a sufficient semi-ring in the Borel σ-algebra of X, we may in addition assume that S is a ball. Say S = B r ′ (a) for r ′ ≤ r and a ∈ j. Then, by Corollary 5.2 we know that i ∩ T −1 (S) is either the empty set or a ball of radius r ′ /C(i), according as whether i ∩ T −1 (j) is empty or not. Taking measures we get Note that for each i ∈ H, by disjoint additivity of µ along with (ii) we have

(iv):
If T is measure-preserving then for each j ∈ H we have, by disjoint additivity of µ, For the converse we use (i) and disjoint additivity: Proposition 5.5. Let X ⊆ K be compact-open, let T : X → X be a locally scaling transformation for r ∈ V, with Σ = (X, µ, T ) the corresponding measurable dynamical system. Let H = X/B r (0), let A : H 2 → R ≥0 be the associated transition matrix, and w : H → R ≥0 the positive row vector given by Then, there exists a continuous, measure-preserving surjection Φ : Proof. For each n ≥ 0, let π n : X A → H denote projection to the n th coordinate. Let φ : X → H be the canonical projection. Consider the map Φ : X → X A defined by We will prove by induction on the number of slots specified (the "length" of the cylinder set [d 0 . . . d ℓ ]) the claim that the pre-image of the cylinder set [d 0 . . . d ℓ ] is a ball of the same measure as the cylinder set. Note that Φ −1 ([d 0 ]) = d 0 is a ball of the correct measure as By the inductive hypothesis, this is the intersection of two balls, and is thus again a ball; so Φ is continuous. Noting = d 1 and applying claim (i) of Lemma 5.4, along with the inductive hypothesis, we see that this ball has the correct measure As the cylinder sets are a sufficient semi-ring in the Borel σ-algebra of X A , this shows that Φ is measure-preserving.
Note that Φ continuous and measure-preserving implies Φ surjective: X is compact and X A is Hausdorff, so the image must be closed. However, the image must have full measure and so must be dense (w positive implies that all cylinder sets, hence all open sets, have strictly positive measure).
then we have an isomorphism of topological and measurable dynamical systems Moreover, each term in this decomposition is either locally an isometry or ergodic Markov, according as whether #X A k < ∞ or not.
The decomposition of Σ ′ induces the following decomposition of Σ: To complete the proof of the proposition, it suffices to show that (C k , µ k , T k ) ∼ = Σ k for k = 1, . . . , n as topological and measurable dynamical systems, and to classify them as being locally isometries and ergodic Markov in the two cases. We now handle the two cases separately: Case 1: #X A k < ∞ If #X A k < ∞, then the isomorphism (C k , µ k , T k ) ∼ = Σ k follows by definition. Note that the measure on Σ ′ k is necessarily atomic; as it is ergodic, it must in fact be the inverse orbit of a single atom. As T A , hence T A k , is measure-preserving, each of the atoms must have equal measure. It follows that each element x ∈ X A k is of the form (here, C is that from the definition of locally scaling).
So, C k must be a collection of r-balls with C(x) = 1 for x ∈ C k . Then, for x, y ∈ C k with |x − y| ≤ r we have |T (x) − T (y)| = C(x)|x − y| = |x − y|. This shows that Σ k is locally an isometry, as desired. Case 2: #X A k = ∞ If #X A k = ∞, then we claim that Φ induces an isomorphism (C k , µ k , T k ) ∼ = Σ k . In a measure-preserving Markov shift, any atoms must have finite inverse orbit; so Σ ′ k ergodic and #X A k = ∞ implies that µ A k ,w k is non-atomic. Recall that Φ is surjective. We claim that it is also injective. For x ∈ X let d n = π H T n (x) for n = 0, 1, . . .. Then, We have from Proposition 5.5 that each of these pre-images is a ball. Then, Φ −1 (Φ(x)) is the intersection of a nested family of balls. If the intersection contains more than a single point, then the radii of the balls do not go to 0, and so the intersection has non-empty interior and thus positive measure. Now, the measure on X A k is non-atomic, so µ A k ,w k (Φ(x)) = 0. As Φ is measure-preserving, this implies that µ(Φ −1 (Φ(x))) = 0; by the above considerations this implies that Φ −1 (Φ(x)) contains at most one point. So, Φ is injective.
Then, Φ is a continuous, measure-preserving bijection. Observe that Φ takes closed sets to closed sets by compactness, so its inverse is also continuous. This also implies that Φ −1 is measurable, and then Φ measure-preserving implies Φ −1 measure-preserving. So, Φ is an isomorphism of topological and measurable dynamic systems (C k , µ k , T k ) ∼ = Σ k as desired. As the later is ergodic Markov, the former is as well.

Corollary 5.7. Let X ⊆ K be compact-open and T : X → X a measure-preserving locally scaling transformation. If T is ergodic then it is either Markov or locally an isometry. In particular, if it is weakly mixing then it also Markov and so mixing. So, for a measurepreserving locally scaling transformation on a compact-open X, weakly mixing implies mixing.
Proof. If T is ergodic, then the decomposition in Theorem 5.6 must be trivial. So, T must be either Markov or locally an isometry. If it is locally an isometry, then it cannot be weakly mixing. So, weakly mixing implies weakly mixing Markov which in turn implies mixing.
Corollary 5.8. For a locally scaling transformation, the following properties depend only on the associated transition matrix: (i) Measure-preserving; (ii) Weakly mixing, mixing, exact, Bernoulli.
Proof. By Lemma 5.4, the property of being measure-preserving depends only on the associated transition matrix. Note that the decomposition in Theorem 5.6 depends only on the associated transition matrix. Given an associated transition matrix, we have the following cases: (i) The decomposition is trivial, and the system is a local isometry. Then, it is not weakly mixing (or any of the stronger properties listed). (ii) The decomposition is trivial, and the system is ergodic Markov. In this case, the system is determined up to isomorphism by the matrix. (iii) The decomposition is not trivial. In this case, the system is not ergodic and cannot satisfy any of the stronger properties listed.

Polynomial approximation in O
The above results dealt with C 1 functions, extending to polynomial maps as a special case. In the next sections we will be interested in finding polynomial maps with specified associated transition matrices. In preparation for this, we will need some results on the approximation of continuous maps O → K. For the reader's convenience, we will sketch here the definitions and results of [Ami64], slightly simplified for our applications. Say X ⊆ O is compact-open. Moreover, assume that X is a finite union of r-balls for r ∈ V. Then, for r ′ ≤ r each r ′ -ball contained in X is a union of precisely q balls of radius |π|r ′ contained in X. In the terminology of [Ami64], this makes X a regular valued compact (compact valué régulier in the original French).
For k ≥ log |π| r, we may define H k = X/B |π| k (0), and a projection map π k : X → H k . Then, we say that a sequence {u k ∈ X : k ∈ N} is very well distributed (très bien répartie) if for each k ≥ log |π| r, h ∈ H k , and m ≥ 1 we have #{i < m#H k : u i ∈ h} = m.
That is, the terms of the sequence must be equally distributed among the possible values mod p k for k ≥ log |π| r. Note that the condition that the {u k } are very well distributed implies that they are distinct. Now, given such a sequence {u 0 , u 1 , . . .}, we may define the corresponding interpolating polynomials for k ≥ 0: Then, we may summarize some of the results of [Ami64, §II.6.2] as follows: . Then: The a k are determined by (ii); (iv) sup x∈X |f (x)| = sup k∈N |a k |.
A very well distributed sequence {u k } is said to be well ordered (bien ordonnée) if |u n − u m | = |π| vq(n−m) for all n, m ≥ 0 where v q (n − m) is the exact power of q dividing n − m ∈ Z. Following our sources, we will call such a sequence T.B.R.B.O. (très bien répartie bien ordonnée). This allows us to state results of Helsmoortel and Barsky, characterizing Lipschitz and C 1 functions on O in terms of the coefficients in their expansions. This result may be found in [Bar73].
the expansion of f in the sense of Theorem 6.1. For k ≥ 1, define Then: Example 6.3. Note that {0, 1, 2, . . .} ⊆ Z p satisfies the conditions for being a very well distributed sequence, and is in fact trivially T.B.R.B.O. Then, So, in this case the above reduces to the Mahler expansion. More generally: Let a 0 , . . . , a q−1 be a complete set of coset representatives for O/p. For k ∈ N, we will define u k in terms of the base-q expansion of k: Then, say we have n, m ∈ N with n = i≥0 n i q i and m = i≥0 Proof. Claim (i) follows by applying Theorem 6.1(iv) with f = Q k (so that a i = 1 for i = k and 0 otherwise). Claim (ii) follows similarly from Theorem 6.2.
Define a polynomial Then, we may observe that for any x ′ , y ′ ∈ O. Then, We will prove the following two statements, which together with the strong triangle inequality and the previous expression imply our desired result: That this suffices is clear, for the j = 1 term will dominate in valuation.
Observe that Suppose {v n } is a very well distributed sequence. Then, it is easy to check that {v 0 , . . . , v k−1 } must contain precisely q ℓ ′ elements bounded by π ℓ ′ for each ℓ ′ ≤ ℓ. Now, observe that both {u m − u n : n ∈ N} and {u k − u n : n ∈ N} are very well distributed. Let m ′ be the unique index in {0, . . . , k − 1} such that |u k − u m ′ | ≤ 1/κ k ; the very well distributed property of {u n } implies that this is in fact an equality. Our previous count implies that we must have for |u m − u i 1 |, . . . , |u m − u i j−1 | > 1/κ k as {u n } is very well distributed (and so the first k elements must be in disjoint 1/κ k -balls). This completes our proof.

Polynomial maps on O realizing locally scaling transformations
Sections 4 and 5 characterize measure-preserving polynomial transformations on O in terms of locally scaling transformations. However, we have shown the existence of only a handful of such maps. In this section, we will show that in fact the polynomials, in a sense, provide a representative class among the measure-preserving locally scaling maps.
We begin with a lemma giving sufficient conditions for two maps to have the same associated transition matrices.
Then, S is locally scaling for r ∈ V, with scaling function C and associated transition matrix A.
Proof. For 0 < |x − y| ≤ r we have by the strong triangle inequality. Indeed, |T (x) − T (y)| = C(x)|x − y|, which by (i) is strictly greater than |R(x) − R(y)|. So, S is locally scaling for r ∈ V, with scaling function C. Now, it remains to verify that i ∩ T −1 (j) = ∅ ⇔ i ∩ S −1 (j) = ∅ for i, j ∈ H. For this, it suffices to show that T (B r (x)) = S(B r (x)) for all x ∈ O. Indeed, applying Lemma 5.1 and (ii) yields For S ⊆ K we say that T : S → K is affine if it is given by x → ax + b for some constants a, b ∈ K. We say that T : O → K is locally affine if for each x ∈ O there exists a r ∈ V such that T | Br(x) is affine. (ii): Let T ∈ T A . By (i), we may assume that T is locally affine and hence strictly differentiable. Let {u k } be a T.B.R.B.O. sequence in O (which must exist by Example 6.3). Let be the decomposition of T in the sense of Theorem 6.1 Take α ∈ V with α < 1. By Theorem 6.2, κ k |a k | → 0 as k → ∞, so there exists an N ∈ N such that for k > N we have κ k |a k | ≤ α < 1. Moreover, take N such that N > #H. Let Note that f is a polynomial. Set Our bound on κ k |a k | along with the choice N > #H = q log |π| r implies that |a k | < 1/κ N ≤ r for k > N; so Theorem 6.1(iv) implies that |R(x)| ≤ r for all x ∈ O. Lemma 6.4 implies that for all x ∈ O, we observe that we may apply Lemma 7.1 to conclude that f ∈ T A . Note that if is a polynomial such that κ k |b k | ≤ α, then the above argument also shows that f + g ∈ T A . So, there are indeed infinitely many polynomials in T A .
In particular, Theorem 7.2 shows the existence of measure-preserving mixing transformations on the p-adics given by polynomial maps. We can also use this method to compute explicit examples of such maps, but it is not particularly enlightening to do so.

Polynomial Bernoulli maps on O
The construction of the preceding section gives infinite classes of measure-preserving polynomials with different kinds of measurable dynamics. Among these maps are Markov mixing maps. We will now study the class of such polynomials whose associated transition matrix has all entries equal, in which case the Markov transformation is in fact Bernoulli. The main upshot of this study is a class of explicitly given and relatively simple measure-preserving Bernoulli polynomial maps.
Definition 8.1. We say that a measure-preserving locally scaling map T : O → O is isometrically Bernoulli for r ∈ V if it is locally scaling for r ∈ V and all entries of the associated transition matrix are equal.
where µ V is the product probability measure, and T V the left-shift. We may let d ′ V be the quotient metric on V . Then, we may define a metric d V on i≥0 V by d V ((a 0 , a 1 , a 2 , . . .), (b 0 , b 1 , b 2 We give two justifications for this metric: (i) View elements of V as ℓ-tuples under the isomorphism F ℓ q ∼ = V corresponding to π-adic expansion (i.e., the isomorphism induced by the map shown in (ii)). Then, expanding each element in the product to a ℓ-tuple, d V is just the dictionary metric (with base |π|). (ii) For each a ∈ V we may let a ∈ O be a coset representative for the quotient. Then, the map gives a bijection i≥0 V → O. This metric is the unique metric making this map an isometry. Now, the term isometrically Bernoulli is partially motivated by the following: Lemma 8.2. Let T : O → O be a transformation and let ℓ ≥ 1. Then, the following are equivalent: (i) T is isometrically Bernoulli for r = |π| ℓ ; (ii) For all x, y ∈ O satisfying |x − y| ≤ |π| ℓ ,

Proof. (i)⇒(ii):
Let H = O/B r (0), and A : H 2 → R ≥0 the associated transition matrix. Note that if T is isometrically Bernoulli, then each entry of A must be equal, and hence must be equal to 1 #H = ρ(r). Now, T must be locally scaling for r ∈ V, so |T (x) − T (y)| = C(x)|x − y| for |x − y| ≤ r = |π| ℓ . But, we must have ρ(1/C(x)) = ρ(r), so C(x) = 1/r = |π| −ℓ . (ii)⇒(iii): implies that T is locally scaling for r. Letting A be the associated transition matrix, we readily note that all non-zero entries of A must be equal to ρ(|π| ℓ ); as A is a stochastic matrix, this implies that all entries of A are non-zero. Now, let Σ ′ = (X A , µ A,w , T A ) be as in Theorem 5.6. We see that B V = Σ ′ . We observed above that all entries of A are non-zero; then, A is irreducible and Theorem 5.6 gives us a topological and measurable isomorphism Φ : O → X A . Note that the balls of X A with respect to d V are just the cylinder sets. Moreover, one may check that for each m ≥ 0, X A is a disjoint union of q m balls of radius r = |π| m , which must then each have measure q −m = ρ(r). Then, Proposition 5.5 implies that Φ −1 takes balls of a given radius to balls of the same radius; moreover, Φ −1 must take each of the q m distinct balls of radius |π| m in X A to a distinct ball of radius |π| m in O. So each ball of radius |π| m in O must be the pre-image of precisely one ball of the same radius in X A . It follows that Φ and Φ −1 are both isometries.

(iii)⇒(i):
Note that for x, y ∈ O we have Then, for d V (Φ(x), Φ(y)) = |x − y| ≤ |π| ℓ we compute Assume that (i) M = max k≥0 κ k |a k | exists, where κ k is as in Theorem 6.2; (ii) There is a unique k M ≥ 0 attaining this maximum, and moreover it is of the form Then, T is isometrically Bernoulli for r = 1/M ∈ V.
In this context, the polynomials x p and x p −x p are in a sense the most natural isometrically Bernoulli maps: Example 8.6. Take a set of coset representatives for Z p /pZ p . Then, using Example 6.3 we may form a T.B.R.B.O. sequence, and then the p th corresponding interpolating polynomial (and unit multiples of it) will be Bernoulli by Corollary 8.4.
Let's look at the two most common sets of coset representatives for the quotient Z p /pZ p : (i) Take as coset representatives 0, 1, 2, . . . , p − 1. The resulting T.B.R.B.O. sequence is {0, 1, . . .}. Then, P p (x) = x(x − 1) · · · (x − p + 1) and Q p (x) = x p is the p th corresponding interpolating polynomial. (ii) Take as coset representatives 0 and the (p − 1) st roots of unity (there are exactly p − 1 by Hensel's Lemma); these are called the "Teichmüller representatives." Then, So, the polynomials x p and x p −x p (up to unit) are analogs, arising by the same construction from the two most common choices for the coset representatives of Z p /pZ p .
. We may construct a T.B.R.B.O. sequence as in Example 6.3, having 0, 1, 2, . . . , q − 1, t as its first q + 1 terms. Then which is isometrically Bernoulli by Corollary 8.4. However, t − 1, t − 2, . . . , t − q + 1 are all units in O, thus , and e = log |π| |p|. It is a standard result that ef = n. It is evident that the nature of how O compares to Z p depends on the values of e and f . The two extreme cases are f = 1, e = n (in which case we say that the extension is totally ramified ) and e = 1, f = n (in which case we say that the extension is unramified ). We give an example from each of these two extremes. For more background on the relevant theory, including the "standard" results invoked in this paragraph and in the following two examples see [Ser62], particularly Ch. I §7, 8., Ch. III §5, Ch. IV §4.
Example 8.8. Take p > 2 and let K = Q p (ζ p ) where ζ p is a primitive p th root of unity.
Example 8.9. Take f > 1 and let K = Q p (ζ) where ζ is a primitive (p f − 1) th root of unity.
It is a standard result that K is the unique unramified extension of Q p of degree f . So, p may be taken as a uniformizing parameter. Let ζ ∈ O/p be the image of ζ under the quotient map. We note that ζ must generate the residue field extension, i.e., O/p = F p (ζ) = F p [ζ]. So, S = {a 0 + a 1 ζ + . . . + a f −1 ζ f −1 }, with 0 ≤ a 0 , a 1 , . . . , a f −1 < p, is a complete set of coset representatives for O/p. Applying the construction of Example 6.3 we may construct a T.B.R.B.O. sequence whose first q = p f terms are precisely the elements of S, with the following term being p. Then, applying Corollary 8.4 shows that the transformation defined by the polynomial 1 p 0≤a 0 ,a 1 ,...,a f −1 <p (x − a 0 − a 1 ζ − · · · − a f −1 ζ f −1 ) is isometrically Bernoulli for r = |p|.

Bernoulli maps on Z
Define, as usual, We briefly note that the results of this section allow us to produce examples of maps N → Z which extend to Bernoulli maps on Z p for each p, and hence to a Bernoulli map on Z. More explicitly, we obtain the following Proposition: Proposition 9.1. Let a 0 , a 1 , . . . be a sequence of integers satisfying the following conditions for each rational prime: (i) |a k | p = 1 for k = p; (ii) |a k | p < p −⌊log p k⌋ for k > p. Then, f extends to an isometrically Bernoulli transformation f : Z p → Z p for each prime p.
Proof. For each prime p, note that the quantity p ⌊log p k⌋ |a k | p attains its maximum for k = p (and for no other k) and that |a k | p = 1. Then, the result is immediate by Corollary 8.4.
Remark 10.2. Suppose the hypotheses of the Proposition hold. Then, we may compute the value of f (x) (mod p) by performing a careful but easy computation involving cancelling corresponding powers in x(x − 1) · · · (x − n + 1) (henceforth, "the numerator") and n!. The only terms which are not obviously matched are those correspoding to terms divisible by p ℓ . Suppose up ℓ , . . . , (u + a − 1)p ℓ are the terms in the numerator divisible by p ℓ , where now we need not assume that u, . . . , u + a − 1 are coprime to p. Then, The simplest family of maps satisfying the hypotheses of Prop. 10.1 is that in the following Corollary: This corresponds to an f -invariant measure µ on Z p defined on any µ-measurable set S ⊂ Z p by µ(S) = p − 2 p µ (S ∩ B r (0)) + 1 p µ (S ∩ B r (1)) + 1 p µ (S ∩ B r (−1)) .
Then, the map Φ defined in Proposition 5.5 gives a measurable isomorphism of (Z p , µ, f ) with (X A , µ A,w , T A ), where the latter dynamical system is mixing Markov.