Entropy and dimension of disintegrations of stationary measures

We extend a result of Ledrappier, Hochman, and Solomyak on exact dimensionality of stationary measures for $\text{SL}_2(\mathbb{R})$ to disintegrations of stationary measures for $\text{GL}(\mathbb{R}^d)$ onto the one dimensional foliations of the space of flags obtained by forgetting a single subspace. The dimensions of these conditional measures are expressed in terms of the gap between consecutive Lyapunov exponents, and a certain entropy associated to the group action on the one dimensional foliation they are defined on. It is shown that the entropies thus defined are also related to simplicity of the Lyapunov spectrum for the given measure on $\text{GL}(\mathbb{R}^d)$.


Introduction
It was shown by Ledrappier [Led84], Hochman and Solomyak [HS17], that if ν is a probability on the projective space of R 2 which is stationary with respect to a probability µ on SL 2 (R 2 ) with finite Lyapunov exponents, then ν is exact dimensional and its dimension is κ 2χ where κ is the Furstenberg entropy and χ is the largest Lyapunov exponent (hence 2χ is the gap between the two Lyapunov exponents).
Suppose now that µ is a probability on SL 3 (R) and ν is a µ-stationary probability on the space of flags in R 3 (i.e. pairs (L, P ) where L ⊂ P , L is a one dimensional subspace, and P is a two dimensional subspace), which is a threedimensional manifold.
We consider here the two foliations of the space of flags obtained by partitioning into sets of flags sharing the same one dimensional subspace on the one hand, and flags sharing the same two dimensional subspace on the other. These are foliations by circles, and furthermore the action of any invertible linear self mapping of R 3 preserves both foliations.
In this context we show that the conditional measures obtained by disintegrating ν with respect to these two foliations, are exact dimensional. Furthermore we express the dimension of these disintegrations in terms of the gap between consecutive Lyapunov exponents as well as two entropies κ 1 , κ 2 . Before establishing the dimension formula we show that the entropies κ i bound the gaps between exponents from below and therefore, in principle, yield a criteria for simplicity of the Lyapunov spectrum.
We prove our results in a slightly more general context, that of actions of GL(R d ) on the space of complete flags in R d . In this context there are d − 1 associated one dimensional foliations which correspond to "forgetting" the idimensional subspace of all flags for some i ∈ {1, . . . , d − 1}.
We denote by Flags(R d ) the space of complete flags in R d , an element F ∈ Flags(R d ) is of the form F = (S 0 , S 1 , . . . , S d ) where S i is an i-dimensional subspace of R d for each i = 0, . . . , d and S i ⊂ S i+1 for i = 0, . . . , d − 1.
Let Flags i (R d ) denote the space of flags missing their i-dimensional subspace. For a given complete flag F = (S 0 , . . . , S d ) we denote by F i its projection to Flags i (R d ) (i.e. the sequence obtained by removing S i from F ).
We use the notation X (d) = Y for equality in distribution between random elements X and Y . And ν 1 ≪ ν 2 to mean that the probability ν 1 is absolutely continuous with respect to ν 2 .
If X and Y are random elements taking values in complete separable metric spaces (a version of) the conditional distribution of X given Y is a σ(Y )measurable random probability ν Y on the range of X such that for all continuous bounded real functions (here the right-hand side is the conditional expectation of f (X) with respect to the σ-algebra generated by Y ). Such a conditional distribution is well defined up to sets of zero measure but we will abuse notation slightly referring to 'the conditional distribution'.
It is always the case that there exists a Borel mapping y → ν(y) from the range of Y to the space of probabilities on the range of X such that ν(Y ) is a version of the conditional distribution of X given Y . Fixing such a mapping one may speak of ν y for y non-random in the range of Y .
The lower local dimension of a probability measure ν on a metric space at a point x is defined by while the upper local dimension is defined by where B r (x) is the ball of radius r centered at x.
If the lower and upper dimensions of ν are equal to the same constant νalmost everywhere then we say that ν is exact dimensional and define its global dimension dim(ν) as the given constant.

Statement of main results
Suppose that A is a random element of GL(R d ) with distribution µ such that E (|log (σ i (A))|) < +∞ for i = 1, . . . , d, and let F = (S 0 , . . . , S d ) be a random element of Flags(R d ) with distribution ν which is independent from A and such that The existence of such a pair (A, F ) is equivalent to the fact that ν is a µstationary probability, as first defined in [Fur63].
The Lyapunov exponents χ 1 , . . . , χ d of µ relative to ν are defined by the equations where |det S (A)| is the Jacobian of the restriction of A to the subspace S (where the volume measure induced by standard inner product is used on S and its image). In the degenerate case where S = {0} one has |det S (A)| = 1, and if S is one dimensional one has |det S (A)| = A |S .
The Lyapunov exponents given by the multiplicative ergodic theorem of [Ose68] for a product of i.i.d. random matrices of distribution µ are obtained by maximizing the sums χ 1 + · · · + χ i over all stationary probabilities ν as shown in [FK83].
Fix i ∈ {1, . . . , d − 1}, let ν i be the projection of ν to Flags i (R d ), and let ν Fi be the conditional distribution of F given F i .
Theorem 1 (Inequality between entropy and gap between exponents). If ν is the unique stationary probability on Flags(R d ) which projects to ν i then Aν Fi ≪ ν AFi almost surely, and κ i = 0 if and only if Aν Fi = ν AFi almost surely.
Theorem 2 (Dimension of conditional measures). If ν is ergodic, is the unique stationary probability on Flags(R d ) which projects to ν i , and κ i > 0, then almost surely ν Fi is exact dimensional and In the case d = 2 both theorems above are known. A proof of Theorem 1 in this case was first given in [Led84]. In the same work the formula for dimension in Theorem 2 is shown to hold for a slightly different notion of dimension. The exact dimensionality of stationary measures when d = 2 was first proved in [HS17] and this implies the formula above for the same notion of dimension we use here.
Theorem 1 implies that the Lyapunov spectrum is simple (i.e. all exponents are different) if there does not exist a family of conditional probabilities F i → ν Fi satisfying Aν Fi = ν AFi for µ almost every A. This suggests a connection to criteria for simplicity dating back to [GdM89] and [GR89] though we do not explore this issue further here.

Acknowledgment
I am grateful to François Ledrappier for many helpful discussions.
Part I Entropy, Mutual information, and Lyapunov exponent gaps 2 Entropy and mutual information We will define below I(A, AF |AF i ) the conditional mutual information between A and AF given AF i . This is a non-negative σ(AF i )-measurable random variable which may take the value +∞.
The purpose of this section is to prove that: Lemma 1 (Entropy and mutual information). If I(A, AF |AF i ) < +∞ almost surely then Aν Fi ≪ ν AFi almost surely and κ i = E (I(A, AF |AF i )).

Conversely, if Aν
This result reduces the problem of showing that Aν Fi ≪ ν AFi almost surely and that 0 ≤ κ i < +∞ to that of bounding the conditional mutual information between A and AF given AF i .
A general reference covering mutual information including Dobrushin's theorem and the Gelfand-Yaglom-Perez theorem is [Pin64].

Mutual information
Let X and Y be random elements of two Polish spaces X and Y, and denote µ X , µ Y , µ (X,Y ) the distribution of X, Y , and (X, Y ) respectively.
The mutual information between X and Y is defined by where the supremum is over all finite partitions P of X × Y into Borel sets.
Directly from the definition one sees that I(X, Y ) = I(Y, X).
By Jensen's inequality 0 ≤ I(X, Y ) ≤ +∞ with equality to 0 if and only if X and Y are independent. If X takes countably many values and has finite entropy H(X) in the sense of [Sha48] one has I(X, Y ) ≤ H(X).
It was shown in [Dob59] that I(X, Y ) is the supremum over any sequence of partitions which generate the Borel σ-algebra in X × Y. This has the following important corollary: It was shown in [GfY59] and [Per59] whether the right hand side is finite or not.
These results are usually called the Gelfand-Yaglom-Perez Theorem.
In our context, when d = 2, this yields the following result: Proof. The marginal distributions of (A, AF ) are µ and ν respectively. However the conditional distribution of AF given A is Aν.
Hence, the distribution of (A, AF ) is absolutely continuous with respect to µ × ν if and only if Aν ≪ ν almost surely and in this case the Radon-Nikodym derivative between the two at (A, AF ) is given by dAν dν (AF ).

Conditional mutual information
Let F be a σ-algebra of measurable sets in the probability space on which the random elements X and Y are defined.
The mutual information between X and Y conditioned on F is the unique up to modifications on null sets random variable I(X, Y |F ) obtained as above but using the conditional distribution of (X, Y ) conditioned on F . In the case F = σ(Z 1 , Z 2 , . . . , Z k ) we use the notation I(X, Y |Z 1 , Z 2 , . . . , Z k ) = I(X, Y |F ).
One still has 0 ≤ I(X, Y |F ) = I(Y, X|F ) ≤ +∞ almost surely. Almost sure equality to zero occurs if and only if X and Y are conditionally independent given F .
In general there is no relation between I(X, Y ) and I(X, Y |F ) or even E (I(X, Y |F )).
To see this suppose for example that X, Y are i.i.d. taking the values ±1 with probability 1/2 and Z = XY , then one has I(X, Y ) = 0 while I(X, Y |Z) = log(2) almost surely.
On the other hand for any Markov chain X 1 , X 2 , X 3 one has I(X 1 , X 3 |X 2 ) = 0 almost surely, and one may construct examples with The following semi-continuity property holds (in contrast to Proposition 1 here one needs almost sure convergence, also notice that the σ-algebra is fixed throughout): Proof. If f : X × Y → R is continuous and bounded then one has By considering functions f as above in a countable set which is dense in the space of bounded 1-Lipschitz functions, this implies that almost surely the conditional distribution of (X n , Y n ) given F converges to the conditional distribution of (X, Y ) given F .
The result now follows from Proposition 1.

Proof of Lemma 1
We will calculate the marginal distributions and the joint distribution of (A, AF ) conditioned on AF i and apply the Gelfand-Yaglom-Perez Theorem as in Proposition 2.
To begin we simply let µ AFi be the conditional distribution of A given AF i .
By stationarity of ν the conditional distribution of AF given AF i is ν AFi .
For the joint distribution notice that the distribution of AF conditioned on σ(A, AF i ) is the same as conditioned on σ(A, F i ) and therefore it is Aν Fi .
Hence the joint conditional distribution of (A, AF ) given F satisfies (and is determined by the equation) for all continuous bounded f .
By the Gelfand-Yaglom-Perez Theorem if I(A, AF |AF i ) < +∞ almost surely then Aν Fi ≪ ν AFi almost surely and And conversely, if Aν Fi ≪ ν AFi almost surely one has The result now follows by taking expectation.

Proof of Theorem 1
In this section we will prove Theorem 1.
The strategy is to approximate (A, F ) by pairs with the property that the conditional distributions ν Fi are absolutely continuous with respect to the natural geometric measure on their domain of definition.
For the approximating pairs there is a direct relation between the distortion of the conditional measures by a linear mapping A and its determinants on certain subspaces. This argument establishes equality between the entropy κ i and the Lyapunov exponent gap χ i − χ i+1 for the approximating pairs.
The result is then obtained by passing to the limit using the properties of conditional mutual information discussed in the previous section. At this step equality is lost, and one obtains only an inequality between entropy and the lyapunov exponent gap.
An important technical issue is that one must maintain the same conditioning σ-algebra for the approximating pairs and the limit pair (A, F ) in order to apply Proposition 3.
The idea of approximating a probability µ by one whose stationary probability is absolutely continuous with respect to the natural geometric measure is already present in [Fur63, Theorem 8.6].

Jacobians of linear actions on flags
We will now briefly, for the duration of this subsection, abandon the context where A and F are random satisfying AF (d) = F in order to discuss a result for a deterministic transformation A and flag F .
Denote the mapping F → F i which removes from each flag in Flags(R d ) its i-dimensional subspace by π i , and notice that the fibers Flags Fi (R d ) = π −1 i (F i ) are 1-dimensional. We consider on each Flags Fi (R d ) the the unique probability measure η Fi which is invariant under the action of orthogonal transformations which fix F i .
Notice that any element A ∈ GL(R d ) leaves the family of measures η Fi quasiinvariant. We will need the explicit Jacobian of the action of A on this family of measures.
Proof. We begin by proving the case d = 2 (this case is included in the statement of [Fur63, Lemma 8.8] though the proof is omitted there).
In this case F = (S 0 , S 1 , S 2 ) and the only non-trivial subspace is S 1 which has dimension 1 in R 2 . Therefore, we are looking to calculate the Jacobian of the action of A on the projective space of lines in R 2 at the line S 1 with respect to the unique rotationally invariant probability η.
For this purpose consider a unit length vector v ∈ S 1 and an orthogonal vector w of length δ. Let R be the rectangle {sv + tw : s, t ∈ [0, 1]}.
Since we are considering the action of A on projective space, it is equivalent to consider the transformation B = A/|det S1 (A)| = A/|Av| so that Bv has length one.
Notice that BR is a paralelogram with a side in AS 1 of length 1, and area ǫ which is the length of the orthogonal projection of BR onto the subspace orthogonal to AS 1 . Calculating the determinant of B one obtains explicitely Taking the limit as ǫ → 0 we obtain that the derivative of the action of A on projective space at the point S 1 is |det(A)| |detS 1 (A)| 2 from which it follows that We will now show that the general case may be reduced to the two dimensional case.
Notice that the quotient space S i+1 /S i−1 is two dimensional and inherits an inner product from R d which makes it isometric to the orthogonal complement where on the right hand side the space S i is considered as a one-dimensional

Proof of Theorem 1
We return now to the notation and context of the statement of Theorem 1. In particular A and F = (S 0 , . . . , S d ) are independent random elements with distribution µ and ν respectively and such that AF (d) = F . Recall that ν i is the projection of ν onto Flags i (R d ) and ν Fi is the conditional distribution of F given F i .

Representation
To begin we will give a technical argument which informally justifies that the pair (A, F ) may be thought of to have been constructed in the following three steps: 1. A random incomplete flag with the i-dimensional space missing AF i is chosen with distribution ν i (in spite of the notation, at this step A is still undetermined).

A linear mapping
A is chosen with the correct conditional distribution given AF i . At this step F i is determined by the equation 3. A random i-dimensional subspace is added to F i to obtain a complete flag F with the correct distribution.
The advantage of having the pair (A, F ) constructed in this way is that one may construct nearby pairs by perturbing the conditional measures under consideration slightly. We will now justify this picture formally.
For this purpose fix a Borel mapping (u, m) → ρ(u, m) where u ∈ [0, 1], m is a Borel probability on GL(R d ), and ρ(u, m) ∈ GL(R d ), such that if U is a uniformly distributed random variable on [0, 1] then ρ(U, m) has distribution m.
Assume furthermore for any convergent sequence of probabilities m n → m one has ρ(U, m n ) → ρ(U, m) almost surely. Such a representation ρ exists by the main result of [BD83].
In the same way fix a representation (u, m) → ρ Flags (u, m) where this time m is a Borel probability on Flags(R d ).
Suppose that u and v are uniform random variables in [0, 1] such that AF i , u, v are independent.
Let µ AFi be the conditional distribution of A given AF i , and recall that ν Fi is the conditional distribution of F given F i .
Setting B = ρ(u, µ AFi ) and G = ρ Flags (v, ν Fi ) we claim that (B, G) has the same distribution as (A, F ).
To establish the claim first notice that G i = F i and BF i = AF i almost surely. Furthermore by construction (F i , AF i , A) has the same distribution as F i , AF i , B). Hence, it suffices to establish that the conditional distribution of F given F i , AF i and A, is ν Fi .
For this purpose notice that conditioning on (F i , AF i , A) is equivalent to conditioning simply on (F i , A). Since F is independent from A its conditional distribution relative to (F i , A) coincides almost surely with the conditional distribution relative to only F i which is ν Fi . This completes the claim.
In view of the above, to simplify notation we assume from now on that A = ρ(u, µ AFi ) and ρ Flags (v, ν Fi ).

Perturbation
Let {R t , t ≥ 0} be defined so that conditioned on AF i it is a Brownian motion starting at the identity on the group of orthogonal transformations which fix AF i . To clarify dependence on the other random elements we assume {R t , t ≥ 0} is σ(AF i , w)-measurable where w is uniform on [0, 1] and independent from all previously considered random elements. Now for each t ≥ 0 let A t = R t A and notice that A t F i = AF i almost surely and A t → A when t → 0 almost surely.
Lemma 3. For each t > 0 there exists a measurable mapping G i → ν t,Gi from Flags i (R d ) to the space of probabilities on Flags(R d ) such that 1. Almost surely ν t,Fi is supported on Flags Fi (R d ) and is continuous with respect to η Fi .
2. There is a compact subinterval I t ⊂ (0, +∞) such that dνt,F i dηF i takes values in I t almost surely.

Letting
Proof. Notice that whatever the choice of mapping G i → ν t,Gi the conditional distribution of AF t given AF i is absolutely continuous with respect to η AFi . Furthermore, if c t and C t are the minimum and maximum values of the density of the time t of a Brownian motion on the group of rotations of R 2 , then the conditional distribution of F t given AF i has density between c t and C t almost surely relative to η AFi .
Let ν i be the projection of ν onto the space of incomplete flags (missing their i-dimensional subspace) Flags i (R d ).
For each t, the family of probabilities on Flags(R d ) which project to ν i , and whose disintegration over Flags i (R d ) satisfies the above density bounds, is weakly compact. Therefore there is a fixed point of the above proceedure by the Markov-Kakutani fixed point theorem.

Conclusion of the proof
Recall that µ is the distribution of A, ν is the distribution of F , and ν i is the projection of ν to Flags i (R d ).
We consider for each t a mapping G i → ν t,Gi and F t given by Lemma 3.
Notice that the distribution ν t of F t projects to ν i on Flags i (R d ) for all t. Since F t and A t F t have the same distribution for all t and A t → A when t → 0, one obtains that any limit point of ν t as t → 0 is a µ -stationary measure. Hence ν t → ν when t → 0 by the assumed uniqueness.
It follows that for some subsequence t k → 0 the conditionals ν t k ,Fi converge to ν Fi almost surely when k → +∞. Therefore by continuity of the Blackwell-Dubins representation ρ Flags one obtains F t k → F almost surely when k → +∞.
In particular, letting S t,i be the i-dimensional subspace of F t so that F = (S 0 , . . . , S d ) and F t = (S 0 , . . . , S t,i , . . . , S d ), one has S t k ,i → S i when k → +∞.
Setting ϕ t,Fi = dνt,F i dηF i and using Lemma 2 we obtain where for the last equality one uses that F t and A t F t have the same distribution, as well as the fact that A t = R t A where R t is an orthogonal transformation so the determinants of A t and A coincide on all subspaces.
Notice that the in σ d−i+1 (A) 2 · · · σ d (A) 2 ≤ |det Si,t (A)| 2 ≤ σ 1 (A) 2 · · · σ i (A) 2 , so taking logarithms yields and using dominated convergence one has Combining this with Fatou's lemma and the semi-continuity of mutual information yields: In view of this the desired result follows from Lemma 1.

Exact dimensionality and dimension of conditional probabilities
In this part of the article we will prove Theorem 2. We now specify notation and context that will be used throughout.
Recall that µ is a probability on GL(R d ) with respect to which the logarithm of all singular values are integrable and ν is a µ-stationary probability on Flags(R d ).
A dimension i ∈ {1, . . . , d − 1} is fixed throughout, ν i is the projection of ν on the space Flags i (R d ) of incomplete flags missing their i-dimensional subspace. It is assumed that ν is the unique stationary probability with projection ν i .
We consider an i.i.d. sequence (A(n)) n∈Z with common distribution µ and a stationary sequence of random random flags (F (n)) n∈Z with common distribution ν such that A(n + k) · · · A(n)F (n) = F (n + k) for all n ∈ Z and k ≥ 0. We will use S j (n) for the j-dimensional subspace of the flag F (n) and F i (n) as before for the incomplete flag obtained by removing the subspace S i (n).
By hypothesis ν is ergodic (i.e. extremal among stationary probabilities) this implies that the stationary sequence ((F (n), A(n))) n∈Z is ergodic.
As before, Lyapunov exponents χ 1 , . . . , χ d are defined by the equations By Theorem 1 one has Aν Fi(n) ≪ ν Fi(n+1) almost surely and We assume from now on that κ i > 0.

Non-atomicity of conditional measures
Our first step in the proof of Theorem 1 is that ν Fi(n) is almost surely non-atomic (i.e. all points have measure zero).
Lemma 4. Almost surely ν Fi(n) is non-atomic for all n.
Proof. By ergodicity and one has almost surely.

The multiplicative ergodic theorem
From Theorem 1 and the hypothesis that κ i > 0 one obtains that χ i > χ i+1 . We will now apply the multiplicative ergodic theorem of [Ose68] to the mappings induced by the sequence A(n) between the quotient spaces S i+1 (n)/S i−1 (n) to obtain the following result: Lemma 5. Almost surely for each n one has lim k→+∞ 1 k log |det Si(n) (A(n + k − 1) · · · A(n))| = χ 1 + · · · + χ i−1 + χ i , and there exists a unique i-dimensional subspace S ′ i (n) containing S i−1 (n) and contained in S i+1 (n) such that Furthermore, S i (n) and S ′ i (n) are conditionally independent given F i (n), and S i (n) = S ′ i (n) almost surely.
Finally, the logarithm of the angle between the projections of S i (n) and S ′ i (n) to S i+1 (n)/S i−1 (n) is o(|n|) when n → ±∞.
Proof. For each n consider the quotient space V (n) = S i+1 (n)/S i−1 (n) with the induced inner product coming from R d , let E u (n) be the one-dimensional subspace in V (n) which is the projection of S i (n), and let T (n) : V (n) → V (n + 1) be mapping induced by A(n).
Notice that almost surely each V (n) is isometric to R 2 with the usual inner product. Furthermore the random sequence is stationary and ergodic.
One has E log det E u (n) (T n ) = χ i which implies by Birkhoff's theorem that almost surely On the other hand E (log (|det(T n )|)) = χ i + χ i+1 .
By hypothesis κ i > 0 which implies by Theorem 1 that χ i > χ i+1 . Hence, one obtains from the multiplicative ergodic theorem of [Ose68] that almost surely and that almost surely Setting S ′ i (n) to be the subspace in S i+1 (n) which projects to E s (n) in S i+1 (n)/S i−1 (n) one obtains the desired result.
6 Proof of Theorem 2 6.1 Random circle diffeomorphisms We fix from now on a Borel measurable projection from Flags i (R d ) to R 2 which consists of mapping S i+1 /S i−1 to R 2 isometrically (where S j denotes the jdimensional subspace of the flag). Furthermore we fix an isometry between the unit circle S 1 with the usual arc-length distance scaled by one half dist, and the space of one-dimensional subspaces of R 2 with the distance given by the angle. The composition of these mappings will be used to identify each fiber of the projection from Flags(R d ) to Flags i (R d ) with the unit circle. Equivalently, given an incomplete flag F i = (S 0 , . . . , S d ) we have chosen an isometry from the projective space of S i+1 /S i−1 to the unit circle, and therefore each i-dimensional subspace between S i−1 and S i+1 corresponds to a point on the unit circle.
With these identifications let F n = σ(F i (n)), ν n be be the projection of ν Fi(n) to S 1 , x n be the projection of S i (n) to S 1 , y n be the projection of S ′ i (n) (given by Lemma 5) to S 1 , T n the diffeomorphism of S 1 obtained by projecting the action of A(n) between S i+1 (n)/S i−1 (n) and S i+1 (n + 1)/S i−1 (n + 1), and for convenience let κ = κ i and χ = χ i − χ i+1 . Finally, we let η be the rotationally invariant probability on the unit circle.
The proof of Theorem 2 will proceed as follows: We will construct a sequence of random intervals I n containing x n and such that T −1 •· · ·•T −n (I −n ) is roughly of size e −χn . We will then show that ν 0 (T −1 • · · · • T −n (I −n )) is roughly e −κn . These two facts will yield that the local dimension of ν 0 at x 0 is almost surely κ/χ so that in particular that ν 0 is exact dimensional.
A few technical issues arise which we have concealed with the word 'roughly' in the previous paragraph. For example, the estimates for the measure of the intervals will hold only for some values of n, but these values are sufficiently dense to imply the needed dimension estimates.

Stationary intervals
We now construct the sequence of intervals that will be used in our argument.
The key points for what follows are that: the construction is stationary, the intervals contain x n but not y n , their size is controlled by dist(x n , y n ), and frequently ν n (I n ) is not close to zero.
Proof. Since almost surely ν n is non-atomic there is a smallest positive radius r n such that ν n (B rn (y n )) = ν n (S 1 \ B rn (y n )) = 1/2.
Conditioned on F n one has that x n has distribution ν n and is independent from r n and y n . Therefore P(x n ∈ B rn (y n )|F n ) = ν n (B rn (y n )) = 1/2 and taking expected value P(x n ∈ B rn (y n )) = 1/2.
In the event that x n ∈ B rn (y n ) one has that S 1 \ B rn (y n ) ⊂ I n and therefore that ν n (I n ) ≥ 1/2. This proves the claim.
What remains is to estimate the size and ν 0 probability of the sequence T −1 • · · · • T −n (I −n ).

Length of distinguished intervals
The point of what follows is that the intervals T −1 • · · · • T −n (I −n ) contain x 0 and are roughly of size e −χn .
Proof. By Lemma 2 one has for all k.
For each n let J n be the connected component of I n \ {x n } which is counterclockwise from x n .
Since y k−n / ∈ J k−n one has almost surely for all k.
Notice that for each x ∈ S 1 one has, again by Lemma 2, that for some i-dimensional subspace S between S i−1 (k − 1) and S i+1 (k − 1).
In particular this implies that This yields that has finite expectation for all k.

Probability of distinguished intervals
We will now essentially repeat the argument of the previous subsection replacing the rotationally invariant probability measure (which is equivalent to length up to a factor) with the random probabilities ν n .
In this case one wishes to replace (in the ergodic averages) the terms of the form dT k−1 ν k−1 dν k (x k ) with approximating terms calculated using the intervales I n . Almost sure convergence of the approximating terms boils down to the theorem on differentiation of measures. However, the integrability of the supremum of the approximating terms is more subtle.
The issue is that the singular values of A(k − 1) do not directly control the maximum and minimum of dT k−1 ν k−1 dν k (x) on the circle. In fact, this density may be unbounded with positive probability. Instead, control of the approximation comes from the x log(x)-integrability of the density with respect to ν k with follows from the fact that κ < +∞ (that is Theorem 1).

Orlicz regularity and a maximal inequality
For each k let f k (x) = dT k−1 ν k−1 dν k (x) and notice that it is σ(F k−1 , F k , T k−1 )measurable.
The conditional distribution of x k given σ(F k−1 , F k , T k−1 ) has density f k with respect to ν k . Therefore, one obtains .
In particular f k log(f k ) is almost surely integrable with respect to ν k . In other words, f k almost surely belongs to an Orlicz space which is slightly smaller than L 1 (ν k ) and the expected value of the corresponding Orlicz norm is finite. This fact, which follows from the finiteness of κ given by Theorem 1, will allow us to control the maximal function of f k .
We define the maximal function of a function f : S 1 → R with respect to a probability λ as where the supremum is over all intervals containing x.
We will need the following maximal inequality the proof of which is adapted from the proof of [Ste70, Theorem 1].
Lemma 8 (Maximal inequality). There exists a constant C > 0 such that for any probability λ on S 1 and any λ-integrable function f one has Proof. Given λ,f , and t consider a compact set K ⊂ {M λ f > t} such that By definition, each point in K belongs to an interval I such that tλ(I) < |f |1 I dλ.
Since K is compact one may cover it with finitely many such intervals.
Applying the Besicovitch covering lemma (e.g. see [dG75, Theorem 1.1]) there exists a constant c (which does not depend on λ nor f ) such that a subcover may be found so that no more than c intervals intersect simultaneously.
Summing over such a subcover one has This inequality has been established for all λ-integrable f and all t > 0. Applying it to g = f 1 {|f |>t/2} one obtains (observing that which establishes the claim.
We now use Lemma 8 to control the typical maximal function of f k . The argument is adapted from [Nev75, Proposition IV-2-10], see the appendix of said work for discussion of this type of results in the context of general Orlicz spaces.
Lemma 9 (Average maximal function). In the context above one has Proof. Conditioning on ν k−1 , ν k and T k−1 we obtain The lower bound f k log(f k ) ≤ f k log(M ν k f k ) which holds ν k -almost everywhere reduces the problem to showing that the expected value on the right is not +∞.

Domination of approximating terms
We will now establish the main estimate needed to apply Maker's theorem as in Lemma 7. For the needed upper bound Lemma 9 suffices. For the lower bound we mimic the argument of [Chu61].

Probability estimates
Having solved the main technical issues we now repeat the argument of Lemma 7 replacing the uniform measure η with the random measure ν 0 to obtains the desired estimate on the ν 0 -measure of a sequence of intervals shrinking to x 0 .
Notice that for each n the sequence X k,n is stationary and almost surely lim n→+∞ X k,n = log dT k−1 ν k−1 dν k (x k ) .
Furthermore sup n |X k,n | is integrable by Lemma 10.
A technical issue in what follows is that the asymptotic lower bound for ν 0 (T −1 • · · · • T −n (I −n )) just obtained, is bad when ν −n (I −n ) is small. However, in view of Lemma 6, ν −n (I −n ) ≥ 1/2 'half of the time', and this suffices for our needs.

Proof of Theorem 2
Let n 1 < n 2 < · · · be the (random) sequence of values of n for which ν −n (I −n ) ≥ 1/2. By Lemma 6 this occurs with probability at least 1/2 for each fixed n.
Hence, by the ergodic theorem, taking a subsequence we may assume that n k = 2k + o(k) almost surely.
Notice that eventually one has R n k(r) ≤ r ≤ r n ℓ(r) and therefore by Lemma 7 almost surely J k(r) ⊂ B Rn k(r) (x 0 ) ⊂ B r ⊂ B rn ℓ(r) ⊂ J ℓ(r) , for all r small enough.
By intersecting over the corresponding full measure sets for a countable sequence ǫ n → 0 one obtains that almost surely ν 0 is exact dimensional with dimension κ/χ as claimed.