Tight inequalities among set hitting times in Markov chains

Given an irreducible discrete-time Markov chain on a finite state space, we consider the largest expected hitting time $T(\alpha)$ of a set of stationary measure at least $\alpha$ for $\alpha\in(0,1)$. We obtain tight inequalities among the values of $T(\alpha)$ for different choices of $\alpha$. One consequence is that $T(\alpha) \le T(1/2)/\alpha$ for all $\alpha<1/2$. As a corollary we have that, if the chain is lazy in a certain sense as well as reversible, then $T(1/2)$ is equivalent to the chain's mixing time, answering a question of Peres. We furthermore demonstrate that the inequalities we establish give an almost everywhere pointwise limiting characterisation of possible hitting time functions $T(\alpha)$ over the domain $\alpha\in(0,1/2]$.


Introduction
Hitting times are a classical topic in the theory of finite Markov chains, with connections to mixing times, cover times and electrical network representations [5,6]. In this paper, we consider a natural family of extremal problems for maximum expected hitting times. In contrast to most earlier work on hitting times that considered the maximum expected hitting times of individual states, we focus on hitting sets of states of at least a given stationary measure. Informally, we are interested in the following basic question: how much more difficult is it to hit a smaller set than a larger one? (We note that other, quite different extremal problems about hitting times have been considered, e.g. [3].) Following the notation of Levin, Peres and Wilmer [5], we let a sequence of random v.ariables X = (X t ) ∞ t=0 denote an irreducible Markov chain with finite state space Ω, transition matrix P , and stationary distribution π. We denote by µ 0 some initial distribution of the chain and by P µ0 the corresponding law. In the case that µ 0 = x almost surely, for some x ∈ Ω, we write P x for the corresponding law. Given a subset A ⊆ Ω, the hitting time of A is the random variable τ A defined as follows: We shall take particular interest in the maximum expected hitting times of sets of at least a given size. For α ∈ (0, 1) we define T (α) = T P (α) as follows: In other words, T (α) = T P (α) is the maximum, over all starting states X 0 = x ∈ Ω and all sets A ⊆ Ω of stationary measure at least α, of the expected hitting time of A from x.
1.1. The extremal ratio problem. Note the obvious fact that, given 0 < α < β < 1, T (α) is lower bounded by T (β) always. Informally in other words, it is more difficult to hit smaller subsets of the state space. A natural problem then is to determine how much more difficult this is, i.e. how large the ratio between T (α) and T (β) can become. We dub this the extremal ratio problem. Theorem 1.2. Fix 0 < α < β < 1/2. There exists a constant C β > 0 such that the following holds. For any irreducible finite Markov chain, This can be shown via Cèsaro mixing time, specifically as a consequence of an equivalence between T (β) for β ∈ (0, 1/2) and Cèsaro mixing time for any irreducible chain. This equivalence, which was recently proved independently by the third author [7] and by Peres and Sousi [9], we will discuss in more detail in Subsection 1.3. In this paper, we improve upon the above result significantly, without recourse to any results on mixing time. Our first main result implies that the optimal constant in Theorem 1.2 is C β = 1 and that moreover we can include the case β = 1/2. Theorem 1.3. Fix 0 < α < β ≤ 1/2. For any irreducible finite Markov chain, This bound on T (α) is tight: for any 0 < α < β ≤ 1/2, there exists an irreducible finite Markov chain for which the three terms in (⋆) are all equal. Furthermore, β = 1/2 represents a boundary case for Theorem 1.3: for each β > 1/2, there is a class of irreducible finite Markov chains such that T (α)/T (β) is arbitrarily large. Thus we have completely settled the extremal ratio problem.
As an application of Theorem 1.3, we show in Subsection 1.3 how mixing time is equivalent to T (1/2) for any irreducible chain, under the added restriction that the chain is lazy in a certain sense as well as reversible; this resolves a problem posed by Peres [4].
Our strategy for proving Theorem 1.3 relies on a simple, but useful proposition, which can be deduced from the ergodic properties of irreducible finite Markov chains. We require the following definitions. Given two sets A, B ⊆ Ω, we define Proposition 1.4. Given an irreducible Markov chain with finite state space Ω and stationary distribution π, let A, C ⊆ Ω. Then . Problem 1.5. What is the minimal set of constraints on the possible "shape" of the function T (α) over the domain α ∈ (0, 1/2] over irreducible finite Markov chains (on at least two states)?
We show that, in the appropriate limit, the constraints imposed by (⋆) in Theorem 1.3 are the only non-trivial constraints on T (α) over the domain α ∈ (0, 1/2]. (The trivial constraint is that T must be a decreasing function.) We now make this statement rigorous. Let F denote the set of decreasing functions f : (0, 1/2] → R given by f (α) = T (α)/T (1/2) for some irreducible finite Markov chain (on at least two states). We also consider limits of such functions. Let F denote the set of decreasing functions f : (0, 1/2] → R each of which may be obtained as the almost everywhere (a.e.) pointwise limit of functions in F . Our second main result is as follows. We prove this by way of a class of chains we call L-shaped Markov chains, for which the hitting time functions T (α) can be straightforwardly determined. We show Theorem 1.6 in Section 3.
As it turns out, the constraints given by (⋆) for 0 < α < β ≤ 1/2 are not the only non-trivial constraints on T (α) over the larger domain α ∈ (0, 1). We demonstrate this in Section 4. The shape problem over that larger domain remains an interesting open problem.
1.3. The connection to mixing times. To put our results into wider context, we now describe the relationship between Theorem 1.3 and mixing times. Recall that the (standard) mixing time of a chain with state space Ω, transition matrix P , and stationary distribution π is defined as t P mix ≡ min t ∈ N : ∀x ∈ Ω, ∀A ⊂ Ω, |P t (x, A) − π(A)| ≤ This parameter has various connections to the analysis of MCMC algorithms, to phase transitions in statistical mechanics, and to other pure and applied problems [5]. Aldous [1] showed that it is also related to other parameters of the chain, including the following hitting time parameter: x ∈ Ω, ∅ = A ⊂ Ω}. Theorem 1.7. There exists a universal constant C > 0 such that the following holds. Consider a reversible, irreducible finite Markov chain with transition matrix P that is lazy in the sense that P x x ≥ 1/2 for all x in the state space. Then We remark that Aldous proved Theorem 1.7 in continuous time, but there are standard methods to transfer his result to discrete time (cf. [5,Theorem 20.3]).
Aldous's theorem is typically summed up by saying that t P mix and t P prod are "equivalent up to universal constants", or simply "equivalent". A similar equivalence was proved for all irreducible finite Markov chains (not necessarily lazy or reversible), with t P mix replaced by Cèsaro mixing time [2]: A drawback of Theorem 1.7 and its Cèsaro mixing version is that it might seem that the mixing time depends on the hitting times of arbitrarily small sets. On the contrary, it transpires that the maximum hitting times of only sets that are large enough is also equivalent to t P mix and t P Ces (in the analogous senses). The following was proved independently by Peres and Sousi [9] and by the third author [7]. Theorem 1.8. For each α ∈ (0, 1/2), there exists a constant c(α) > 0 such that the following holds. Consider a reversible, irreducible finite Markov chain with transition matrix P that is lazy in the sense that P x x ≥ 1/2 for all x in the state space. Then Moreover, for any irreducible finite Markov chain (not necessarily reversible or lazy), Note that, together with the Cèsaro mixing time form of Theorem 1.7, Theorem 1.2 now follows.
There is no analogue of Theorem 1.8 if one allows α > 1/2: a simple counterexample is given by a random walk on a graph consisting of two large cliques connected by a single edge [8]. Until now, it was not known whether T P (1/2) is also equivalent to t P mix and t P Ces . We prove here that this is the case, answering a question of Peres [4]. Theorem 1.9. There exists a universal constant c > 0 such that the following holds. Consider a reversible, irreducible finite Markov chain with transition matrix P that is lazy in the sense that P x x ≥ 1/2 for all x in the state space. Then Moreover, for any irreducible finite Markov chain (not necessarily reversible or lazy), Proof. By Theorem 1.7 and its Cèsaro mixing time version, it suffices to show that t P prod is equivalent to T P (1/2). But this is simple: on the one hand, if π(A) ≤ 1/2, and the fact that T P (·) is monotone decreasing implies the above inequality also holds if π(A) > 1/2.

1.4.
Organization. The remainder of the article is organised as follows. In Section 2, we prove Theorem 1.3. In Section 3, we show Theorem 1.3 is tight by presenting some two-and three-state Markov chains. We also prove Theorem 1.6 in Section 3. Finally, in Section 4 we consider the behaviour of T (α) over the larger domain α ∈ (0, 1) and make some concluding remarks.

Proofs for Theorem 1.3
We begin by showing that Theorem 1.3 is an easy consequence of Proposition 1.4.
Proof of Theorem 1.3. Consider an irreducible Markov chain with finite state space Ω and stationary distribution π. Fix a state x ∈ Ω and a set A ⊆ Ω with π(A) ≥ α. We prove that .
Since x and A are arbitrary, this will suffice to prove the theorem.
Define the set C = C β A as follows: We claim that π(C) < 1 − β. Indeed, if, on the contrary, π(C) were at least 1 − β, . This would imply, by Proposition 1.4, that π(A) < α, a contradiction. Thus, letting B ≡ Ω \ C, we have established that π(B) > β. Our route from x to A is now clear -proceed from x to B and then on from B to A. See Figure 1. That is, using the Markovian property of the chain, the expected hitting time of A from x may be bounded by as required.
All that remains is to prove Proposition 1.4. As remarked in the introduction, we have replaced our original proof by a shorter and more elegant argument suggested by Peres and Sousi [10]. Our original proof, which may be obtained at http://arxiv.org/abs/1209.0039v1, relied on the ergodic theorem for irreducible Markov chains combined with a martingale concentration inequality.
Proof of Proposition 1.4 [10]. Denote the Markov chain by X. Our approach is to define a distribution µ on A and a distribution ν on C such that Doing so will complete a proof of the proposition. Indeed, re-arranging inequality (2.1), we obtain as required.
We now define the distributions µ and ν to satisfy inequality (2.1). Consider an auxiliary Markov chain on A defined by the following transitions: for each x, y ∈ A, let Q xy be the probability that, started from x, the first state of A hit by X after time τ C is y (i.e. that y is the first state of A hit after the original chain has reached C from x). Let µ denote a stationary distribution of this new chain, and let ν be the hitting distribution on C when the original chain is started from µ, i.e. ν(y) = P µ (X τC = y) for each y ∈ C.
It remains to prove that (2.1) holds for this choice of µ and ν. First observe that, started from the distribution µ, the expected time the chain X spends in A before it reaches C and returns to A is given by E µ [τ ] π(A), where τ denotes the number of steps in such a cycle (from A to C then back to A). This observation is not difficult to verify, but we have included a proof below in Lemma A.1 of the appendix. Next, since all visits to A occur before the chain reaches C, we have that

Examples and a proof of Theorem 1.6
This section is devoted to exhibiting classes of Markov chains which demonstrate that Theorem 1.3 is tight, in a few different senses.
We now turn to the proof of Theorem 1.6. We must prove that each decreasing function f : (0, 1/2] → R satisfying f (α) ≤ 1 α for all α ∈ (0, 1/2) may be obtained as the a.e. pointwise limit of a sequence of functions f 1 , f 2 , . . . in F (i.e. functions f i such that f i (α) = T Pi (α)/T Pi (1/2) for some irreducible finite Markov chain with transition matrix P i ). We first prove this for a certain class of step functions. Then we consider general functions as limits of these step functions in order to obtain the theorem. The class of decreasing step functions f : (0, 1/2] → R we consider are those that may be written in the form where the λ i and α i are positive reals satisfying . . , k}, and 0 < α k < · · · < α 1 < 1/2. We call such a step function hittable. We note that if f is a hittable step function then f (1/2) = 1 and f (α) ≤ 1/α for all α ∈ (0, 1/2). Given a hittable step function f (α) = 1 + k i=1 λ i · 1 α≤αi , we define the ε-error set for f to be the set where we interpret α 0 = 0. The examples of Markov chains we shall use in the proof of the lemma are all of the same type. An L-shaped Markov chain is a chain whose state space may be labelled Ω = {v −1 , v 0 , v 1 , . . . , v k } in such a way that the transition matrix of the chain has non-zero entries only at P i (i−1) , P i i , P (i−1) i , P i 0 for i ∈ {0, 1, . . . , k}. Note that v 0 is the only state that may be reached directly from a non-adjacent . . , k} and α ∈ (0, 1) satisfy Proof. The second equality is obvious since, starting from v −1 , the chain first arrives in the set {v i , . . . , v k } at v i . It is also immediate that T (α) ≥ E v−1 τ {vi,...,v k } , by the definition of T (α) and the assumption that π({v i , . . . , v k }) ≥ α. Thus all that remains is to prove, for any state v j and set A with π(A) ≥ α, that Fix j ∈ {−1, . . . , k} and a set A with π(A) ≥ α. Let i ′ be the minimal nonnegative integer for which v i ′ ∈ A. The condition on α implies that i ′ ≤ i. Now (using the property that E vj [τ v0 ] is maximised at j = −1, and the fact that i ′ ≤ i) we have that Since any path from v −1 to v i necessarily passes through v 0 , the final expression is equal to E v−1 τ {vi,...,v k } , completing the proof.
The intuition of the above lemma (at least for our intended application) is that if v −1 has a very small measure (ε say) then for almost all values of α (except on a set of measure at most kε) we know how to express T (α) directly as a hitting time. This is central to our proof of Lemma 3.1.
We now prove the above assertion by stating explicitly the entries of the transition matrix P . First, we set Next, for each i ∈ {2, . . . , k}, we set Last, we set P 1 1 = 1 − P 1 0 − P 1 2 . It is routine to verify that each entry in the transition matrix P of our Markov chain is in [0, 1] using (3.1), 0 < ε < 1/2 − α 1 , 0 < α k < · · · < α 2 < α 1 , and a large enough choice of N . Some straightforward calculations confirm that the resulting stationary distribution π satisfies condition (i) above. Condition (ii) follows easily from checking that To verify condition (iii) for each i ∈ {1, . . . , k}, we compute the expected hitting time from v i−1 to v i by considering the chain started at v i−1 and conditioning on the first step. We use induction on i. For the base case (i = 1), we have that which implies (after substitution and rearrangement) that E v0 [τ v1 ] = λ 1 N . Next, where the second equality uses the inductive assumption that It is now straightforward to deduce Theorem 1.6. One easily notes that f n (x) → f (x) for all x ∈ (0, 1/2] \ D.
We observe that each f n is a hittable step function, because it can be written where α i = 1/2 − i2 −n , and λ i = f (α i ) − f (α i−1 ). Condition (3.1) is easily seen to hold since To prove the theorem we must find a sequence of functions g n ∈ F such that g n (x) → f (x) except on a set of measure zero. By Lemma 3.1 there exists for each n a function g n ∈ F such that g n (x) = f n (x) for all x ∈ (0, 1/2] \ Err fn (2 −2n ), where We now prove that g n (x) → f (x) as n → ∞ for each x ∈ (0, 1/2] \ (D ∪ D ′ ), where D ′ denotes the set of points that lie in infinitely many intervals of Err fn (2 −2n ). Since D ∪ D ′ has measure zero, this will complete the proof of the theorem.
To this end, fix x ∈ (0, 1/2] \ (D ∪ D ′ ). Since x ∈ D, we have that f n (x) → f (x) as n → ∞. Furthermore, since x ∈ D ′ , there exists n 0 such that and so g n (x) = f n (x) for all n ≥ n 0 . Thus lim n→∞ g n (x) = lim n→∞ f n (x) = f (x), completing the proof of the theorem.

One further result and concluding remarks
For 0 < α < β ≤ 1/2 we proved the tight inequality T (α) ≤ T (β) + (α −1 − 1)T (1 − β) relating hitting times of large enough sets in irreducible finite Markov chains. Furthermore, we demonstrated that this is the only non-trivial restriction on T (α) as a function over α ∈ (0, 1/2], in the sense made rigorous in Theorem 1.6. The most obvious remaining question then is whether there are other non-trivial inequalities relating the values of T (α) for all α ∈ (0, 1). In one further result, we demonstrate that T : (0, 1) → R is further constrained. However, determining the set of all inequalities that hold among the values of T (α) for all α ∈ (0, 1) and thereby giving a characterisation in the spirit of Theorem 1.6 of the possible behaviour of T : (0, 1) → R remains an interesting open problem.
To demonstrate that T : (0, 1) → R is further constrained it suffices to give a single example of such an additional restriction, which is as follows.  Proof. Let y ∈ B. Consider running the chain for 10T steps and denote by p y the probability P y (τ A ≤ 10T ). The assumptions on the hitting time of A imply that Thus p y < 0.111 < 1/8. On the other hand, P y (τ A∪C ≤ 10T ) ≥ 9/10 by Markov's inequality, and so P y (τ C ≤ 10T ) ≥ 9/10 − 1/8 > 3/4.
We may now bound d + (B, C) as follows. Note that, in the event that the chain does not hit C after 10T steps, the expected remaining time to hit C may be bounded by T (an upper bound on expected time to return to B) plus d + (B, C) (an upper bound on the expected time to hit C from an element of B). Thus + (B, C)) .