A Class of Markov Chains with no Spectral Gap

In this paper we extend the results of the research started by the first author, in which Karlin-McGregor diagonalization of certain reversible Markov chains over countably infinite general state spaces by orthogonal polynomials was used to estimate the rate of convergence to a stationary distribution. We use a method of Koornwinder to generate a large and interesting family of random walks which exhibits a lack of spectral gap, and a polynomial rate of convergence to the stationary distribution. For the Chebyshev type subfamily of Markov chains, we use asymptotic techniques to obtain an upper bound of order $O({\log{t} \over \sqrt{t}})$ and a lower bound of order $O({1 \over \sqrt{t}})$ on the distance to the stationary distribution regardless of the initial state. Due to the lack of a spectral gap, these results lie outside the scope of geometric ergodicity theory.


Introduction
Let P = p(i, j) i,j∈Ω be a reversible Markov chain over a sample space Ω, that is, it must satisfy the following detailed balance conditions: where π is a non-trivial non-negative function over Ω. If P admits a unique stationary distribution ν, then 1 i∈Ω πi π = ν.
It can be shown that the reversible P is a self-adjoint operator in ℓ 2 (π), the space generated by the following inner product induced by π f, g π = i∈Ω f (i)g(i)π i If P is a tridiagonal operator (i.e. a nearest-neighbor random walk) on Ω = {0, 1, 2, . . . }, then it must have a simple spectrum, and is diagonalizable via orthogonal polynomials as it was studied in the 50's by Karlin and McGregor, see [3], [10], and [2]. There, the extended eigenfunctions Q j (λ) satisfying Q 0 ≡ 1 and are orthogonal polynomials with respect to a probability measure ψ. If we let p t (i, j) denote the entries of the operator P t that represent t step transition probabilities from state i to state j then where π j with π 0 = 1 is the reversibility measure of P .
We will use the following distance to measure the deviation from the stationary distribution on a scale from zero to one.
Definition 1. If µ and ν are two probability distributions over a sample space Ω, then the total variation distance is Observe that ρ < ∞ if and only if the random walk P is positive recurrent. Recall that ν = 1 ρ π is the stationary probability distribution. If in addition to being positive recurrent, the aperiodic nearest neighbor Markov chain originates at site j, then the total variation distance between the distribution µ t = µ 0 P t and ν is given by as measure ψ contains a point mass of weight 1 ρ at 1. See [6]. The rates of convergence are quantified via mixing times, which for an infinite state space with a unique stationary distribution are defined as follows. Here the notion of a mixing time depends on the state of origination j of the Markov chain. See [7]. Definition 2. Suppose P is a Markov chain with a stationary probability distribution ν that commences at X 0 = j. Given an ǫ > 0, the mixing time t mix (ǫ) is defined as In the case of a nearest-neighbor process on Ω = {0, 1, 2, . . . } commencing at j, the corresponding mixing time has the following simple expression in orthogonal polynomials Investigations into the use of orthogonal polynomial techniques (see [3], [10]) in the estimation of mixing times and distance to the stationary distribution has been carried out in [7] for certain classes of random walks. In this paper we consider the problem from the other direction. Namely given a large class of orthogonal polynomials we outline how to find the corresponding random walk and estimate the rate for the distance to the stationary distribution.
More specifically beginning with the Jacobi polynomials, whose weight function lies in (−1, 1) we use Koornwinder's techniques [5] to attach a point mass at 1. For the class of Jacobi type polynomials Q n thus obtained, the three term recurrence relationship is understood [4]. The tridiagonal operator corresponding to these polynomials is not a Markov chain, however the operator can be deformed to become one. The corresponding changes in the polynomials are easy to trace. This gives a four parameter family of nearest neighbor Markov chains whose distance to the stationary distribution decays in a non-geometric way. In principle the asymptotic analysis presented in this paper can be applied to the entire four parameter family. We outline how this proceeds for Chebyshev-type subfamily consisting of taking α = β = −1/2 in the Koornwinder class.
We would like to point out the important results of V. B. Uvarov [11] on transformation of orthogonal polynomial systems by attaching point masses to the orthogonality measure, predating the Koornwinder results by fifteen years. The results of V. B. Uvarov can potentially be used in order to significantly extend the scope of convergence rate problems covered in this current manuscript.
The paper is organized as follows. In Section 2 we discuss constructing positive recurrent Markov chains from the Jacobi family of orthogonal polynomials adjusted by using Koornwinder's techniques to place a point mass at x = 1. Next, we derive an asymptotic upper bound on the total variation distance to the stationary distribution in the case of general α > −1 and β > −1 in Section 3. Our main result, Theorem 2, is presented in Section 4. There, for the case of Chebyshev type polynomials corresponding to α = β = −1/2, we produce both asymptotic lower and upper bounds for the total variation distance. Finally, in Section 5 we compare our main result to related results obtained by other techniques.
As we have mentioned earlier, the tridiagonal operator H corresponding to the recurrence relation of the orthogonal polynomials may not be a Markov chain operator. Let p i , r i and q i denote the coefficients in the tridiagonal recursion Notice because the polynomials are normalized so that Q i (1) = 1 it follows immediately that p i + r i + q i = 1. However some of the coefficients p i , r i , or q i may turn out to be negative, in which case the rows of the tridiagonal operator A would add up to one, but will not necessarily consist of all nonnegative entries.
In the case when all the negative entries are located on the main diagonal, this may be overcome by considering the operator 1 λ+1 (H + λI). For λ ≥ − inf i r i this ensures all entries in the matrix 1 λ+1 (H + λI) are nonnegative and hence can be thought of as transition probabilities. More generally, if a polynomial p(·) with coefficients adding up to one is found to satisfy p(H) ≥ 0 coordinatewise, then such p(H) would be a Markov chain.

An Asymptotic Upper Bound for Jacobi type Polynomials
In this section we derive asymptotic estimates for the distance to the stationary distribution when our operator given by P λ = 1 λ+1 (H + λI) is a Markov chain. In this case the Karlin-McGregor orthogonal polynomials for P λ are Q j (1 + λ)x − λ and the orthogonality probability measure is 1 1+λ dψ (1 + λ)x − λ over λ−1 λ+1 , 1 , where the Q j are the Jacobi type polynomials introduced by Koornwinder from the previous section.
Of course the new operator P λ is again tridiagonal. For the n-th row of P λ , let us denote the (n − 1)-st, n-th, and (n + 1)-st entries by q λ n , r λ n , and p λ n respectively. Here the entries of P λ can be expressed via the entries of H as follows and q λ n = q n 1 + λ Clearly we still have that p λ n + r λ n + q λ n = 1. With the probabilities in hand we now compute the corresponding reversibility function π λ n of P λ which is equal to the corresponding function of H defined as π n = p0···pn−1 q1···qn . Here π λ 0 = 1 = π 0 and π λ n = p λ 0 ···p λ n−1 q λ 1 ···q λ n = p0···pn−1 q1···qn = π n . Changing variables in (1.1) yields Lemma 1. Consider the case when p n > 0 and q n > 0 for all n ≥ 0, and ∞ > λ ≥ − inf i r i . Then, for the Jacobi type polynomials Q j the distance to the stationary distribution satisfies the following bound for a certain constant C α,β,λ .
Thus I is clearly bounded by the right hand side of (3.1).
For the second term, There we make the change of variables s = − log( x+λ 1+λ ), and for simplicity let x(s) = (1 + λ)e −s − λ. Then the integral reduces to Using the fact that (1−e −s ) α = s α 1+O(s) and 1−λ+(1+λ)e −s ) β = 2 β +O(s), the above integral becomes where the upper bounds O(s) can be made specific. Next, applying the standard asymptotic methods of Laplace to this yields the following asymptotics Thus one can obtain a large enough constant C α,β,λ such that In order to derived effective bounds on ν − µ t T V it is necessary to gain a more detailed understanding of π n and Q n ∞ . When min(α, β) ≥ − 1 2 , the Q n ∞ can be estimated using the known maximum for the Jacobi polynomials found in Lemma 4.2.1 on page 85 of [2] together with Koornwinder's definition of these polynomials.
One way to derive estimates for π n is to use the expression π n in terms of p n , r n , and q n . For Koorwinder's class of polynomials these expressions are derived for all α, β, M, N in [4]. It can be verified directly that in the case when M = 0, then p 0 = 2(α+1) (1+N )(α+β+2) > 0. After taking into account the normilization Q n (1) = 1, and taking into account a small typo, it can be verified from equations (41)-(45) in [4] that p n and q n are positive for n ≥ 1. Thus the conditions for Lemma 1 are satisfied for all α, β > −1. Furthermore, from (18), (19) and (32) in [4] it can be easily seen that p n → 1 2 and q n → 1 2 as n → ∞, and hence r n = 1 − p n − q n → 0 as n → ∞. Thus for λ large enough the operator P λ corresponds to a Markov chain.
As the expressions for these quantities laborious to write down, instead we focus our attention on a specific case in which our calculations are easy to follow. Specifically we focus on the Chebyshev polynomials.

Chebyshev Polynomials: Upper and Lower Bounds
By applying Koorwinder's results to the Chebyshev polynomials of the first kind which correspond to the case of α = β = − 1 2 , we arrive at a family of orthogonal polynomials with respect to the measure 1 1+N 1 π √ 1−x 2 dx + N δ 1 (x) . Using (2.1) we find that here, where T n and U n denote the Chebyshev polynomials of the first and second kind respectively. Notice that U n (1) = n + 1 and T n (1) = 1, which immediately to verify that Q n (1) = 1.

Once again we consider the operator
Specifically the numbers p n , r n , and q n satisfy p 0 P 1 (x) + r 0 P 0 (x) = x for n = 0, and Keisel and Wimp [4] give expressions for p n , r n and q n for n ≥ 0. To find the expressions directly in this case one could use (4.1) to derive three linearly independent equations, and solve for p n , r n , and q n .
For the case n = 0 the equation immediately gives us that p 0 = 1 N +1 and r 0 = N N +1 . Evaluating at convenient choices of x, such as −1, 0, 1, do not yield linearly independent equations for all n. One solution to this is to evaluate at x = 1, −1 and differentiate (4.1) and then evaluate at x = 0. This gives three linearly independent equations and a direct calculation then shows that (4.2) p n = 1 2 · 1 + (2n − 1)N 1 + (2n + 1)N , q n = 1 2 · 1 + (2n + 1)N 1 + (2n − 1)N , and As r n ≤ 0 the operator H fails to correspond to a Markov chain. However this is the case we addressed at the end of Section 2 of the current paper. Thus consider P λ = 1 1+λ (H + λI). Now, since |r n | is a decreasing sequence for n ≥ 1. So provided that λ ≥ |r 1 | = 2N 2 (1+N )(1+3N ) , we then have p λ n , r λ n , q λ n ≥ 0. Thus we can consider these coefficients p λ n , r λ n , and q λ n as the transition probabilities in a nearest neighbor random walk.
Proof. For the upper bound we simply need to estimate the sums appearing in Lemma 1. Since π n = O 1 (n+1) 2 , it is easy to see that the second sum ∞ n=j+t+1 π n is bounded by C N /(t + j + 1). The main term turns out to be the first sum.
In the case of the Chebyshev type polynomials we have the bound Q n ∞ ≤ 4N n + 1. Thus the first sum in Lemma 1 is bounded byĈ α,β,λ,N j log(t+j+2) √ t for an appropriate constantĈ α,β,λ,N . And so, for an appropriate C and large t, On the other hand, recalling that Q 0 (x) = π 0 = 1, we have that: However we have already shown that for large enough t, the above right-hand side is asymptotic toC √ 1+t .
We finish with some concluding remarks. At first the bound Q n ∞ ≤ 4N n + 1 may appear somewhat imprecise since near x = 1, we have that Q n (1) = 1. It is tempting to suggest that the correct asymptotic for the total variation norm is C/ √ t. However on closer examination in the neighborhood of x = 1, Q ′ n (x) ≈ n 3 . This n 3 causes the errors to be at least of the order of the main term. Overall it seems unlikely to the authors that C/ √ t is the correct asymptotic for the Chebyshev-type polynomials.

Comparison to other methods
An ergodic Markov chain P = p(i, j) i,j∈Ω with stationary distribution ν is said to be geometrically ergodic if and only if there exists 0 < R < 1 and a function M : Ω → R + such that for each initial state i ∈ Ω, the total variation distance decreases exponentially as follows In other words, an ergodic Markov chain is geometric when the rate of convergence to stationary distribution is exponential. See [8] and references therein.
If the state space Ω is finite, |Ω| = d < ∞, and Markov chain is irreducible and aperiodic, then P will have eigenvalues that can be ordered as follows In which case, the Perron-Frobenious Theorem will imply geometric ergodicity with where m 2 is the algebraic multiplicity of λ 2 . Here the existence of a positive spectral gap, 1−|λ 2 | > 0, implies geometric ergodicity with the exponent − log |λ 2 | ≈ 1−|λ 2 | whenever the spectral gap is small enough.
When dealing with Markov chains over general countably infinite state space Ω, the existence of a positive spectral gap of the operator P is essentially equivalent to the chain being geometrically ergodic. For instance, the orthogonal polynomial approach in [7]  over Ω = Z + , together with establishing the value of the spectral gap, 1 − r > 0.
As for the Markov chain P λ considered in Theorem 2 of this paper, its spectral measure 1 1+λ dψ (1 + λ)x − λ over λ−1 λ+1 , 1 admits no spectral gap between the point mass at 1 and the rest of the spectrum implying sub-geometric ergodicity. The sub-exponential rate in total variation norm is then estimated to be of polynomial order between 1 √ t and log t √ t . In the field of probability and stochastic processes, there is a great interest in finding methods for analyzing Markov chains over general state space that have polynomial rates of convergence to stationary distribution. In Menshikov and Popov [9] a one dimensional version of Lamperti's problem is considered. There, a class of ergodic Markov chains on countably infinite state space with sub-exponential convergence to the stationary probabilities is studied via probabilistic techniques. One of their results relates to our main result, Theorem 2. Namely, Theorem 3.1 of [9] when applied to our case, implies for any ε > 0 the existence of positive real constants C 1 and C 2 such that Thus for the Markov chain considered in Theorem 2, the orthogonal polynomials approach provides a closed form expression for the difference ν − µ t , and a significantly sharper estimate on convergence of µ t to the stationary distribution ν, for both the single state distance |ν(0) − µ t (0)| and a much stronger total variation norm, ν − µ t T V .

Acknowledgments
We would like to thank Yuan Xu of University of Oregon for his helpful comments that initiated this work. We would also like to thank Michael Anshelevich of Texas A & M for the feedback he provided during the conference on orthogonal polynomials in probability theory in July of 2010. We would like thank Andrew R. Wade of the University of Strathclyde for his helpful comments on the preprint of this paper. Finally, we would like to thank the anonymous referee for the many helpful corrections and suggestions.