Subgroups generated by two pseudo-Anosov elements in a mapping class group. II. Uniform bound on exponents

Let $S$ be a compact orientable surface, and $\Mod(S)$ its mapping class group. Then there exists a constant $M(S)$, which depends on $S$, with the following property. Suppose $a,b \in \Mod(S)$ are independent (i.e., $[a^n,b^m]\not=1$ for any $n,m \not=0$) pseudo-Anosov elements. Then for any $n,m \ge M$, the subgroup $$ is free of rank two, and convex-cocompact in the sense of Farb-Mosher. In particular all non-trivial elements in $$ are pseudo-Anosov. We also show that there exists a constant $N$, which depends on $a,b$, such that $$ is free of rank two and convex-cocompact if $|n|+|m| \ge N$ and $nm \not=0$.


Introduction
This is the second half of our study on subgroups generated by two pseudo-Anosov elements in a mapping class group. We will improve the results we obtained in the first half of the study [4], but one can read this paper independently. We explain the improvement after we state the main results in this section.

Hyperbolic isometry and (quasi-)axis
A geodesic space is called δ-hyperbolic for δ ≥ 0 if for any geodesics α, β, γ which form a triangle, α is contained in the δ-neighborhood of β ∪ γ ( [5]). Let Γ be a δ-hyperbolic graph. Let a be an isometry of Γ. If there exist a point x ∈ Γ and a constant C > 0 such that d(x, a n (x)) ≥ Cn for any n > 0, then a is called hyperbolic.
Suppose a is a hyperbolic isometry. If there exists a geodesic α such that a(α) is contained in the C-neighborhood of α for some C ≥ 0, α is called a quasi-axis of a. By δ-hyperbolicity of Γ, it then follows that a(α) is in the 2δ-neighborhood of α. If α and β are quasi-axes of a (they are geodesics by definition), then they are contained in the 2δ-neighborhood of each other. If C = 0 we say α is an axis. We remark that an axis and even a quasi-axis may not exist for a, but there is always a quasi-geodesic which is invariant by a. That will be good enough for our argument (See section 2.6). In the literature, quasi-axis sometimes means a quasi-geodesic which is invariant by a. Our definition is different.
For two points x, y ∈ Γ, we may denote a geodesic joining them by [x, y] although they are not unique. We may write the distance between the two points by |x − y|.
For an isometry a, we define its translation length, tr(a), by tr(a) = lim n→∞ |x − a n (x)| n ≥ 0 for a point x. It is easy to see tr(a) does not depend on the choice of x. The isometry a is hyperbolic iff tr(a) > 0. It is known ( §7,8 [5]) that a is hyperbolic if there is a point p ∈ Γ such that the following is satisfied. In particular, the element a has infinite order. We remark that if δ > 0 we can replace (δ + 1) by δ.
In section 2 we use a similar geometric idea (Proposition 1) to give a sufficient condition for two isometries to generate a free group.

Convex co-compact subgroup in Mod(S)
Suppose that a finitely generated group G is acting on Γ by isometries. Fix a finite generating set and let |a| be the word metric of a ∈ G. Let x ∈ Γ be a point and consider the map from G to Γ defined by sending a ∈ G to a(x) ∈ Γ. We call this map as the embedding by an orbit of the action by G. If there exist constants L, C > 0 such that for all a ∈ G |a|/L − C ≤ d(x, a(x)) ≤ L|a| + C, then we say the map is quasi-isometric.
Our main application is regarding subgroups of mapping class groups. Let S be a compact orientable surface, and Mod(S) its mapping class group. Let C(S) be the curve graph of S, on which Mod(S) acts by isometries (see for example [10], [16] for the definition).
Masur-Minsky [16] showed that C(S) is δ-hyperbolic and an element a ∈ Mod(S) is pseudo-Anosov if and only if it acts as a hyperbolic isometry on C(S), and moreover that there always exists a quasi-axis.
For a subgroup G < Mod(S), Farb-Mosher [3] introduced the notion of convex-cocompact in terms of the action on Teichmuller space. It has been shown ( [6], [12]) that G is convex-cocompact iff for a point c ∈ C(S), the map from G to C(S) sending g to g(c) is quasi-isometric. Note that the choice of the generating set and the point c is not important.

Main results
In Section 2 we discuss subgroups generated by powers of two hyperbolic isometries on a hyperbolic graph, and obtain sufficient conditions for them to be free. That section is the main technical part of the paper. We put an overview of the argument in Section 2.1. The final result is the following. The point is that although the constant M depends on Γ and the action of G on Γ, it does not depend on a and b. It will become clear how the constant M depends on the action. Theorem 14. Suppose G acts acylindrically on a δ-hyperbolic graph Γ. Then there exists a constant M with the following property.
Suppose a, b ∈ G act hyperbolically. Assume for any p, q = 0, [a p , b q ] = 1 in G. Then for any n, m ≥ M , a n , b m is free of rank two. Moreover, the embedding of a n , b m by an orbit in Γ is quasiisometric. In particular, all non-trivial elements in a n , b m are hyperbolic on Γ.
The acylindricity of an action (see section 2.3 for the definition) is a weak assumption on properness, and in particular, the result applies to a word-hyperbolic group and its action on a Cayley graph, therefore, if G is a word-hyperbolic group, then there exists M such that for any two elements a, b ∈ G of infinite order, either the subgroup a, b is elementary or else for any n, m ≥ M , a n , b m is free and quasi-convex in G. It seems this claim is new (see [5, 8.2 E] for the statement without a bound on n, m). The result also immediately applies to mapping class groups. That is our motivation and we show the following in Section 3.
Theorem 16. Let S be a compact orientable surface, and Mod(S) its mapping class group. Then there exists a constant M (S) with the following property. Suppose a, b ∈ Mod(S) are pseudo-Anosov elements such that [a n , b m ] = 1 for any n, m = 0. (a, b are called independent.) Then for any n, m ≥ M , the subgroup a n , b m is free of rank two, and convex-cocompact in the sense of Farb-Mosher. In particular all non-trivial elements in a n , b m are pseudo-Anosov.
It was known ( [10], [17]) that a n , b m is free for sufficiently large n, m. A uniform bound on n, m is new. In our previous study [4], a uniform bound on one of n, m was shown. Namely, there exists a constant N (S) such that a n , b m is free and convex-cocompact if one of n, m is at least N and the other one is sufficiently large. The previous study has also been used to show the uniform exponential growth of a mapping class group by Mangahas [15].
The theorem concerns only pseudo-Anosov elements a, b. It is unknown if a uniform bound such as M (S) exists for two elements a, b of infinite order in general such that the subgroup a n , b m is free if n, m ≥ M . Note that the subgroup is never convex-cocompact unless both a and b are pseudo-Anosov. In the case that both a, b are Dehn twists ( [8]), and more generally, positive multi-twists ([7, Theorem 3.2]), it is known that a n , b m is free or abelian if n, m ≥ 2. Recently, Leininger and Margalit [14] have shown that if S is the n-times punctured sphere, then for any two elements a, b ∈ Mod(S), a N , b N is either free or abelian for N = n!.
It would be interesting to know for which (n, m), the subgroup a n , b m is free for given a, b in the above theorem. The following theorem says that for given a, b, the subgroup a n , b m is free except for finitely many pairs (n, m). It is not clear if the number of those exceptional pairs (n, m) is bounded, but we know that the constant N depends on a, b in the following theorem (see Example 18).
Theorem 17. Let S be a compact orientable surface and a, b two independent pseudo-Anosov elements. Then there exists N such that for any n ≥ N , both a, b n and b, a n are free of rank two, and convex-cocompact. In particular, a n , b m is free of rank two, and convex-cocompact if |n| + |m| ≥ 2N and nm = 0.
The author would like to thank Z. Sela for many insightful suggestions, and L. Mosher for his interest and comments. He is grateful to M. Bestvina and T. Delzant In this section, we will find sufficient conditions for certain powers of two independent hyperbolic isometries a, b to generate a free group. The final results are Theorem 9 and Theorem 14.
It is well-known that for sufficiently large n, m > 0, a n , b m generate a free group (Proposition 2). The argument is an application of a geometric fact on a δ-hyperbolic space (Proposition 1). The goal of this section is to give an upper bound on n, m which does not depend on a and b.
In Section 2.3, by analyzing the argument for Proposition 2 carefully, we first show that there is an upper bound on n and m if the translation length of a and b are comparable (Proposition 6). A more difficult case is that one of the translation length, say for b, is much smaller than the translation length of a. In this case, if we use the same argument, we need to take the exponent m for b very large so that a n , b m generate a free group. In Section 2.4, we use a different idea to deal with this case and show there is an upper bound on n and m if the translation of one of a and b is much smaller than the translation length of the other (Proposition 7). This part is tedious (the idea is elementary, but we put many details), but we think this is a main technical achievement of the paper.
As we explained, the two propositions are complimentary to each other, and combining them, we obtain an upper bound on the both exponents which does not depend on a and b in Section 2.5 (Theorem 9).
We first prove those results under the assumption that a and b have quasi-axes. Then in section 2.6 we explain that the assumption is indeed redundant, which gives Theorem 14. For our application in this paper, we only need Theorem 9, but we prove Theorem 14 for potential application in the future.
Another technical issue is the properness of an action. We argue under the assumption of acylindricity, which is weaker than the action being properly discontinuous (see Section 2.3). We will need that when we discuss application in Section 3.

Nielsen condition
In this section, we review a well-known fact (Proposition 2) and its proof. We start with a fundamental result from [5] (see §7 and 8).
Proposition 1 (7.2C [5], Three points condition). Let Γ be a δhyperbolic graph. Let ε > 100δ be a constant. Let p i ∈ Γ(i ≥ 1) be points such that for all i ≥ 1 Then, for each i ≥ 3, We call the inequality (2) the three points condition (for p i , p i+1 , p i+2 and a constant ε). If the three points condition is satisfied for any three consecutive points in a sequence, then we say the sequence satisfies the three point condition. Proposition 1 has been used to derive a condition for sufficiently large powers of hyperbolic isometries a, b of Γ with quasi-axes α, β to generate a free group in terms of α, β (Proposition 2). In that argument, it will be important how much of β is contained in the 10δneighborhood of α and vise-versa. We thus define the 10δ-overlap of α and β, denoted by α ∩ 10δ β, as follows.
Let |α ∩ 10δ β| denote the diameter of this set. |α ∩ 10δ β| can be ∞. If it is finite, by the δ-hyperbolicity, the longest segment of α, the longest segment of β and the longest geodesics which are contained in α ∩ 10δ β all have length between |α ∩ 10δ β| − 20δ and |α ∩ 10δ β| + 20δ, and those segments are in the 20δ-neighborhood of each other.
The following fact is well-known ( [5], see also [13]). Notice that the exponents n and m which satisfy the inequalities depend on a and b. For readers who have not seen a proof, we give details of the argument, since we will generalize the statement using the same idea.
Proof. We use Proposition 1 with ε = 100(δ + 1). We first show that a n , b m is free, then argue independently the moreover part. It turns out that for a certain choice of x, the embedding is not only quasiisometric, but also bi-Lipschitz, which implies that a n , b m is free. In that sense, the first part is not necessary, but we hope it will make the whole argument more transparent in this way.
Set A = a n , B = b m .
We remark that α, β are quasi-axes of A, B, respectively. Let w be a non-empty reduced word on A, B and we prove that the action of w on Γ is non-trivial, therefore, w is a non-trivial element in Isom(Γ). It suffices to find a point p ∈ Γ with w(p) = p. (In this proof the point p does not depend on the word w).
Suppose |α ∩ 10δ β| = 0. Let ℓ be a geodesic segment which realizes the distance between α and β, and p ∈ ℓ the mid point. (See Figure  1.) We claim w(p) = p. To see it, let where n 1 , m i are possibly 0. We discuss the case such that both n 1 , m i are not 0. Set Then the the sequence of points p j satisfies the inequalities (2) in Proposition 1. Indeed, for example, for p 1 , p 2 , p 3 , if we apply the element B −m 1 A −n 1 , then we get B −m 1 (p), p, A n 2 (p). Since m 1 , n 2 = 0, those three points satisfy the three point condition by the δ-hyperbolicity of Γ. Now, by Proposition 1, we get |p 0 − p 2i | > 0, namely |p − w(p)| > 0. We can argue similarly if n 1 or m i is 0, and omit the details. Suppose |α ∩ 10δ β| = D > 0. Let ℓ be the longest geodesic which is contained in α ∩ 10δ β. Then D − 20δ ≤ |ℓ| ≤ D + 20δ. Let p be the mid point of ℓ. Define the points p j in the same way as before, then by using the assumption on tr(a n ) and tr(b m ), we get |p − w(p)| > 0 by Proposition 1. Now we argue that the embedding by the orbit of a point x is quasi-isometric with respect to the word metric for a n , b m . We remind that the choice of the point x is not important. Since a n , b m is finitely generated, the embedding is always Lipschitz with respect to the constant max(|a n (x) − x|, |b m (x) − x|). We will show that there exist a point x, constants L > 0, C ≥ 0 such that for any non-trivial reduced word w on a n , b m , where |w| is the word metric with respect to a n , b m . Indeed we will The argument is a modification of the previous one, so that we use Proposition 1. For each j set They are geodesic segments. Note that Thus the right hand side of the inequality (3) is proportional to the word length of w when we vary w. The left hand side of the inequality is λ|p 0 − p 2i | = λ|p − w(p)|, so that if there is an upper bound on λ > 0 when we vary the word w, then we would be done. But since λ = max j |p j − p j+1 |, λ can be arbitrarily large when we vary w.
As a remedy, we will divide each of geodesic segments A j , B j , namely, introduce certain points on them such that there is an upper bound on the distance between any two consecutive points, then apply Proposition 1 to this new sequence of points. The point is that introducing new points does not change the property that the right hand side of the inequality (3) is proportional to |w|, while the constant λ for the new sequence will have an upper bound.
We start the argument. Again, we discuss the case that both n 1 , m i are not 0. First, between p 0 and p 1 , if n 1 > 0, define points by and if n 1 < 0, define the points p 0,k for 0 ≤ k ≤ −n 1 similarly using actions by A −1 , A −2 , · · · , A −n 1 on p. Next, between p 1 and p 2 , if m 1 > 0, define points by and if m 1 < 0, define the points similarly as follows.
We define points similarly between p 2j and p 2j+1 , and also between p 2j and p 2j−1 for all j. We obtain a sequence of points p j,k with the canonical order (the lexicographical order on (j, k)). By definition, the distance between any two consecutive points is either tr(A) or tr(B). Also, by our assumption, the sequence of points satisfies the three points condition of Proposition 1 such that λ = max(tr(A), tr(B)), which no more depends on w. Here, we regard, for example, p 0,n 1 and p 1,0 are the same point, which is p 1 . By the proposition (for the first inequality), we get The same bound holds if n 1 or m i is 0 as well. We have shown that the embedding is bi-Lipschitz, for this particular choice of a base point and a generating set.
We are interested in finding an upper bound on n, m > 0 such that a n , b m is not free under some condition on the action of G on Γ, provided that D = |α ∩ 10δ β| < ∞. Before we discuss that, we analyze the case when D = ∞, namely, hyperbolic isometries a and b have a common quasi-axis. For example, suppose G is a word-hyperbolic group and Γ is a Cayley graph, which is δ-hyperbolic. Then D = ∞ implies that [a, b k ] = 1 for some k > 0 and [b, a l ] = 1 for some l > 0, where the commutator of two elements is defined by To see the first claim, take a point x ∈ α, the common quasi-axis, and look at the set of points They are all in the 20δ-neighborhood of x. Since the action of G is proper, there are only finitely many elements g ∈ G with |x − g(x)| ≤ 20δ, therefore there must be distinct integers n, m > 0 such that Notice that in the previous argument, we do not need that D is infinite, but it is enough if D is sufficiently large. To formulate a precise statement (Lemma 5), we consider a certain condition, acylindricity, on the action in the next section.

Acylindrical action
In this section, we assume certain properness of an action, acylindricity, and improve Proposition 2 to Proposition 6.
Let Γ be a δ-hyperbolic graph, and G a group acting on Γ by isometries. Bowditch [1] defined that the action is acylindrical if for any R > 0, there exist K(R), L(R) ≥ 1 such that for any vertices x, y ∈ Γ with d(x, y) ≥ L(R), the following set has at most K(R) elements: Lemma 3. Suppose G acts on a δ-hyperbolic graph Γ. If the action is acylindrical with constants K(R), L(R), then there exists an integer P ≥ 1 such that for any element a ∈ G which acts hyperbolically on Γ with a quasi-axis, we have tr(a P ) ≥ 1. The constant P depends only on δ and K(200δ).
Convention 4 (Subscript of a constant). To keep track of constants, we may number a constant by the number of the claim which the constant first appears, for example, the constant P in Lemma 3 will be P 3 . We may omit the subscript if there is no confusion.
Proof. If δ = 0, Γ is a tree. Then tr(a) ≥ 1. Set P = 1. Suppose δ > 0. Set R = 100δ. Let α be a (geodesic) quasi-axis of a. Take a point By the acylindricity, this implies that there is .
Choose an integer P such that P ≥ K(200δ) 90δ .
Lemma 5. Let Γ be a δ-hyperbolic graph, and G a group acting on Γ acylindrically with constants K(R), L(R). Suppose a, b ∈ G act hyperbolically with quasi-axes α, β ⊂ Γ, respectively. If a n b = ba n for all n = 0 or b n a = ab n for all n = 0, then By Convention 4, P 3 is the constant from Lemma 3.
Proof. To argue by contradiction, suppose that the inequality was false. Set K = K(20δ), L = L(20δ). For concreteness, suppose tr(b) ≤ tr(a). By our assumption, since |α ∩ 10δ β| is much larger than 2δ, the set α ∩ 10δ β looks like a narrow tube. Let ℓ ⊂ α be the longest segment which is contained in α ∩ 10δ β. Then, by our assumption, |ℓ| ≥ 4P KLtr(a) + 80δ. Take a point p ∈ ℓ such that the following points are in ℓ (See Figure 2 Since y = a P KL (x), and by Lemma 3, Figure 2: Apply the acylindricity to the pair x, y.
We first consider the special case that δ = 0, namely, Γ is a tree. Then, α ∩ 10δ β coincides the segment α ∩ β, and also the segment ℓ, therefore, all above points a n (p), 1 ≤ n ≤ 4P KL, are in α ∩ β. We , but this is obvious since when we apply If δ > 0, we can show that the point moves in the 10δ-neighborhood of ℓ when we apply a i , b, a −i followed by b −1 to x. Therefore, we get d(x, [b, a i ](x)) ≤ 20δ by estimating the error terms from the tree case using triangle inequality. We leave the details to readers. (See Figure  3.) We can show d(y, [b, a i ](y)) ≤ 20δ in the same way. We got the claim.
Since |x − y| ≥ L = L(20δ), by the acylindricity of the action, it follows from the claim that there are at most K distinct elements in the We get [b, a n ] = 1 for some n = 0. The same argument applies to the elements [a, b i ] since tr(b) ≤ tr(a), therefore we also get [a, b n ] = 1 for some n = 0 as well. This is a contradiction.
Combining Proposition 2 and Lemma 5, we obtain the following. This says that we can find a global bound on one of the exponents, but the other bound depends on the ratio tr(a)/tr(b). Proposition 6. Let G be a group which acts on a δ-hyperbolic graph Γ acylindrically with constants K(R), L(R). Then, there exists a constant N = N 6 ≥ 1 with the following property. N depends only on δ, K(20δ), L(20δ) and K(200δ).
Suppose a, b ∈ G act hyperbolically with quasi-axes. Assume [a n , b m ] = 1 for all n, m = 0. Suppose there exists a number q ≥ 1 such that Then, for any n ≥ N and m ≥ qN , a n , b m is free. Moreover, the embedding of a n , b m in Γ is quasi-isometric.

Another condition for freeness
In this section we discuss the case when tr(b)/tr(a) is small as opposed to Proposition 6. We also apply Proposition 1 to a certain sequence of points, but we need a slightly different idea to construct the sequence from a given word w. The argument is elementary but lengthy, and takes most part of this section.
If n ≥ N , then g, f n is free of rank two. Moreover, the embedding of g, f n in Γ by an orbit is quasi-isometric.
A few remarks are in order before the lengthy proof. Unlike Proposition 6, the roles of two elements f and g are not symmetric. For example, in the conclusion we take powers only for f . Also, when we construct sequence of points p i in the argument, we detect the action of a = f n more closely than the action of b = g, since a moves a base point much more than b does. That is summarized as the condition (*) in the claim. As usual, in Part 1, we first show the subgroup g, f n is free, by constructing a certain sequence of points p i which satisfies the three points condition because of the condition (*). Then in Part 2, we show that the embedding of the subgroup by an orbit is quasi-isometric by interpolating the points in the sequence p i as before. Part 2 is most complicated in the paper.
Proof. First of all, D < ∞ by our assumption. As in the proof for Proposition 2, the argument is slightly different if D = 0 than the case D > 0, and from the view point of Proposition 1, it is easier than the case D > 0. So, we discuss the case D > 0 in detail, and then discuss the case D = 0 briefly.
Assume D > 0. Part 1. As usual, we first prove that g, f n is free. Set K = K(20δ), L = L(20δ). This is the only place where constants K(R), L(R) are used. Set There exists a constant N , which depends only on K, L, P, δ, such that if n ≥ N , then tr(f n ) ≥ 1000E. We used tr(f ) ≥ 1/P and 2tr(f ) ≥ D. This is the only place the condition 2tr(f ) ≥ D is used. Fix such constant N . Note that N ≥ 10000 by the definition of E.
For n ≥ N , set a = f n , b = g. We will show a, b is free. Let w be a non-empty reduced word on a, b, and we show w is non-trivial in Isom(Γ). There are three cases according to the form of w. In the case (O) it is clear that w = 1 in Isom(Γ).
(I) w = a n 1 b m 1 · · · a n i b m i (i ≥ 1) such that n 1 = 0, n i = 0 and m i is possibly 0, (II) w = b m 0 a n 1 b m 1 · · · a n i b m i (i ≥ 1) such that m 0 = 0, n i = 0 and m i is possibly 0.
For (I) and (II), we can find a point p ∈ Γ such that |p − w(p)| > 0, therefore w is not 1 in Isom(Γ). The argument is very similar to each other, so we only discuss the case (I) in detail. Assume we are in the case (I). Let m be the mid point of a longest segment ℓ which is contained in α ∩ 10δ β. We can take a point x ∈ α and a point y ∈ β such that |x − y| ≤ 2δ and |m − x|, |m − y| ≤ 4δ. It would be easier to follow the discussion if we imagine that α and β coincide in ℓ and that m = x = y, although there are actually errors of the order of δ.
To show |x−w(x)| > 0, we interpolate x and w(x) by the following points, which gives a sequence satisfying the three points condition. s 0 = y p 1 = x, q 1 = a n 1 (x), r 1 = a n 1 (y), s 1 = a n 1 b m 1 (y), p 2 = a n 1 b m 1 (x), q 2 = a n 1 b m 1 a n 2 (x), r 2 = a n 1 b m 1 a n 2 (y), s 2 = a n 1 b m 1 a n 2 b m 2 (y) · · · p i = a n 1 b m 1 · · · a n i−1 b m i−1 (x), q i = a n 1 b m 1 · · · a n i−1 b m i−1 a n i (x), r i = a n 1 b m 1 · · · a n i−1 b m i−1 a n i (y), s i = a n 1 b m 1 · · · a n i−1 b m i−1 a n i b m i (y), Claim. The following conditions are satisfied for all 1 ≤ j.
We verify those later, and proceed to show the set of the conditions (1)-(5) implies that the sequence {p j } satisfy the three points condition for 990E, therefore we get |p 1 − p i+1 | > 0, namely, |w(x) − x| > 0.
For j ≥ 1, we define The conditions (3),(4),(5) imply that d(C j , B j ) ≤ 2E for all j. It follows using (3) and (4) . This is because, for each j, the geodesic quadrilateral with the corners p j , q j , p j+1 , q j+1 is 2δ-thin (see Figure 4, 5). Therefore, for all 1 ≤ j ≤ i, |D j ∩ 10δ D j+1 | ≤ 5E. It implies that for all 1 ≤ j ≤ i − 1, We checked the three points condition for the constant 990E and the sequence {p j }, therefore by Proposition 1, since E > 100δ, we get that We are left to verify the condition (*). (1) follows from |x−y| ≤ 2δ. Since n j = 0 and |A j | = tr(a n j ), we have |A j | ≥ tr(a) ≥ 1000E. We get (2).
To show (5), suppose not, i.e., |A j ∩ 10δ A j+1 | > E for some j ≥ 1. Set w j−1 = a n 1 b m 1 · · · a n j−1 b m j−1 , w j = a n 1 b m 1 · · · a n j b m j (If j = 1, then set w 0 = 1.) Then A j is contained in w j−1 (α), which is a quasi-axis of w j−1 f w −1 j−1 , and also A j+1 is contained in w j (α), which is a quasi-axis of w j f w −1 j . Let ℓ = [u, v] ⊂ A j+1 be a geodesic of length E which is contained in A j ∩ 10δ A j+1 . For concreteness, suppose the point u is mapped toward v by (sufficiently big) positive powers of w j−1 f w −1 j−1 , and the point v is mapped toward u by (sufficiently big) positive powers of w j f w −1 j . (Otherwise we take the inverse of the elements in the following.) The direction of the actions makes sense since the set A j ∩ 10δ A j+1 looks like a long narrow tube, which contains ℓ. Note that tr(w j f w −1 j ) = tr(w j−1 f w −1 j−1 ) = tr(f ) and 10tr(f )P KL ≤ |ℓ|. (See Figure 6.) We apply the following elements Figure 6: to the point v.
Each of the elements moves v at most 10δ. Set v ′ = w j f P KL w −1 j (v). Then, each of the above elements also moves v ′ at most 10δ. On the other hand |v − v ′ | = P KLtr(f ) ≥ P KL/P ≥ L. By the acylindricity, there must be 1 ≤ I < J ≤ P KL such that Since w −1 j−1 w j = a n j b m j , this implies that [f I−J , a n j b m j ] = 1. Since a = f n , it follows that [f I−J , b m j ] = 1. This is a contradiction (since b = g). We showed (5). The case (I) is completed.
For the case (II), we show |y − w(y)| > 0. The argument is similar, and we omit the details. So far, we have shown that g, f n is free. Part 2. Now, we show that the embedding of g, f n in Γ by an orbit is quasi-isometric. Our situation is same as in the proof of Proposition 2. Since the upper bound is trivial as before, we prove a lower bound. Note that it is enough to get a desired uniform lower bound for the case (O), (I) and (II) separately. The case (O) is trivial, and the argument is similar for (I) and (II). We only discuss the case (I) in detail.
As in the proof of Proposition 2, the reason why the above argument does not give a desired uniform lower bound in terms of |w| is that |p j+1 − p j |, the length of D j , is unbounded when we vary w. As before, we introduce interpolating points such that there is a uniform (for all w) upper bound on the distance between two consecutive points, and the three points condition is satisfied for ε = E > 100δ. Between p j and q j for each j, we define points using the action of f n = a as follows: if n j > 0, define (see Figure 7) p j = p j,0 = a n 1 b m 1 · · · a n j−1 b m j−1 (x), p j,1 = a n 1 b m 1 · · · a n j−1 b m j−1 a(x), · · · p j,n j = a n 1 b m 1 · · · a n j−1 b m j−1 a n j (x) = q j .
Note that for each j, the distance between any two consecutive points is tr(a). If n j < 0, then define points similarly by p j,k = a n 1 b m 1 · · · a n j−1 b m j−1 a −k (x) for 0 ≤ k ≤ −n j .
Next, we also define points between r j and s j . In order to choose them with control on the distance between two consecutive new points, we fix an integer Q ≥ 1 such that Such Q, which depends on n, exists since tr(b) ≤ tr(f ) ≤ tr(a) 100 . This is the only place where the condition tr(g) ≤ tr(f ) is used. (Remember Figure 7: New points between p j , q j (n j > 0) b = g.) Note that we use this Q for all j. To define points, we write (uniquely) for each j, For each j, define points as follows between r j and s j : if m j ≥ 0 and o j ≥ 2 then (see Figure 8) r j = r j,0 = a n 1 b m 1 · · · a n j−1 b m j−1 a n j (y), r j,1 = a n 1 b m 1 · · · a n j−1 b m j−1 a n j b Q (y), · · · r j,o j −1 = a n 1 b m 1 · · · a n j−1 b m j−1 a n j b (o j −1)Q (y) r j,o j = a n 1 b m 1 · · · a n j−1 b m j−1 a n j b m j (y) = s j .
Here, if o j = 0, then the distance between any two consecutive points is tr(b Q ) except for the last pair of points, and it is between tr(b Q ) and 2tr(b Q ) for the last pair. If o j = 1, we do not produce any new points. In this case, tr(b Q ) ≤ |r j − s j | ≤ 2tr(b Q ). Also, if o j = 0, then we do not produce any new points. In that case, |r j −s j | < tr(b Q ).
If m j < 0, we define points similarly, by r j,k = a n 1 b m 1 · · · a n j−1 b m j−1 a n j b −kQ (y) Otherwise we do not produce new points.
In this way, we obtain a new set of points {p j,k , r j,k } with the order which it naturally inherits from the order on the sequence p 1 , q 1 , r 1 , s 1 , · · · , p i , q i , r i , s i .
The distance of any two consecutive points in this new sequence is at most tr(a). But as a trade-off, the new sequence may not satisfy the r j s j r j,0 r j,1 r j,oj r j,oj−1 Figure 8: New points between r j , s j (m j > 0) three points condition any more. Therefore we modify the sequence by removing points. First, remove all r j . (r j is at most 2δ-close to q j .) Next, remove all s j except for the last point s i . (s j is at most 2δ-close to p j+1 .) Finally, if o j = 0, then remove q j . We get a subsequence of points with order, which we denote by {u k }.
See Figure 9,10,11. In those figures, two consecutive points u k , u k+1 are joined by a solid line, where an interval by a thin solid line appears after we remove points while an interval by a thick solid line exists before we remove points. The intervals of dashed line do not exist because we removed points. Removed points are described by white dots in the figures. Now, we argue that the sequence {u k } satisfies the three points condition of Proposition 1, and that there is a uniform upper bound on the distance between any two consecutive points in the sequence. In the following argument, it would help to keep in mind the following δ << E << tr(b Q ) << tr(a).
First, we remark that we removed points such that the distance of any two consecutive points in {u k } is in one of the following intervals, in particular there is a uniform upper bound: (i) between 4tr(a) 5 and 6tr(a) 5 (ii) between tr(a) 100 and tr(a) 10 . Moreover, (iii) the sequence of intervals (between two consecutive points) starts with |n 1 | intervals of length (i), followed by |o 1 | intervals of length (ii), followed by |n 2 | intervals of length (i), followed by |o 2 | intervals of length (ii), · · · , |n i | intervals of length (i), then |o i | intervals of length (ii) at the end. In particular, each interval has length ≥ tr(a) 100 ≥ 10E ≥ 1000. This is because removing r j and s j changes the distance between two consecutive points at most 4δ. And removing q j happens only when o j = 0, therefore |r j − s j | ≤ tr(b Q ) ≤ tr(a) 50 . In conclusion, the distance between any two consecutive points in the subsequence is at most tr(a) + 4δ + tr(a) 50 ≤ 6 5 tr(a). Next, we argue that the sequence {u k } satisfies the three points condition of Proposition 1 for the constant E. This is by δ-hyperbolicity, and essentially by the same reason as the sequence D j satisfies the condition in the previous discussion(see Figure 4,5). If o j ≥ 1 (Figure 10, 11), the three points (in other words, two consecutive intervals) condition is nearly obvious since an interval of thin line is 10δ-neighborhood of the corresponding thick dashed line interval. The less obvious case is when o j = 0. See Figure 9. We explain why the three points condition is satisfied for u k−1 = p j,|n j |−1 , u k = p j+1 , u k+1 = p j+1,1 . It suf- 1000 and |r j −s j | ≤ tr(a) 50 . The first inequality is by one of the conditions we obtained, which is |A j ∩ 10δ A j+1 | ≤ E for all j. Therefore, since |r j − q j | ≤ 2δ, the diameter of the union of those two sets is smaller than 39 100 tr(a), which shows our claim. We verified the three points condition for E for the sequence {u k }, which starts at x and ends at w(x). By Proposition 1 (for the first inequality below), we get The second inequality follows from the remark (iii) above. We have the third inequality because |n j | ≥ 1 for all j, and the fourth one because Q(|o j | + 1) ≥ |m j | for all j. Moreover, since |u k − u k+1 | is at most 2tr(a), we have λ ≤ ( E 100 − δ) −1 2tr(a) = λ 0 for all w. Set L = Qλ 0 500 , then we get L|w(x) − x| ≥ |w| for all w. L depends on a, b but not on w since so does Q. We completed the case (I). Note that we showed that the embedding is bi-Lipschitz for the point x and the word metric |w| for the collection of words w of the case (I).
The case (II) is similar, and we omit the details. We finished the argument under the assumption that D > 0.
Finally, assume that D = 0. The argument will be essentially same as the case that D > 0, so we discuss only the part which is different (see the proof of Proposition 2. We discussed the case D = 0 then the case D > 0 as well). Take the constants K, L, P, E, N as before. Set a, b as before. Now let ℓ be a geodesic which realizes the distance between α, β. Let m, x, y be the midpoint of ℓ. Then argue as before in the rest. We define segments A j , B j which satisfy the conditions (1)- (5). (In this case, q j = r j and p j = s j−1 .) This case is easier in the sense that since α∩ 10δ β = ∅, we have A j ∩ 10δ B j = ∅, B j ∩ 10δ A j+1 = ∅, therefore the union of those geodesics, which we need to analyze in terms of metric, looks nearly like a tree with those segments as edges. We omit the details.
Before we state a main theorem, we state a proposition which can be shown similarly to Proposition 7. The conclusion is weaker since N 8 depends on f, g, but we do not require the conditions 1 and 2 in Proposition 7 regarding tr(f ), tr(g).
Then there exists a constant N 8 = N > 0, which depends on f, g, such that if n ≥ N , then g, f n is free of rank two. Moreover, the embedding of g, f n in Γ by an orbit is quasi-isometric.
Proof. The argument is very similar to the proof of Proposition 7, and easier since the difficult part was to obtain a uniform constant N 7 for all f, g. We only indicate where we need to modify the argument.
Choose constants K, L, P in the same way. Let D = |α ∩ 10δ β|. By Lemma 5, D < ∞. Define the constant E in the same way. Now fix a constant N > 0 such that if n ≥ N , then tr(f n ) ≥ 1000E, and tr(g) ≤ tr(f n ) 100 .
The constant N depends on f, g as opposed to Proposition 7, where we could choose N 7 uniformly on f, g to have those two inequalities for all n ≥ N 7 because of the conditions 1 and 2. Note that the condition 2 (and condition 1) in Proposition 7 was used only to choose N 7 uniformly on f, g such that we have the first (and the second, respectively) inequality in the above for all n ≥ N 7 . Let n ≥ N be a constant. Then we have the two inequalities in the above. Set a = f n , b = g as before. Then, using the first inequality, we can show that a, b is free exactly same as for Proposition 7. What is essential in the argument is that tr(a) is much larger than D, the 10δ-overlap of the quasi-axes of a, b. (Remember that E ≥ D.) We remark that the condition 1 was irrelevant until this part since it does not matter even if tr(g) = tr(b) is much larger than tr(f ) or tr(a) = tr(f n ).
To show that the embedding is quasi-isometric for Proposition 7, we chose a constant Q such that tr(a) 100 ≤ tr(b Q ) ≤ tr(a) 50 . This was possible since tr(b) ≤ tr(a) 100 . (For this we used the condition 1.) This is exactly the second inequality in the above, and we have chosen N to have this inequality if n ≥ N . With this Q we apply the same argument for the rest. We omit details.

Upper bound on both exponents
The following is the main theorem of Section 2.
Theorem 9. Suppose G acts acylindrically for constants K(R), L(R) on a δ-hyperbolic graph Γ. Then there exists a constant M 9 , which depends only on δ and K(20δ), L(20δ), L(200δ) with the following property.
Suppose a, b ∈ G act hyperbolically with quasi-axes α, β. Assume for any p, q = 0, [a p , b q ] = 1 in G. Then for any n, m ≥ M , a n , b m is free of rank two. Moreover, the embedding of a n , b m by an orbit in Γ is quasi-isometric. In particular, all non-trivial elements in a n , b m are hyperbolic on Γ.
M depends only on δ and K(20δ), L(20δ), L(200δ) and does not depend on a, b. It suffices to show Claim. a n , b m is free and the embedding to Γ is quasi-isometric if n, m ≥ M .
Step 1. We may assume tr(b) tr(a) ≤ N 6 M . This is because otherwise we can show Claim as follows. Assume , and apply Proposition 6 to a, b. Then, if n ≥ N 6 and m ≥ qN 6 = M , then a n , b m is free and the embedding is quasi-isometric. Since M > N 6 , we get Claim.
Note that it follows that 10KLP tr(b) ≤ tr(a) since N 6 M < 1 10KLP .
Step 3. We have D ≤ 2tr(a) (assuming the inequalities in Step 1 and 2). To argue by contradiction, assume D > 2tr(a). Then we get a contradiction using the same idea as for Lemma 5 concerning the action of commutators [b i , a]. Since D ≥ 1000δ by Step 2, the set α ∩ 10δ β looks like a narrow tube. Therefore, it makes sense to talk about the direction of the action by a and b along this tube, and furthermore, the direction of a coincides the direction of one of b or b −1 . In the following, we assume that the actions by a, b have the same direction along α ∩ 10δ β, otherwise, we consider b −1 instead of b.
Let ℓ = [p, p ′ ] ⊂ α be the longest segment contained in α ∩ N 2δ (β) such that a moves p toward p ′ , i.e., a(p) ∈ [p, p ′ ]. We know |ℓ| ≥ D − 100δ. Since D ≥ 1000δ by Step 2, it follows |ℓ| ≥ 9 10 D. We claim that for all 1 ≤ i ≤ P KL, we have d(p, [b i , a](p)) ≤ 20δ. If δ = 0 then this is obvious. Indeed, first of all, α and β coincide in ℓ in this case. Also, if we apply a, b i , a −1 then b −i in this order to p, the point moves in ℓ. This is because since 10KLP tr(b) ≤ tr(a) by But if D ≤ 2tr(a), we can apply Proposition 7 to a, b. Therefore, if n ≥ N 7 , then b, a n is free and the embedding is quasi-isometric. Since M > N 7 , we have shown the Claim. Note that b m , a n is a subgroup of b, a n .
Remark 10. It is more difficult to deal with the normal subgroup generated by a n , b m , or even just by a n (see Question 11 [11]). See the work of Delzant [2].
Remark 11. Theorem 9 is regarding two elements, but one can ask if there exists a constant M such that if a, b, c ∈ G are hyperbolic elements with certain condition (for example, pairwise independence), then a ℓ , b m , c n is free for any ℓ, m, n ≥ M . We remark that the rank of the free subgroup may not be three. Take two hyperbolic elements a, b ∈ G which satisfy the commutator assumption in Theorem 9 (i.e., independent). For any M > 0, set c = a M ba −M . Then, the pairs a, c and b, c are also independent, but a M , b M , c M is equal to a M , b M .

Hyperbolic isometries without quasi-axes
So far, we have been discussing hyperbolic isometries with quasi-axes. The existence of quasi-axes, which are geodesics by our definition, is a restriction but indeed not really necessary for our arguments since we can use certain quasi-geodesics and modify the original arguments. We discuss this issue in this section. Readers may skip this section since we do not use this for our main application to mapping class groups in Section 3.
The following fact is elementary (see [4] for details).
Fact 12 (quasi-geodesic axis). If a is a hyperbolic isometry of a δhyperbolic graph Γ, there exists a (K, ε)-quasi-geodesic α for some K, ε such that 1. a n (α) and α are in the 30δ-neighborhood of each other for any n. (Namely, α is almost invariant by a.) 2. Let p, q ∈ α. Then the subpath of α between p, q and a geodesic [p, q] are in the 10δ-neighborhood of each other.
We call such path α as a quasi-geodesic axis of a in this paper. To be precise, we should use the term quasi-geodesic quasi-axis, but we make it shorter.
One can easily show from (1) and (2) that any two quasi-geodesic axes of a are in the 30δ-neighborhood of each other. Note that (2) is concerning only the path, but not the element a. Also, the quasigeodesic constants of α are not important for our purpose. What is useful for us is (2).
For example, Lemma 3 gives a uniform positive lower bound of tr(a) for all hyperbolic isometry a ∈ G with a quasi-axis if the action of G is acylindrical, but, indeed the assumption on the existence of quasi-axes is redundant. The proof of Lemma 3 easily generalizes by using quasi-geodesic axes instead of quasi-axes (see [4] for the precise argument), and we obtain the following.
Lemma 13. Suppose G acts on a δ-hyperbolic graph Γ. If the action is acylindrical with constants K(R) and L(R), then there exists an integer P ≥ 1 such that for any element a ∈ G which acts hyperbolically on Γ, we have tr(a P ) ≥ 1. The constant P depends only on δ and K(200δ).
Of course, this constant P is maybe larger than P 3 . The existence of such P , but not the actual number, is essential for our argument.
We restate Theorem 9 in the following form for a potential application. The only difference is that we do not assume that there are quasi-axes for a and b.
Suppose a, b ∈ G act hyperbolically. Assume for any p, q = 0, [a p , b q ] = 1 in G. Then for any n, m ≥ M ′ , a n , b m is free of rank two. Moreover, the embedding of a n , b m by an orbit in Γ is quasiisometric. In particular, all non-trivial elements in a n , b m are hyperbolic on Γ.
Proof. The proof is very similar to the one for Theorem 9. Basically, we use quasi-geodesic axes instead of quasi-axes for hyperbolic isometries. The proof of Theorem 9 relies on Proposition 2, Lemma 3, Lemma 5, Proposition 6 and Proposition 7. We have already generalized Lemma 3 to Lemma 13. We modify the statement and the proof of each of the other ones, which we only outline here.
Then, the proof is same after an appropriate modification regarding constants.
As for Proposition 6, Proposition 7, replace quasi-axes α, β by quasi-geodesic axes. Use α∩ 1000δ β instead of α∩ 10δ β in the statement. Then the original proof works with minor modification. Having done them all, we modify the proof of Theorem 9 to fit our setting. Of course, we always use the constant P 13 instead of P 3 . The constant M ′ 14 is maybe larger than M 9 . We omit details. Remark 15. Proposition 8 also holds if we drop the assumption on the existence of quasi-axes α, β. The argument is also very similar to the original one.

Application to mapping class group
We discuss mapping class groups in this section. We apply results from Section 2 to pseudo-Anosov elements. Theorem 16 and 17 are main results of the paper.

Uniform estimate
We apply Theorem 9 to the mapping class group, Mod(S), of a compact orientable surface S. Let C(S) be the curve graph of S (see for example [10], [16] for the definition). Masur-Minsky [16] showed that C(S) is δ-hyperbolic and an element a ∈ Mod(S) is pseudo-Anosov if and only if it acts as a hyperbolic isometry on C(S), and moreover ( [1]) there always exists a quasi-axis. Bowditch [1] showed that the action is acylindrical.
For a subgroup G < Mod(S), Farb-Mosher [3] introduced the notion of convex-cocompact. It has been shown ( [6], [12]) that G is convex-cocompact iff for a point c ∈ C(S), the map from G to C(S) sending g to g(c), namely the embedding by an orbit, is quasiisometric.
The following is an immediate consequence of Theorem 9. Apply it to the action of Mod(S) to C(S). Two pseudo-Anosov elements a, b are called independent if [a n , b m ] = 1 for any n, m = 0 (cf. [10]).
Theorem 16. Let S be a compact orientable surface, and Mod(S) its mapping class group. Then there exists a constant M (S) with the following property. Suppose a, b ∈ Mod(S) are pseudo-Anosov elements such that [a n , b m ] = 1 for any n, m = 0. Then for any n, m ≥ M , a n , b m is free of rank two, and convex-cocompact. In particular all non-trivial elements in a n , b m are pseudo-Anosov.

Non-uniform estimate and example
Let a, b ∈ Mod(S) be two independent pseudo-Anosov elements. It would be interesting to know for which (n, m), a n , b m is free of rank two, and convex-cocompact. The following theorem says that it is the case except for finitely many (n, m). We do not know if the number of the exceptional pairs is bounded.
Theorem 17. Let S be a compact orientable surface and a, b two independent pseudo-Anosov elements. Then there exists N 17 = N , which depends on a, b, such that for any n ≥ N , both a, b n and b, a n are free of rank two, and convex-cocompact. In particular, a n , b m is free of rank two, and convex-cocompact if |n| + |m| ≥ 2N and nm = 0.
Proof. Apply Proposition 8 to a, b for the action on C(S). The constant N 8 will do.
The constant N 17 must depend on a, b as the following example shows.
Example 18. Let S be a compact orientable surface which is not a sphere with less than four punctures or a torus. If n > 0 is sufficiently large, then there exist two independent pseudo-Anosov elements f, g ∈ Mod(S) such that g, f n is not free.
To see this, take f, a ∈ Mod(S) such that f is pseudo-Anosov, a is non-trivial torsion and that f, a is not virtually cyclic. To find such a, f , first take a non-trivial torsion element a ∈ Mod(S) such that there is a non-trivial and non-peripheral simple closed curve σ on S which is not homotopic to a(σ). One can find such a easily. Then one can find a desired f . For example, take any pseudo-Anosov element h on S. Let d be a Dehn-twist along σ. Set f = d m hd −m . We choose a sufficiently large m > 0 later. It is clear that f is pseudo-Anosov, and the two laminations which are invariant by f , which we regard as a set of two points, fix(f ), in the boundary of the Teichmuller space of S, must be moved by a (i.e. fix(f ) ∩ a(fix(f )) = ∅) if m is sufficiently large. For such m, it follows by a standard argument that f, a is not virtually cyclic (cf. [9]). Now, for sufficiently large n, f n a is pseudo-Anosov, and independent from f . One can show this using the curve graph of S, C(S), which is δ-hyperbolic. If necessary, replace f by some power of it in advance, and we may assume that f leaves a geodesic γ in C(S) invariant. By our assumption γ ∩ 10δ a(γ) is bounded. For each n > 0, one can find a line which is invariant by f n a using a piece of γ, a fundamental domain for the action of f n , and the action of a. Then, for sufficiently large n, using δ-hyperbolic geometry of C(S), one can show that the line is indeed a quasi-geodesic, therefore f n a is pseudo-Anosov. Moreover, for sufficiently large n, the quasi-geodesic has two points at infinity of C(S) which are disjoint from the two points for γ.
It implies that f n a and f are independent. Set g = f n a. Then g, f n is not free since it contains the torsion element a.