Schr\"oder's problems and scaling limits of random trees

In a classic paper Schr\"oder posed four combinatorial problems about the number of certain types of bracketings of words and sets. Here we address what these bracketings look like on average. For each of the four problems we prove that a uniform pick from the appropriate set of bracketings, when considered as a tree, has the Brownian continuum random tree as its scaling limit as the size of the word or set goes to infinity.


Introduction
In his now classic paper [23], Schröder posed four combinatorial problems about bracketings of words and sets: how many binary bracketings are there of a word of length n? how many bracketings are there of a word of length n? how many binary bracketings are there of a set of size n? and how many bracketings are there of a set of size n? These questions are well studied and [24] gives a good account of the solutions. In this paper we are concerned with a probabilistic variation on these questions: for each of the above questions, if you select a bracketing uniformly at random what does it look like? To answer these questions, we will use the well known correspondence between the bracketings described above and various types of trees. We will then apply Aldous's theory of continuum trees, originally developed in the series [1,2,3] and subsequently studied by many authors, to study the scaling limits of these trees. Let us briefly describe the correspondence between bracketings and trees.
The first problem: The correspondence is best illustrated by example. For n = 4 the binary word bracketings are (xx)(xx) x(x(xx)) ((xx)x)x x((xx)x) (x(xx))x.
A binary bracketing of a word with n letters corresponds to rooted ordered binary tree with n leaves in a natural way. This is most easily described if we put brackets around the entire word and each letter, which are left out of our example because they are visually cumbersome. The tree corresponding to a bracketing is constructed recursively. A single bracketed letter is a leaf. For a word with more than one letter, the bracketing of the whole word is the root. Attached as subtrees to the root are, in order of appearance, the trees corresponding to the maximal proper bracketed subwords. For n = 4, this is illustrated by Figure 1.
x(x(xx)) ((xx)x)x x((xx)x) (x(xx))x It is worth noting that these trees are in bijection with rooted ordered trees with n vertices, but this correspondence is not as natural as the one above.
The second problem: General word bracketings are defined similarly to binary word bracketings and correspond to rooted ordered trees with n leaves and no vertices with out degree equal to one. We remark that these trees were recently studied in [5] due to their connection with non-crossing plane configurations.
The third problem: The trees associated to binary set bracketings are constructed similarly to those associated to binary word bracketings. They are rooted, unordered, leaflabeled binary trees. Figure 2 shows a sample of the correspondence for n = 4 (for n = 4 there are 15 bracketings, so showing the whole correspondence is unwieldy).
2{1{3,4}} 2{3{1,4}} The fourth problem: General set bracketings are defined similarly to binary set bracketings and correspond to rooted unordered leaf-labeled trees with n leaves and no vertices with out degree equal to one. In the literature, these trees are also called fragmentation trees [10] and hierarchies [8]. The correspondence for n = 3 is in Figure 3.
Scaling limits of uniform picks from the trees appearing in the first and third problems are well studied. A uniform pick from rooted ordered binary tree's with n leaves has the same distribution as a Galton-Watson tree with offspring distribution ξ 0 = ξ 2 = 1/2 conditioned to have 2n − 1 vertices. Thus it falls within the scope of the results in [3]. Similarly, a 2{1,3} 3{1,2} 1,2,3 Figure 3. Set bracketings and rooted unordered leaf-labeled trees for n = 3 uniform pick from rooted unordered leaf-labeled binary trees with n leaves is a uniform binary fragmentation tree with n leaves, and scaling limits of these are studied in [10]. In this paper we present a unified approach that is able to handle all four of these types of trees simultaneously. Our method is essentially to link the trees appearing in these four problems to Galton-Watson trees conditioned on their number of leaves. In particular, we obtain the following result, which is proved in Section 2.2.
For each i and n, let T i n be distributed like a Galton-Watson tree with offspring distribution ξ (i) conditioned to have n leaves.
(1) T 1 n and T 2 n are distributed like uniform random picks from the trees in Schröder's first and second problem respectively.
(2) Let U n be a uniform random ordering of {1, . . . , n}, independent of the trees and for i ∈ {3, 4} letT i n be constructed from T i n by labeling the leaves of T i n from left to right by U and then forgetting the order structure. ThenT 3 n andT 4 n are distributed like uniform random picks from the trees in Schröder's third and fourth problem respectively.
Scaling limits for these Galton-Watson trees were recently proven in [22], with an alternate approach given independently in [13], and as a result we obtain the following theorem, the notation for which will be fully explained later.
Theorem 2. For i = 1, 2, 3, 4, let T i n be a uniform random tree of the type appearing in Schröder's i'th problem with n leaves. For each i and n equip T i n with the graph metric where edges have length one and the uniform probability measure on its leaves. We then have the following limits with respect to the rooted Gromov-Hausdorff-Prokhorov topology: where T Br is the Brownian continuum random tree.
As noted above, parts (i) and (iii) were originally proven in [3] and [10] respectively. Parts (ii) and (iv) appear to be new, though an alternate approach to (ii) was independently obtained in [13]. Though, as mentioned, this theorem follows from Theorem 1 and [22, Theorem 1], we include an independent proof in Section 3. The proof we give here exploits the fact the distributions ξ i in Theorem 1 have some exponential moments and is considerably simpler than the proof of [22,Theorem 1]. Furthermore, our approach lets us obtain asymptotic results for other quantities associated to these trees as indicated in Section 3.
The paper is organized as follows. In Section 2 we rigorously introduce the models of random trees under consideration here. In Section 3 we introduce the analytic setting for Theorem 2 and end with the proof of this theorem. This section also includes a detailed analysis of the depth-first processes associated with these trees. Finally, in Section 4 we use elementary methods from analytic combinatorics to compute some asymptotic properties of these trees explicitly.

Combinatorial models and Galton-Watson trees
In this section we develop several combinatorial and probabilistic models of trees. There are two primary types of trees we will be dealing with in the sequel: rooted ordered unlabeled trees and rooted unordered leaf-labeled trees. Combinatorial relations between rooted ordered unlabeled trees and rooted unordered labeled trees are well known when the size of a tree is its number of vertices (se e.g. [20,2,8,6]). In this section we develop analogous relations when the size of a tree is its number of leaves. Particularly important for us is Corollary 2, which relates Schröder's problems to particular Galton-Watson trees conditioned on their number of leaves.
We briefly give an account of the formal constructions of the trees we will be considering. Fix a countably infinite set S; we will consider the vertex sets of all graphs discussed to be subsets of S. Let T (ℓ) n denote the set of rooted unordered trees with n leaves (where the root is considered a leaf if and only if it is the only vertex in the the tree) whose leaves are labeled by {1, 2, . . . , n}. More precisely, we consider the set T S n of all trees whose vertex sets are contained in S that have a distinguished root and n leaves (where the root is considered a leaf if and only if it is the only vertex in the the tree), whose leaves are labeled by {1, 2, . . . , n} and set T (ℓ) there is a root and label preserving isomorphism from t to s. This is the only time we shall go through this formal construction, but all other sets of trees we discuss should be considered as formally constructed in an analogous fashion. We also let T (ℓ) = ∪ n≥1 T (ℓ) n . We let T We will be proving analogous results for trees in T (ℓ) and T (o) where the only differences in the statements and proofs will be whether the superscript is (ℓ) or (o). To avoid repetition we will use T * and T * n to mean that the statements and proofs are valid both when all of the * 's are replaced by (ℓ)'s and when they are replaced by (o)'s. For a tree t ∈ T * , we define |t| to be the number of leaves in t and #t to be the number of vertices in t.
2.1. Probabilities on trees. In this subsection we introduce models that are analogous to the simply generated trees introduced by Meir and Moon [17], but were the size of a tree is its number of leaves rather than its number of vertices. Let ζ = (ζ i ) i≥0 be a sequence of numbers. We may then define the weight of a tree t ∈ T * to be Here and throughout, deg(v) is the out degree of v, i.e., the number of children of v. We will assume the following conditions: Observe that if (i) and (ii) are satisfied, then (iii) is also satisfied whenever ζ 1 = 0, as is the case for Schröder's problems. For each n such that w ζ (t) > 0 for some t ∈ T * n we may define a probability measure on T * n by .
We wish to consider generating functions, but we want an ordinary generating function for T (o) and an exponential generating function for T (ℓ) . In order to do this all at once, for z ∈ C, we define y  n (z) = z n , both for n ≥ 0, and we use y * n in the same fashion as T * . The weighted generating function induced on T * by ζ with the weights defined above is , it is then easy to see that C * ζ satisfies the functional equation (2.1) C * ζ (z) = ζ 0 z + G ζ, * (C * ζ (z)), in the sense of formal power series. Our interest is in the measures Q ζ * n and, in particular, we would like to find a Galton-Watson tree T such that Q ζ(o) n is the law of T conditioned to have exctly n leaves. Recall that if (ξ i ) i≥0 is a distribution on Z + with mean less than or equal to one and ξ 0 > 0, a Galton-Watson tree with offspring distribution ξ is a random T is called critical if ξ has mean equal to one.
Proposition 1. If ζ is a probability distribution with mean less or equal to one and T is a Galton-Watson tree with offspring distribution ζ, then the law of T conditioned to have exactly n leaves is Q ζ(o) n .

JIM PITMAN AND DOUGLAS RIZZOLO
This leads to the notion of tilting, which is similar to exponential tilting for Galton-Watson trees conditioned on their number of vertices (see [6, p. 11]). Proposition 1. Suppose that ζ satisfies Condition 1 and suppose that a, b > 0. Defineζ bỹ Then Q ζ * n = Qζ * n for all n ≥ 1. Proof. This follows immediately from the computation that, for t ∈ T * n , wζ(t) = a n b n−1 w ζ (t).
A consequence of this is that we can find a Galton-Watson tree T such that Q is the law of T conditioned to have n leaves if we can find a, b > 0 such that An immediate consequence of this is the following corollary. Corollary 1. Let ξ (2) be as defined in Theorem 1. Note that ξ (2) has mean 1 and variance 4 √ 2. Let T be a Galton-Watson tree with offspring distribution ξ (2) . Then the law of T conditioned to have n leaves is uniform on the subset of T Proof. The proof follows immediately from the discussion above by noting that, if ζ i = 1 for i = 1 and ζ 1 = 0 then then Q , and tilting as in Proposition 1.
Given the similarities in the construction of Q ζ n and Q ζ(o) n , there should be a natural way to go back and forth between them.
Proposition 2. Suppose that ζ satisfies Condition 1 for * = (o). Defineζ byζ n = n!ζ n . Thenζ satisfies Condition 1 for * = (ℓ). Suppose that T is distributed like Q ζ(o) n and let U be a uniformly random ordering of {1, . . . , n} independent of T . DefineT ∈ T (ℓ) n to be the tree obtained from T by labeling the leaves of T by U and forgetting the ordering of T . Then T is distributed like Qζ Results of this type connecting plane and labeled trees where the size of a tree is given by the number of its vertices can be traced back to [12,18,19]. See [20] for a more complete history. Our proposition is analogous to an implicit discussion in [1,2] as well as Theorem 7.1 in [20], which considered the case where the size of a tree is given by the number of its vertices. To prove this proposition, we will need some notation. For a rooted ordered tree x let shape(x) be the rooted unordered tree obtained by forgetting the order on x. Similarly, for t ∈ T (ℓ) , shape(t) is defined to be the rooted unlabeled tree obtained from forgetting the labeling of t. For t ∈ T (ℓ) , x ∈ T (o) , and a rooted unordered tree y define #labels t (x) to be the number of ways to label the leaves of x such that when order on x is forgotten the resulting tree is t and #ordered(y) to be the number of ordered trees whose shape is y. Observe that #labels t (x) depends only on shape(x), so we will abuse our notation and write #labels t (shape(x)).

Proof. Let t be an element of T
Furthermore, observe that Observe that #labels t (shape(x)) = 0 unless shape(t) = shape(x). Furthermore, P(T = x) depends only on shape(x), and is given by Consequently we have This is because both sides count the number of distinct leaf-labeled ordered trees that equal t upon forgetting their order. On the left hand side, count by picking an ordered tree and then labeling it and, on the right hand side, count by labeling an unordered tree with the appropriate shape and then ordering the children of each vertex. Therefore we have The last step is to observe that This is because for s ∈ T n , there are n! rooted ordered leaf-labeled trees whose ordered tree is s upon forgetting the labeling, so the left hand side is the weighted number of rooted ordered leaf-labeled trees with n leaves. Furthermore, we have already noted above that for s ∈ T (ℓ) n , there are v∈s (deg(v)!) rooted ordered leaf-labeled trees whose labeled tree is s upon forgetting the ordering. Thus the right hand side is also the weighted number of rooted ordered leaf-labeled trees with n leaves. Note that this step also shows thatζ satisfies Condition 1 for * = (ℓ).
Combining with tilting, we have the following corollary.
Schröder's problems. In this section we record which of the trees above correspond to the trees that appear in Schröder's problems. The proofs of the claims here are simple applications of the results in Section 2.1.
The first problem: The trees here are uniform binary rooted ordered unlabeled trees. We can obtain these by taking * = (o) and ζ 0 = ζ 2 = 1 and ζ i = 0 for i / ∈ {0, 2}. Letting ξ be the probability distribution given by ξ 0 = ξ 2 = 1/2 and T be a Galton-Watson tree with offspring distribution ξ, we have that T conditioned to have n leaves is a uniform binary rooted ordered unlabeled tree with n leaves. Also note that T is critical and the variance of ξ is equal to one.
The second problem: These are uniform rooted ordered trees with no vertices of out degree one. These were dealt with in Corollary 1 The third problem: These are uniform binary unordered leaf-labeled trees. We can obtain these by taking * = (ℓ) and ζ 0 = ζ 2 = 1 and ζ i = 0 for i / ∈ {0, 2}. In this case, if T is the Galton-Watson tree defined in the first problem andT is defined as in Corollary 2, then T conditioned to have n leaves is a uniform binary unordered leaf-labeled tree with n leaves.
The fourth problem: These are uniform rooted unordered leaf-labeled trees with no vertices with out-degree 1. We can obtain these by taking * = (ℓ) and ζ 1 = 0 and ζ i = 1 for i = 1. We define a probability distribution ξ (4) as in Theorem 1. Note that ξ (4) has mean 1 and variance var(ξ (4) ) = 2 log 2. Letting T be a Galton-Watson tree with offspring distribution ξ (4) and definingT is as in Corollary 2, we have thatT conditioned to have n leaves is a uniform unordered leaf-labeled tree with no vertices of out degree one and n leaves.
2.3. Gibbs trees. Above we saw a natural way to put probability measures on T (ℓ) n that are concentrated on fragmentation trees (the trees appearing in Schröder's fourth problem); namely, take ζ 1 = 0. Another natural type of probability to put on fragmentation trees is a Gibbs model, which we now describe. First, we need to set up the natural framework in which to view fragmentation trees. The idea is that, while in Schröder's fourth problem we have an arbitrary set bracketing, for fragmentations we recursively partition a set. This dynamic view of constructing a set bracketing makes Gibbs models quite natural.
We can naturally consider t B as a tree whose vertices are the elements of t B and whose edges are defined by the parent-child relationship. Considering the properties of such a tree leads naturally to the following definition of a fragmentation tree on B.
Definition 2. A fragmentation tree T on n leaves is a rooted tree such that (1) The root of T does not have degree 1, (2) T has no non-root vertices of degree 2, (3) The leaves of T are labeled by a set B with #B = n. We denote the label of a leaf v by ℓ(v).
The idea of the Gibbs model is that, at each step in the fragmentation the next step is distributed according to multiplicative weights depending on the block sizes. We first take a sequence {α n }, α n ≥ 0 of weights and a Gibbs weight, which is a function g : Z + → R + with g(0) = 0 and g(1) > 0. Then, for n ≥ 2, define a normalization constant where the sum is over unordered partitions of [n] into at least two blocks. Whenever we write a formula like this, we assume that each block B i is nonempty. For n such that Z(n) > 0, define the probability of a partition of [n] by The probability of a fragmentation X of [n] is then defined as where {B 1 , . . . , B k } are the children of B. Using the correspondence between fragmentations and fragmentation trees, for T n ∈ T (ℓ) n , we define P g,α n (T n ) to be P g,α n (X) where X is the fragmentation determined by T n . The probabilistic properties of Gibbs models are studied in [16].
Theorem 3. Suppose that ζ satisfies Condition 1 with * = (ℓ) and ζ 1 = 0. Define α k = ζ k and g(k) = k![z k ]C when they are defined. Furthermore, given a nonnegative weight sequence α and a Gibbs weight g such that Z(n) = g(n), there is a ζ satisfying Condition 1 with * = (ℓ) and ζ 1 = 0 such that Q ζ(ℓ) n = P g,α n . Proof. Since the number of partitions of [n] into k ordered nonempty blocks with sizes n 1 , . . . , n k is given by the multinomial coefficient, we see that for n ≥ 2 Z(n) = where the 1/(k!) appears because the partitions {B 1 , . . . , B k } are unordered. Note also that one way to count the weighted number of trees of size n is to decompose by the degree of the root and the sizes of the subtrees attached to the root. Doing so yields the formula ζ (z) by definition, it follows that Z(n) = g(n). Using this, one proves inductively that P g,α n (T n ) = Q ζ(ℓ) n (T n ). Furthermore, observe that the condition Z(n) = g(n) implies that there is a weight sequence (ζ i ) i≥0 from which the fragmentation model can be derived in the above manner; just take ζ 0 = g(1), ζ 1 = 0, and ζ k = α k for k ≥ 2.
When we have Z(n) = g(n), the model is called a combinatorial Gibbs model. This is justified by the fact that, in this case, Z(n) (and thus g(n)) is the weighted number of trees with n leaves. For example, if we let g(n) be the number of fragmentation trees with n leaves, and α k = 1 for k ≥ 2, we then see that The right hand side of this equation is just the sum over partitions at the root of a fragmentation tree with n leaves of the number of fragmentation trees with that partition at the root, which is precisely the number of fragmentation trees with n leaves. That is, Z(n) = g(n).
Note that combinatorial Gibbs models are a generalization of the hierarchies studied in [8] and, as previously observed, a special case of the Gibbs models introduced in [16].

Scaling limits
We now turn to scaling limits of the models of trees we have been discussing. Fortunately for us, the heavy lifting has already been done in [22]. In order to use the results from that paper, we must first introduce the formalism required to handle limits of random metric measure spaces.
3.1. Trees as metric measure spaces. The trees we have been talking about can naturally be considered as metric spaces with the graph metric. That is, the distance between to vertices is the number of edges on the path connecting them. Let (t, d) be a tree equipped with the graph metric. For a > 0, we define at to be the metric space (t, ad), i.e. the metric is scaled by a. This is equivalent to saying the edges have length a rather than length 1 in the definition of the graph metric. More, generally we can attach a positive length to each edge in t and use these to in the definition of the graph metric. Moreover, the trees we are dealing with are rooted so we consider (t, d) as a pointed metric space with the root as the point. Moreover, we are concerned with the leaves, so we attach a measure µ t , which is the uniform probability measure on the leaves of t. If we have a random tree T , this gives rise to a random pointed metric measure space (T, d, root, µ T ). To make this last concept rigorous, we need to put a topology on pointed metric measure spaces. This is hard to do in general, but note that the pointed metric measure spaces that come from the trees we are discussing are compact.
Let M w be the set of equivalence classes of compact pointed metric measure spaces (equivalence here being up to point and measure preserving isometry). It is worth pointing out that M w actually is a set in the sense of ZFC, though this takes some work to show. We metrize M w with the pointed Gromov-Hausdorff-Prokhorov metric (see [9]). Fix (X, d, ρ, µ), (X ′ , d ′ , ρ, µ ′ ) ∈ M w and define where the first infimum is over metric spaces (M, δ), the second infimum if over isometric embeddings φ and φ ′ of X and X ′ into M, d H is the Hausdorff distance on compact subsets of M, and d P is the Prokhorov distance between the pushforward φ * µ of µ by φ and the pushforward φ ′ * µ ′ of µ ′ by φ ′ . Again, the definition of this metric has potential to run into set-theoretic difficulties, but they are not terribly difficult to resolve.
An R-tree is a complete metric space (T, d) with the following properties: Definition 3. A continuum tree is an R-tree (T, d, ρ, µ) with a choice of root and probability measure such that µ is non-atomic, µ(L(T )) = 1, and for every non-leaf vertex w, µ{v ∈ T : The last condition says that there is a positive mass of leaves above every non-leaf vertex. We will usually just refer to a continuum tree T , leaving the metric, root, and measure as implicit. A continuum random tree (CRT) is an (M w , d GHP ) valued random variable that is almost surely a continuum tree.  Note that the elementary properties of Brownian excursion imply that T Br actually is almost surely a continuum tree. It is also worth noting that in our formalism the Brownian continuum random tree originally defined by Aldous in [3] corresponds to (T 2e , d 2e , ρ e (0), µ 2e ), but the convention has since shifted to the one we have adopted here (see e.g. [14]). The following lemma shows that this construction is above board, measure theoretically speaking. We let C + [0, 1] denote the set of continuous functions from [0, 1] to [0, ∞) that map 0 and 1 to 0. This lemma has been in the folklore for quite some time, with variants dating back to [3, Theorem 23]. However, as far as we can tell, no formal statement or proof appears in the literature so we include one here.
The distortion of R is defined as We define a metric on the disjoint union Z := T f ⊔ T g of T f and T g by if x ∈ T f and y ∈ T g , and extend by symmetry. For the remained of the proof we identify T f and T g with their natural embeddings in Z. Observe that d H (T f , T g ) ≤ dis(R)/2 ≤ 2||f − g|| and d Z (ρ f (0), ρ g (0)) = dis(R)/2 since (ρ f (0), ρ g (0)) ∈ R.
To finish proving the continuity of F , it remains to show that the Prokhorov distance between µ f and ν g is can be made small if (f, µ) is sufficiently close to (g, ν). For h > 0, we define to be the h-modulus of continuity of f . For any subset B of a metric space (E, δ) and ǫ > 0, we define B ǫ = {x ∈ E : d(x, B) < ǫ}. The two key observations are that for r > dis(R)/2 and I ⊆ [0, 1] we have ρ g (I) ⊆ ρ f (I) r and if κ, ǫ 0 > 0 we have ρ g (I κ ) ⊆ ρ g (I) 2ω(g,κ)+ǫ 0 . Combining these, we see that if A ⊂ Z, Consequently, if d P (µ, ν) < κ and A is measurable we have Since ω(·, h) is continuous on C + [0, 1], it is easy to see from these inequalities that d P (µ f , ν g ) can be made small by making ||f − g|| + d P (µ, ν) small.

3.3.
Excursions of random walks. The basis of our approach to scaling limits of Galton-Watson trees with n leaves is a new conditioned limit theorem for excursions of random walks. Let µ be a probability distribution on {−1, 0, 1, 2, . . . } that has mean 0 and finite, nonzero, variance σ 2 . Further let {X i } i≥1 be i.i.d. distributed like µ. We will restrict ourselves to the canonical situation where the X i are the coordinate functions on R N and For x ∈ R N , we let x 0 = 0 and define x n (t), for 0 ≤ t ≤ 1, as the n'th time scaled linearly interpolated process where [t] is the integer part of t.
Theorem 4. Assume that P(N n = τ −1 ({S i } i≥0 )) > 0 for all sufficiently large n and recall that γ := EN. For n ≥ 1, define the law P n on C[0, 1] by Let W ex be the law of standard (positive) Brownian excursion. We then have P n ⇒ W ex as n → ∞. That is, P n converges weakly to W ex in C[0, 1].
Proof. DefineX i = N i N i−1 +1 X i . Note that theX i are i.i.d. and, by Wald's equation, have mean 0 and finite, nonzero variance σ 2 EN. LetS 0 = 0 andS n = n i=1X i . Observe that A consequence of this is that with the last equality being a standard consequence of the Otter-Dwass formula and the local limit theorem (see e.g. [22]).
We consider these processes as elements of C[0, 1] with the uniform topology. Since E exp(λN) < ∞ for some λ > 0, Petrov's lemmas (as formulated in [15,Appendix A.1]) show that for each ǫ > 0 there exist constants c 1 , c 2 > 0 such that It follows that Let Ψ n (t) = (N n ) −1 N n (t). Consequently we have that c 2 n)). Using the standard (if seldom written, see e.g. [11]) fact that we thus have the joint convergence , it follows from the continuous mapping theorem that Now, observe that for 0 ≤ k ≤ n we havẽ S n (Ψ −1 n (N k /N n )) =S n (k/n) =S k = S N k = S n (N k /N n ). Therefore, to deduce the convergence of the first passage bridges of (σ √ γn) −1 S N n from the convergence of the first passage bridges of (σ √ γn) −1S n • Ψ −1 , we need only control how the processes differ in time intervals of the form [N k /N n , N k+1 /N n ]. We will do this in terms of the modulus of continuity ofS n . Since µ is supported on {−1, 0, 1, . . . } and N = inf{i : Suppose δ > 0 and suppose that n > δ −1 . We then have that is the δ-modulus of continuity. Consequently S n (t) −S n (Ψ −1 n (t)) ≤ ω(S n , δ) + 1.

Limits of Galton
Suppose that t ∈ T (o) has n vertices. The depth-first walk of t is a function f t : {0, . . . , 2n} → t defined by f t (0) is the root of t and f t (i) is the smallest child of f t (i − 1) that is not in {f t (0), . . . , f t (i − 1)} if one exists and the parent of f t (i − 1) otherwise. Index the vertices V of t from 1 to n by order of appearance on the depth-first walk of t, so that V = {v 1 , . . . , v n }.
for k ≤ n and 0 for k > n, which are the increments of the depth-first queue of t. Note that DQ t ∈ D. Furthermore t → DQ t is a bijection from T (o) to D (see e.g. [21]). DQ t is the list of increments of the depth-first queue of t, which is defined by We will also be interested in several other processes associated to t. Two of them are easily described in terms of the depth-first order of the vertices. They are the contour and height processes of t which are defined by C t k = d(root(t), f t (k)) and H t k = d(root(t), v k ) respectively. Two others are breadth-first processes. The breadth-first order of the vertices of a tree t with n vertices is defined as follows: Let n k be the number of vertices of t at distance k from the root and let ht(t) be the height of t. We define v ′ 1 to be the root of t and for 1 ≤ k ≤ ht(t) we define (v ′ n k−1 +1 , . . . , v ′ n k ) to be the vertices of t at height k listed from left to right. The complete list (v ′ 1 , . . . , v ′ n ) is the vertices of t listed in breadth-first order. The breadth-first queue is defined by The level profile of t is defined by L t k = n k . Observe that the level profile and breadth-first queue are related by L t k = B t (1 + n 1 + · · · + n k−1 ). See [11, p. 7] for the details regarding this relation.
Proposition 4. For n such that P(|T | = n) > 0, we have ) Similar arguments show that Proposition 5. For n such that P(|T | = n) > 0, we have ) Now, if we were to follow our previous notational convention, the time scaled linear interpolation of Q Tn would be denoted by Q Tn #Tn , which seems congested. We simplify this by dropping the superscript and the hash sign in the subscript. That is, we use the notation, for 0 ≤ s ≤ 1, Theorem 5. Let (ξ i ) i≥0 be a probability distribution on Z + with mean 1 and 0 < var(ξ) := σ 2 < ∞. Suppose that T is a Galton-Watson tree with offspring distribution ξ. Assume that for all sufficiently large n we have P(#T = n) > 0. For such n, let T n be distributed like T conditioned to have n leaves. We then have the convergences in distribution , where e has distribution W ex .
Note that this says nothing about the convergence of the joint distribution of the scaled depth and breadth first queues, which we leave as an open problem.
Assuming ξ has some exponential moments, we can strengthen this result.
Theorem 6. In addition to the conditions of Theorem 5, assume that exp(αx)ξ(dx) < ∞ for some α > 0. Let (e(t), 0 ≤ t ≤ 1) have distribution W ex . We then have the convergence A similar result with weaker hypotheses appears in [13,Theorem 5.9], and the result for the scaled contour function can also be derived from [22,Theorem 1]. However, for us, it follows immediately from Theorem 5 and the next theorem. By exploiting the existence of exponential moments in our setting, we are able to provide a much simpler proof than those appearing in [13,22]. Because of this, Theorem 6 can also be used to simplify the approach to non-crossing configurations of the plane in [5], which makes use of this theorem applied to the trees in Schröder's second problem.
Theorem 7. For each ν > 0 there exist constants N and α > 0 such that for n ≥ N we have Proof. Theorems 2 and 3 in [15] prove the analogous result when T n is distributed like T conditioned to have n vertices. Our theorem follows from those by decomposition by the number of vertices in T n . For example, (ii) follows from Theorem 2 in [15] by where the second inequality and α ′ > 0 are given by Theorem 2 in [15].
Corollary 3. Maintaining the notation and hypotheses from Theorem 6, we have √ ξ 0 σ √ n L Tn Proof. The proof of Theorem 1 in [11] goes through almost verbatim. We need only verify a condition involving the cumulative height process 1(e(s) ≤ u)ds = 0 = 0 for all u > 0. The proof of Theorem 1 in [11, p. 18] now goes through exactly, using Theorem 5 in place of [11,Theorem 11] and Equation 3.2 in place of [11,Lemma 9].
We now show how to obtain scaling limits for weighted trees from the limits for depth-first processes obtained in Theorem 6.
Theorem 8. Let ξ be an offspring distribution satisfying the hypothesis of Theorem 6 and let T n be distributed like a Galton-Watson tree with offspring distribution ξ conditioned to have n leaves. Consider T n as a rooted weighted metric space with the graph metric and the uniform probability measure µ n on the leaves of T n . We have with respect to the Gromov-Hausdorff-Prokhorov topology.
Proof. Let ν n be the empirical distribution of the location of leaves along the scaled contour process n −1/2 C Tn (s) of T n . It is clear that Therefore, by Theorem 6 and Lemma 1, all that remains to be shown is that ν n ⇒ λ in probability, where λ is Lebesgue measure on [0, 1]. Letν n be the empirical distribution of the location of leaves along the height process of T n , which we note is the same as the empirical distribution of the location of leaves along the depth-first queue of T n . We denote the vertices of T n listed in order of appearance on the depth-first walk of T n by (v 1 , . . . , v #Tn ). For 1 ≤ l ≤ #T n , define m(l) = inf{k : f Tn (k) = v l }, were we recall that f Tn is the depth-first walk of T n . From [15, Lemma 2], we see that m(l) = 2l − 1 − H Tn l , where our formula is slightly different from that in [15] due to indexing considerations. If we let N k denote the location of the k'th leaf along the depth-first queue of T n , we see that It follows from Proposition 4 and Equation 3.1 thatν n ⇒ λ in probability. Furthermore, it follows from Theorem 6 that in probability. As a result of this, we have that ν n ⇒ λ in probability as well and this completes the proof.

Explicit computations using analytic combinatorics
The convergence result above is a powerful theorem for obtaining asymptotics of various tree statistics, but it is difficult to prove and, as a result, asymptotics thus obtained can seem mysterious. Consequently it is worth noting we can obtain a number of asymptotic results directly using analytic combinatorics. This analytic approach is based on considering the asymptotics of generating functions. The primary source for asymptotics in general is [8], which develops the theory with extensive examples.
Our main goal in this section is to develop the general framework of additive functionals for leaf-labeled trees whose size is counted by their number of leaves and use this to find the asymptotic distribution of the height of a uniformly randomly chosen leaf. We also find the limit of the expected height of a random leaf. These computations are meant to be illustrative and by no means exhaust the power of analytic combinatorics framework. Indeed, it seems that most of the techniques used to study for simple varieties of trees (see [8] for a summary of the extensive work in this area) have close analogs that will provide results about the trees we are considering here.

Analytic background.
In this subsection we recall from [8] some fundamental results from analytic combinatorics. The next subsection applies these to our setting. The approach is based on the asymptotics of several universal functions. Recall that if f (z) is either a formal power series, [z n ]f (z) denotes the coefficient of z n . Similarly, if f : C → C is analytic at 0 then [z n ]f (z) denotes the coefficient of z n in the power series expansion of f at 0.
To use these classical results we need a special type of analyticity called ∆-analyticity, which we now define.
Definition 5 (Definition VI.I p. 389 [8]). Given two number φ and R with R > 1 and 0 < φ < π/2, the open domain ∆(φ, R) is defined as For a complex number ζ a domain D is a ∆-domain at ζ if there exist φ and R such that D = ζ∆(φ, R). A function is ∆-analytic if it is analytic on a ∆-domain.
Define λ(z) by Theorem 9 (Theorem VI.4 p. 393 [8]). Let f (z) be a function analytic at 0 with a singularity at ζ, such that f (z) can be continued to a domain of the form ζ∆ 0 , for a ∆-domain ∆ 0 . Assume that there exist two function σ and τ , where σ is a (finite) linear combination of elements of S and τ ∈ S, so that f (z) = σ(z/ζ) + O(τ (z/ζ)) as z → ζ in ζ∆ 0 .
Then the coefficients of f (z) satisfy the asymptotic estimate where τ ⋆ n = n a−1 (log n) b , if τ (z) = (1 − z) −a λ(z) b . Occasionally we will also need to deal with derivatives and the next theorem shows us how this is done.
Theorem 10 (Theorem VI.8 p. 419 [8]). Let f (z) be ∆-analytic with singular expansion near its singularity of the simple form with A, a 1 , a 2 , · · · ∈ R. Then, for each integer r > 0, the derivative f (r) (z) is ∆-analytic. The expansion of the derivative at its singularity is obtained through term by term differentiation: The generating functions we will work with fall into the smooth implicit-function schema, which provides a way to derive coefficient asymptotics from functional equations.
Definition 6 (Definition VII.4 p. 467 [8]). Let y(z) be a function analytic at 0, y(z) = n≥0 y n z n , with y 0 = 0 and y n ≥ 0. The function is said to belong to the smooth implicitfunction schema if there exists a bivariate function G(z, w) such that y(z) = G(z, y(z)), where G(z, w) satisfies the following conditions.
(i) G(z, w) = m,n≥0 g m,n z m w n is analytic in a domain |z| < R and |w| < S, for some R, S > 0. (ii) The coefficients of G satisfy g m,n ≥ 0, g 0,0 = 0, g 0,1 = 1, and g m,n > 0 for some m and for some n ≥ 2. (iii) There exist two numbers r and s such that 0 < r < R and 0 < s < S, satisfying the system of equations G(r, s) = s, G w (r, s) = 1, with r < R, s < S, which is called the characteristic system.
Definition 7 (Definition IV.5 p. 266 [8]). Consider the formal power series f (z) = f n z n . The series f is said to admit span d if for some r {n : f n = 0} ∞ n=0 ⊆ r + dZ + . The largest span is the period of f . If f has period 1, then f is aperiodic.
With this definition, we get the following theorem. It is worth noting that this result appears in several places in the literature. We give the version that appears as Theorem VII.3 on page 468 of [8]. In that source it is footnoted that many statements occurring previously in the literature contained errors, so caution is advised.
Theorem 11 (Theorem VII.3 p. 468 [8]). Let y(z) belong to the smooth implicit-function schema defined by G(z, w), with (r, s) the positive solution of the characteristic system. Then y(z) converges at z = r, where it has a square root singularity, y(z) = z→r s − γ 1 − z/r + O(1 − z/r), γ ≡ 2rG z (r, s) G ww (r, s) , the expansion being valid in a ∆-domain. If, in addition, y(z) is aperiodic, then r is the unique dominant singularity of y and the coefficients satisfy [z n ]y(z) = n→∞ γ 2 √ πn 3 r −n 1 + O(n −1 ) .
We will also need the following theorem.

4.2.
Restricting the generality. So far we have been considering a very general situation. In order to simplify the computations in the following section we will focus on a more restricted setting. In particular, we will let ζ = (ζ i ) i≥0 be a sequence of non-negative weights such that ζ 0 = 1, ζ 1 = 0, gcd{k : ζ k = 0} = 1, and is entire. These conditions can be relaxed, but doing so makes the analysis more difficult.
Proof. All that really needs to be checked is that the characteristic system has a positive solution. For G(z, w) = z + G ζ (w), the characteristic system is s = r + G ζ (s) and G ′ ζ (s) = 1. Using that G ζ is entire, G ′ ζ (0) = 0 and G ′ ζ (+∞) = +∞, and G ′ ζ is increasing on R + , the intermediate value theorem yields s > 0. An easy computation yields that G ζ (s) < sG ′ ζ (s) = s, so r > 0 as well.
4.3. The height of a random leaf. Let H n be the height of a randomly chosen leaf from a tree in T (ℓ) n . Specifically, to get H n , we choose a tree T n from T (ℓ) n according to Q ζ(ℓ) n and then choose a leaf uniformly at random from T n . Our main result in this section is the following theorem.