Large Deviations for Small Noise Diffusions Over Long Time

We study two problems. First, we consider the large deviation behavior of empirical measures of certain diffusion processes as, simultaneously, the time horizon becomes large and noise becomes vanishingly small. The law of large numbers (LLN) of the empirical measure in this asymptotic regime is given by the unique equilibrium of the noiseless dynamics. Due to degeneracy of the noise in the limit, the methods of Donsker and Varadhan (1976) are not directly applicable and new ideas are needed. Second, we study a system of slow-fast diffusions where both the slow and the fast components have vanishing noise on their natural time scales. This time the LLN is governed by a degenerate averaging principle in which local equilibria of the noiseless system obtained from the fast dynamics describe the asymptotic evolution of the slow component. We establish a large deviation principle that describes probabilities of divergence from this behavior. On the one hand our methods require stronger assumptions than the nondegenerate settings, while on the other hand the rate functions take simple and explicit forms that have striking differences from their nondegenerate counterparts.


Introduction
In this work we study the large deviation behavior of certain stochastic dynamical systems with small noise over long time horizons. In order to motivate the problem of interest we begin with the following classical setting of Donsker-Varadhan large deviation theory [9,10,11] for ergodic diffusions. Let Z be a R d -valued continuous stochastic process given as the solution of the following stochastic differential equation (SDE) dZ(t) = −∇ϕ(Z(t))dt + dB(t), Z(0) = z 0 where B is a d-dimensional Brownian motion given on some probability space (Ω, F, P ), z 0 ∈ R d , and ϕ : R d → R is a twice continuously differentiable function. Suppose in addition that ϕ is bounded from below, has a bounded Hessian, and ∥∇ϕ(x)∥ → ∞ as ∥x∥ → ∞. Consider the empirical measure process associated with Z defined as where δ x denotes the Dirac probability measure at the point x and B(S) for a topological space S denotes the associated Borel σ-field. From [11] it follows that under the above conditions on ϕ, the collection {µ t } of P(R d ) valued random variables, where P(R d ) is the space of probability measures on R d equipped with the topology of weak convergence, satisfies a large deviation principle (LDP) with rate function I Z : P(R d ) → [0, ∞] and speed t, namely for all continuous and bounded F : where the rate function I Z is given as µ(dx) , µ ∈ P(R d ), (1.2) where L is the infinitesimal generator of the Markov process Z, whose evaluation for g ∈ C 2 b (R d ) (the space of twice continuously differentiable bounded functions with bounded derivatives) is given as (Lg) .
where ∆ is the d-dimensional Laplacian, and D + is the space of functions g in the domain of L that are uniformly bounded below by a positive constant. The above large deviation principle gives asymptotics of probabilities of deviations of the empirical measure process µ t from its law of large numbers (LLN) limit, which is the unique stationary distribution of the Markov process Z, for large values of t. This basic result has been extended in subsequent works in many different directions (see e.g. [17,6,7,8,18,20,15,14]). Our first interest in the current work is in the study of analogous long-time behavior for small noise diffusions. Specifically, we consider the following setting. Let Z ε be a R d -valued continuous stochastic process given as the solution of the following SDE dZ ε (t) = −∇ϕ(Z ε (t))dt + s(ε)dB(t), Z ε (0) = z 0 (1. 3) where B and ϕ are as before and s : (0, ∞) → (0, ∞) satisfies s(ε) → 0 as ε → 0. Due to the degeneracy of the noise in the limit as ε → 0, we will need stronger conditions than those needed for a LDP for {µ t } defined by (1.1). Specifically, we assume that, in addition to ϕ being twice continuously differentiable with a bounded Hessian, ϕ is strongly convex and ∇ϕ(0) = 0. Under these assumptions on ϕ, 0 is the unique equilibrium of the ordinary differential equation (ODE): Also, it follows (see e.g. proof of Lemma 4.5) that, as ε → 0, in P(R d ). Long time behavior of SDE with small noise as in (1.3) is of interest, for example, in study of stochastic approximation schemes for approximating zeroes of a nonlinear function (cf. [2,19,1]). One of the crucial ingredients in the proofs of [11] and other works on related themes is the nondegeneracy of the noise in the dynamics. This property is key in the proof of the lower bound where one invokes an ergodic theorem in order to suitably approximate near optimal paths in the variational problem describing the large deviation rate function. This feature of nondegeneracy is the main point of departure in the current work, as instead of empirical measures converging to the stationary distribution of an ergodic nondegenerate diffusion, in the current setting, these measures converge to a point mass given by the fixed point of the noiseless ODE in (1.4). The usual methods of studying empirical measure large deviations for Markov processes exploit nondegeneracy in the dynamics by considering relative entropies of near optimal measures with respect to the stationary distribution of the given diffusion. However these methods are not applicable here as typical measures of interest in our setting will be mutually singular, and therefore one needs new tools. We also remark that, if on the right side of (1.2) one naively replaces the second order operator with the limiting first order operator associated with the diffusion in (1.3), namely L 0 g . = −∇ϕ · ∇g, the maximization in (1.2) gives +∞ for any µ with a compact support and so even a candidate for the rate function for µ ε is not immediate from (1.2) in this degenerate setting.
(1. 6) We note that unlike the rate function I Z associated with the ergodic diffusion Z, given in (1.2), which is described through a variational formula, the rate function in this small noise setting takes a surprisingly explicit form. The precise result we establish allows for a somewhat more general drift function and a state-dependent diffusion coefficient. The conditions on the coefficients and the form of the rate function in this more general setting are given in Section 2. The above result gives a LDP when simultaneously time becomes large and the noise intensity becomes small. A similar theme has recently been considered in [13], where, motivated by the problem of design of Monte-Carlo schemes, certain large deviation estimates have been established for suitable integrals in the specific case where s(ε) = (log(1/ε)) −1/2 . In this case the relevant techniques are those based on the Freidlin-Wentzell theory of quasipotentials of small noise diffusions [16]. Note that in our result we do not make any assumptions on how s(ε) approches 0. Furthermore, the paper [13] does not give a LDP for the empirical measure µ ε . The second focus of this work is the study of asymptotic behavior of fast-slow diffusions, when both slow and fast components have small noise in their natural time scales. The precise model of interest is described by a m + d dimensional diffusion (X ε , Y ε ) given as follows.
dX ε (t) = b(X ε (t), Y ε (t))dt + s(ε) √ εα(X ε (t))dW (t), X ε (0) = x 0 , 0 ≤ t ≤ T dY ε (t) = − 1 ε ∇ y U (X ε (t), Y ε (t))dt + s(ε) √ ε dB(t), Y ε (0) = y 0 , 0 ≤ t ≤ T, (1.7) Here, T ∈ (0, ∞) is some fixed time horizon, b : R m+d → R m , α : R m → R m×k , U : R m+d → R are suitable coefficient functions, s(ε) is as before, and W, B are k and d dimensional mutually independent Brownian motions respectively. Note that the natural time scale for Y ε is O(ε) while that of X ε is O(1). On these natural time scales the noise variances of the two processes are O(s 2 (ε)) and O(εs 2 (ε)) respectively, both of which converge to 0 as ε → 0. In contrast to the setting considered here, when s(ε) = 1, the above multiscale system falls within the framework of (nondegenerate) stochastic averaging principles, for which the associated large deviations theory has been well developed (cf. [16,22,21,12,5,21]). Under appropriate conditions on the coefficient functions, these large deviations results give probabilities of deviations of the trajectory X ε , regarded as a random variable in the space C([0, T ] : R m ) (the space of continuous functions from [0, T ] to R m equipped with the usual uniform convergence topology), from its law of large number limit X 0 given as the solution of the following ODE: and for x ∈ R m , µ x is the unique stationary distribution of the diffusion dY x (t) = −∇ y U (x, Y x (t))dt + dB(t).
The main insight that emerges from this LLN behavior is that the slow process, over the time scales at which the fast process equilibrates towards its stationary distribution, stays approximately unchanged and its limit is governed by a parametrized family of stationary distributions associated with the fast diffusion where each stationary distribution corresponds to the local equilibrium of the fast process for a given value of the state of the slow process.
In the setting of the current work (s(ε) → 0) the fast process on its natural time scale is driven by a small noise and thus in the scaling regime we consider, the asymptotics of the slow process (under suitable conditions) are governed by the family of equilibria for the parametrized family of ODE: Under our assumptions, for each x ∈ R m the above ODE will have a unique equilibrium point y(x) ∈ R d and the LLN of X ε defined in (1.7) is given bẏ The goal of this work is to study the behavior of probabilities of large deviations of the process X ε from its LLN limit given by X 0 . The key challenge in the degenerate setting considered here is that, unlike the case s(ε) = 1 where the local equilibria are mutually absolutely continuous, when s(ε) → 0 as ε → 0, the family of equilibria are mutually singular (except the trivial case when they are the same). Once again the proof techniques used in the nondegenerate setting are not applicable here and different ideas are needed. In Theorem 2.7 we establish a large deviation principle for X ε under appropriate conditions on the coefficient functions. Our results in fact give a stronger result which provides a LDP for the pair ( , equipped with the weak convergence topology, and Λ ε is a M 1 valued random variable defined as The precise rate function governing the LDP can be found in Section 2.2 (see (2.5)) but we note here that in the special case where m = k and α is the identity matrix, the rate function takes a simple explicit form as where µ(dy ds) = µ(s, dy) ds. Roughly, the second term in the rate function arises from the large deviations of the Brownian motion W whereas the first term captures the deviations of the fast process from the collection of its local equilibria. More precisely, the first term can be interpreted as the instantaneous cost associated with the deviations of a set of points described by the measure µ(s, dy), for each time instant s, from its equilibrium point y(ξ(s)). Once again the form of the rate function has striking differences from that in the nondegenerate setting (cf. [16,22,12]). We remark that the proof of a LDP for µ ε defined in (1.5) is a simpler analogue of the proof of Theorem 2.7 and in fact can be deduced from it. However we present these results separately for two reasons. First, the basic idea of the proof (particularly of the LDP lower bound) is significantly simpler and clearer to see in the setting of Theorem 2.2 and sets the general framework for the more involved setting in Theorem 2.7. Second, in Theorem 2.2 we treat a more general setting than the one discussed in the Introduction because of which this result cannot be immediately deduced from Theorem 2.7.
We now make comments on proof ideas.

Proof Strategy.
The starting point for the proofs of both Theorem 2.2 and 2.7 is a variational formula for moments of nonnegative functionals of finite dimensional Brownian motions due to Boué and Dupuis [3] (see Theorem 3.1). Using this formula the basic problem of large deviations reduces to establishing convergence of costs associated with certain stochastic control problems to those associated with suitable deterministic optimization problems. This convergence is shown by establishing a complementary set of asymptotic inequalities between the costs, one giving the large deviation upper bound (see (4.1) and (5.1)) while the other giving the large deviation lower bound (see (4.2) and (5.2)). Proof of the upper bound proceeds by weak convergence arguments that also reveal the precise form of the large deviations rate function. This form emerges from a key orthogonality property (see (4.15) for Theorem 2.2 and (5.16) for Theorem 2.7) that is behind the inequalities in Lemma 4.4 and Lemma 5.4 and which in turn give the large deviation upper bound. Proofs of the lower bounds are somewhat long and involved and require several approximating constructions. We only comment on the arguments for Theorem 2.7 as those used for Theorem 2.2 are simpler analogues. The basic approach in the proof of the lower bound is the construction of simple form near optimal paths ξ * and occupation measures ν * for the deterministic optimization problem on the right side of (5.2) (see Lemma 5.5). This is then used to construct suitable controls and controlled processes for the prelimit stochastic system which appropriately converge to the chosen near optimum. In doing so one needs to ensure that the corresponding prelimit occupation measureΛ ε in (5.5) charges the asymptotically correct periods of time in the correct regions of the state space of the fast process that are dictated by the near optimum occupation measure ν * . One also needs to ensure that the cost incurred in doing so is suitably close to the cost associated with the near optimum (ξ * , ν * ). In achieving these dual goals one needs to design suitable controls that appropriately modify the dynamics of the fast process so that the local equilibria of the process are sufficiently close to ν * (s, dy) at all time instants s. This construction, which is given in Section 5.3, is at the heart of the lower bound proof. The idea is for the control to move the state process from one point in the support of ν * (s, dy) to the next in a very small amount of time with negligible cost and then keep the process near this latter point for the correct amount of time as dictated by ν * (s, dy) while incurring the optimum amount of cost. This basic idea takes a somewhat simpler form for the proof of Theorem 2.2 and we refer the reader to Section 4.3 for a more detailed outline of the strategy for this setting. The Rest of the paper is organized as follows. We close this section by summarizing the basic notation and terminology used. Sections 4 and 5 contain the proofs of Theorems 2.2 and 2.7 respectively. Organizations of these proofs are summarized at the beginning of the corresponding sections.

Notation and Terminology.
The following notation will be used. For a Polish space X , we will denote the space of continuous, real-valued functions on X by C(X ) and by C c (X ) (resp. C b (X )) the subset of C(X ) consisting of functions with compact support (resp. that are bounded). We say a function f : Such a function is said to be in C k b if the function and all its derivatives up to the k-th order are bounded. We denote by C ∞ (R d ) the space of infinitely differentiable functions from R d to R. C([0, T ] : R d ) will denote the space of continuous functions from [0, T ] to R d which will be equipped with the usual uniform topology induced by the sup-norm. We denote by Hessian matrix of f with regard to all variables, and for (x, y) ∈ R m × R d , H x f (x, y) will denote the m × m Hessian matrix of f with regard to x ∈ R m , and H y f is defined analogously. Similarly ∇f denotes the (m + d)-dimensional vector that is the gradient of f , ∇ x f the m−dimensional vector that is the gradient of f with regard to the variables in x, and ∇ y f is defined similarly. For a matrix a, we denote its transpose by a T and its trace (when meaningful) by tr(a). Id will denote the identity matrix with dimension clear from the context. We denote by B(X ) the Borel σ-field on X . P(X ) will denote the space of probability measures on (X , B(X )), equipped with the topology of weak convergence. This topology can be metrized using the bounded-Lipschitz distance defined as: for µ, ν ∈ P(X ) where BL 1 (X ) is the space of all Lipschitz functions from X to R that are bounded by 1 and have Lipschitz constant bounded by 1. For x ∈ X , δ x will denote the Dirac probability measure concentrated at the point x. For X valued random variables X n , X, we denote the convergence in distribution (resp. in probability) of X n to X as X n ⇒ X (resp. X n P − → X). BM(X ) will denote the set of bounded and measurable real valued functions on X . For a bounded R d valued function f on X , we denote is called a rate function (on X ) if it has compact level sets, i.e. for each M ∈ (0, ∞) the level set {x ∈ X : I(x) ≤ M } is a compact subset in X . As a convention, infimum over an empty set is taken to be ∞. We will consider collections indexed by a positive parameter ε, and by convention, ε will always take values in (0, 1).
A collection of X −valued stochastic processes {X ε } is said to satisfy the Laplace Principle on X with rate function I and speed We say the Laplace upper (resp. lower) bound holds if the left side is bounded below (resp. above) by the right side. We recall that the collection {X ε } satisfies the Large Deviation Principle on X with rate function I and speed α(ε), if and only if it satisfies the Laplace principle.

Main Results
In this section we present our two main results. The first result concerns the LDP for the empirical measure of certain small noise diffusions while the second result studies large deviations for a class of slow-fast system of diffusions with vanishing noise. The results are described below in Sections 2.1 and 2.2 respectively.

Empirical measure for Small Noise Diffusions.
We will consider a somewhat more general setting than the one considered in the Introduction and after stating the main result we remark on how the model considered in the Introduction is covered by this result. The collection of diffusions we study takes the form where B is a r-dimensional {F t } 0≤t≤1 standard Brownian motion given on some filtered probability space (Ω, F, {F t }, P ) satisfying the usual conditions, y 0 ∈ R d , and s(ε) → 0 as ε → 0. Throughout, without loss of generality, we assume that s(ε) ∈ (0, 1). We will make the following assumptions on the coefficient functions ψ and σ.

[Drift Coefficient]
There is a C 2 function ϕ : R d → R such that (a) ψ(y) = a(y)∇ϕ(y) for all y ∈ R d , (c) ∇ϕ(y) = 0 if and only if y = 0, the ODEξ = −V x (ξ) has 0 as the unique fixed point, which is globally asymptotically stable.
The following is our first main result.
is a rate function. Additionally, the collection {µ ε } of P(R d ) valued random variables defined as satisfies a LDP on P(R d ) with rate function I 1 and speed (εs 2 (ε)) −1 .

Multiscale System of Diffusions with Vanishing Noise.
In this section we present our main result for the multiscale system (X ε , Y ε ) introduced in (1.7). We begin with our main assumption on the coefficients.  Lipschitz: there exist some L b , L α ∈ (0, ∞) such that for all x, Furthermore α is a bounded function.

2.
[Coefficients of the fast component] U : R m+d → R is a C 2 function such that the following hold: (c) [Lower bound on U and its y-gradient] There exist constants L 1 low , L 2 low ∈ (0, ∞) such that has 0 as the unique fixed point, which is globally asymptotically stable.
Recall the space M 1 introduced above (1.8) which is equipped with the weak convergence topology. Once more we will metrize it using the bounded-Lipschitz distance as in (1.9). Also recall the collection of M 1 valued random variables {Λ ε } defined in (1.8). We now introduce the rate function associated with a LDP for (X ε , Λ ε ). Let X . = C([0, T ] : R m ). Note that a ν ∈ M 1 can be disintegrated as ν(dy ds) =ν s (dy)ds where s →ν s is a measurable map from [0, T ] to P(R d ).
Now, for (ξ, ν) ∈ X × M 1 , let U(ξ, ν) be the class of all v ∈ L 2 ([0, T ] : R k ) such that ξ solves Define The following is the second main result of this work.

A Variational Formula.
In this section we recall a basic variational formula for exponential moments of functionals of finite dimensional Brownian motions that was established in [3]. In the form stated below, the result can be found in [4,Theorem 8.3]. Let (Ω, F, P, F t ) 0≤t≤T be a filtered probability space where the filtration {F t } 0≤t≤T satisfies the usual conditions. For p ∈ N, denote by A p the collection of all F t −progressively measurable, R p −valued stochastic processes {u(t)} 0≤t≤T that satisfy E T 0 ∥u(s)∥ 2 ds < ∞. Also for M ∈ (0, ∞) we denote by This space will be equipped with the inherited weak topology on When clear from the context, we will drop p from the notation in A p , A p b,M , A p b , and S p M . Let β be a p−dimensional, standard, {F t }−Brownian motion on this filtered probability space.
The above theorem will be used in the proofs of both Theorems 2.2 and 2.7. In the first case, T = 1 and the role of β will be played by the Brownian motion B (in particular p = r), while in the second case β = (W, B) and p = k + d.

Proof of Theorem 2.2.
In order to prove the theorem, we will first show in Section 4.1 the LDP upper bound, which in terms of Laplace asymptotics corresponds to the statement: for every Then, in Section 4.3 we will prove the complementary lower bound: for every Finally, in Section 4.4 we show that the function I 1 has compact level sets. Together these three results will complete the proof of Theorem 2.2. Assumption 2.1 will be taken to hold throughout this Section.

LDP Upper Bound
In this Section, we prove the inequality in (4.1). Fix F ∈ C b (P(R d )). Note that the coefficients ψ and σ are Lipschitz maps and thus the SDE in (2.1) has a unique pathwise solution. This says that, there exists a measurable mapG ε : Fix ε > 0 and apply Theorem 3.1 with p = r, β = B, and G replaced by Fix δ > 0 and choose for each ε > 0 aṽ ε ∈ A b that is δ−optimal for the right side. Then, for all Since F is bounded, by a standard localization argument (see [4,Theorem 3.17 ]) it follows that there is a M ∈ (0, ∞), and for each Also by an application of Girsanov's Theorem, it is easy to see that In particular, We begin with the following moment estimate.
Proof. We note that from Assumption 2.1 parts 2 and 3, ψ = V 0 . Using this fact and applying Itô's lemma to V 0 (Ȳ ε (t)) we obtain, for 0 ≤ t ≤ 1 Let, for m ∈ N, τ m = inf{t :Ȳ ε (t) ≥ m}. Taking expectations and rearranging terms we obtain where we have used the nonnegativity of V 0 . With c 1 (·), c 2 (·) as in Assumption 2.1 (part 4), we have Using the linear growth of ∇V 0 (which follows from ∥HV 0 ∥ ∞ < ∞) and Young's inequality, we can find κ 1 ∈ (0, ∞) such that for all ε > 0 Then, by sending m → ∞ in the previous display, and recalling that v ε ∈ A b,M , we have, . This proves the first statement in the lemma.
Finally, using Assumption 2.1 (part 4) in (4.6) again and taking expectations, we have, for 0 ≤ t ≤ 1, where we have used the observation that the expected value of the stochastic integral in (4.6) is 0 in view of (4.8) and linear growth of ∇V 0 . The second statement in the lemma is now immediate from (4.7),(4.8) and on recalling that v ε ∈ A b,M .
We now introduce certain occupation measures which will play an important role in the proof of the upper bound. For ε > 0, define a P(R d+r )−valued random variableQ ε as The following proposition gives the tightness of the collection {Q ε }.
The estimate in (4.11) now follows from Lemma 4.1 and on using the fact that v ε ∈ A b,M for every ε ∈ (0, 1).
The next step will be to give a suitable characterization of the weak limit points ofQ ε . In order for that we present the following approximation lemma which will also be used in the proof of Theorem 2.7.
Proof. Define for r ∈ N,g r : R → R as We now suitably mollify the above piecewise smooth function. Let ξ : R → R be defined as where c is a normalization constant that makes ξ a probability density. Now define for This in particular says that, for all Then The result follows.
We now proceed with obtaining a characterization for the weak limit points of {Q ε }. Recall from Proposition 4.2 that this collection is tight.
Lemma 4.4. LetQ be a weak limit point of {Q ε }. Then, a.s., Multiplying with ε in the above equation Sending ε to 0 and using the fact that η, ∇η and Hη are bounded (as η has compact support), we have, in probability, We relabel the subsequence along whichQ ε ⇒Q, asQ ε . Then from the square integrability in (4.10), and since η has compact support we have that as ε → 0, Combining the above two displays we have, a.s., is the first marginal ofQ and q is the regular conditional probability distribution (r.c.p.d.) on the second coordinate given the first coordinate. Now define u(y) = R r z q(y, dz).

Proof of the LDP upper bound.
We now complete the proof of the upper bound, namely of the inequality in (4.1). Denote by [Q ε ] i , i = 1, 2 the two marginals ofQ ε on R d and R r respectively. LetQ be a weak limit point ofQ ε . By a usual subsequential argument we can assume that the convergenceQ ε ⇒Q holds along the full sequence. Note that where the first inequality is from (4.5), the second line uses the definition ofQ ε , the third line uses the convergence ofQ ε toQ, the lower semicontinutiy of the L 2 norm and Fatou's lemma, and the decompositionQ(dy dz) =Q(dy)q(y, dz), the fourth line uses Lemma 4.4, and the fifth uses the definition of I 1 . Since δ > 0 is arbitrary, the proof of the Laplace upper bound is complete.

Construction of a Stabilizing Control
In proving the lower bound we will need to construct certain controlled versions of (2.1) that stay in the neighborhood of a specified state in R d for a given length of time. The following lemma will be key in such constructions.
where V x (y) is as in Assumption 2.1(3). Then, the following hold.
1. There exists Proof. Let x ∈ R d be as in the statement of the theorem. In the following, dependence of various bounds on x will not be noted explicitly. Let for y ∈ R d , σ x (y) Using Assumption 2.1(4) again and sending m → ∞, we have that for some κ 2 ∈ (0, ∞), for all ε > 0. This estimate together with the linear growth of ∇V x says in particular that the expected value of the stochastic integral in (4.19) is 0. From (4.19) and Assumption 2.1(4) we also see that for some a i ∈ (0, ∞), i = 1, 2, 3, 4, and for all t ∈ [0, 1] By our choice of k 1 we see that (a 3 k 1 − a 4 ) ≥ 0 and so, from the above display, and using the definition of κ 1 and k 1 , which is clearly false. This gives us a contradiction, which proves the claim and completes the proof of part (1) of the proposition. Now we prove part (2) of the proposition. Fix κ ∈ (0, 1) and a collection {x We argue by contradiction. Suppose that (4.18) is false. Then, there exist some δ > 0, κ > 0, a sequence ε n → 0, and t n ∈ [κ, 1] such that for every n ≥ 1, with π n . = θ εn tn Ed bl (π n , δ 0 ) ≥ δ. Using (4.20) and our assumption on {x ε } we see that {π n } is tight as a sequence of P(R d )-valued random variables. Then, along a subsequence (labeled again by n), π n ⇒ π, for some P(R d )-valued random variable π. Along the lines of Lemma 4.4 we now see that, for every η : Indeed, by Itô's formula, we have for all n ∈ N, Rearranging the terms, Taking limit as n → ∞, the right hand side converges to 0 since ε n , s(ε n ) → 0 and η ∈ C 2 b . Also, the left hand side can be rewritten as Since t n ∈ [κ, 1] for all n, we have, from the weak convergence of π n to π, the square integrability estimate in (4.20), and linear growth of V x from Assumption 2.1, that, a.s., (4.24) holds. Now, from the global asymptotic stability of the unique fixed point 0 of the ODEẏ = −V x (y) (Assumption 2.1(3)), we see that π = δ 0 . Indeed, denoting the solution of the above ODE with y(0) = z ∈ R d as y z (t) and defining which, on sending t → ∞ and using the global asymptotic stability property, says that , proving the identity π = δ 0 . However, since π n ⇒ π, this contradicts the inequality in (4.23), completing the proof of part (2) of the proposition.

LDP Lower Bound
In this section, we prove the lower bound for the LDP, namely the inequality in (4.2). Fix δ > 0 and let γ * ∈ P(R d ) be δ−optimum for the infimum on the right side of (4.2), namely, (4.25) Denote by P dis the class of all probability distributions on R d that are supported on a finite set. Then, we can find aμ ∈ P dis such that (4.26) Indeed, consider an iid sequence of R d −valued random variables ξ 1 , ξ 2 , . . . distributed as γ * on some probability space (Ω,F,P ), then from the Glivenko-Cantelli theorem and the Strong Law of Large Numbers Now fix aω in the set of full measure on which the above two convergence results hold. It then follows with µ n = 1 This proves the statement in (4.26) and gives In order to prove the lower bound in (4.2) we will once more use the variational representation in (4.3). With this variational representation, the proof reduces to the construction of a suitable sequence of controls v ε and controlled empirical measuresμ ε such that the associated costs are asymptotitcally close as ε → 0 to In order for this asymptotic behavior, the controls should keep the empirical measureμ ε = 1 0 δȲ ε (s) ds asymptotically close to the discrete measureμ and keep the associated cost 1 2 1 0 ∥v ε (s)∥ 2 ds asymptotically close to 1 2 R d ∥σ T (y)∇ϕ(y)∥ 2μ (dy). Our strategy will be to construct controls that make the stochastic procesȲ ε visit sequentially the k points in the support ofμ and spend approximately p i units of time in the vicinity of x i by incurring a control cost of approximately p i ∥σ T (x i )∇ϕ(x i )∥ 2 . More precisely, the control we will construct will have the following features: • In a short time and with negligible cost the processȲ ε (t) travels to a neighbourhood of x 1 .
• For approximately p 1 amounts of time the state is controlled to stay in the viccinity of x 1 while paying a total cost that is approximately • The process is then moved to x 2 in a short time, while expending a negligible control cost. Then, it is controlled to stay in the vicinity of x 2 for p 2 units of time while paying a costrol cost of • This is continued until we finish with all the k positions.
State ofȲ ε Time Control Process v ε (s) Then the state controlled process is given by the equation (4.4) with the control process v ε defined in state feedback form as: (4.29) From the Lipschitz property of ∇ϕ, and Λ, and the boundedness and Lipschitz property of σ T we see that the above state feedback control is well defined and the corresponding SDE in (4.4) has a unique solution. For the rest of the section,Ȳ ε will denote the solution of this SDE with the above feedback control.
The following lemma gives a key moment bound.
Proof. We will show via induction that for i = 0, 1, . . . , k, Clearly the result is true for i = 0. Suppose now that the result is true for some i ∈ {0, 1, . . . , k − 1} and consider i + 1. Then, for t ∈ [P i , Thus with c as in (4.28) Using the induction hypothesis we now have where σ x (y) is as introduced in the proof of Lemma 4.5 andB i (s) Combining the above with (4.30) we now have that sup 0≤s≤Pi+1 sup ε∈(0,1) The proof is complete by induction.
The next lemma shows that the control process moves the state process to the vicinity of x j+1 in the time interval from P j to P j + ε.
The following lemma shows that the empirical measure ofȲ ε (t) over the interval [P i + ε, P i+1 ] is concentrated near x i+1 .
where the last line follows on recalling the definition of the processỸ ε i+1 from (4.31). The expectation on the last line can be written as and Y ε (s, x i+1 , x) is the process introduced in Lemma 4.5.
From Lemma 4.5, for any compact set K ⊂ R d , From the tightness of {Ỹ ε i+1 (0)}, which follows from Lemma 4.7, we now see that The result follows.
The following lemma shows that the asymptotic costs under the control v ε defined in (4.29) is as desired.
Proposition 4.9. We have lim sup Proof. Note that We first argue that the cost associated with travel from x i to x i+1 is negligible, i.e.
We now show the convergence of the empirical measure associated withȲ ε . Recallμ introduced above (4.26). Proof. Note that one can writeμ ε as

It then follows that
The result now follows on sending ε → 0 and using Lemma 4.8.

Proof of the LDP lower bound.
We now complete the proof of (4.2). We will apply Theorem 3.1 with R = A. By a similar argument as in Section 4.1, whereμ ε is as introduced in Proposition 4.10 and v ε is as constructed in (4.29). Thus, where the first equality is from Propositions 4.9 and 4.10, and the last inequality is from (4.27). Since δ > 0 is arbitrary, the inequality in (4.25) follows.

Compactness of Level Sets of I 1
In this section we show that the function I 1 as defined in (2.2) is a rate function.  We now argue that {γ n } is a tight sequence of probability measures on R d . Fix κ > 0. From Assumption 2.1 (parts 1 and 2(d)) there exists M 1 ∈ (0, ∞) such that for ∥y∥ > M 1 , ∥σ T (y)∇ϕ(y) Since κ > 0 is arbitrary, we have that {γ n } is tight. Thus, we can find some subsequence (labeled again as n) along which γ n →γ for someγ ∈ P(R d ). From Fatou's lemma

Proof of Theorem 2.7.
We now turn to the multiscale system introduced in Section 2.2 and prove our second main result, namely Theorem 2.7. As before, the proof proceeds by establishing the associated Laplace asymptotics.
Recall the definitions of the spaces X and M 1 from Section 2.2. Section 5.1 shows that for every which gives the LDP upper bound. Then, in Section 5.3.4 we prove the complementary lower bound: F (ξ, ν) + I 2 (ξ, ν) .

(5.2)
Finally Section 5.4 proves that I 2 is a rate function. Theorem 2.7 is an immediate consequence of these three results. Throughout this section, Assumption 2.5 will be taken to hold.

LDP Upper Bound
In this Section, we prove the inequality in (5.1). Under Assumption 2.5, the SDE system in (1.7) has a unique pathwise solution (X ε , Y ε ) given on a filtered probability space (Ω, F, {F t }, P ) satisfying the usual conditions and equipped with mutually independent k and d dimensional {F t }-Brownian motions {W (t)} and {B(t)}. This says that, there exists a measurable mapG ε : Fix ε > 0 and F ∈ C b (X × M 1 ) and apply Theorem 3.1 with p = k + d, β = (B, W ), and G replaced by G ε = F •G ε . Then, as in Section 4.1 we have that The classes A b = A p b and A = A p are as in Section 3. Note that any v ∈ A d+k can be written as Fix δ > 0 and choose for each ε > 0 a (v ε 1 ,v ε 2 ) ∈ A b that is δ−optimal for the right side, namely, A similar localization argument as invoked in Section 4.1 shows that, there is a M ∈ (0, ∞), and for each ε > 0, Also by application of Girsanov's Theorem, it follows that By denoting (ds × dy × dz) as dv, the first equation in (5.4) can now be written as We begin by establishing the following useful moment bound.
For the rest of this subsection, we will assume that ε ∈ (0, ε 0 ) where ε 0 is as in the statement of previous lemma. In the next result we establish the tightness of various objects of interest. Recall the definition of S p M , for p ∈ N, from (3.1).
The tightness of {B ε } in X is immediate from the moment bounds in Lemma 5.1, the linear growth of b, the boundedness of α and since v ε 2 ∈ A r b,M , while the tightness ofĀ ε in X follows from the boundedness of α. This proves the tightness of {X ε } in X and completes the proof of the lemma. By Lemma 5.2, it follows that every subsequence has a further subsequence along which (Γ ε ,X ε , v ε 2 ) converges in distribution to (Γ,X, v 2 ). We disintegrate the measureΓ as follows Γ(dt dy dz) = 1 T dtγ t (dy) q(t, y, dz) = γ(dt dy)q(t, y, dz) = 1 T dtΓ t (dy dz). (5.9) In the above identity, γ(dt dy) =Γ(dt × dy × R d ) is the marginal distribution ofΓ on the first two coordinates and q is the r.c.p.d. on the third coordinate given the first two coordinates. Also, sincē where λ is the Lebesgue measure and A ∈ B([0, T ]), the probability measure γ can be disintegrated as γ(dt dy) = 1 T dtγ t (dy), which give the first two identities in the display. For the third identity we disintegrate the probability measureΓ as the marginal on the first coordinate (which is the normalized Lebesgue measure 1 T dt) and the r.c.p.d. on the last two coordinates given the first coordinate, denoted asΓ t (dy dz). The following lemma gives a characterization of the limit points (Γ,X, v 2 ). Recall the class U(ξ, ν) defined in Section 2.2.
We assume without loss of generality that convergence of (Γ ε ,X ε , v ε 2 , R ε ) to (Γ,X, v 2 , 0) in P(M T ) × X × S k M × X holds along the full sequence, and, by appealing to Skorohod representation theorem, that the convergence holds a.s. We need to show that (Γ,X, v 2 ) satisfy a.e., for all t ∈ [0, T ] Note thatX Using the Lipschitz property of b and α from Assumption 2.5(1) From Lemma 5.1, and since v ε Finally, for each t ∈ [0, T ], as ε → 0, The result now follows on sending ε → 0 in (5.10).
The following lemma gives an important inequality for the costs that will be useful for the proof of the upper bound. Recall the disintegration in (5.9).
As in the proof of Lemma 5.3 we assume without loss of generality that convergence of (Γ ε ,X ε ) to (Γ,X) holds along the full sequence in a.s. sense. From the Lipschitz property of ∇ y U we now see that, as ε → 0, Finally, from the convergence ofΓ ε toΓ, the square integrability in (5.11), and the compact support property of η we see that for all t ∈ [0, T ] Combining the last three convergence statements we have, a.s., for all t ∈ [0, T ], Mt ∇η(y) · (∇ y U (X(s), y) − z)Γ(dv) = 0. (5.12) Recall the disintegration in (5.9) and definē Note that the integral is well defined a.s. for γ a.e. (t, y), since from (5.11) and Fatou's lemma (∥y∥ 2 + ∥z∥ 2 )Γ t (dy dz) < ∞, (5.14) and for every η ∈ C 2 c , Fix t ∈ [0, T ] for which the above two equations holds and denote f (y) . = U (X(t), y). Now a similar argument as used in the proof of (4.15) shows that we can replace η with f in the above identity. Indeed, from Assumption 2.5 the function f satisfies the conditions in Lemma 4.3. Also, from properties of U , an estimate similar to (4.16) shows that, for some κ 1 ∈ (0, ∞), where the second inequality on the second line is from Jensen's inequality while the finiteness asserted in the last inequality is from (5.14). Now exactly as in (4.17) and discussion below it we see that (5.15) holds with η replaced by f (·) = U (X(t), ·), namely Now the proof is completed as in Lemma 4.4: Since the third term equals 0 from (5.16), the lemma follows.

Proof of the LDP upper bound.
We now complete the proof of the inequality in (5.1). Recall the weak limit point (Γ,X, v 2 ) of (Γ ε ,X ε , v ε 2 ) as in Lemmas 5.3 and 5.4. By a standard subsequential argument we can assume without loss of generality that the convergence holds along the full sequence. From (5.3) (and the observation below it) where the third line uses lower semicontinutiy of the L 2 norm and Fatou's lemma, the fourth line uses Lemma 5.4, the fifth line uses the definition of I 2 , the definition ofΛ, and the property v 2 ∈ U(X,Λ) a.s. shown in Lemma 5.3. Since δ > 0 is arbitrary, the result follows.

Simple form near optimal paths.
In preparation for the proof of the LDP lower bound we first prove a preliminary result which provides simple form near optimal paths that can then be well approximated by suitable controlled processes.
We now proceed with the proof of the lemma. This proof will be completed at the end of Section 5.2.3 by constructing a series of approximations for a near optimal (ξ,ν).
Fix a bounded Lipschitz function F : X × M 1 → R and δ 0 ∈ (0, 1). Choose (ξ,ν) ∈ X × M 1 such that Next, using the definition of I 2 , chooseṽ ∈ U(ξ,ν) such that, withν(dy ds) =ν s (dy)ds Note thatξ,ν,ṽ satisfỹ From the lower bound on ∥∇ y U (x, y)∥ 2 in Assumption 2.5(2c) we see that We also remark thatν 0 can be taken to be an arbitrary probability measure on R d and we will assume without loss of generality that We now proceed to our series of approximations.

Approximating with continuous ν, v.
Fix δ ∈ (0, 1). The choice of δ will be identified at the end of this section. Using the uniform continuity ofξ, choose 0 < η < δ 2 such that ∥ξ(s) −ξ(s ′ )∥ ≤ δ whenever |s − s ′ | ≤ η. = 0 andν r . =ν 0 for r ≤ 0. Note that v * η and µ * η are continuous maps on [0, T ] with values in R k and P(R d ) respectively. Let ξ * η be given as the solution of Note that due to the Lipschitz property of b and α the above equation has a unique solution. Also note thatξ can be represented as where α(ξ(s))ṽ(s)ds.
Using an interchange of the order of integration.
From the Lipschitz property of b Combining this with the linear growth of b, for some C(b) ∈ (0, ∞) depending only on the function b A similar calculation shows that Thus, for all t ∈ [0, T ], Combining the above estimate with (5.22) and (5.23), and using the Lipschitz property of b, we have by Grönwall's lemma where κ 2 . = κ 1 e L b T +T 1/2 Lα∥ṽ∥2 . Now we estimate the cost. From (5.19) and (5.20) we see that Using the Lipschitz property of ∇ y U we see that for a C(U ) ∈ (0, ∞) depending only on U , and From this it follows that Thus with κ 3 .

Approximating using discrete measures
Finally, we will now approximate the piecewise constant trajectory of measures µ s from the last section by a similar trajectory where the measures are discrete. Fix δ ∈ (0, 1). An appropriate choice of δ will be identified at the end of the section. Let 0 = t 0 < t 1 < · · · t K+1 = T be the partition over which µ * s and v * (s) are piecewise constant. Letμ i = µ * ti , i = 0, . . . K. Note that, from (5.19), for each i, Recall the space P dis defined in Subsection 4.3. Then, as in (4.26), for each i, there is aμ i,d ∈ P dis such that, and and µ 0,d . = µ t0,d . Then, from (5.36), for t ∈ [0, T ], Then, from the Lipschitz property of b, Also, from (5.35), (5.37), and the lower bound in Assumption 2.5(2c) From the last two estimates and (5.26) .
Proof of Lemma 5.5. Lemma 5.5 now follows on combining the above display with (5.17) and setting = µ i,d , and ν * = µ d .

Construction of Controlled process
We will now use the trajectory ξ * , control v * and measures ν * i given by Lemma 5.5 to construct suitable controls for the pre-limit stochastic system in (5.4).
We will show that this controlled processX ε and the correspondingΛ ε defined by (5.5) converge suitably to the near optimal (ξ * , ν * ) in Lemma 5.5 and the associated costs converge appropriately as well. In preparation for this result we first prove a stabilization lemma analogous to Lemma 4.5 used in the proof of Theorem 2.2. We suppress some details in the proof that are similar to that in Lemma 4.5.
= sup x,y 1 2 ∥H y U (x, y)∥. Thus, sending m → ∞, for some c 5 ∈ (0, ∞), Also, from (5.46) and (5.48) we see that This, using an argument similar to that below (4.21), shows that for some c 6 ∈ (0, ∞), for all t ≤ ∆ ε , The first part in the lemma now follows on taking κ 1 = c 6 . In order to prove the second statement we argue via contradiction. Fix an a ∈ (0, 1) and an R < ∞. Suppose that the convergence in (5.45) fails to hold. Then there is a γ > 0, a sequence ε n → 0, t n ∈ [a, 1] and (x n , y n ) ∈ A R such that for every n ≥ 1 E d bl 1 t n ∆ εn tn∆ε n 0 δ Y εn (s,z,xn,yn) ds, δ 0 > γ. (5.50) Introduce random probability measuresQ n on R d as Q n (A) . = 1 t n ∆ εn tn∆ε n 0 1 A (Y εn (s, z, x n , y n ))ds, A ∈ B(R d ).
Using (5.49) and our assumption on {(x n , y n )} we see thatQ n is tight. Suppose that it converges in distribution along a subsequence toQ along which we also have that x n → x for some x ∈ R m . Then, using the Lipschitz property of ∇ y U , we have, for all η :

Some preliminary estimates.
In this subsection we collect some estimates that will be used to show the convergence ofX ε and the associated costs to appropriate limits. We begin with the following lemma which gives a key moment bound.
Next, by our choice of the control, note that for any t ∈ [s 0,Nε , t 1 ] Applying Itô's formula with the function f (y) = ∥y∥ 2 and using an argument similar to that below (4.21) we now see that, for some c 8 , c 9 ∈ (0, ∞) and all ε and t ∈ [s 0,Nε , t 1 ] Using this bound together with (5.57) and (5.42), and the linear growth property of b, we now see by a straightforward application of Grönwall's lemma that, for some c 10 ∈ (0, ∞) and all ε and t ∈ [s 0,Nε , t 1 ], The result follows.
The next lemma is analogous to Lemma 4.7 and uses the properties of the control ϱ * in (5.41).
The next lemma is analogous to Lemma 4.8 and uses the stabilization lemma from Section 5.3.1.
The first statement in the lemma now follows on applying Lemma 5.6 and Lemma 5.7 by sending ε → 0 and then R → ∞. The second statement is immediate from the first. Proof. Fix i = 0, 1, . . . K. Then, for all j = 0, . . . , N i ε − 1, The result now follows from Lemma 5.9 on sending ε → 0.
In order to prove the convergence ofX ε to ξ * , we now introduce an approximation toX ε . Fix L ε > 0 such that, as ε → 0, L ε → 0 and L ε /∆ ε → ∞. Define for t ∈ [0, T ], where we takeȲ ε (r) . = y 0 for r ≤ 0. Unique solvability of the above equation follows from the Lipschitz property of b and α. The following lemma shows thatX ε is close toX ε for small ε.
Proof. We begin by noting that Next note that Thus, using the Lipschitz property of b We now consider the remainder terms. Note that for some c 1 ∈ (0, ∞) depending only on the coefficient b Next, there is a c 2 ∈ (0, ∞) depending only on b, α, v * and T , such that for . =R ε 2 . Also, from linear growth of b,  The result now follows on taking squared expectations and applying Grönwall lemma together with Lemma 5.7.
The next lemma estimates the cost.
The result follows on using Lemmas 5.7 and 5.12 upon first sending ε → 0 and then ς → 0.

Proof of the LDP lower bound.
Now we complete the proof of the lower bound in (5.2). From [4, Corollary 1.10], without loss of generality we can assume that F is a bounded Lipschitz function. From the variational formula in Section 3 it follows that where G ε is as in Section 5.1 and A is as in Section 3 with p = d+k. Then, since (v 1 , v 2 ) = (u ε , v * ) ∈ A, where u ε is as constructed in (5.43) and v * is as in Lemma 5.5, we have from the above variational formula that, withX ε as in (5.42) andΛ ε defined via (5.5) withȲ ε as in (5.42), [F (ξ, ν) + I 2 (ξ, ν)] + δ 0 .
where the third line is from Lemmas 5.12 and 5.13, and the last line follows from Lemma 5.5(4). The bound in (5.2) now follows on sending δ 0 → 0.

Compactness of level sets of I 2 .
In this section we show that the function I 2 defined in (2.5) is a rate function. For this it suffices to show that for every M < ∞, the set B M . = {(ξ, ν) ∈ X × M 1 : I 2 (ξ, ν) ≤ M } is a compact subset of X × M 1 . Now fix a M ∈ (0, ∞) and consider a sequence {ξ n , ν n } ⊂ B M . It suffices to argue that the sequence is relatively compact and there is a limit point of this sequence that belongs to B M . From the definition of I 2 , there is a v n ∈ U(ξ n , ν n ) such that 1 2 This estimate, together with (5.70) shows that {ξ n } is relatively compact in X . Now let {(ξ n , ν n , v n )} n converge along some subsequence (labeled again as n) in X × M 1 × S 2(M +1) to (ξ, ν, v). Note that for every t ∈ From the convergence of (ξ n , v n ) → (ξ, v) in X × S 2(M +1) the last term converges to 0 as n → ∞. This shows that, as n → ∞, for each t ∈ [0, T ] t 0 α(ξ n (s))v n (s)ds → t 0 α(ξ(s))v(s)ds.