Nonautonomous Kolmogorov parabolic equations with unbounded coefficients

We study a class of elliptic operators $A$ with unbounded coefficients defined in $I\times\CR^d$ for some unbounded interval $I\subset\CR$. We prove that, for any $s\in I$, the Cauchy problem $u(s,\cdot)=f\in C_b(\CR^d)$ for the parabolic equation $D_tu=Au$ admits a unique bounded classical solution $u$. This allows to associate an evolution family $\{G(t,s)\}$ with $A$, in a natural way. We study the main properties of this evolution family and prove gradient estimates for the function $G(t,s)f$. Under suitable assumptions, we show that there exists an evolution system of measures for $\{G(t,s)\}$ and we study the first properties of the extension of $G(t,s)$ to the $L^p$-spaces with respect to such measures.


Introduction and summary
Parabolic partial differential equations with unbounded coefficients occur naturally in the study of stochastic processes. Let us consider the stochastic differential equation (1.1) dX t = µ(t, X t )dt + σ(t, X t )dW t , t > s, X s = x.
Here, W t is a standard d-dimensional Brownian motion and µ (resp. σ) are regular R d (resp. R d×d ) valued coefficients. If (1.1) has a solution X t = X(t, s, x) for all x ∈ R d , it follows from Itô's formula that, for ϕ ∈ C 2 b (R d ) and t ∈ R, the function u(s, x) := E (ϕ(X(t, s, x))) solves the partial differential equation (1.2) u s (s, x) = − 1 2 Tr((σ(s, x)σ * (s, x))D 2 x u(s, x)) − µ(s, x), ∇ x u(s, x) , s < t, This shows how probability theory may be used to obtain information about the solutions of second order evolution PDE's. In the case of Lipschitz continuous coefficients, there are many results stating conditions on µ and σ such that (1.1) is well posed. See, e.g., [12,13,14]. It is also possible to take (1.2) as a starting point and work in a purely analytic manner. This has been done in several papers in the autonomous case (see e.g., the book [3] and its bibliography). To the best of our knowledge, in the literature there is not any systematic treatment of the nonautonomous case except in the particular case when the elliptic operator in (1.2) is the non autonomous Ornstein-Uhlenbeck operator (see [6,10,11] In this paper we set the basis for the general theory of non autonomous operators. More precisely, we consider the equation The operators A(t) appearing in (1.2) are defined on smooth functions ϕ by The time index t varies over an interval I which is either R or a right halfline. Note that the equation in (1.3) is forward in time in contrast to equation (1.2). However, reverting time, solutions of (1.3) are transformed into solutions of (1.2) and viceversa. Our standing hypotheses on the data b = (b i ) and Q = (q ij ) are the following: Hypothesis 1.1.
Conditions (i) and (ii) are standard regularity and ellipticity assumptions in parabolic PDE's. It is well known that assuming only (i) and (ii), problem (1.3) may admit several bounded solutions also in the autonomous case. Condition (iii) is mainly used to ensure uniqueness of the bounded classical solution u of (1.3) (i.e., uniqueness of a function u ∈ C 1,2 ((s, +∞) × R d ) ∩ C b ([s, T ] × R d ) for any T > s, that satisfies (1.3)).
In Section 2 we will be concerned with wellposedness of (1.3) in the space C b (R d ).
In the autonomous case the solutions to (1.3) are governed by a semigroup {T (t)} which is the transition semigroup of the Markov process obtained in (1.1). In the non autonomous setting the semigroup is replaced by an evolution family {G(t, s)}. We will establish several properties of this family in Section 3. Note that, while regularity of (G(t, s)ϕ)(x) with respect to (t, x) is a classical item in the theory of PDE's, regularity with respect to s is less standard. It is treated in the literature in the case of bounded coefficients because of its importance in several applications such as control theory. In our case, to get continuity with respect to s we have to sharpen Hypothesis 1.1(iii), assuming that A(t)ϕ is upperly bounded in J × R d for any bounded interval J ⊂ I.
In Section 4 we will study smoothing properties of G(t, s), proving several estimates on the spatial derivatives of G(t, s)ϕ for ϕ ∈ C b (R d ). We will consider the following additional hypothesis: (i) The data q ij and b i (i, j = 1, . . . , d) and their first-order spatial derivatives belong to C α 2 ,α loc (I × R d ); (ii) there exists a continuous function k : I → R such that ∇ x b(t, x)ξ, ξ ≤ k(t)|ξ| 2 , ξ ∈ R d , (t, x) ∈ I × R d ; (iii) there exists a continuous function ρ : I → [0, +∞) such that, for every i, j, k ∈ {1, . . . , d}, we have Under this hypothesis we will prove uniform spatial gradient estimates for the function G(t, s)f when f ∈ C k b (R d ), k = 0, 1, by means of the classical Bernstein method (see [2]). We will also prove more refined pointwise gradient estimates under either one of the following more restrictive conditions: (i) there exist a function r : I × R d → R and a constant p 0 ∈ (1, +∞) such that (ii) Hypothesis 1.2(ii) holds true with the function k being replaced by a real constant k 0 . Moreover, there exists a positive constant ρ 0 such that, for every i, j, k = 1, . . . , d, we have Then, we get pointwise estimates, for every p ≥ p 0 and some real constant σ p . In the autonomous case (see [3]) these estimates are interesting not only for themselves, but also for the study of the behavior of the semigroup {T (t)} in L p -spaces with respect to invariant measures. An invariant measure corresponds to a stationary distribution of the Markov process with transition semigroup {T (t)}. In the analytical setting, an invariant measure for a Markov semigroup {T (t)} is a Borel probability measure such that The interest in invariant measures is due to the following: (i) the invariant measure arises naturally in the asymptotic behaviour of the semigroup. If µ is the (necessarily unique) invariant measure of {T (t)}, then and any x ∈ R d ; (ii) the realizations of elliptic and parabolic operators in L p -spaces with respect to invariant measures are dissipative. In our nonautonomous case we cannot hope to find a single invariant measure. Instead, we look for systems of invariant measures (see e.g. [5,7]) that is families of Borel probability measures {µ t : t ∈ I} such that In Section 5, we will prove the existence of a system of invariant measures for our problem (1.3) replacing Hypothesis 1.1(iii) with the following stronger condition: Hypothesis 1.4. there exist a nonnegative function ϕ ∈ C 2 (R d ), diverging to +∞ as |x| → +∞, and constants a, c > 0 and t 0 ∈ I such that In contrast to the autonomous case, systems of invariant measures are, in general, not unique. However, using a pointwise gradient estimate we will prove that uniqueness holds in the class of invariant measures {µ t : t ∈ I} that admit finite moments of some order p > 0, which may blow up as t → +∞ with a certain exponential rate. By definition, {µ t : t ∈ I} admits finite moments of order p if, for any t ∈ I, Still using a uniform gradient estimate, we show that, also in the non autonomous case, the asymptotic behaviour is determined by "the" system of invariant measures, in the sense that, for any and the convergence is uniform in each compact set in R d .
Concerning point (ii), we note that, since we have to deal with a family of probability measures µ t , we will also have a family of Lebesgue spaces L p (µ t ) that are not mutually equivalent in general. This prevents us from extending the operators G(t, s) to a single L p -space, because G(t, s) does not map L p (µ s ) into itself in general, but it maps L p (µ s ) into L p (µ t ). However, it is possible to define an evolution semigroup associated with G(t, s) on a single L p -space of functions defined in I ×R d . This was already done in [6,10,11] in the special case of Ornstein-Uhlenbeck operators.
The evolution semigroup associated with an evolution family {G(t, s)} is known to be a useful tool to determine several qualitative properties of the evolution family. See e.g. the book [4] and the references therein. In the case of time depending Ornstein-Uhlenbeck operators, the use of the evolution semigroup was essential to establish optimal regularity results for evolution equations and also to get precise asymptotic behavior estimates for G(t, s), see [10,11]. However, the general theory of evolution semigroups is well established only for evolution families acting on a fixed Banach space X, which is not our case. Therefore, the study of the asymptotic behavior of G(t, s)ϕ for ϕ ∈ L p (µ s ) through the evolution semigroup is deferred to a future paper. Here, we just describe the first properties of the evolution semigroup, in Section 6.
In the last section we consider a simple example and see how our conditions may be verified in this setting.
Notations. We denote, respectively, by B b (R d ) and C b (R d ) the set of all bounded and Borel measurable functions f : R d → R and its subset of all continuous functions. We endow both spaces with the sup-norm · ∞ .
For any k ∈ R + (possibly k = +∞) we denote by C k b (R d ) the set of all functions f : R d → R that are continuously differentiable in R d , up to [k]-th-order, with bounded derivatives and such that the [k]-th-order derivatives are (k − [k])-Hölder denotes the subset of C k b (R d ) of all compactly supported functions. C 0 (R d ) denotes the set of all continuous functions vanishing at infinity.
Suppose that f depends on both time and spatial variables. If there is damage of confusion, we denote by ∇ x f and D 2 x f the gradient and the Hessian matrix of the function f (t, ·). When f is a vector valued function, ∇ x f denotes the Jacobian matrix of f (t, ·).
Let D ⊂ R d+1 be a domain or the closure of a domain. By C k+α/2,2k+α loc (D) (k = 0, 1, α ∈ (0, 1)) we denote the set of all functions f : D → R such that the time derivatives up to the k-th-order and the spatial derivatives up to the 2k-th-order are Hölder continuous with exponent α, with respect to the parabolic distance, in any compact set D 0 ⊂ D.
For any r > 0 we denote by B r ⊂ R d the open ball centered at 0 with radius r. Given a measurable set E, we denote by 1l E the characteristic function of E, i.e., 1l Finally, we use the notation u f for the (unique) bounded classical solution to problem (1.3).

Solutions in
In this section we want to solve our parabolic problem (1.3) with data s ∈ I and f ∈ C b (R d ). By a solution of (1.3) we mean a bounded classical solution, i.e. a function u ∈ C b ([s, +∞) × R d ) ∩ C 1,2 ((s, +∞) × R d ) such that (1.3) is satisfied. In the whole section we assume that Hypotheses 1.1 is fulfilled.
We already mentioned that Hypothesis 1.1(iii) ensures uniqueness of the solution to (1.3). In fact, it implies a maximum principle that we state as our first Theorem 2.1. Let s ∈ I and T > s.
Proof. Uniqueness follows from applying Theorem 2.1 to u − v and to v − u, if u and v are two solutions. Estimate (2.1) follows by applying the same theorem to ±u − f ∞ . The existence part can be obtained in a classical way, solving Cauchy-Dirichlet problems in the balls B n and then letting n → +∞. See e.g., [8, ]. Since there are some technicalities, for the reader's convenience we go into details. We split the proof in three steps.
Step 1. Here, we assume that f belongs to C 2+α c (R d ). Denote by n 0 the smallest integer such that supp(f ) is contained in the ball B n0 . Further, for any n ≥ n 0 , we consider the Cauchy-Dirichlet problem in the ball B n . By classical results (see e.g., [9] or [15]) and Hypotheses 1.1(i)-(ii), for any n ≥ n 0 , the problem (2.2) admits a unique solution u n ∈ C 1+ α 2 ,2+α loc ([s, +∞) × B n ). Moreover, the classical Schauder estimates imply that, for any m ∈ N, with m > n 0 , there exists a constant C = C(m) independent of n, such that for any n > m, where D m = (s, m) × B m . By the Arzelà-Ascoli theorem, there exists a subsequence (u m n ) of u n which converges in C 1,2 (D m ) to some function u m ∈ C 1+ α 2 ,2+α (D m ). Of course, u m satisfies the differential equation D t u m = A(·)u m in D m and it equals f on {s} × B m . Without loss of generality, we can assume that u m+1 n is a subsequence of u m n . Note that, in this case, u m+1 | Dm ≡ u m . Hence, we can define a function u by putting u| Dm := u m . A standard procedure shows that u belongs to C 1+α/2,2+α loc Note that the sequence u n itself converges to u as n tends to +∞, locally uniformly in [s, +∞) × R d . Indeed, the above arguments show that any convergent subsequence of (u n ) should converge to a classical solution of (1.3).
Step 2. Assume now that f ∈ C 0 (R d ). Then, there exists a sequence (f n ) ⊂ C 2+α c (R d ) converging to f uniformly in R d as n tends to +∞. Estimate (2.1) yields Therefore, there exists a bounded and continuous function u such that u fn converges to u, uniformly in [s, +∞)×R d . Moreover, applying the interior Schauder estimates to the sequence (u fn ), we deduce that u fn converges in C 1,2 loc ((s, +∞) × R d ) to u. Hence, u is the bounded classical solution of problem (1.3).
Step 3. Now, fix f ∈ C b (R d ) and consider a bounded sequence (f n ) ∈ C 2+α c (R d ) converging to f locally uniformly in R d as n tends to +∞. The same arguments as in Step 2 show that, up to a subsequence, u fn converges, in C 1,2 loc ((s, +∞) ×R d ), to some function u ∈ C 1+α/2,2+α loc ((s, +∞) × R d ), as n tends to +∞. In particular, u solves the differential equation in (1.3). To prove that u is, actually, a classical solution of the problem (1.3), we fix a compact set K ⊂ R d and a smooth and compactly supported function ϕ such that 0 ≤ ϕ ≤ 1 and ϕ ≡ 1 in K. Further, we split u fn = u ϕfn + u (1−ϕ)fn , for any n ∈ R. Since the function ϕf is compactly supported in R d , it follows from Step 2 that u ϕfn converges to u ϕf uniformly in [s, +∞) × R d .
Let us now consider the sequence (u (1−ϕ)fn ). Fix m ∈ N. We claim that where M = sup n∈N f n ∞ . Indeed, as a straightforward computation shows, Therefore, the function w := u (1−ϕ)fm −M (1−u ϕ ) satisfies w t = Aw and, moreover, The maximum principle of Theorem 2.1 immediately implies that w is nonpositive in To prove the other inequality in (2.4), it suffices to observe that (−u (1−ϕ)fm ) = (u −(1−ϕ)fm ) and repeat the above arguments with f m replaced by −f m . Now, since u fn converges pointwise to u, for any (t, x) ∈ (s, +∞) × R d we have and, for each n ∈ N, we have The right-hand side of (2.5) converges to 0 uniformly in K as t tends to s + . Hence, u can be continuously extended up to t = s setting u(s, ·) = f . This completes the proof.

Remark 2.3. Let us observe that the choice of approximating problem (1.3) by
Cauchy-Dirichlet problems in the ball B n is not essential. Indeed, repeating step by step the proof of Theorem 2.2, we can see that problem (1.3) can be approximated also by the Cauchy-Neumann problems We will use this approach in Section 4 to prove estimates for the space derivatives of G(t, s)f . Now we define the evolution family associated with our problem (1.3). Let We put G(t, t) := id C b (R d ) and for t > s we define the operator G(t, s) by setting where u f is the unique solution of problem (1.3). We call the family {G(t, s) : (t, s) ∈ Λ} the evolution family associated with the problem (1.3). It is immediate from Theorem 2.1 that, for (t, s) ∈ Λ, the operator G(t, s) is a positive contraction on C b (R d ). From the uniqueness assertion in Theorem 2.2, the law of evolution for r ≤ s ≤ t easily follows. The connection with Markov processes suggests that every operator G(t, s) should be associated with a transition kernel. Recall that a transition kernel p is a mapping from R d × B(R d ) such that p(x, ·) is a sub probability measure for fixed x and such that p(·, A) is measurable for fixed A ∈ B(R d ). The following proposition states that this is indeed the case; in fact, the transition kernels p t,s form the non autonomous equivalent of a conservative, stochastically continuous transition function, cf. [7, Sections 2.1 and 2.8].
Proposition 2.4. For every (t, s) ∈ Λ and every x ∈ R d there exists a unique probability measure p t,s (x, ·) such that . Furthermore, the following properties hold: (i) for every t ∈ I, p t,t (x, ·) is the Dirac measure concentrated at x; (ii) for t > s the measure p t,s (x, ·) is equivalent to the Lebesgue measure, i.e. they have the same sets of zero measure; Proof. Let us define for any (Lebesgue) measurable set A ⊂ R d and any t > s, where g is the Green function of problem (1.3) which can be obtained as the pointwise limit of the increasing (with respect to n) sequence of Green functions g n associated with problem (2.2). For the existence of these latter kernels, see e.g., [9,Theorem 3.16]. The function g is measurable in its entries and it is positive since the g n 's are.
Notice that, since G(t, s)1l ≡ 1l by uniqueness, we have is a probability measure. Now formula (2.8) and properties (i) and (ii) immediately follow.
To prove (iii), let A ∈ B(R d ). Then, there exists a bounded sequence (f n ) ⊂ C b (R d ) converging almost everywhere to 1l A . Hence, by the dominated convergence theorem, for any (s, t) ∈ Λ and any x ∈ R d , and this implies that the function (t, s, Finally, (v) is an immediate consequence of (2.7).
Lebesgue measure, and for any (s, t) ∈ Λ and any x ∈ R d .
Proof. The statement follows from the inequality g(t, s, x, y) > 0 ((s, t) ∈ Λ, x ∈ R d ) which holds since the Green function g is the pointwise limit of the increasing sequence of Green functions g n associated with problem (2.2).
Remark 2.6. The representation (2.8) implies that we can extend our evolution family G(t, s) to the space B b (R d ) of bounded, measurable functions. More generally, in the sequel we set also for unbounded functions f : . Formula (2.8) also implies that the adjoints G * (t, s) leave the space of signed measures invariant.

Continuity properties of the evolution family {G(t, s)}
In this section we prove some useful continuity properties of the function G(t, s)f To begin with, let us prove the following proposition.
Proof. (i). By the representation formula (2.8) and the dominated convergence theorem, we obtain that G(·, s)f n converges to G(·, s)f pointwise in [s, +∞) × R d , as n tends to +∞. However, the classical interior Schauder estimates yield for any ε > 0, any compact set K ⊂ R d and some positive constant (ii). The proof follows adapting the arguments used in the proof of Theorem 2.2. Let K be a compact set and let ϕ ∈ C 2+α c (R d ) be such that ϕ ≡ 1 in K. Split u fn = u ϕfn + u (1−ϕ)fn . By Step 2 in the proof of Theorem 2.2, u ϕfn converges to u ϕf uniformly in [s, +∞) × R d .
To complete the proof, it suffices to show that u (1−ϕ)fn converges to u (1−ϕ)f uniformly in [s, T ] × K as n tends to +∞ for any T > s. The arguments in Step 2 of the proof of Theorem 2.2 show that, up to a subsequence, u (1−ϕ)fn converges in C 1,2 ([s + ε, T ] × B r ), for any ε > 0 and any r > 0, to a function v ∈ C 1+α/2,2+α loc ((s, +∞) × R d ). Moreover, letting m go to +∞ in (2.4) gives Since u ϕ is continuous at {s} × R d and ϕ ≡ 1 in K, from (3.1) we deduce that v(t, x) converges to 0 as t → s + , uniformly with respect to x ∈ K.
Let us now fix ε > 0 and let δ be sufficiently small such that |u (1−ϕ)fn | + |v| ≤ ε/2 in [s, s + δ] × K for any n ∈ N. Moreover, we fix n large enough such that |u (1−ϕ)fn − v| ≤ ε/2 in [s + δ, T ] × K. For such n and δ we get Summing up, the sequence u fn = u ϕfn + u (1−ϕ)fn converges as n tends to +∞, to the function u = u ϕf + v which belongs to C 1+α/2,2+α loc ((s, +∞) × R d ), and the convergence is uniform in [s, T ] × K. Since K is arbitrary, u fn converges locally uniformly to u in [s, +∞) × R d , so that u is continuous up to t = s where it equals f . Moreover, since u fn converges to u in C 1,2 ([s + ε, T ] × B R ) for any ε ∈ (0, T − s) and any R > 0, then D t u − Au = 0 for t > s. Thus, u is a bounded classical solution of (1.3) and, by Theorem 2.1, u = u f . This completes the proof.
3.1. Continuity of the function G(t, s)f with respect to the variable s. Since evolution families depend on two parameters t and s, it is natural to investigate also the smoothness of the function G(t, ·)f . In the following lemma we prove a very useful generalization of the well known formula that holds in the case of bounded coefficients. This lemma will play a fundamental role to prove existence of evolution systems of invariant measures in Section 5.
be constant outside a compact set K. Then, for any x ∈ R d , and any s 0 < s 1 ≤ t, the function r → (G(t, r)A(r)f )(x) is integrable in (s 0 , s 1 ) and we have In particular, the function ( Finally for any g ∈ C 0 (R d ) the function G(t, ·)g is continuous in Proof. By assumption, we can write f = g + c · 1l, for some g ∈ C 2 c (R d ) and some c ∈ R. However, G(t, s)1l ≡ 1l, whence the assertion is trivially satisfied by any constant function. Thus, it remains to prove it when f ∈ C 2 c (R d ). Choose n 0 such that supp(f ) ⊂ B n0 , and denote by {G n (t, s)} the evolution family associated with problem (2.2) for n ≥ n 0 (cf. [1, Theorem 6.3]). By [1, Theorem 2.3(ix)], we can write (3.3) with G being replaced by G n . Integrating such an equality with respect to s and recalling that, by Step 1 in the proof of Theorem 2.2, for any (t, r) ∈ Λ, G n (t, r)f converges to G(t, r)f pointwise in R d as n tends to +∞, we obtain where the last equality follows by dominated convergence. Now, observe that (3.4) implies that the function G(t, ·)f is continuous in To prove that the function G(t, ·)f is differentiable, it is enough to show that the function G(t, ·)A(·)f is continuous in I t . Indeed, for any r, r 0 ∈ I t , and the last side of the previous chain of inequalities goes to 0 as r → r 0 , since A(r 0 )f ∈ C c (R d ). Now, (3.4) implies that the function G(t, ·)f is differentiable in I t , and (3.3) follows. This completes the proof.
To prove that (t, s, x) → (G(t, s)f )(x) is continuous in Λ × R d for any function f ∈ C b (R d ), we need an intermediate assumption between Hypothesis 1.1(iii) and Hypothesis 1.4. More precisely, in the rest of this section we assume that the following hypothesis is satisfied. Hypothesis 3.3. For every bounded interval J ⊂ I there exist a function ϕ = ϕ J ∈ C 2 (R d ) diverging to +∞ as |x| tends to +∞, and a positive constant M J such that Hypothesis 3.3 allows to define G(t, s) on a larger class than B b (R d ). Namely, we show that the right hand side of (2.8) makes sense for f = ϕ, where ϕ is any of the functions in Hypothesis 3.3.
Let us begin with the following fundamental lemma. If J ⊂ I is any interval, we set Proof. We may assume (possibly adding a constant) that ϕ(x) ≥ 0 for each x ∈ R d . For every n ∈ N choose a function ψ n ∈ C ∞ ([0, +∞)) such that (i) ψ n (t) = t for t ∈ [0, n], (ii) ψ n (t) ≡ const. for t ≥ n + 1, (iii) 0 ≤ ψ ′ n ≤ 1 and ψ ′′ n ≤ 0. Then, the function ϕ n := ψ n • ϕ belongs to C 2 b (R d ) and it is constant outside a compact set. By Lemma 3.2, we have for any s, t ∈ Λ and any x ∈ R d . We claim that for each s, t ∈ J, letting n → +∞ in (3.5) we obtain so that, in particular, the above integral is finite. It is clear that lim n→+∞ ϕ n (x) = ϕ(x) for each x ∈ R d . Concerning the integral in the right-hand side of (3.5), we split it into the sum Since ψ ′ n (ϕ)(y) is increasing in n and converges to 1 for each y, both integrals in the right-hand side of (3.6) converge by the monotone convergence theorem. The claim follows.
Letting n → +∞ in (3.5) yields and since (A(r)ϕ)(y) ≤ M J for each y ∈ R d and r ∈ J, Estimates (3.7) and (3.8) imply that for any s, t ∈ J, with s ≤ t and any x ∈ R d . It follows that This completes the proof.
Having (G(t, s)ϕ)(x) bounded for (t, s) ∈ Λ J , we may prove in the standard way that for each r > 0 the family of measures {p t,s (x, dy) : (t, s, x) ∈ Λ J × B r } is tight. We recall that a family of (probability) measures {µ α : α ∈ F } is tight, if for any ε > 0 there exists ̺ > 0 such that µ α (R d \ B(̺)) ≤ ε for any α ∈ F . Proof. Fix ε > 0 and consider the function ϕ = ϕ J in Hypothesis 3.3. As in the proof of Lemma 3.4, we assume that ϕ is nonnegative. Since ϕ blows up as |x| → +∞, there exists ̺ > 0 such that where M J,r is given by (3.9). Then, for (t, s) ∈ Λ J , we have and the statement follows.
As usual, tightness yields some convergence result.
Proposition 3.6. Assume that Hypotheses 1.1(i)-(ii) and 3.3 are satisfied. Further, let {f n } be a bounded sequence in C b (R d ), such that f n ∞ ≤ M for each n ∈ N and f n converges to f ∈ C b (R d ) locally uniformly in R d . Then, the function G(·, ·)f n converges to G(·, ·)f locally uniformly in Λ × R d .
Proof. Fix any bounded interval J ⊂ I and any ε, r > 0. Let ̺ be such that (3.10) holds, and for (t, s, x) ∈ Λ J × B ρ split G(t, s)f n − G(t, s)f as For n ≥ n 0 we get Thus, G(·, ·)f n converges to G(·, ·)f uniformly in Λ J × B r . Now we are ready to prove that (t, s, x) → (G(t, s)f )(x) is continuous, for each f ∈ C b (R d ).
Theorem 3.7. Under the assumptions of Proposition 3.6, the function (t, s, Proof. Fix f ∈ C b (R d ) and let {f n } ∈ C ∞ c (R d ) be a sequence of smooth functions converging to f locally uniformly in R d and such that sup n∈N f n ∞ < +∞.
By Proposition 3.6, the sequence of functions (t, s, x) → (G(t, s)f n )(x) converges to (t, s, x) → (G(t, s)f )(x) locally uniformly. Therefore, it suffices to show that For this purpose, we observe that the classical interior Schauder estimates as in [9, Theorem 3.5] imply a slightly more general estimate than (2.3), i.e., (3.11) sup for any a, b ∈ I, a < b, and some positive constant C, independent of n > m.
Combining (3.12) and (3.13) yields Let now assume that s < s 0 and split Since (t, x) → (G(·, s)g)(x) is continuous in [s, +∞) × R d , locally uniformly with respect to s, from (3.13) and (3.14) we also deduce that This completes the proof.

Gradient estimates
In this section we prove both uniform and pointwise gradient estimates. Besides being interesting in their own right, we will need them in the next section to prove uniqueness of systems of invariant measures in a suitable class and convergence results.
Throughout the section we assume that Hypotheses 1.1 and 1.2 hold. Therefore, the bounded classical solution of problem (1.2) is such that its first-order spatial derivatives belong to C 1+ α 2 ,2+α loc ((s, +∞)×R d ) (see e.g., [9, Theorem 3.10] and [16]). We will use this fact in the sequel to apply a variant of the Bernstein method to get our gradient estimates.
First, we prove uniform gradient estimates.
Theorem 4.1. Let s ∈ I and T > s. Then, there exist positive constants C 1 , C 2 , depending on s and T , such that: Proof. It suffices to prove the statement for f ∈ C 2+α c (R d ), since we may approximate an arbitrary f by a sequence (f n ) ⊂ C 2+α (i). Let u n be the unique solution of the Cauchy-Neumann problem (2.6), where n is so large that the support of f is contained in B n . By Remark 2.3, u n converges to u(t, x) := (G(t, s)f )(x) in C 1,2 ([s, T ] × K) as n → +∞, for any compact set Then, z n belongs to , for any s < T . Since B n is convex, the matrix Dν = (D j ν i ) is positive definite. Moreover, differentiating the equality ∂u ∂ν = 0, one easily verifies that which, in its turn, implies that the normal derivative of z n on ∂B n is nonpositive. We claim that we may choose a > 0 in such a way that D t z n − A(t)zn ≤ 0 for s < t < T . Then, the classical maximum principle yields |z n | ≤ f 2 Letting n → +∞, statement (i) follows with C 1 = a − 1 2 . From now on we omit the subscript n as well as the dependence on t and x to simplify notation. To prove the claim, observe that Using Hypothesis 1.2(iii), we estimate the last term as follows, The other terms are easily estimated using Hypotheses 1.1(ii) and 1.2(ii). Eventually, we get . We proceed similarly to (i), defining As above, in what follows we omit the subscript n as well as the dependence on t and x.
If we proceed as in part (i), we see that z satisfies an equality similar to (4.1) with a replaced by a(t − s) and a further addendum a|∇ x u| 2 . Hence, By the maximum principle we obtain z n ≤ f 2 ∞ and statement (ii) follows, with C 2 = a − 1 2 , letting n → +∞.
Remark 4.2. In the proof of Theorem 4.1 we have chosen to approximate G(t, s)f by solutions of Cauchy-Neumann problems instead of Cauchy-Dirichlet problems as in the first part of the paper. Approximation by Cauchy-Dirichlet problems is in fact possible, but it requires stronger conditions on the coefficients (see e.g., [3, Section 6.1] for the autonomous case), that we want to avoid here.
As a consequence of Theorem 4.1, our evolution family enjoys the strong Feller property.
Proof. Let f ∈ B b (R d ). Then, there exists a bounded sequence (f n ) ⊂ C b (R d ) which converges pointwise to f almost everywhere in R d . As a consequence of Theorem 4.1, for any fixed s < t, the function t → G(t, s)f n is Lipschitz continuous with Lipschitz constant independent of n. The statement follows, observing that, by the dominated convergence theorem and (2.8), G(t, s)f n converges to G(t, s)f pointwise. Proof. We have to show only continuity at t = s. For any n ∈ N, From Theorems 2.2 and 4.1, it follows that the functions u and ∇ x u are bounded and continuous in (s, T ]× R d , for any T > s. Since ϕ is compactly supported in B n , ψ ∈ C((s, s + 1], C 0 (B n )). Moreover, Theorem 4.1 yields that for some C > 0.
By classical gradient estimates ([15, Chapter IV, Theorem 17]), we get , for any s < σ < t ≤ s + 1 and some positive constants C 1 and C 2 . Hence, we can differentiate (4.2) obtaining Therefore, for any x, x 0 ∈ B n−1 we have and this implies that ∇G(·, s)f is continuous at the point (s, x 0 ) since the function ∇ x G n (·, s)(ϕf ) is continuous in {s} × B n by classical results. Since n is arbitrary, the statement follows.
Next, we prove a pointwise gradient estimate.
Theorem 4.5. Assume that Hypotheses 1.1, 1.2(i)(iii) and 1.3(i) are satisfied. Then for every p ≥ p 0 and any f ∈ where Corollary 4.6. Under the hypotheses of Theorem 4.5, there exists a constant C such that for every p ≥ p 0 if Hypothesis 1.3(i) is satisfied, and for every p > 1 if Hypothesis 1.3(ii) is satisfied.

Evolution systems of measures
Definition 5.1. Let {U (t, s)} be an evolution family of bounded operators on B b (R d ). A family (ν t ) of probability measures on R d is an evolution system of measures for {U (t, s)} if, for every f ∈ B b (R d ) and every s < t, we have Formula (5.1) may be rewritten as U * (t, s)ν t = ν s . It implies that, if we know a single measure ν t0 of an evolution system of measures for {U (t, s)}, then we know all the measures ν t for t ≤ t 0 . In particular, an evolution system of measures is uniquely determined by its tail (ν t ) t≥t0 .
In this section we give sufficient conditions for the existence of an evolution system (µ t ) of measures associated with the evolution family {G(t, s)} and we study the main properties of (µ t ). As a first step, we note that, for our evolution family {G(t, s)}, evolution systems of measures necessarily consist of measures which are equivalent to the Lebesgue measure.
Proposition 5.2. If (µ t ) is an evolution system of measures for {G(t, s)} then (µ t ) is equivalent to the Lebesgue measure.
Proof. For each A ∈ B(R d ) and t ∈ I we have By Corollaries 2.5 and 4.3, if the Lebesgue measure |A| of A is positive then (G(t + 1, t)1l A )(x) is positive for each x ∈ R d ; therefore µ t (A) > 0. On the other hand, by Proposition 2.4(ii), if |A| = 0 then G(t + 1, t)1l A = 0, hence µ t (A) = 0.
To prove existence of evolution systems of measures we use a procedure similar to the Krylov-Bogoliubov Theorem which states that, in the autonomous case, existence of an invariant measure is equivalent to the tightness of a certain set of probability measures. In our case, the corresponding tightness property is proved under Hypothesis 1.4, through the Prokhorov Theorem. It states that a set {P α : α ∈ F } of probability measures is tight if and only if, for any sequence (α n ) in F , there exists a subsequence α n k such that P αn k converges to some probability measure P in the following sense: Lemma 5.3. Assume that Hypotheses 1.1 and 1.4 are satisfied. Then, G(t, s)ϕ is well defined for any t 0 ≤ s ≤ t ∈ I. Moreover, for any fixed x ∈ R d , the function Proof. Lemma 3.4 implies that G(t, s)ϕ is well defined for (t, s) ∈ Λ with t 0 ≤ s and the function (t, s, x) → (G(t, s)ϕ)(x) is locally bounded. To complete the proof, we fix t > t 0 and x ∈ R d , and consider the function g defined in [t 0 , t] by g(s) := (G(t, s)ϕ)(x). g is measurable, because (G(t, s)ϕ)(x) is the pointwise limit of the functions (G(t, s)ϕ n )(x) in the proof of Lemma 3.4, that are continuous with respect to s. The procedure of Lemma 3.4 yields We claim that (5.2) implies Proof. Fix s ∈ I and x 0 ∈ R d . For any t > s, define the measure µ t,s by Lemma 5.3 implies that the family (µ t,s ) t>s≥t0 is tight, through the same proof of Lemma 3.5. The Prokhorov Theorem and a diagonal argument yield existence of a sequence t k diverging to +∞ and of probability measures µ n (n ∈ N, n > t 0 ) such that µ t k ,n ⇀ * µ n . To define µ s also for noninteger s, we show preliminarly that G * (n, m)µ n = µ m for m < n. Indeed, for each A ∈ B(R d ) we have Thus, we can extend the definition of the measures µ s to any s ∈ I, by setting µ s := G * (n, s)µ n where n is any positive integer greater than s. Since G * (n, s) = G * (m, s)G * (n, m), this definition is independent of n. It is immediate to check that (µ t ) is an evolution system of measures for {G(t, s)}.
To complete the proof, we observe that, since ( is bounded for t > s ≥ t 0 by the same constant. Letting t → +∞, we get (5.4). In the following, if (µ t ) is a family of probability measures on R d , we denote by the p-th moment function. We note that, if ϕ(x) = |x| p satisfies Hypothesis 1.4, then Theorem 5.4 implies that {G(t, s)} admits an evolution system of measures (µ t ) such that µ t (p) = O(1) as t → +∞, i.e. there exists t 0 ∈ I such that the p-th moments of µ t exist and are uniformly bounded for any t ≥ t 0 .
Let us see the connection between evolution systems of measures and asymptotic behaviour of solutions to problem (1.2). We assume that there exists a negative constant ω such that, for large t − s, we have A sufficient condition for this may be obtained from Corollary 4.6.
Theorem 5.6. Assume that there exists ω < 0 such that for all t ≥ s+1, all f ∈ C b (R d ) and some positive constant C. Further, assume that {G(t, s)} admits an evolution system of measures (µ t ) such that, for some p > 0, lim t→+∞ µ t (p)e ωpt = 0. Then, for all s ∈ I and f ∈ C b (R d ). If I = R, then, we also have In both cases the convergence is uniform in the compact sets of R d .
Proof. Without loss of generality, we may assume that p < 1. We have and using the mean value theorem and (5.5), we get Hence, we have: |y| p µ t (dy) , (5.6) and the right-hand side vanishes as t → +∞ (and also as s → −∞, if I = R), uniformly for x in compact sets.
Corollary 5.7. Under the hypothesis of Theorem 5.6, there exists at most one evolution system of measures (µ t ) such that lim t→+∞ µ t (p)e ωpt = 0 for some p > 0.
Proof. Let (µ t ), (ν t ) be two evolution system of measures with the above property. By Theorem 5.6, for each f ∈ C b (R d ) and s ∈ I we have since both integrals coincide with lim t→+∞ (G(t, s)f )(0). The statement follows.

Evolution semigroups in L p spaces with respect to invariant measures
In this section we assume that I = R, and that Hypotheses 1.1 and 1.4 are satisfied.
Let us define the evolution semigroup {T (t)} associated with the evolution family {G(t, s)} on the space C b (R d+1 ) by Proposition 6.1. The family of operators {T (t) : t ≥ 0} is a semigroup of positive contractions in C b (R d+1 ). Moreover, T (t)f tends to f locally uniformly in R d+1 as t → 0 + , for any f ∈ C b (R d+1 ).
The positivity of T (t) follows from the positivity of the evolution family {G(t, s)}.
Finally, the fact that T (t)f converges to f locally uniformly in R d+1 as t → 0 + , is an immediate consequence of the continuity of the function (p, r, x) → (G(p, r)f )(x) in {(p, r, x) ∈ R d+2 : p ≥ r} and Proposition 3.6.
Remark 6.2. Since G(s, s − t)1l = 1l for each t > s, if f = f (s) depends only on time then (T (t)f )(s, x) = f (s − t), i.e. T (t) acts as a translation semigroup. Therefore, T (t) cannot have any smoothing or summability improving property in the s variable. In particular, it is not strong Feller and not hypercontractive. Now, let (µ t ) be an evolution system of measures for {G(t, s)}. Note that the function s → µ s (A) is measurable in I for any Borel set A. Indeed, by Lemma 3.2, the function s → (G(t, s)f )(x) is bounded and continuous in (−∞, t), for any x ∈ R d and any f ∈ C 0 (R d ). Hence, the function s →  and 1l A is the pointwise limit of a sequence (f n ) ⊂ C 0 (R d ), bounded with respect to the sup-norm, by dominated convergence, the measurability of the function s → µ s (A) follows. Therefore, we can define for Borel sets J ⊂ R and K ⊂ R d . Of course, ν may be uniquely extended in a standard way to a measure on B(R d+1 ).
In the following, we denote by G the differential operator We state a preliminary lemma about T (t) and G. Lemma 6.3.
Remark 6.4. In view of (6.3) we say that ν is infinitesimally invariant, although it is not a probability measure.
Finally, Hypothesis 1.3(ii) is trivially satisfied by virtue of Hypothesis 7.1(iii)(a). Hence, the following result follows.
Theorem 7.2. Let A be defined by (7.1) with the function b satisfying Hypothesis 7.1. Then, problem (1.2) is well posed in C b (R d ). The corresponding evolution family {G(t, s)} is irreducible and maps bounded measurable functions into bounded continuous functions. Moreover, {G(t, s)} admits an evolution system (µ t ) of measures having bounded moments of any order N ∈ N for any t ≥ t 0 , where t 0 is the number in (7.4). In particular, all the polynomial functions are integrable with respect to µ t for any t ≥ t 0 .