Error Estimates for Gaussian Beam Superpositions

Gaussian beams are asymptotically valid high frequency solutions to hyperbolic partial differential equations, concentrated on a single curve through the physical domain. They can also be extended to some dispersive wave equations, such as the Schr\"odinger equation. Superpositions of Gaussian beams provide a powerful tool to generate more general high frequency solutions that are not necessarily concentrated on a single curve. This work is concerned with the accuracy of Gaussian beam superpositions in terms of the wavelength $\epsilon$. We present a systematic construction of Gaussian beam superpositions for all strictly hyperbolic and Schr\"odinger equations subject to highly oscillatory initial data of the form $Ae^{i\Phi/\epsilon}$. Through a careful estimate of an oscillatory integral operator, we prove that the $k$-th order Gaussian beam superposition converges to the original wave field at a rate proportional to $\epsilon^{k/2}$ in the appropriate norm dictated by the well-posedness estimate. In particular, we prove that the Gaussian beam superposition converges at this rate for the acoustic wave equation in the standard, $\epsilon$-scaled, energy norm and for the Schr\"odinger equation in the $L^2$ norm. The obtained results are valid for any number of spatial dimensions and are unaffected by the presence of caustics. We present a numerical study of convergence for the constant coefficient acoustic wave equation in $\Real^2$ to analyze the sharpness of the theoretical results.


Introduction
In simulations of high frequency wave propagation, a large number of grid points is needed to resolve and maintain an accurate in time representation of the wave field. Consequently, in this regime, direct numerical simulations are computationally expensive and at sufficiently high frequencies, such simulations are no longer feasible. To circumvent this difficulty, approximate high frequency asymptotically valid methods are often used. One such popular approach is geometrical optics [7,33], which is obtained in the limit when the frequency tends to infinity. This method is also known as the WKB method or ray-tracing. The solution of the partial differential equation (PDE) is assumed to be of the form a(t, y, ε)e iφ(t,y)/ε , solution. They can therefore be computed at a computational cost independent of the frequency. However, the geometrical optics approximation breaks down at caustics, where rays concentrate and the predicted amplitude is unbounded [24,19]. The consideration of difficulties caused by caustics, beginning with Keller in [16] and Maslov and Fedoriuk (see [25]), led to the development of the theory of Fourier integral operators, e.g., as given by Hörmander in [10]. Gaussian beams form another high frequency asymptotic model which is closely related to geometrical optics. However, unlike geometrical optics, Gaussian beams do not breakdown at caustics. For Gaussian beams, the solution is also assumed to be of the geometrical optics form (1), but a Gaussian beam is a localized solution that concentrates near a single ray of geometrical optics in space-time. Although the phase function is real-valued along the central ray, Gaussian beams have a complex-valued phase function off their central ray. The imaginary part of the phase is chosen such that the solution decays exponentially away from the central ray, maintaining a Gaussian-shaped profile. To form a Gaussian beam solution, we first pick a ray and solve a system of ordinary differential equations (ODEs) along it to find the values of the phase, its first and second order derivatives and the amplitude on the ray. To define the phase and amplitude away from this ray to all of space-time, we extend them using a Taylor expansion. Heuristically speaking, along each ray we propagate information about the phase and amplitude and their derivatives that allows us to reconstruct the wave field locally in a Gaussian envelope. The existence of Gaussian beam solutions has been known since sometime in the 1960's, first in connection with lasers, see Babič and Buldyrev [2]. Later, they were used in the analysis of propagation of singularities in partial differential equations by Hörmander [11] and Ralston [30].
As above, we will assume that Φ ∈ C ∞ (K 0 ; R) and A j ∈ C ∞ 0 (K 0 ; C). Furthermore, we assume the potential V (y) is smooth and bounded along with all its derivatives, ∂ β y V ∈ C ∞ b (R n ) for all β. For the Schrödinger equation, the asymptotic parameter ε appears in the equation and there is no need to assume the bound on |∇Φ| that is necessary for strictly hyperbolic equations. Since these partial differential equations are linear, it is a natural extension to consider sums of Gaussian beams to represent more general high frequency solutions that are not necessarily concentrated on a single ray. This idea was first introduced by Babič and Pankratova in [3] and was later proposed as a method for wave propagation by Popov in [28]. The sum, or rather the integral superposition, of Gaussian beams in the simplest first order form can be written as u GB (t, y) = 1 2πε n 2 K0 a(t; z)e iφ(t,y−x(t;z);z)/ε dz , where K 0 is a compact subset of R n and the phase that defines the Gaussian beam is given by φ(t, y; z) = φ 0 (t; z) + y · p(t; z) + y · 1 2 M (t; z)y .
The real vector p(t; z) is the direction of wave propagation and the matrix M (t; z) has a positive definite imaginary part and it gives Gaussian beams their profile. Extensions of the above superposition are possible in several directions, including using higher order Gaussian beams in the superposition and using a sum of several superpositions to approximate the different modes of wave propagation. Higher order Gaussian beams are created by using an asymptotic series for the amplitude and using higher order Taylor expansions to define the phase and the amplitude functions, see (16) and (17). Also, for higher order beams, a cutoff function (13) is necessary to avoid spurious growth away from the central ray. Superpositions with higher order Gaussian beams have an improved asymptotic convergence rate. For m-th order strictly hyperbolic PDEs, which have m pieces of initial data, we use m different Gaussian beam superpositions chosen in such a way so that their sum approximates the initial data. Each of these superpositions corresponds to one of the m distinct modes of wave propagation. Accuracy studies for a Gaussian beam solution u GB have traditionally focused on how well it asymptotically satisfies the PDE, i.e. the size of the norm of P u GB in terms of ε. The question of determining the error of the Gaussian beam superposition compared to the exact solution was thought to be a rather difficult problem decades ago, see the conclusion section of the review article by Babič and Popov [4]. However, some progress on estimates of the error has been made in the past few years. This accuracy study was initiated by Tanushev in [34], where a convergence rate was obtained for the initial data. Some earlier results on this were also established by Klimeš in [18]. The part of the error that is due to the Taylor expansion off the central ray was considered by Motamed and Runborg in [27] for the Helmholtz equation. Liu and Ralston [22,23] gave rigorous convergence rates in terms of ε for the acoustic wave equation in the scaled energy norm and for the Schrödinger equation in the L 2 norm. However, the error estimates they obtained depend on the number of space dimensions in the presence of caustics, since the projected Hamiltonian flow to physical space becomes singular at caustics. The superpositions in (5) can also be carried out over both x and p in full phase space through the Hamiltonian map, (z, p 0 ) → (x(t; z, p 0 ), p(t; z, p 0 )), as shown in [22,23]. In this formulation the Hamiltonian flow is regular and there are no caustics, so we expect to obtain a dimensionally independent error estimate. This has been confirmed for the wave equation by Bougacha, Akian and Alexandre in [5] for the case of initial data based on the Fourier-Bros-Iaglonitzer (FBI) transform, a result which is inspired by the work of Rousse and Swart on the Herman-Kluk propagator for the Schrödinger equation [31,32]. From a computational stand point, the full phase space formulation is more expensive.
Building upon these recent advances, together with an application of a non-squeezing argument proved in Lemma 3, we are able to provide a definite answer to the question of accuracy for Gaussian beam superposition solutions. More precisely, we obtain dimensionally independent estimates for the superposition in physical space for general m-th order strictly hyperbolic PDEs and the Schrödinger equation. Our main result is the following theorem.
Theorem 1 Let u be the exact solution to the PDEs considered, (2), (3) and (4), under the stated assumptions on initial data and partial differential operators. Moreover, let u k be the corresponding k-th order Gaussian beam superposition given in Section 2.2 and Section 2.3, with a sufficiently small cutoff parameter η when k > 1. We then have the following estimates. For the m-th order strictly hyperbolic PDE (2), For the acoustic wave equation (3), where || · || E is the scaled energy norm (21). Finally, for the Schrödinger equation (4), This improves on the results in [22,23] where the last two error estimates were also proved, but with an additional factor ε −γ in the right hand side, where γ = (n − 1)/4 for the wave equation and γ = n/4 for the Schrödinger equation. Note that the rescaling by ε m−1 in the first estimate is convenient here, since it exactly balances the rate at which the corresponding norm of the initial data for the PDE (2) goes to infinity as ε → 0.
At present there is considerable interest in using numerical methods based on superpositions of beams to resolve high frequency waves near caustics, which began in the 1980's with numerical methods for wave propagation in [28,15,6] and more specifically in geophysical applications in [17,8,9]. Recent work in this direction includes simulations of gravity waves [36], of the semiclassical Schrödinger equation [14,20], and of acoustic wave equations [26,34]. Numerical techniques based on both Lagrangian and Eulerian formulations of the problem have been devised [14,21,20,26]. A numerical approach for general high frequency initial data closely related to the FBI transform, but avoiding the cost of superposing over all of phase space, is presented in [29] for the Schrödinger equation. Numerical approaches for treating general high frequency initial data for superposition over physical space were considered in [35,1] for the wave equation. Our theoretical results show that the numerical solutions found in these papers will be accurate when ε ≪ 1.
To test the sharpness of the theoretical convergence rates, we present a short numerical study in the case of the acoustic wave equation with constant sound speed. Our numerical results indicate that the theoretical rates are sharp for even order k, but similar to the result in [27], we observe a gain in the convergence rate of a factor of ε 1/2 for beams of odd order k, which suggests that the actual convergence rate is O(ε ⌈k/2⌉ ). This paper is organized as follows: Section 2 introduces Gaussian beams and their superpositions for m-th order strictly hyperbolic equations. Furthermore, we construct Gaussian beams for the Schrödinger equation. Section 3 is devoted to error estimates for Gaussian beam superpositions. Detailed norm estimates of the oscillatory operators used in obtaining the error estimates are given in Section 4. Numerical validation of our results is finally presented in Section 5.

Construction of Gaussian Beams
In this section, we outline the construction of Gaussian beam superpositions for strictly hyperbolic PDEs. We also construct Gaussian beams for the Schrödinger equation.

Hyperbolic Equations
Let P = P m + L be a linear strictly hyperbolic m-th order partial differential operator (PDO) in n dimensions with and L a differential operator of order m − 1. The principal symbol of P , denoted by σ m (t, y, τ, p), is defined by the formal relationship P m = σ m (t, y, −i∂ t , −i∂ y ). Following [13,30], we make the assumptions: (H1) The coefficients g β (t, y) are smooth functions, bounded in t and y along with all their derivatives, ∂ ℓ t ∂ α y g β ∈ C ∞ b (R n ) for all ℓ, α. (H2) For |p| = 0, the principal symbol σ m (t, y, τ, p) has m distinct real roots, when it is considered as a polynomial in τ .
The following lemma summarizes some results related to the Hamiltonian flow above. We will use the second point to argue that changing variables s → t is always allowed. The last point is needed in the proof of the non-squeezing lemma (Lemma 3).
We next show that there is a constant C such that |τ (s)| ≤ C|p(s)| for all s and initial data. This is obviously true if τ = 0. When τ = 0, let ξ = p/τ and observe that, by homogeneity, 0 = σ m (t, y, τ, p) = τ m σ m (t, y, 1, ξ) , on the null bicharacteristic. Hence ξ is a root of the polynomial equation Since the coefficients g β (t, y) are bounded it follows that there is a constant such that |ξ| ≥ C > 0, which proves that |τ (s)| ≤ C|p(s)| for all s.
We can now boundṗ(s) as follows.
for some constant c 1 independent of s and initial data. Let λ = c 1 /c 0 where c 0 is the constant in (8). Moreover, sett(s) = t(s) − t(0) ≥ 0 and note thatṫ(s) =ṫ(s) > 0. Then, Consequently, |p(s)| 2 e 2λt(s) ≥ |p(0)| 2 and (10) follows. With λ m = (m − 1)λ, we obtaiṅ It follows thatt(s) ≥ c 0 s for m = 1, since λ 1 = 0. For m > 1, This proves the stated logarithmic growth oft(s) for m > 1. ✷ As mentioned above, an immediate consequence of this lemma is that we can use t to parametrize the Hamiltonian flow (9) instead of s, since the lemma guarantees that for a fixed t 0 ∈ [0, T ], there exists a unique s 0 such that t(s 0 ) = t 0 . With a slight abuse of notation we will now write x(t) and p(t) for the Hamiltonian flow parametrized by t.
Following [30], we define a phase function φ and amplitude functions a j via Taylor polynomials. After changing variables s → t in the formulation used in [30] we can write: We can now define the preliminary k-th order Gaussian beamṽ k (t, y) as: Applying the operator P to this beam and collecting terms containing the same power of ε, we have where c r (t, y) are smooth functions independent of ε. The construction in [30] then proceeds to make c r (t, y) vanishe up to order k−2(r+m)+1 on y = x(t). To obtain this, (x(t), p(t)) must follow the Hamiltonian flow and the coefficients in the Taylor polynomials should satisfy ODEs, which are given as follows. By assumption (H2) above, whenever p = 0, we can define m Hamiltonians H ℓ (t, x, p) implicitly by the relations σ m (t, x, −H ℓ (t, x, p), p) = 0 for ℓ = 0, . . . , m − 1. For any choice H = H ℓ , the first several ODEs arė By (10) the Hamitonian will be well-defined for all times if p(0) = 0. Moreover, we have the following result for the Hessian matrix M (t) of the phase that guarantees that the leading order shape of the beam stays Gaussian for all time: Before we can fully define a Gaussian beam, one last point needs to be addressed. Since x(t) and p(t) are real, if φ 0 (0) is real and M (0) chosen as in Lemma 2, then the imaginary part of φ(t, y) will be a positive quadratic plus higher order terms about x(t). Thus, we must only construct the Gaussian beam in a domain on which the quadratic part is dominant. To this end, we use a cutoff function ρ η ∈ C ∞ (R n ; R) with cutoff radius 0 < η ≤ ∞ satisfying, Now, by choosing η > 0 sufficiently small, we can ensure that on the support of ρ η (y − x(t)), However, note that for first order beams the imaginary part of the phase is quadratic with no higher order terms, so that the cutoff is unnecessary. Thus, to include this case we let the cutoff function be defined for η = ∞ by ρ ∞ ≡ 1. We are now ready to finally define the k-th order Gaussian beam v k (t, y) as:

Superpositions of Gaussian Beams
In the previous section, we introduced Gaussian beam solutions that satisfy P u = 0 in an asymptotic sense (see [30] for a precise statement), without too much concern for the values of the solution at t = 0. The initial value problem that a single Gaussian beam v k (t, y) approximates has initial data that is simply given by its values (and the values of its time derivatives) at t = 0. While they resemble the initial conditions for (2), they are quite different since for example the phase, φ, for v k is complex valued and v k is concentrated in y.
Our goal is to create an asymptotically valid solution to the PDE (2), thus we must also consider the m distinct pieces of initial data given in the form of time derivatives of u at t = 0. To generate solutions based on Gaussian beams that approximate the initial data for (2), we exploit the linearity properties of P . That is, we use the fact that a linear combination of two Gaussian beams, with different initial parameters, will also be an asymptotic solution to P u = 0, since each Gaussian beam is itself an asymptotic solution. Building on this idea, we take a family of Gaussian beams that is indexed by a parameter z ∈ K 0 , where K 0 is the compact subset of R n discussed in the introduction that contains the support of the amplitudes of the initial data for (2). We will use the notation x ℓ (t; z), p ℓ (t; z), φ ℓ (t, y − x(t; z); z), etc, to denote the dependence of these quantities on the indexing parameter z and on the m choices for H(t, x, p) = H ℓ (t, x, p) denoted by ℓ = 0, . . . , m − 1. Thus, we will write the k-th order Gaussian beam v k,ℓ (t, y; z). Note that the cutoff radius η may vary with z and ℓ between beams, however, as the beam superposition is taken over the compact set K 0 and 0 ≤ ℓ ≤ m − 1, there is a minimum value for η that will work for all beams in the superposition. Thus, we form the k-th order superposition solution u k (t, y) as where the phase and amplitude that define the Gaussian beam, We remind the reader that each v k,ℓ (t, y; z) requires initial values for the ray and all of the amplitude and phase Taylor coefficients. The appropriate choice of these initial values will make u k (0, y) asymptotically converge the initial conditions in (2). The first step is to choose the origin of the rays and the initial coefficients of φ ℓ up to order k + 1. Letting the ray begin at a point z and expanding Φ(y) in a Taylor series about this point, we initialize the ray and Gaussian beam phase associated with τ ℓ as Determining the initial coefficients for the amplitudes involves a more complicated procedure. As with the phase, we expand all of the amplitude functions in Taylor series, Next, we look at the time derivatives of u k at t = 0, to the lowest order in ε. We equate the coefficients of (y − z) to the the corresponding terms in the Taylor expansions of B ℓ,j and recalling that we obtain the following m × m system of linear equations: Since the τ ℓ are distinct, this Vandermonde matrix is invertible, so the solution will give the initial coefficients for the first amplitude for each of the m Gaussian beams. If we proceed with the next orders in ε, we would obtain the same m × m linear system for [a 0,j,β , . . . , a m−1,j,β ] T , except that the right hand side will not only depend on the Taylor coefficients B ℓ,j,β , but also on previously computed a ℓ,q,γ , q < j, coefficients and their time derivatives. Thus, all of the necessary initial coefficients for each of the m Gaussian beams can be computed sequentially. Summarizing this construction, we have that at t = 0 and ℓ = 0, . . . , m − 1, where φ(y) is the Taylor expansion of Φ(y) + i|y − z| 2 /2 to order k + 1 and each b ℓ,j is the same as the Taylor expansion of B ℓ,j up to order k − 2j − 1. The O(ε ∞ ) term is present because some of the time derivatives fall on the cutoff function ρ η (y − x ℓ (t; z)). The contributions of such terms decays exponentially as ε → 0, since the derivatives of ρ η (y − x ℓ (t; z)) are compactly supported and vanish near y = z.
Remark 2.1 For ease of notation and exposition, in (2) we have taken the phase Φ to be the same for all of the m initial data, however, this is not a requirement. We can form m different Gaussian beam superpositions that each satisfy one of these m conditions with a specific phase and the rest with zero initial data. Then, summing these m solutions we obtain a more general solution of (2) with m different phase functions for each of initial data piece.

Remark 2.2
In the initialization of M ℓ (0; z) we take its imaginary part to be given by i Id n×n for simplicity. All of the results in this paper can be carried out if we instead took the imaginary part to be given by iγ Id n×n , for some constant γ > 0, and adjusted the normalization constant in (15) appropriately. However, it is important to note that the constants throughout this paper will depend on γ and that in general, we expect that increasing γ will increase the evolution error in the Gaussian beams and that decreasing γ will increase the error in approximating the initial data.
This completes the construction of the Gaussian beam superpositions u k for the initial value problem (2) for general m-th order strictly hyperbolic operators. From this point, we will assume that the parameter η is chosen as η = ∞ for the first order superposition u 1 and that for higher order superpositions it is taken small enough (and independent of z and ℓ) to make ℑφ ℓ (t, y−x ℓ (t; z); z) > δ|y − x ℓ (t; z)| 2 for t ∈ [0, T ] and y ∈ supp{ρ η (y − x ℓ (t; z))}. Lemma 4 below shows that this is always possible.

The Schrödinger Equation
The construction of Gaussian beams for hyperbolic equations can be extended to the Schrödinger equation by replacing the operator P with a semiclassical operator P ε . Then, we can similarly construct asymptotic solutions to P ε u = 0 as ε → 0. In this section, we briefly review the construction presented in [23] for the Schrödinger equation (4) with a smooth external potential V (y). Note that the small parameter ε represents the fast space and time scale introduced in the equation, as well as the typical wavelength of oscillations of the initial data.
We recall that the k-th order Gaussian beam solutions are of the form where the phase and amplitudes are given in (16) and (17) and ρ η is the cutoff function (13). Furthermore, the subindex "ℓ" has been suppressed since for the Schrödinger equation there is only one choice for H(t, x, p), namely The system (12) for the bicharacteristics (x(t; z), p(t; z)) is then given bẏ The equations for the phase and amplitude Taylor coefficients are derived in the same way as for the strictly hyperbolic equations. The phase coefficients along the bicharacteristic curve satisfẏ The amplitude coefficients are obtained by recursively solving transport equations for a j,β with |β| ≤ k − 2j − 1, starting fromȧ These equations when equipped with the following initial data The k-th order Gaussian beam superposition is finally formed as As in the case of strictly hyperbolic PDEs, we will assume that the cutoff parameter η is chosen as η = ∞ for the first order superposition u 1 and that for higher order superpositions it is taken small enough (and independent of z) to make ℑφ(t, y − x(t; z); z) > δ|y − x(t; z)| 2 for t ∈ [0, T ] and y ∈ supp{ρ η (y − x(t; z))}. Again, Lemma 4 ensures that this can be done. Furthermore, we note that Remark 2.2 concerning the initial choice of the imaginary part of M (0; z) also applies to the superposition for the Schrödinger equation.

Error Estimates for Gaussian Beams
In this section we prove the asymptotic convergence results for superpositions of Gaussian beams given in our main result, Theorem 1. The corner stone of our error estimates are the well-posedness estimates for each PDE. Since they are crucial to our analysis we summarize them here.
Proof: The results for the wave and Schrödinger equation are standard and can be found in most books on PDEs. The result for m-th order equations is a bit more technical to prove and appears in Section 23.2 of [13] (Lemma 23.2.1). ✷ Remark 3.1 Since the wave equation is a second order strictly hyperbolic PDE, we have two distinct well-posedness estimates in terms of two different norms. Furthermore, we note that · E is only a norm over the class of functions that tend to zero at infinity, which we are considering here.
When Theorem 2 is applied to the difference between the Gaussian beam superposition, u k , and the true solution, u, for any one of the PDEs that we are considering, we obtain the following estimate for t ∈ [0, T ], with the appropriate choices for Θ, q and · S . We will refer to the first term on the right hand side as the error in approximating the initial data or the initial data error and to the second term as the evolution error.
Using the ideas in [34], we prove Theorem 4, which shows the convergence rate in ε of the Gaussian beam superposition to the initial data for any given Sobolev norm. Thus, this theorem extends a result of [34], so that we can use it to estimate the initial data error in the more general well-posedness estimates above.
The evolution error has been estimated in the work of previous authors [22,23,5,32]. The necessary steps used by those authors are quite general and can be applied to any strictly hyperbolic equation as well as any linear dispersive wave equation as long as it is semi-classically rescaled. Following these ideas, we show in Lemma 5 that for all of the PDEs that we consider, Θ[u k ] can be written in the form so that the key to estimating the evolution error is precise norm estimates in terms of ε of the oscillatory integral operators Q αj ,gj ,η : L 2 (K 0 ) → L ∞ ([0, T ]; L 2 (R n )), defined as follows. For a fixed t ∈ [0, T ], a multi-index α, a compact set K 0 ⊂ R n , a cutoff function ρ η (13) with cutoff radius 0 < η ≤ ∞ and a function g(t, y; z), we let (Q α,g,η w)(t, y) w(z)g(t, y; z)(y − x(t; z)) α e iφ(t,y−x(t;z);z)/ε ρ η (y − x(t; z))dz , with the functions g(t, y; z), φ(t, y; z) and x(t; z) satisfying for all t ∈ [0, T ], With this definition, the following norm estimate of Q α,g,η will be proved in Section 4. This theorem improves the norm estimate of Q α,g,η given in [22,23], which has an additional factor ε −γ in the right hand side, with γ = (n − 1)/4 for the wave equation and γ = n/4 for the Schrödinger equation. Instead of estimating the integral directly, we follow the arguments in [5,31] to relate the estimate of the oscillatory integral to the operator norm, through the use of an adjoint operator. An essential ingredient in estimating the operator norm is the non-squeezing lemma (Lemma 3), which states that the distance between two physical points is comparable to the distance between their Hamiltonian trajectories measured in phase space, even in the presence of caustics. Using Theorem 3 we are able to prove the same convergence rate for Gaussian beam superpositions over physical space that is achieved in [5] for beam superposition carried out in full phase space. Thus, we improve on the error estimates given in [22,23] to obtain error estimates for the Gaussian beam superposition that are independent of dimension as given in Theorem 1. We conclude this section with two remarks.

Remark 3.2
The assumption of C ∞ smoothness for all functions is made for simplicity to avoid a too technical discussion about precise regularity requirements. In this sense, Theorem 1 and Theorem 3 can be sharpened, since they will be true also for less regular functions.

Remark 3.3
If the condition in assumption (A4) is satisfied for all y there is no need for the cutoff function in the definition of the operator Q in (24). We treat this case by taking η = ∞ and defining ρ ∞ ≡ 1. The operators with η = ∞ are used in the case of first order Gaussian beams.

Gaussian Beam Phase
In this section, we show that the Gaussian beam phase φ given in (16) is an admissible phase for the operators Q α,g,η . We begin with a lemma based on the regularity of the Hamiltonian flow map, S t , stating that the difference |z − z ′ | is comparable to the sum |p(t; z) − p(t; z ′ )| + |x(t; z) − x(t; z ′ )|. Note, however, that because of caustics it is not true that |z − z ′ | is related in this way to either of the individual terms |p(t; z) − p(t; z ′ )| or |x(t; z) − x(t; z ′ )|.
Lemma 3 (Non-squeezing lemma) Let S t be a Hamiltonian flow map, (x(t; z), p(t; z)) = S t (x(0; z), p(0; z)), associated to a strictly hyperbolic PDO (12) or to the Schrödinger operator (19). Let K 0 be a compact subset of R n and assume that p(0; z) is Lipschitz continuous in z ∈ K 0 for the flow associated with the Schrödinger operator. Additionally, assume that inf z∈K0 |p(0; z)| = δ > 0 for the flow associated with the strictly hyperbolic PDO. Under these conditions, there exist positive constants c 1 and c 2 depending on T and δ, such that for all z, z ′ ∈ K 0 and t ∈ [0, T ].
Proof: We prove the result for the flows associated with the two types of operators separately, however, we use the common notation, We begin with the flow associated with the Hamiltonian for the Schrödinger operator. Let us introduce the set K 0 = {(z, p(0; z)) : z ∈ K 0 } and note that S t is invertible with inverse S −t and regular for all t so that where conv(E) denotes the convex hull of the set E. Now, noting that and taking ℓ 1 norms, since sZ + (1 − s)Z ′ ∈ conv(K 0 ), we have where we have used the Lipschitz continuity of p 0 in z. This gives the right half of (25). By the equivalent of (26) for S −t , we have which completes the proof of the lemma for the Hamiltonian flow associated with the Schrödinger operator.
For the flow associated with the strictly hyperbolic operator, we follow the same idea, but we have to be careful near |p| = 0. Thus, in addition to K 0 , we introduce the sets Note that S t is regular away from |p 0 | = 0 by Lemma 1 as |p(t)| = 0, for all t so Thus, again by (26) we obtain, provided thatp(s) = (1 − s)p 0 + sp ′ 0 satisfies inf 0≤s≤1 |p(s)| ≥ δ/2, which guarantees that sZ + (1 − s)Z ′ ∈K 0 \ B δ/2 for 0 ≤ s ≤ 1. On the other hand, suppose that inf 0≤s≤1 |p(s)| < δ/2. We define p * as the point on the line connecting p 0 to p ′ 0 with smallest norm and let s * = argmin 0≤s≤1 |p(s)| so that p * =p(s * ). Then which shows that (27) holds for all Z, Z ′ ∈ K 0 . This gives the right half of (25). For the left half of (25), we note that by Lemma 1, inf t∈[0,T ],z∈K0 |p(t; z)| ≥δ > 0, so that we can consider the inverse map S −t in exactly the same way as S t above and show that ||Z − Z ′ || 1 ≤ C||X − X ′ || 1 .
Thus we obtain the left half of (25), Thus, the proof of the lemma is complete. ✷ We are now ready to show that the phase function for a k-th order Gaussian beam is an admissible phase for the operators Q α,g,η .
Recalling that by Lemma 2, ℑM (t; z) is positive definite, we therefore have that when |y| ≤ 2η, where the constant δ(η) is positive for small enough η and independent of t ∈ [0, T ], since φ β (t; z) are smooth functions of t. This shows that φ(t, y; z) satisfies (A4). When k = 1, there are only quadratic terms in the phase and in this case φ(t, y; z) will satisfy (A4) for any choice of η ∈ (0, ∞]. ✷

Representations of P [u k ] in Terms of Q α,g,η
In this section, we show that several of the intermediate quantities in the proof of Theorem 1 can be written as sums involving the operators Q α,g,η .
Lemma 5 Let P be an m-th order strictly hyperbolic operator and P ε a semiclassical Schrödinger operator, satsifying the assumptions stated for (2) (in Section 2.1) and (4) respectively. Let u k be the corresponding Gaussian beam superpositions given in Section 2.2 and Section 2.3. Then P [u k ] and P ε [u k ] can be expressed as a finite sum of the operators Q: with χ K0 (z) the characteristic function on K 0 and Q αj ,gj ,η satisfying assumptions (A1)-(A5) if η is sufficiently small. In the case of k = 1, η can take any value in (0, ∞]. Proof: For a single Gaussian beam v k,ℓ (t, y; z) for the hyperbolic operator, following the discussion in [30] and Section 2.1, we have where E k contains terms that are multiplied by derivatives of the cutoff function. For first order beams η = ∞ and ρ ∞ ≡ 1 making E 1 (t, y; z) ≡ 0. For higher order beams, E k ≡ 0 in an η neighborhood of x ℓ (t; z) and since η is small enough so that the imaginary part of φ ℓ is strictly positive for η ≤ |y − x ℓ (t; z)| ≤ 2η, E k will decay exponentially as ε → 0. Thus, E k = O(ε ∞ ) for all orders of beams and t ∈ [0, T ].
To simplify the notation, let j = 1, . . . , J < ∞ enumerate all of the combination of ℓ, r, and α in the triple sum above and rewrite the sums as with ℓ j ≥ (k/2 − m + 1) and g j = c rj,ℓj ,αj . Thus, we have the desired result for P . Now, for each Gaussian beam v k (t, y; z) for the Schrödinger equation defined in Section 2.3, following [23], we compute Thus, we have where for convenience a j ≡ 0 for j ∈ [0, ⌈k/2⌉ − 1] and By construction of the phase coefficients in Section 2.3, we have that on y = x(t; z), G vanishes to order k + 1 and by construction of the amplitude coefficients, the quantity −iLa r−1 − 1 2 △ y a r−2 vanishes to order k − 2r + 1. Thus, d r vanishes to order k − 2r + 1 on y = x(t; z), where again we remark that if k − 2r + 1 is negative, d r does not vanish on y = x(t; z). By Taylor's remainder formula, we have d r (t, y; z) = |α|=[k+2−2r]+ d r,α (t, y; z) (y − x(t; z)) α , with d r,α (t, y; z) compactly supported in z ∈ K 0 . Hence, For the superposition, we have As above, the smoothness of V (y), φ β and a j,β guarantees that d r,α satisfies (A5), while Lemma 4 ensures that also (A1)-(A4) are true for Q α,dr,α,η under the condition on η. Again, we let j = 1, . . . , J < ∞ enumerate all of the combination of r and α in the double sum above. Thus, with ℓ j ≥ (k/2 + 1) and g j = d rj ,αj . Thus, the proof is complete. ✷

Initial Data Errors
In addition to establishing the role of the Q-operators and estimating their norms, we need to consider the convergence of the Gaussian beam superposition to the initial data to be able to prove Theorem 1. We start with the following lemma.
where φ(y − z; z) is the k + 1 order Taylor series of Φ about z, a j (y − z; z) is the same as the Taylor series of A j about z up to order k − 2j − 1 (but may differ in higher order terms) and ρ η is the cutoff function (13) with 0 < η ≤ ∞. Then for some constant C s , Proof: The proof of this lemma is based on a discussion in [34]. We first assume that η < ∞. Looking at each of the terms in the sum above separately, the estimate depends on how well is approximated by 1 2πε n 2 R n ε j ρ η (y − z)a j (y − z; z)e iφ(y−z;z)/ε−|y−z| 2 /2ε dz .
Since we want to estimate the difference between these two functions in a Sobolev norm, we need to consider differences between their ∂ β y derivatives in the L 2 norm. Since derivatives of (29) that fall on the cutoff function ρ η (y − z) vanish in a neighborhood of z = y and the integrand is compactly supported in y and z, they will be O(ε ∞ ) in the L 2 norm and do not contribute to the estimate. Thus, when we differentiate (28) and (29) by ∂ β y and pair the results from the sequence of differentiations, the terms that will contribute to the estimate will be of the form 1 2πε for (28) and for (29), where C {γ ℓ } ℓ are combinatorial coefficients. These terms are integrated over R n in z and summed over the multi-indexes δ 1 + γ = β, the index ℓ = 1 . . . |γ|, and multi-indexes |γ 1 |, . . . , |γ ℓ | ≥ 1, γ 1 + . . . + γ ℓ = γ. Furthermore, we have that max m [γ m ] ≤ |γ| − ℓ + 1. These formulas can be obtained through long but straightforward calculations. The important part is to recall that iφ(y − z; z) − |y − z| 2 /2 is the k + 1 order Taylor series of iΦ(y) − |y − z| 2 /2 about z and a j (y − z; z) agrees with the Taylor series of A j (y) about z up to order k − 2j − 1, so that (31) will agree with the Taylor series of (30) up to order Thus, we have that ∂ β y (u k − u) L 2 can be estimated by a sum of terms of the form where ℓ ≤ |β|, |B ℓ − b ℓ | = O(|y − z| k−2j−|β|+ℓ ) and |Φ − φ| = O(|y − z| k+2 ). Now, the proof of Theorem 2.1 in [34] can be applied directly to (32) to obtain the estimate, Thus, we have the result for η < ∞. The extension for ρ ∞ ≡ 1 follows directly, since the cutoff ρ ν for ν < ∞ introduces O(ε ∞ ) errors in the L 2 norm: as 1 − ρ ν vanishes in a neighborhood z = y and the integrand is compactly supported in z. ✷ Using Lemma 6, we can estimate the asymptotic convergence rate of the superposition solution to the initial data.
Theorem 4 For the Gaussian beam superposition, u k , given in Section 2.2 and the solution, u, to the strictly hyperbolic PDE (2), we have for some constant C ℓ,s and 0 ≤ ℓ ≤ m − 1.
Similarly, for the Gaussian beam superposition, u k , given in Section 2.2 and the solution, u, to the wave equation (3), we have Furthermore, for the superposition, u k , given in Section 2.3 and the solution, u, to the Schrödinger equation (4), we have Proof: The proof of this theorem for hyperbolic PDEs follows directly from Lemma 6, since for each power of ε, ∂ ℓ t u k (0, y) and ∂ ℓ t u(0, y) given in (18) and (2), respectively, are exactly in the assumed form in the Lemma 6.
The result for the wave equation follows after noting that Similarly, the result for the Schrödinger equation follows directly from the definition of the u k and u at t = 0 in Section 2.3. ✷

Proof of Theorem 1
We prove the results for each type of PDE separately. For the strictly hyperbolic m-th order PDE (2), applying the well-posedness estimate given in Theorem 2 to the difference between the true solution u and the k-th order Gaussian beam superposition, u k , defined in Section 2.2, we obtain for t ∈ [0, T ], The first term of the right hand side, which represents the difference in the initial data, can be estimated by Theorem 4 and the second term, which represents the evolution error, can be estimated by Lemma 5 to obtain with ℓ j ≥ (k/2 − m + 1) and Q αj ,gj ,η satisfying (A1)-(A5), for small enough η when k > 1. Thus, using Theorem 3, we obtain which completes the proof for strictly hyperbolic PDEs. Since the wave equation is a second order strictly hyperbolic PDE, applying the above estimate to (3), we obtain for t ∈ [0, T ], which completes the proof of Theorem 1 for the wave equation. For the Schrödinger equation (4), applying the well-posedness estimate given in Theorem 2 to the difference between the true solution u and the k-th order Gaussian beam superposition, u k , defined in Section 2.3, we obtain for t ∈ [0, T ], The initial data part of the right hand side can be estimated by Theorem 4 to obtain u(0, ·) − u k (0, ·) L 2 ≤ Cε k 2 . With the help of Lemma 5, we can estimate the second part of the right hand side as with ℓ j ≥ (k/2 + 1) and, as above, Q αj ,gj ,η satisfying (A1)-(A5), for small enough η when k > 1. Again, using Theorem 3 and combining, we obtain, Thus, the proof of Theorem 1 is complete.

Norm Estimates of Q α,g,η
In this section we prove Theorem 3. We follow the ideas in [5,31] to relate the estimate of the oscillatory integral to the operator norm, through the use of an adjoint operator. A key ingredient in estimating the operator norm is the non-squeezing lemma (Lemma 3), which allows us to obtain a dimensionally independent estimates for the oscillatory integral operator.

Utility results
We will prove a few general results that will be useful in the proof of Theorem 3.
For the second result, we have Using assumption (A3) we have for E 1 , To estimate E 2 , we first note that on Ω(t, µ), Then, by the Fundamental Theorem of Calculus and the smoothness assumption (A2), for y ∈ Ω(t, µ) we have, with C 1 independent of t ∈ [0, T ]. Similarly, with y ∈ Ω(t, µ) and C 2 independent of t ∈ [0, T ] and s ∈ K 0 , where C(θ, µ) is independent of t ∈ [0, T ] and positive if θ and µ are small enough. ✷ Next we have a version of the non-stationary phase lemma.
Proof: This is a classical result. A proof can be obtained by modifying the proof of Lemma 7.7.1 of [12]. However, we omit the details for the sake of brevity. ✷ With this lemma in hand we can estimate |I ε α,g (t, z, z ′ )|.
The integral I 1 will correspond to the first part of the right hand side of the estimate in the lemma and I 2 to the second part. We begin estimating I 1 . By Lemma 8 and (A5), for a fixed t, we compute, |y − ∆x| |α| |y + ∆x| |α| e −δ(|y−∆x| 2 +|y+∆x| 2 )/ε dy .
Now, using the same argument as for the K = 0 case, we have β12≤α and consequently, Using the fact that |I 1 | will be bounded by the minimum of the K = 0 and K > 0 estimates, we have Noting that for positive a, b, and c, This shows the I 1 contribution to the estimate (34). It remains to show the smallness of I 2 . Indeed, since either |y + ∆x| > µ or |y − ∆x| > µ on Ω(t, η) \ Ω(t, µ), we get in the same way as for I 1 in the K = 0 case, for any s > 0 with C s independent of t. This concludes the proof of the lemma. ✷

Proof of Theorem 3
We now have all of the ingredients to complete the proof of Theorem 3. We fix t ∈ [0, T ] and start with the estimate, derived in the beginning of Section 4.1 and we turn our attention to estimating the integral of |I ε α,g (t, z, z ′ )|. We will use the shorthand notation I ε D (t, z, z ′ ) ≡ χ D (z, z ′ )I ε α,g (t, z, z ′ ), where χ D (z, z ′ ) is the characteristic function on a domain D ⊆ K 0 × K 0 . We will consider two disjoint subsets of K 0 × K 0 given by where θ is the small parameter θ in Lemma 8. Note that D 1 ∪ D 2 = K 0 × K 0 and D 1 ∩ D 2 = ∅. Thus, The set D 1 corresponds to the non-caustic region of the solution. There, we estimate I ε D1 (t, z, z ′ ) by taking K = 0 and s = n + |α| in Lemma 10.
The set D 2 corresponds to the region near caustics of the solution. On D 2 , we estimate I ε D2 (t, z, z ′ ) using Lemma 10 again where we pick µ small enough to allow us to use also Lemma 8. Letting R = sup z,z ′ ∈K0 |z − z ′ | < ∞ be the diameter of K 0 and s = n + |α|, we compute if we take K = n + 1. Since all of the constants are independent of the fixed t ∈ [0, T ], by putting all of these estimate together we obtain for all t ∈ [0, T ], which proves the theorem.

Numerical study of convergence
In this section, we perform numerical convergence analyses to study the sharpness of the theoretical estimates in this paper for the constant coefficient wave equation with sound speed c(y) = 1, for which H(t, x, p) = ±|p|. The ODEs that define the Gaussian beams are solved numerically using an explicit Runge-Kutta (4, 5) method (MATLAB's ode45). We use the fast Fourier transform to obtain the "exact solution" and use it to determine the error in the Gaussian beam solution. When the norms require it, we compute derivatives via analytical forms rather than numerical differentiation.

Single Gaussian Beams
First, we study the convergence rate for a single Gaussian beam to show the sharpness of the estimate proved in [30]. For 2D, this estimate states that for a single Gaussian beam v k (t, y), Using the well-posedness estimate for the wave equation, we obtain the rescaled energy norm estimate for t ∈ [0, T ], where u is the exact solution to the wave equation. Taking the initial conditions for u to be the same as the 1-st, 2-nd and 3-rd order Gaussian beams at t = 0 (modulo the cutoff function), we obtain the asymptotic error estimate, To test the sharpness of this estimate, we investigate the numerical convergence as follows. We let the Gaussian beam parameters for the 1-st order Gaussian beam be given by and pick H = −|p|. The additional coefficients necessary for the Taylor polynomials of phase and amplitudes for 2-nd and 3-rd order Gaussian beams are all initially taken to be 0. Note that even though the phase and amplitudes for 1-st, 2-nd and 3-rd order beams are the same at t = 0, their time derivatives will not be, as the ODEs for the higher order coefficients are inhomogeneous. Thus, since the initial data for the exact solution, u, matches the initial data for the Gaussian beam, u depends on the order of the beam. With this choice of parameters, we generate the 1-st, 2-nd and 3-rd order Gaussian beam solution at t = {0.5, 1}. For 2-nd and 3-rd order beams we additionally use a cutoff function with η = 1/10. At each t, we compute the rescaled energy norm of the difference v k − u. The asymptotic convergence as ε → 0 is shown in Figure 1. We draw the attention of the reader to the following features of the plots in Figure 1: 1. For 1-st order beam: The error decays as ε 1 .
2. For 2-nd and 3-rd order beams: The error for larger ε is dominated by the error induced by the cutoff function. In this region, the error decays exponentially fast. As ε gets smaller, the error decays as ε 3/2 for the 2-nd order beam and ε 2 for the 3-rd order beam.
The numerical results agree with the estimate given in [30] and, thus, the estimate for single Gaussian beams is sharp.

Cusp Caustic
We consider an example in 2D that develops a cusp caustic. The initial data for u at t = 0 is given by u(0, y) = e −10|y| 2 e i(−y1+y 2 2 )/ε .
We draw the attention of the reader to the following features in the plots: 1. At all t: (a) The asymptotic behavior of 1-st and 2-nd order solutions is the same and of order O(ε 1 ).
(b) The asymptotic behavior of 3-rd order solution is of order O(ε 2 ).
(c) For large value of ε the error for 2-nd and 3-rd order solutions is dominated by the error induced by the cutoff function. In this region the error decays exponentially. (d) For midrange values of ε, 2-nd order solutions experience a fortuitous error cancellation. This is due to the cutoff function. Changing the cutoff radius η shifts this region.
2. At t = 0.5 and t > 0.5, the asymptotic behavior of the error is unaffected by the cusp and fold caustics, respectively.
For this example we note that the convergence rate for odd beams k ∈ {1, 3} is in fact half an order better than our theoretical estimates. This improvement was also observed and proved for a simplified setting in [27], where the analysis shows that the gain is due to error cancellations between adjacent beams; it is therefore not present for single Gaussian beams. The result in [27] is for the Helmholtz equation and only concerns the pointwise error away from caustics. Here the numerical results indicate that the same improvement appears for the wave equation in the energy norm even when caustics are present. We conjecture that this gain of convergence is due to error cancellations as well and that it is present for all Gaussian beam superpositions with beams of odd order k. Moreover, we conjecture that the theoretical result is sharp for beams of even order k, giving us an optimal error estimate in the energy norm of O(ε ⌈k/2⌉ ) for all k.

Concluding Remarks
Gaussian beams are asymptotically valid high frequency solutions to strictly hyperbolic PDEs concentrated on a single curve through the physical domain. They can also be constructed for the Schrödinger equation. Superpositions of Gaussian beams provide a powerful tool to generate more general high frequency solutions. In this work, we establish error estimates of the Gaussian beam superposition for all strictly hyperbolic PDEs and the Schrödinger equation. Our study gives the surprising conclusion that even if the superposition is done over physical space, the error is still independent of the number of dimension and of the presence of caustics. Thus, we improve upon earlier results by Liu and Ralston [22,23].