A Priori Generalization Error Analysis of Two-Layer Neural Networks for Solving High Dimensional Schr\"odinger Eigenvalue Problems

This paper analyzes the generalization error of two-layer neural networks for computing the ground state of the Schr\"odinger operator on a $d$-dimensional hypercube. We prove that the convergence rate of the generalization error is independent of the dimension $d$, under the a priori assumption that the ground state lies in a spectral Barron space. We verify such assumption by proving a new regularity estimate for the ground state in the spectral Barron space. The later is achieved by a fixed point argument based on the Krein-Rutman theorem.


Introduction
High dimensional partial differential equations (PDEs) arise ubiquitously from scientific and engineering problems which involve many degrees of freedom, examples include many-body quantum mechanics, phase space description of chemical dynamics, learning and control of complex systems, spectral methods for high dimensional data, just to name a few.While numerical methods for partial differential equations in low-dimension are quite standard, the numerical solution to high dimensional PDEs has remained an outstanding challenge due to the well-known curse of dimensionality.Namely, the computational cost can grow exponentially as the dimension increases.Perhaps the most celebrated and important example of such challenge is to determine the ground state of many-body quantum systems, which amounts to solving eigenvalue problems for high dimensional PDEs.
In recent years, neural networks have shown great success in representing highdimensional classifiers or probability distributions in a variety of machine learning tasks and have led to the tremendous success and popularity of deep learning [32,38].Motivated by those recent success, researchers have been actively exploring using deep learning techniques to solve high dimensional PDEs [10,18,22,23,29,37,45,48] by using neural networks to parameterize the unknown solution of high dimensional PDEs.Thanks to the flexibility of the neural network approximations, such methods have achieved remarkable results for various kind of PDE problems, including eigenvalue problems for many-body quantum systems (see e.g., [7,9,12,19,[24][25][26]36]), where the high dimensional wave functions are parameterized by neural networks with specific architecture design to address the symmetry properties of many-body quantum systems.
Despite wide popularity and many successful examples of employing neural network ansatz for solving PDEs, their theoretical analyses are still sparse.In [40,41], the authors obtained the convergence error estimates for PINNs based on both strong and variational formulations in the context of solving linear elliptic and parabolic PDEs.In [35], the authors proposed a general framework to study the a-posterior-type generalization error estimates for PINNs.It is worth to note that in the aforementioned work, the error estimates were proved under the assumption that the solution belongs to Sobolev or Hölder spaces and hence those estimates suffer from the curse of dimensionality.In [27,34,47], the authors established dimension-explicit a priori-estimates for the generalization error of two-layer neural networks for solving elliptic PDEs assuming (but without verifying) the solution of the PDEs lie in certain Barron spaces.In our recent work [33], we proved a dimension-independent convergence rate for the generalization error bound for the deep Ritz method [18] for solving elliptic PDEs when the solutions lie in some spectral Barron space, and more importantly we also established new regularity theory in the spectral Barron space for the PDEs.
Nonetheless, to the best of our knowledge, the numerical analysis of neural network methods for high dimensional eigenvalue problems is not yet established.The goal of this paper is to provide an a priori generalization analysis of variational methods for computing the ground state of the Schrödinger operator in high dimension based on the two-layer neural network ansatz.
Our generalization error analysis follows largely the framework established in our previous work [33], where a priori generalization error is analyzed for deep Ritz method for solving elliptic equations.In particular, to establish approximation results that do not deteriorate as dimension increases, we will work in a spectral Barron space that is firstly defined in the seminal work of Barron [5].It has been shown in [5,14,30,33,43] that spectral Barron functions has "lower complexity" than more familiar regularitybased functions such as Sobolev or Hölder functions in the sense that the former can be efficiently approximated by two-layer neural networks without curse of dimensionality.It is also worth mentioning that another notion of Barron space based on an integral representation is defined in [15], in which similar neural network approximation result holds.Discussions on the relationship between the two notions of Barron spaces and their properties can be found in [8] and [16].
On the other hand, as the Barron space is rather different from Sobolev or Hölder spaces, the main challenge one faces is to establish regularity theory for high dimensional PDEs in such space.Our previous work [33] established the appropriate solution theory for elliptic equations.The key contribution of the present work is to extend such novel solution theory to Schrödinger eigenvalue problems in high dimension.Since we are working in spectral Barron space, which is a general Banach space without inner product structure, Lax-Milgram or Courant-Fisher theorems are not applicable, and thus we have to rely on fixed point theorem to establish the existence of solutions.In our previous work for elliptic PDEs [33], the Fredholm alternative principle was used; while in this work, to establish the existence of nontrivial eigenfunctions, we rely on the Krein-Rutman theorem [31].We also remark that similar regularity estimates in an alternative Barron space [15] were obtained for some nonlinear PDEs [17] and linear elliptic PDEs [11].
The remainder of this paper is organized as follows.In Section 2 we first set up the ground state problem of the Schrödinger operator and present the main generalization results (see Theorems 2.3 and 2.4) and the new regularity estimate on the ground state in the spectral Barron space (see Theorem 2.5).In Section 3 we prove a key stability estimate on the ground state, which allows us to bound the  1 -error between the ground state and its approximation in terms of the energy excess.We present the proof of the main generalization result in Section 4 and the proof of the new regularity estimate on the ground state in Section 5. Throughout the paper we make the following minimum assumption on the potential function.

Assumption 2.1.
There exist finite positive constants  min and  max such that  min ≤ () ≤  max for every  ∈ Ω.
Note that we assume without loss of generality that  min is positive, since one can always add a constant to  without changing the eigenfunctions.
For certain results to hold, we may also need to make the following additional spectrum assumption on the Schrödinger operator ℋ.
Notice that in our setting where the Schrödinger equation is defined on the compact domain, ℋ always has a discrete spectrum thanks to the well-known Hilbert-Schmidt theorem and the fact that (−Δ+) −1 is a compact self-adjoint operator on  2 (assuming the validity of Assumption 2.1).Moreover, the eigenvector of ℋ forms an orthonormal basis on  2 .As for the spectral gap, it is well-known that the Schrödinger operator with Dirichlet boundary condition has a positive spectral gap under certain convexity assumption on the domain and the potential; see e.g.[2,44].However, less is known about the spectral gap for the Neumann Schrödinger operator; some limited results were obtained by [3,28] in the one-dimensional setting when the potential is a single well.
To avoid any confusion with the subscript in the notation, let us denote the ground state eigenpair by ( * ,  * ) ≔ ( 0 ,  0 ), which our study focuses on.
The natural idea is to seek an approximate solution to Problem (2.1) within some hypothesis class ℱ ⊂  1 (Ω) that is parameterized by neural networks.In practice, the Monte Carlo method is employed to compute the high dimensional integrals defined by the inner products in (2.1), leading to the definition of empirical loss (or risk) minimization.More concretely, let us denote by  Ω the uniform probability distribution on the domain Ω.Then the population loss ℛ can be written as .
Let {  }  =1 be a sequence of random variables that are independent and identically distributed (i.i.d.) according to  Ω .The population loss is approximated by the following empirical loss where ℰ , and ℰ ,2 are defined by Note that we have used the fact that |Ω| = 1 in deriving the Monte Carlo approximation above.Let   be a minimizer of ℛ  within ℱ, i.e.,   = arg min ᵆ∈ℱ ℛ  ().Again since ℛ  () is scaling-invariant, we may assume that ‖  ‖  2 = 1.Our goal is to obtain quantitative estimates for the error between   and  * , and following the statistical learning literature we will call such error the generalization error.We are interested in quantifying the error between   and  * in terms of two criteria.The first one is given by the energy excess ℛ(  ) − ℛ( * ) that quantifies the approximation of ℛ(  ) to the leading eigenvalue  * = ℛ( * ).
To introduce the second quantity for measuring the error, we define the projection operator  onto the space of ground state by setting  = ⟨,  * ⟩ * .

Main results.
We aim to establish quantitative generalization error estimates between the approximate ground state   parametrized by neural networks and the exact ground state  * .Our particular interest is to show that under certain circumstances the generalization error of the neural network solution does not suffer from the curse of dimensionality.To this end, we will assume (and prove below) that the exact ground state  * lies in a smaller function space than the usual Sobolev space within which the functions can be approximated by neural networks without curse of dimensionality.Specifically, we consider the spectral Barron space [33] defined as follows.
Recall the domain Ω = [0, 1]  .Let us first define the set of cosine functions Let { û ()} ∈ℕ  0 be the expansion coefficients of a function  ∈  1 (Ω) under the basis {Φ  } ∈ℕ  0 .For  ≥ 0, the spectral Barron space ℬ  (Ω) on Ω is defined by which is equipped with the spectral Barron norm Note that we use || 1 to denote the ℓ 1 -norm of a vector .It is clear that ℬ  (Ω) is a Banach space since it can be viewed as a weighted ℓ 1 space ℓ 1   (ℕ  0 ) of the cosine coefficients defined on the lattice ℕ  0 with the weight   () = (1 +   ||  1 ).Moreover, since functions in ℬ  (Ω) have summable cosine coefficients, we have ℬ  (Ω) ↪ (Ω).When  = 2, we adopt the short notation ℬ(Ω) for ℬ 2 (Ω).Our notion of spectral Barron space is an adaptation of the Barron space defined in the seminal work [5]; see also the recent works [4,15,30,42] on other variants of Barron spaces.The original Barron function  in [5] is defined on the whole space ℝ  whose Fourier transform f() satisfies that ∫ | f()||| < ∞.Our spectral Barron space ℬ  (Ω) with  = 1, defined on the bounded domain Ω, can be viewed as a finite domain analog of the original Barron space from [5].
Remark 2.1.Notice that since the spectral Barron space embeds continuously into the space of bounded continuous functions, the interacting potential  defined by (2.5) in the example above unfortunately excludes some important physical potentials that are singular, such as the Coulomb potential.Note also that, in practical applications to many body electron systems, when the Coulomb potential is involved, often times specific ansatz that involve cusp conditions are used for the wave function.In such practical setting, the effective potential does not contain singularities and the theory developed here can still apply.Nonetheless, how to establish analogous Barron regularity of the Schrödinger equation with singular potentials remains an interesting open question.We will leave such cases for future studies.
Functions in the spectral Barron space differ substantially from those in Sobolev or Hölder spaces; most importantly, they can be approximated with respect to the  1norm by two-layer neural networks without curse of dimensionality.To make this more precise, we recall an approximation result from [33].Let us define for an activation function , a constant ℬ > 0 and the number of hidden neurons  the set of functions Following our earlier work [33], we will focus on the rescaled Softplus activation function where  > 0 is a rescaling parameter.Observe that SP  → ReLU pointwisely as  → 0 (see [33,Lemma 4.6]).Let ℱ SP  , (ℬ) be the set of neural networks defined by setting  = SP  in (2.6).Lemma 2.2 shows that functions in ℬ(Ω) can be well approximated by functions in ℱ SP  , (ℬ) without curse of dimensionality.√ .
The proof of the approximation bound (2.7) relies on first establishing a similar bound for two-layer networks with the ReLU activation function and then replacing ReLU with SP  ; the latter step induces the factor ln  on the right side of (2.7).We remark that the bound (2.7) does not only hold for two-layer networks with ReLU or SP  function, but can still be valid for other activation functions.In fact, the recent works [27,43] have shown that the convergence rate ( −1/2 ) of Lemma 2.2 can be improved to ( − 1 2 −   ) if ReLU  is used as the activation function where  > 0 depends on .Since the focus of the present paper is not to achieve the sharpest convergence estimate, we are content with Lemma 2.2 since this is enough to get a dimensional-independent rate for the generalization error.
With the above approximation result in Lemma 2.2 at hand, we are ready to state the main generalization theorem as follows.

√𝑚 ,
where  1 depends on ‖ * ‖ ℬ(Ω) , ,  max polynomially and  2 depends on ‖ * ‖ ℬ(Ω) linearly.In particular, with the choice  = √, we have that there exists  3 > 0 such that with probability at least 1 − 3, The proof of Theorem 2.3 relies on decomposing the generalization error into the sum of the approximation error (the second term on the right side of (2.9)) and statistical error (the first term on the right side of (2.9)) arising from the Monte Carlo approximation.The statistical error is further bounded by controlling the Rademacher complexity of certain neural network classes associated to the loss formulation (see Section 4.3).It is worth to comment that the numerator of the statistical error scales like (√(ln )) for  ≫ 1, which seems worse than the bound proved in [15] where the statistical error (or the Rademacher complexity) scales like ( ‖ᵆ‖ ℬ  1/2 ).This is mainly because [15] only considers the Rademacher complexity bound of Barron functions with finite Barron norm while we need to bound the Rademacher complexities of several neural network classes with -dependent network parameters.
Let us also comment about the largeness assumption on  and .In fact, by tracking the proof of Theorem 2.3, the condition (2.8) holds as long as  and  are larger than (‖ * ‖ ℬ(Ω) ).Therefore  and  need not be exponentially in  if ‖ * ‖ ℬ(Ω) does not grow exponentially in .
Thanks to Proposition 2.1, the generalization error in terms of the energy excess translates directly to that in terms of the  1  Setting  = √ in the above leads to that the following holds for some  6 > 0 with probability at least 1 − 3: Theorem 2.3 and Theorem 2.4 show that with high probability the convergence rate of the generalization error of two-layer network for approximating the ground state  * and the corresponding leading eigenvalue  * = ℛ( * ) does not suffer from the curse of dimensionality provided that the ground state  * ∈ ℬ(Ω).
Finally we justify the regularity assumption on the ground state in Theorem 2.5.This gives a novel solution theory of high dimensional eigenvalue problems in Barron type spaces.Theorem 2.5.Assume that  ∈ ℬ  (Ω) with  ≥ 0 and  satisfies Assumption 2.1.Then the ground state  * ∈ ℬ +2 (Ω).
Our idea of proving Theorem 2.5 differs from the standard proof of Sobolev regularity of eigenfunctions, which usually relies on bootstrapping estimates on the weak derivative of the eigenfunctions.Instead, we prove Theorem 2.5 by reformulating the ground state problem as a fixed point problem on the spectral Barron space ℬ  (Ω); the existence of a nontrivial fixed point is proved by employing the celebrated Krein-Rutman theorem [31].See Section 5 for a complete proof.

Stability estimate of the ground state (Proof of Proposition 2.1)
In this section, we show that the offset ‖ ⟂ ‖  2 (Ω) of any  ∈  1 (Ω) can be bounded by the energy excess ℛ() − ℛ( * ).
To obtain the bound on ∇ ⟂ , we notice that Rearranging the terms, we arrive at  Here  ℱ = arg min ᵆ∈ℱ ℛ().Note that ℛ  (  ) − ℛ  ( ℱ ) ≤ 0 since   is the minimizer of ℛ  .Therefore Note that the first term  1 is the statistical error arising from the random approximation of integrals, the second term  2 is the Monte Carlo error and the third term  3 is the approximation error term due to restricting the minimization of ℰ from over the set  1 (Ω) to ℱ; see an upper bound of  3 in Theorem 4.5 when ℱ is chosen as the set of two-layer neural networks.To control the statistical errors, we employ the well-known tool of Rademacher complexity, for which we recall its definition as follows.where the expectation E  is taken with respect to the independent uniform Bernoulli sequence {  }  ).
Proof.The proof follows directly from [33, Lemma 5.5] and [33,Lemma 5.7] by slightly adjusting the constants.We thus omit the details.□ Proof of Theorem 4.6.First from the definition of ℱ SP  , (ℬ) and the fact that [33,Lemma 4.6]), one has the following uniform bound sup Applying Theorem 4.7 with  = 0, we obtain from Lemma 4. 8  In this section we aim to prove the regularity of the ground state  * in the spectral Barron space as shown in Theorem 2.5.Since our proof relies heavily on the spectrum theory of positive linear operators on ordered Banach spaces (especially the Krein-Rutman theorem), we first recall some relevant terminologies and useful facts from linear functional analysis.Recall the Beurling-Gelfand formula   () = lim →∞ ‖  ‖ 1/ .We also note that if  ∶  →  is compact, then   () ⧵ {0} =  , () ⧵ {0}.
Lemma 5.1.Let  ∶  →  be a linear compact operator on a Hilbert space  equipped with an inner product (⋅, ⋅)  and the associated norm ‖⋅‖  .Let  ⫋  be a dense subspace of .Assume that  is a Banach space equipped with the norm ‖ ⋅ ‖  and that  ∶  →  is also compact.Then   () =   ().
Next we show that   () =   ().Assume otherwise that   () >   (𝑇).By the definition of   () and the assumption that  ∶  →  is a compact operator, there exists an eigenvalue  ∈  , () such that   () ≥ || >   ().This implies that Ker  ( − ) ≔ { ∈  ∶ ( − ) = 0} = {0}, i.e., Ker  ( − ) ∩  = {0}.Since  is a dense subset of , this implies that Ker  ( − ) ⊂ F ⟂ =  ⟂ = {0}.This contradicts with the fact that  is an eigenvalue of  on  and completes the proof of the lemma.□ 5.2.Krein-Rutman theorem and the leading eigenvalue.In this section, we recall the famous Krein-Rutman theorem [31] on the leading eigenvalue and eigenfunction of positive operators on ordered Banach spaces.To this end, let us first recall some terminologies on ordered Banach spaces.Given a Banach space , a closed convex subset  ⊂  is called a cone on  if  ⊂  for every  > 0 and  ∩ {−} = {0}.A cone  induces a natural partial ordering ≤ on the Banach space :  ≤  if and only if  −  ∈ .Therefore a Banach space  with a cone  is called an ordered Banach space, denoted by (, ).If the cone  satisfies that  −  = , then the cone  is called a total cone.We define K =  ⧵ {0} and denote by  ∘ the interior of .If  has nonempty interior  ∘ , then  is called a solid cone.It is not hard to see that a solid cone is total.
Example 5.1.Consider the Banach space (Ω) of continuous functions on a bounded domain Ω ⊂ ℝ  .The space (Ω) is an ordered Banach space with cone  + (Ω) consisting of nonnegative functions in (Ω).This cone is solid since any strictly positive function is an interior point.As an important consequence of the Krein-Rutman theorem, Theorem 5.3 establishes the simplicity of the leading eigenvalue of a strongly positive compact operator on an ordered Banach space.Theorem 5.3 ([1, Theorem 3.2]).Let  be an ordered Banach space with a solid cone .Let  ∶  →  be a strongly positive compact operator.Then (i) The spectral radius () > 0; (ii) () is a simple eigenvalue with an eigenvector  ∈  ∘ and there is no other eigenvalue with a positive eigenvector.Recall the spectral Barron space ℬ  (Ω) defined in (2.4).We also recall from [33] the next important lemma which shows that the operator  ∶ ℬ  (Ω) → ℬ +2 (Ω) is bounded.
The estimates in Barron spaces of Lemma 5.4 are different from the standard proofs for the well-posedness and the regularity estimates in Sobolev spaces.In fact, the lack of Hilbert structure of the spectral Barron space prevents us using the Lax-Milgram theory to obtain the existence and uniqueness.Instead, we rewrite the stationary Schrödinger equation with a source term as an equivalent Fredholm integral equation of the second kind for the cosine coefficients of the solution in the weighted space ℓ 1   (ℕ  0 ).Thanks to the celebrated theorem of Fredholm alternative, the existence and stability estimate of the solution then follow from uniqueness where the latter holds as a result of the standard energy estimate.A detailed proof of Lemma 5.4 can be found in [33,Appendix D.2].
Moreover, as a result of the semigroup property, the solution operator  can be written as Note that owing to the upper bound of (5.1) the integral is finite.It follows from the last identity and the lower bound of (5.Multiplying above with  − max and then integrating on [1,2] with respect to  yields that () ≥   1 Leb() ∫ we have  * =  *  * .An application of Lemma 5.4 implies that  * ∈ ℬ +2 (Ω) if and only if  * ∈ ℬ  (Ω).To show the latter, let us consider the operator  defined on the ordered Banach space ℬ  (Ω) with the solid cone ℬ  + (Ω).Observe that  ∶  2 (Ω) →  2 (Ω) is compact and that by Corollary 5.5  ∶ ℬ  (Ω) → ℬ  (Ω) is also compact.Therefore by Lemma 5.1 the spectral radii of  are identical when viewed as operators on ℬ  (Ω) and  2 (Ω) respectively.It follows from Theorem 5.3 and the strongly positivity of  established in Lemma 5.6 that there exists a unique (up to a multiplicative constant) eigenfunction  * ∈ ℬ  (Ω) corresponding to the spectral radius () = 1/ * .Moreover,  * is strictly positive on Ω.This completes the proof.□

2. 1 .
Set-up of problem.Let Ω = [0, 1]  be the unit hypercube on ℝ  with the boundary Ω.Consider the Neumann eigenvalue problem for the Schrödinger operator ℋ = −Δ +  =  in Ω,   = 0 on Ω, where ℋ ≔ −Δ+ is the Schrödinger operator with the potential function  equipped with the Neumann boundary condition.We are particularly interested in computing the ground state of ℋ, that is the eigenfunction associated to the smallest eigenvalue of ℋ.

Definition 4 . 1 .
We define for a set of random variables {  }  =1 independently distributed according to  Ω and a function class  the random variable R  () ≔ E  [
Theorem 2.4.Suppose that the assumption of Theorem 2.3 holds and suppose further that ℋ has a spectral gap.Then there exist positive constants  4 and  5 depending polynomially on ‖ * ‖ ℬ(Ω) , ,  min ,  max and  1 − 0 such that with probability at least 1−3, We first introduce an oracle inequality for the empirical loss that decomposes the generalization error into the sum of approximation error and statistical error.Recall the population loss ℛ and the empirical loss ℛ  defined in (2.2) and (2.3) respectively.Consider the minimization of ℛ  in a function class ℱ and we denote by   a minimizer of ℛ  within ℱ, i.e.   = arg min ᵆ∈ℱ ℛ  ().We aim to bound the energy excess Δℛ  ≔ ℛ(  ) − ℛ( * ) where  * is the exact ground state.Let us first decompose Δℛ  as follows: 2. □ 4. Proof of Theorem 2.3 4.1.Oracle inequality for the generalization error.
Consider two ordered Banach spaces  and , with cones  and  respectively.A linear operator  ∶  →  is called positive if () ⊂ , and strictly positive if ( Ṗ ) ⊂ Q .If in addition  is solid, then  is called strongly positive if ( Ṗ ) ⊂  ∘ .