Adaptive wavelet methods for elliptic operator equations: Convergence rates

By Albert Cohen, Wolfgang Dahmen, and Ronald DeVore

Abstract

This paper is concerned with the construction and analysis of wavelet-based adaptive algorithms for the numerical solution of elliptic equations. These algorithms approximate the solution of the equation by a linear combination of wavelets. Therefore, a benchmark for their performance is provided by the rate of best approximation to by an arbitrary linear combination of wavelets (so called -term approximation), which would be obtained by keeping the largest wavelet coefficients of the real solution (which of course is unknown). The main result of the paper is the construction of an adaptive scheme which produces an approximation to with error in the energy norm, whenever such a rate is possible by -term approximation. The range of for which this holds is only limited by the approximation properties of the wavelets together with their ability to compress the elliptic operator. Moreover, it is shown that the number of arithmetic operations needed to compute the approximate solution stays proportional to . The adaptive algorithm applies to a wide class of elliptic problems and wavelet bases. The analysis in this paper puts forward new techniques for treating elliptic problems as well as the linear systems of equations that arise from the wavelet discretization.

1. Introduction

1.1. Background

Adaptive methods, such as adaptive finite elements methods (FEM), are frequently used to numerically solve elliptic equations when the solution is known to have singularities. A typical algorithm uses information gained during a given stage of the computation to produce a new mesh for the next iteration. Thus, the adaptive procedure depends on the current numerical resolution of . Accordingly, these methods produce a form of nonlinear approximation of the solution, in contrast with linear methods in which the numerical procedure is set in advance and does not depend on the solution to be resolved.

The motivation for adaptive methods is that they provide flexibility to use finer resolution near singularities of the solution and thereby improve on the approximation efficiency. Since the startling papers Reference 2Reference 3 the understanding and practical realization of adaptive refinement schemes in a finite element context has been documented in numerous publications Reference 3Reference 4Reference 5Reference 13Reference 36. A key ingredient in most adaptive algorithms are a-posteriori error estimators or indicators derived from the current residual or the solution of local problems. They consist of local quantities such as jumps of derivatives across the interface between adjacent triangles or simplices. One often succeeds in bounding the (global) error of the current solution with respect to the energy norm, say, by sums of these quantities from below and above. Thus refining the mesh where these local quantities are large is hoped to reduce the bounds and hence the error in the next computation. Computational experience frequently confirms the success of such techniques for elliptic boundary value problems in the sense that adaptively generated highly nonuniform meshes indeed give rise to an accuracy that would require the solution of much larger systems of equations based on uniform refinements. However, on a rigorous level the quantitative gain of adaptive techniques is usually not clear. The central question is whether the mesh refinements actually result, at each step, in some fixed error reduction. To our knowledge, only in Reference 35 has convergence of an adaptive scheme been established for a rather special case, namely a piecewise linear finite element discretization of the classical Dirichlet problem for Laplace’s equation. There is usually no rigorous proof of the overall convergence of such schemes unless one assumes some quantitative information such as the saturation property about the unknown solution Reference 13. Saturation properties are assumed but not proven to hold.

Moreover, the derivation of error indicators in conventional discretizations hinges on the locality of differential operators. Additional difficulties are therefore encountered when considering elliptic operators with nonlocal Schwartz kernel arising, for instance, in connection with boundary integral equations.

In summary, there seem to be at least two reasons for this state of affairs:

(i) There is an inherent difficulty, even for local operators, in utilizing the information available at a given stage in the adaptive computation to guarantee that a suitable reduction will occur in the residual error during the next adaptive step.

(ii) Finite element analysis is traditionally based on Sobolev regularity (see e.g., Reference 14 or Reference 15), which is known to govern the performance of linear methods. Only recent developments in the understanding of nonlinear methods have revealed that Besov regularity is a decidedly different and more appropriate smoothness scale for the analysis of adaptive schemes, see e.g., Reference 31.

In view of the significant computational overhead and the severe complications caused by handling appropriate data structures for adaptive schemes, not only guaranteeing convergence but above all knowing its speed is of paramount importance for deciding whether or under which circumstances adaptive techniques actually pay off. To our knowledge nothing is known so far about the actual rate of convergence of adaptive FEM solvers, by which we mean the relation between the accuracy of the approximate solution and the involved degrees of freedom, or better yet the number of arithmetic operations.

1.2. Wavelet methods

An alternative to FEM are wavelet based methods. Similarily to mesh refinement in FEM, these methods offer the possibility to compress smooth functions with isolated singularities into high-order adaptive approximations involving a small number of basis functions. In addition, it has been recognized for some time Reference 11 that for a large class of operators (including integral operators) wavelet bases give rise to matrix representations that are quasi-sparse (see Sections 2 and 3 for a definition of quasi-sparse) and admit simple diagonal preconditioners in the case of elliptic operators. Therefore, it is natural to develop adaptive strategies based on wavelet discretizations in order to solve elliptic operator equations numerically.

The state of wavelet-based solvers is still in its infancy, and certain inherent impediments to their numerical use remain. These are mainly due to the difficulty of dealing with realistic domain geometries. Nevertheless, these solvers show great promise, especially for adaptive approximation (see e.g., Reference 1Reference 12Reference 16Reference 18Reference 25). Most adaptive strategies exploit the fact that wavelet coefficients convey detailed information on the local regularity of a function and thereby allow the detection of its singularities. The rule of thumb is that wherever wavelet coefficients of the currently computed solution are large in modulus, additional refinements are necessary. In some sense, this amounts to using the size of the computed coefficients as local a-posteriori error indicators. Note that here refinement has a somewhat different meaning than in the finite element setting. There the adapted spaces result from refining a mesh. The mesh is the primary controlling device and may create its own problems (of geometric nature) that have nothing to do with the underlying analytic task. In the wavelet context refinement means to add suitably selected further basis functions to those that are used to approximate the current solution. We refer to this as space refinement.

In spite of promising numerical performances, the problem remains (as in the finite element context) to quantify these strategies, that is, to decide which and how many additional wavelets need to be added in a refinement step in order to guarantee a fixed error reduction rate at the next resolution step. An adaptive wavelet scheme based on a-posteriori error estimators has been recently developed in Reference 20, which ensures this fixed error reduction for a wide class of elliptic operators, including those of negative order. This shows that making use of the characteristic features of wavelet expansions, such as the sparsification and preconditioning of elliptic operators, allows one to go beyond what is typically known in the conventional framework of adaptive FEM. However, similar to FEM, there are so far no results about the rate of convergence of adaptive wavelet based solvers, i.e., the dependence of the error on the number of degrees of freedom.

1.3. The objectives

The purpose of the present paper is twofold. First, we provide analytical tools that can be utilized in studying the theoretical performance of adaptive algorithms. Second, we show how these tools can be used to construct and analyze wavelet based adaptive algorithms which display optimal approximation and complexity properties in the sense that we describe below.

The adaptive methods we analyze in this paper take the following form. We assume that we have in hand a wavelet basis to be used for numerically resolving the elliptic equation. Our adaptive scheme will iteratively produce finite sets , , and the Galerkin approximation to from the space . The function is a linear combination of wavelets. Thus the adaptive method can be viewed as a particular form of nonlinear -term wavelet approximation, and a benchmark for the performance of such an adaptive method is provided by comparison with best -term approximation (in the energy norm) when full knowledge of is available.

Much is known about -term approximation. In particular, there is a characterization of the functions that can be approximated in the energy norm with accuracy by using linear combinations of wavelets. As we already mentioned, this class is typically a Besov space, which is substantially larger than the corresponding Sobolev space which ensures accuracy for uniform discretization with parameters. In several instances of the elliptic problems, e.g., when the right hand side has singularities, or when the boundary of has corners, the Besov regularity of the solution will exceed its Sobolev regularity (see Reference 19 and Reference 21). So these solutions can be approximated better by best -term approximation than by uniformly refined spaces, and the use of adaptive methods is suggested. Another important feature of -term approximation is that a near best approximation is produced by thresholding, i.e., simply keeping the largest contributions (measured in the same metric as the approximation error) of the wavelet expansion of .

Of course, since best -term approximation requires complete information on the approximated function, it cannot be applied directly to the unknown solution. It is certainly not clear beforehand whether a concrete numerical scheme can produce at least asymptotically the same convergence rate. Thus ideally an optimal adaptive wavelet algorithm should produce a result similar to thresholding the exact solution. In more quantitative terms this means that whenever the solution is in , the approximations should satisfy

where is the energy norm and is the norm for . Since in practice one is mostly interested in controlling a prescribed accuracy with a minimal number of parameters, we shall rather say that the adaptive algorithm is of optimal order if whenever the solution is in , then for all , there exists such that

and such that

Such a property ensures an optimal memory size for the description of the approximate solution.

Another crucial aspect of the adaptive algorithms is their computational complexity: we shall say that the adaptive algorithm is computationally optimal if, in addition to Equation 1.2Equation 1.3, the number of arithmetic operation needed to derive is proportional to . Note that an instance of computational optimality in the context of linear methods is provided by the full multigrid algorithm when represents the number of unknowns necessary to achieve a given accuracy on a uniform grid. We are thus interested in algorithms that exhibit the same type of computational optimality with respect to an optimal adaptive grid which is not known in advance and should itself be generated by the algorithm.

The main accomplishment of this paper is the development and analysis of an adaptive numerical scheme which for a wide class of operator equations (including those of negative order) is optimal with regard to best -term approximation and is also computationally optimal in the above sense. Let us mention that a simplified version of this algorithm has been developed and tested in Reference 18, as well as a more elaborate version in Reference 6. In this last version, appropriate object-oriented data structures play a crucial role for handling properly the sparse representation of the solution. In both cases, the numerical experiments confirm a similar behavior between the numerical error generated by such adaptive algorithms and by thresholding the exact solution. We note however, that depending on the concrete application at hand, these implementations still suffer from currently suboptimal schemes for a central task, namely evaluating the entries of the stiffness matrices. Since, in the case of piecewise polynomial wavelets and (piecewise) constant coefficients PDE’s, these elements are given by explicit formulae, the cost of this task is negligible. However, a much more sophisticated treatment appears to be necessary in the general case where, for instance, the techniques recently proposed in Reference 10 should lead to significant improvements.

Although the present investigations are confined to symmetric elliptic problems, the results provide, in our opinion, a core ingredient for the treatment of more complex tasks. For instance, the first steps in this direction are the development of a-posteriori wavelet strategies for saddle point problems in Reference 22, the use of stabilized variational formulations in Reference 9, or least squares formulations in Reference 24.

1.4. Organization of the paper

In Section 2, we introduce the general setting of elliptic operator equations where our results apply. In this context, after applying a diagonal preconditioner, wavelet discretizations allow us to view the equation as a discrete well conditioned linear system.

In Section 3, we review certain aspects of nonlinear approximation, quasi-sparse matrices and fast multiplication using such matrices. The main result of this section is an algorithm for the fast computation of the application of a quasi-sparse matrix to a vector.

In Section 4, we analyze the rate of convergence of the refinement procedure introduced earlier in Reference 20. We will refer to this scheme here as Algorithm I. We show that this algorithm is optimal for a small range of . However, the full range of optimality should be limited only by the properties of the wavelet basis (smoothness and vanishing moments) and the operator; this is not the case for Algorithm I. The analysis in Section 4, however, identifies the barrier that keeps Algorithm I from being optimal in the full range of .

In Section 5, we introduce a second strategy—Algorithm II—for adaptively generating the sets that is shown to provide optimal approximation of order for the full range of . The new ingredient that distinguishes Algorithm II from Algorithm I is the addition of thresholding steps which delete some indices from . This would be the analogue of coarsening the mesh in FEM.

Although we have qualified so far both procedures in Sections 4 and 5 as “algorithms”, we have actually ignored any issue concerning practical realization. They are idealized in the sense that the exact assessment of residuals and Galerkin solutions is assumed. This was done in order to clearly identify the essential analytical tasks. Practical realizations require truncations and approximations of these quantities. Section 6 is devoted to developing the ingredients of a realistic numerical scheme. This includes quantitative thresholding procedures, approximate matrix/vector multiplication, approximate Galerkin solvers and the approximate evaluation of residuals.

In Section 7 we employ these ingredients to formulate a computable version of Algorithm II, which is shown to be computationally optimal for the full range of . Recall that this means that it realizes for this range the order of best -term approximation at the expense of a number of arithmetic operations that stays proportional to the number of significant coefficients. Computational optimality hinges to a great extent on the fast approximate matrix/vector multiplication from Section 3.

It should be noted however that an additional cost in our wavelet adaptive algorithm is incurred by sorting the coefficients in the currently computed solution. This cost at stage is of order , where , thus slightly larger than the cost in arithmetic operations. It should be stressed that the complexity of the algorithm is analysed under the assumption that the solution exhibits a certain rate of best -term approximation which is, for instance, implied by a certain Besov regularity. The algorithm itself does not require any a-priori assumption of that sort.

We have decided to carry out the (admittedly more technical) analysis of the numerical ingredients in some detail in order to substantiate our claim that the optimality analysis is not based on any hidden assumptions (beyond those hypotheses that are explicitly stated) such as accessing infinitely many data. Nevertheless the main message of this paper can be read in Sections 4 and 5: optimal adaptive approximations of elliptic equations can be computed by iterative wavelet refinements using a-posteriori error estimators, provided that the computed solution is regularly updated by appropriate thresholding procedures. This fact was already suggested by numerical experiments in Reference 18 that show similar behavior between the numerical error generated by such adaptive algorithms and by thresholding the exact solution.

2. The setting

In this section, we shall introduce the setting in which our results apply. In essence, our analysis applies whenever the elliptic operator equation takes place on a manifold or domain which admits a biorthogonal wavelet basis.

2.1. Ellipticity assumptions

This subsection gives the assumptions we make on the operator equation to be numerically solved. These assumptions are quite mild and apply in great generality.

Let denote a bounded open domain in the Euclidean space with Lipschitz boundary or, more generally, a Lipschitz manifold of dimension . In particular, could be a closed surface which arises as a domain for a boundary integral equation. The space consists of all square integrable functions on with respect to the (canonically induced) Lebesgue measure. The corresponding inner product is denoted by

Let be a linear operator mapping a Hilbert space into (its dual relative to the pairing ), where is a space with the property that either or its dual is embedded in . The operator induces the bilinear form defined on by

where denotes the duality product.

(A1): We assume that the bilinear form is symmetric positive definite and elliptic in the sense that

Here, and throughout this paper, means that both quantities can be uniformly bounded by constant multiples of each other. Likewise indicates inequalities up to constant factors.

It follows that is also a Hilbert space with respect to the inner product and that this inner product induces an equivalent norm (called the energy norm) on by

By duality, thus defines an isomorphism from onto . We shall study the equation

with . From our assumptions, it follows that for any , this equation has a unique solution in , which will always be denoted by . This is also the unique solution of the variational equation

The typical examples included in the above assumptions are the Poisson or the biharmonic equations on bounded domains in ; single or double layer potentials, and hypersingular operators on closed surfaces arising in the context of boundary integral equations. In these examples is a Sobolev space, e.g., , , or ; see Reference 23Reference 20Reference 41 for examples.

2.2. Wavelet assumptions

By now wavelet bases are available for various types of domains that are relevant for the formulation of operator equations. This covers, for instance, polyhedral surfaces of dimension two and three Reference 29 as well as manifolds or domains that can be represented as a disjoint union of smooth regular parametric images of a simple parameter domain such as the unit cube Reference 27.

There are many excellent accounts of wavelets on (see e.g., Reference 38 or Reference 30). For the construction and description of wavelet bases on domains and manifolds, we refer the reader to the survey paper Reference 23 and the references therein. This survey also sets forth the notation we shall employ below for indexing the elements in a wavelet basis. To understand this notation, it may be useful for the reader to keep in mind the case of wavelet bases on . In this setting, a typical biorthogonal wavelet basis of compactly supported functions is given by the shifted dilates of a set of functions. Namely, the collection of functions

forms a Riesz basis for . The dual basis is given by

with again a set of functions. The integer gives the dyadic level ( the frequency) of the wavelet. The multi-integer gives the position () of the wavelet. Namely, the wavelet has support contained in a cube of diameter centered at the point . Note that there are functions with the same dyadic level and position .

Another way to construct a wavelet basis for is to start the multiscale decomposition at a finite dyadic level . In this case, the basis consists of the functions of Equation 2.7 with , together with a family of functions

with a fixed (scaling) function. Wavelet bases for domains take a similar form except that some alterations are necessary near the boundary.

We shall denote wavelet bases by . In the particular case above, this notation incorporates the three parameters into the one . We use to denote the dyadic level of the wavelet. We let , , consist of the wavelets at level .

In all classical constructions of compactly supported wavelets, there exist fixed constants and such that and such that for all there are at most indices such that .

Since we shall consider only bounded domains in this paper, the wavelet decomposition will begin at some fixed level . For notational convenience only, we assume . We define to be the set of scaling functions in the wavelet basis. We shall assume that is a domain or manifold which admits two sets of functions:

that form a biorthogonal wavelet bases on : writing for any two collections of functions in , one has

where is the identity matrix.

A typical feature in the theory of biorthogonal bases is that the sequences are Riesz bases. That is, using the shorthand notation , one has

This property means that the wavelet bases characterize . In the present context of elliptic equations, we shall need not Equation 2.12 but rather the fact that these bases provide a characterization of and in terms of wavelet coefficients. This is expressed by the following specific assumption.

(A2): Let the energy space be equipped with the norm and its dual space with the norm . We assume that the wavelets in are in , whereas those in are in (in this context, we can assume that Equation 2.11 simply holds in the sense of the duality ). We assume that each has a wavelet expansion (with coordinates ) and that

with a fixed positive diagonal matrix.

Observe that Equation 2.13 implies that , and that (resp. ) is an unconditional (resp. Riesz) basis for . By duality, one easily obtains that each has a wavelet expansion (with coordinates ) that satisfies

One should keep in mind, though, that is only needed for analysis purposes. The Galerkin schemes to be considered below only involve , while never enters any computation and need not even be known explicitly.

It is well known (see e.g., Reference 27) that wavelet bases provide such characterizations for a large variety of spaces (in particular the Sobolev and Besov spaces for a certain parameter range which depends on the smoothness of the wavelets). In the context of elliptic equations, is typically some Sobolev space . In this case (A2) is satisfied whenever the wavelets are sufficiently smooth, with . For instance, when , one has .

2.3. Discretization and preconditioning of the elliptic equation

Using wavelets, we can rewrite Equation 2.5 as an infinite system of linear equations. We take wavelet bases and satisfying (A2) and write the unknown solution and the given right hand side in terms of the basis . This gives the system of equations

The solution to Equation 2.15 gives the wavelet coefficients of the solution to Equation 2.5.

An advantage of wavelet bases is that they allow for trivial preconditioning of the linear system Equation 2.15. This preconditioning is given by the matrix of (A2) and results in the system of equations

or, more compactly,

where

Let us briefly explain the effect of the above diagonal scaling with regard to preconditioning. To this end, note that by (A1), the matrix is symmetric positive definite. We define its associated bilinear form by

where is the standard inner product in , and denote the norm associated with this bilinear form by . In other words,

Combining the ellipticity assumption (A1) together with the wavelet characterization of (A2), we obtain that and are equivalent norms, i.e., there exist constants such that

It is immediate that these constants are also such that

and

In particular, the condition number of satisfies

The fact that the diagonal scaling turns the original operator into an isomorphism on will be a cornerstone of the subsequent development. Denoting by the entries of and by the section of restricted to the set , we see from the positive definiteness of that

and that the condition numbers of the submatrices remain uniformly bounded for any subset , i.e.,

Finally, it is easy to check that the constants and also provide the equivalence

Here and later, we adopt the following rule about denoting constants. We shall denote constants which appear later in our analysis by . Other constants, whose value is not so important for us, will be denoted by or incorporated into the notation.

A typical instance of the above setting involves Sobolev spaces , in which case the entries of the diagonal matrix can be chosen as . Of course, the constants in Equation 2.24 will then depend on the relation between the energy norm Equation 2.20 and the Sobolev norm. In some cases such a detour through a Sobolev space is not necessary, and Equation 2.13 can be arranged to hold for a suitable when already coincides with the energy norm. A simple example is , where is an appropriate choice. In fact, Equation 2.13 will then hold independently of .

2.4. Quasi-sparsity assumptions on the stiffness matrix

Another advantage of the wavelet basis is that for a large class of elliptic operators, the resulting preconditioned matrix exhibits fast decay away from the diagonal. This will later be crucial with regard to storage economy and efficiency of (approximate) matrix/vector multiplication.

Consider for example the (typical) case when is the Sobolev space of order or its subspace . Then, for a large class of elliptic operators, we have

with and and

The validity of Equation 2.28 has been established in numerous settings (see e.g., Reference 23Reference 11Reference 40Reference 42). Decay estimates of the form Equation 2.28 were initially introduced in Reference 37 in the context of Littlewood-Paley analysis. The constant depends on the smoothness of the wavelets, whereas is related to the approximation order of the dual multiresolution (resp. the vanishing moments of the wavelets) and the order of the operator . Estimates of the type Equation 2.28 are known to hold for a wide range of cases, including classical pseudo-differential operators and Calderón-Zygmund operators (see e.g., Reference 26Reference 41). In particular, the single and double layer potential operators fall into this category. We refer the reader to Reference 23 for a full discussion of settings where Equation 2.28 is valid.

We introduce the class of all matrices which satisfy

with defined by Equation 2.29. We say that a matrix is quasi-sparse if it is in the class for some and . Properties of quasi-sparse matrices will be discussed in Section 3.

(A3): We assume that, for some , , the matrix of Equation 2.17 is in the class .

Let us note that in the case discussed earlier, we obtain Equation 2.30 from Equation 2.28 because .

2.5. Wavelet Galerkin methods

A wavelet based Galerkin method for solving Equation 2.5 takes the following form. We choose a finite set of wavelet indices and use the space as our trial and analysis space. The approximate Galerkin solution from is defined by the conditions

We introduce some notation which will help embed the finite dimensional problem Equation 2.31 into the infinite dimensional space . For any set , we let

Thus, we will for convenience identify a vector with finitely many components with the sequence obtained by setting all components outside its support equal to zero. Moreover, let denote the orthogonal projector from onto ; that is, is simply obtained from by setting all coordinates outside equal to zero.

Using the preconditioning matrix , Equation 2.31 is equivalent to the finite linear system

with unknown vector , and where and refer to the preconditioned system given in Equation 2.18. The solution to Equation 2.32 determines the wavelet coefficients of . Namely,

Of course, coefficients corresponding to are zero.

We shall work almost exclusively in the remainder of this paper with the preconditioned discrete system Equation 2.17. Note that the solution to Equation 2.32 can be viewed as its Galerkin approximation. In turn, it has the property that

Our problem then is to find a good set of indices such that the Galerkin solution is a good approximation to . In view of the equivalences (see Equation 2.21, Equation 2.3, Equation 2.20)

any estimate for the error translates into an estimate for how well the Galerkin solution from the wavelet space approximates .

3. -term approximation and quasi-sparse matrices

We have seen in the previous section how the problem of finding Galerkin solutions to from the wavelet space is equivalent to finding Galerkin approximations to from the sequence spaces . This leads us to understand first what properties of a vector determine its approximability from the spaces . It turns out that this is a simple and well understood problem in approximation theory, which we now review.

3.1. -term approximation

In this subsection, we want to understand the properties of that determine its approximability by a with of small cardinality. This is a special case of what is called -term approximation, which is completely understood in our setting. We shall recall the simple results in this subject that are pertinent to our analysis.

For each , let . Thus, is the (nonlinear) subspace of consisting of all vectors with at most nonzero coordinates. Given , , we introduce the error of approximation

A best approximation to from is obtained by taking a set with on which takes its largest values. The set is not unique, but all such sets yield best approximations from . Indeed, given such a set , we let be the vector in which agrees with on . Then

We next want to understand which vectors can be approximated efficiently by the elements of . For each , we let denote the set of all vectors such that the quantity

is finite, where . Thus consists of all vectors which can be approximated with order by the elements of .

It is easy to characterize for any . For this we introduce the decreasing rearrangement of . For each , let be the -th largest of the numbers and let . For each , we let denote the collection of all vectors for which the quantity

is finite. The space is called weak and is a special case of a Lorentz sequence space. The expression Equation 3.3 defines its quasi-norm (it does not in general satisfy the triangle inequality). We shall only consider the case in this paper. In this case , and for notational convenience we define

If , are two sequences, then

with depending on when tends to zero.

We have if and only if , , and the smallest such is equal to . In other words, the coordinates of when rearranged in decreasing order are required to decay at the rate . Another description of this space is given by

and the smallest which satisfies Equation 3.6 is equivalent to .

Remark 3.1.

An alternative description of is

Moreover, the smallest such is equivalent to .

We recall that contains , and we trivially have and therefore

i.e.,

The following well known result characterizes .

Proposition 3.2.

Given , let be defined by

Then the sequence belongs to if and only if and

with constants of equivalency depending only on when tends to zero (respectively, only on when tends to ). In particular, if , then

with the constant depending only on when tends to zero.

For the simple proof of this proposition, we refer the reader to Reference 34 or the survey Reference 31.

Conditions like or , are equivalent to smoothness conditions on the function . We describe a typical situation when and . Then, the condition is equivalent to the requirement that the wavelet coefficients , , satisfy

For a certain range of (and hence ) depending on the smoothness and vanishing moments of the wavelet basis, the condition Equation 3.12 describes membership in a certain Besov space. Namely, for and related by Equation 3.9, we have

with the usual Besov space measuring orders of smoothness in ”. The weaker condition gives a slightly larger space endowed with the (quasi) norm

In view of Equation 2.35, the space consists exactly of those functions whose best -term wavelet approximation in the energy norm produces an error .

3.2. Quasi-sparse matrices

In this subsection, we shall consider some of the properties of the quasi-sparse matrices that appear in the discrete reformulation Equation 2.17 of the elliptic equation Equation 2.5. We recall that such matrices are in the class for some , ; and therefore they satisfy Equation 2.30

We begin by discussing the mapping properties of matrices . We denote by the spectral norm of . We shall use the following version of the Schur lemma: if for the matrix there are a sequence , , and a positive constant such that

then . An instance of the application of this lemma to the classes is the following result (which can be found in Reference 37).

Proposition 3.3.

If and , then every defines a bounded operator on .

Proof.

We apply Schur’s lemma with the weights , . To establish the first inequality in Equation 3.15, let and let . Then, using the estimate for the summation in space, we obtain

A symmetric argument confirms the second estimate in Equation 3.15, proving that is bounded.

While Proposition 3.3 is of general interest, it does not give us any additional information when applied to the matrix of Equation 2.17, since our ellipticity assumption (A1) already implies that is bounded on .

It is well-known that decay estimates of the type Equation 2.30 form the basis of matrix compression Reference 11Reference 26Reference 40Reference 41. The following proposition employs a compression technique which is somewhat different from the results in those papers.

Proposition 3.4.

For each , let

and assume that . Then, given any , there exists for every a matrix which contains at most nonzero entries in each row and column and provides the approximation efficiency

Moreover, this result also holds for provided .

Proof.

Let be in . We fix and we first apply a truncation in scale, defining , where

In order to estimate the spectral norm , we can employ the Schur lemma with the same weights as in the proof of Proposition 3.3. As in that proof, we obtain, for any and ,

It follows that

We also need a truncation in space provided by the new matrix , where

and where is a polynomially decreasing sequence such that . Specifically, we take .

We can then immediately estimate the maximal number of non-zero entries in each row and column of by

In view of Equation 3.18, it remains only to prove that . In order to estimate the spectral norm , we again use the Schur lemma with the same weights. For each and , we have

Therefore, for any ,

In the case where (resp.), the factor on the right of is bounded by (resp. ) with a constant independent of and . Thus, when , we obtain the desired estimate of for all . On the other hand, when , this factor is still bounded by a fixed constant provided .

Remark 3.5.

In the case that the matrix of Proposition 3.4 is the preconditioned matrix representation of an elliptic operator which is local (i.e., , ) then the truncation in space in the proof of this proposition is not needed and the proposition holds for .

3.3. Fast multiplication

We now come to the main result of this section, which is the fast computation of quasi-sparse matrices applied to vectors. We continue to denote the spectral norm of a matrix by .

We have seen that decay estimates like Equation 2.28 imply compressibility in the sense of Proposition 3.4. To emphasize that only this compressibility (which may actually hold also for other operators than those discussed in connection with Equation 2.28) matters for the subsequent analysis we introduce the following class of compressible matrices.

Definition 3.6.

We say a matrix is in the class if there are two positive sequences and that are both summable and for every there exists a matrix with at most nonzero entries per row and column such that

We further define

where the minimum is taken over all such sequences and .

We record the following consequence of Proposition 3.4.

Corollary 3.7.

Let be defined by Equation 3.16. Then for every one has

Note that the sequences , can in this case be chosen to decay exponentially and that grows when approaches .

The main result of this section reads as follows.

Proposition 3.8.

If the matrix is in the class , then maps boundedly into itself for ; that is, for any , we have

with the constant depending only on and the spectral norm .

Proof.

Let , and for any denote by a best -term approximation to in . We recall that is obtained by retaining the biggest coefficients of and setting all other coefficients equal to zero. Then, from Proposition 3.2, we have

with the constant depending only on . Using the matrices of Equation 3.19, we define

This gives

It follows then from the summability of the that

where for the last term we have used the simple inequalities

The number of nonzero entries of is estimated by

We now apply Proposition 3.2 and obtain Equation 3.22.

We state an immediate consequence of Corollary 3.7.

Corollary 3.9.

The conclusions of Proposition 3.8 hold for any matrix provided .

Note that the number of arithmetic operations needed to compute in Equation 3.24 is estimated as above, so that this multiplication algorithm is optimal. This is stated in the following corollary, in which we also reformulate our result in terms of a prescribed tolerance.

Corollary 3.10.

Under the hypotheses of Proposition 3.8, for each , and for each , there is a such that

and

with and related as in Equation 3.9. Moreover, the approximation can be computed with arithmetic operations. In both of these statements, the constant depends only on and the spectral norm of .

4. An adaptive Galerkin scheme

We have shown in §2 that the elliptic equation Equation 2.5 is equivalent to the infinite system of equations Equation 2.17

where is an isomorphism on . This system results from expanding the solution and right hand side of Equation 2.5 in a primal and dual wavelet basis, respectively, and then using a diagonal preconditioning. We also noted in that section that, for a given set , solving Equation 4.1 with trial space is the same as solving Equation 2.5 with the trial space .

We are interested not only in rapidly solving the linear system Equation 2.32 of equations for a given selection of basis functions for the trial space , but also in adaptively generating possibly economic sets needed to achieve a desired accuracy. Since adaptive approximation is a form of nonlinear approximation, it is reasonable to benchmark the performance of such an adaptive method against nonlinear -term approximation as discussed in Section 3. We recall that the results of subsection 3.1 show that a vector can be approximated with order by -term approximation (i.e., by a vector with at most nonzero coordinates) if and only if , . We shall strive therefore to meet the following goal.

Goal: Construct an adaptive algorithm so that the following property holds for a wide range of : for each , , the algorithm generates sets , , such that the Galerkin approximation to provides the approximation error

Recall that this goal can also be expressed in terms of achieving a certain tolerance with an optimal number of degrees of freedom as stated in Equation 1.2 and Equation 1.3.

In this section, we shall describe a first adaptive algorithm, initially developed in Reference 20, for solving Equation 4.1. Starting with an initial set , this algorithm adaptively generates a sequence of (nested) sets , . The Galerkin solutions , , to Equation 4.1 provide our numerical approximation to , and these in turn determine our approximations to the solution of the original elliptic equation Equation 2.5.

At present, we can only show that the algorithm of this section meets our goal for a small range of (see Corollary 4.10). Nevertheless, this algorithm is simple and interesting in several respects, and the analysis of this algorithm brings forward natural questions concerning Galerkin approximations.

In Section 5 we shall present a second adaptive algorithm which will meet our goal for a natural range . This range of is limited only by the decay properties of the stiffness matrix , which in turn are related to properties of the wavelet basis (smoothness and vanishing moments) and the order of .

The analysis we give in this and the following section for these adaptive algorithms is idealized, since it will address only questions of approximation order in terms of the cardinality of the sets . At this stage we shall ignore certain computational issues. In particular, we will assume that we are able to access the values of possibly infinitely many wavelet coefficients, e.g., of residuals, which is of course unrealistic. However, this will facilitate a more transparent analysis of the adaptive algorithms and their ingredients. Later, in Sections 6 and 7, we will develop corresponding computable counterparts by introducing suitable truncation and approximation procedures. Moreover, we will provide a complete analysis of their computational complexity.

4.1. Algorithm I

The idea behind our first adaptive algorithm is to generate step by step an ascending sequence of (nested) sets so that on the one hand stays as small as possible, while on the other hand the error for the corresponding Galerkin solutions is reduced by some fixed factor, that is, for some one has

We remind the reader that is the discrete energy norm when applied to vectors. The will be generated adaptively; that is, depends on the given data and on the previous solution .

We will first explain the basic principle that has been already used in Reference 13Reference 20Reference 35 to guarantee a reduction of the form Equation 4.3. The idea is, given , find containing such that the inequality

holds for some . By the orthogonality of the Galerkin solutions with respect to the energy inner product, we have

Hence Equation 4.4 (applied with and ) implies Equation 4.3 with

Therefore, our strategy is to establish Equation 4.4. This is also a common approach in the context of finite element discretizations, see e.g., Reference 13. There the role of is played by an approximate solution of higher order or with respect to a finer mesh. In most studies, however, the property Equation 4.4, often referred to as a saturation property, is assumed and not proven to be valid.

We shall show how such sets can be selected. For this we shall use the residual

Since and are known to us, the coordinates of this residual can in principle be computed to any desired accuracy. We leave aside until Section 6 the issue of the computational cost for a given accuracy in this residual, and work with the simplified assumption that we have the exact knowledge of its coordinates.

We recall the orthogonal projector from to in the norm . For , is the vector in which agrees with on .

Lemma 4.1.

Let and let be the residual associated to . If , and is any set that satisfies

then

where and , are the constants of Equation 2.21. As a consequence,

with .

Proof.

From Equation 2.27, we have

where the second to last equality uses the fact that and agree on . This proves Equation 4.9, while Equation 4.10 follows from Equation 4.5.

We consider now our first algorithm for choosing the sets , in which we take (similar algorithms and analysis hold for any ). We introduce the following steps, which will be part of our adaptive algorithms.

GALERKIN: Given a set , GALERKIN determines the Galerkin approximation to by solving the finite system of equations Equation 2.32.

GROW: Given a set and the Galerkin solution , GROW produces the smallest set which contains and satisfies

We note that is obtained by taking the indices of the largest coefficients of ; the number of these indices to be chosen is determined by the criterion Equation 4.11. Algorithm I:

Let and .

For , determine from by first applying GALERKIN (in order to find ) and then applying GROW.

As a consequence of Lemma 4.1, we have the following.

Corollary 4.2.

For the sets given by Algorithm I, the corresponding Galerkin approximations of satisfy

where

Consequently,

Proof.

The inequality Equation 4.12 follows from Equation 4.10, while Equation 4.14 follows by repeatedly applying Equation 4.12.

4.2. Error analysis for Algorithm I

While the last corollary shows that for each the sequence converges in the energy norm to , we would like to go further and understand how the error decreases with . In particular, we would like to see if this algorithm meets our goal for certain . We begin with the following lemma.

Lemma 4.3.

Let , let be in the class , and let and . Given any set , let be the smallest set such that and

Then

where is a constant depending only on when is large.

Proof.

We will make frequent use of the following simple fact.

Remark 4.4.

Since is finite, is in . By assumption and hence is also in . Applying Proposition 3.8 we see that is also in .

Now, for any , let denote the indices of the largest coefficients of in absolute value. According to Proposition 3.2,

where depends only on when is large. We may assume that . We choose as the smallest integer such that

and define . Then, clearly, Equation 4.15 is satisfied. Moreover,

and so Equation 4.16 is also satisfied.

Lemma 4.3 gives our first hint of the importance of controlling the norms of the residuals . The following theorem and corollary will draw this out more and will provide our first error estimate for Algorithm I.

Theorem 4.5.

Let , let be in the class , and let , . Define

with the constants of subsection 2.3. Then, the Galerkin approximations , , generated by Algorithm I satisfy

where

and the constants , , satisfy

with the constant of Lemma 4.3.

Proof.

We use the abbreviations , , , . The constants , , are defined by Equation 4.18. For any , we know that

and from Lemma 4.3 and Equation 2.27, we obtain

This means that for ,

This proves Equation 4.20. The same argument gives Equation 4.19, because and .

Theorem 4.5 reveals that the growth of the constants can be controlled by the size of the residual norms . The following corollary shows that if these norms are bounded, then so are the constants .

Corollary 4.6.

If the hypotheses of Theorem 4.5 are valid and in addition

then

with a constant such that depends only on when . Consequently,

Proof.

We use the same notation as in the proof of Theorem 4.5. We define , and find that

Now, , and from Equation 4.19 and Proposition 3.8

This proves Equation 4.22. The estimate Equation 4.23 then follows from Equation 4.18.

Remark 4.7.

Corollary 4.6 shows that if is bounded independently of , then we are successful in the goal that we fixed in the beginning of this section. One can also check that optimality is achieved in the sense of a target accuracy : Let be the smallest such that . Then, since , we obtain the estimate from Equation 4.23. From Equation 4.16, we also derive that . It follows that we have the desired estimate .

4.3. Bounding

Corollary 4.6 shows that if for each , , the boundedness condition Equation 4.21 holds with , then the algorithm meets our goal for . We can give sufficient conditions for the validity of Equation 4.21 in terms of the (finite) sections

of the matrix . Note that in terms of these sections the Galerkin equations Equation 2.32 take the form

where according to our convention we always employ the same notation for the finitely supported vector and the infinite sequence obtained by setting all components outside equal to zero. Likewise, depending on the context, it will be convenient to treat for either as an infinite sequence with zero entries outside or as a finitely supported vector defined on .

Recall also from Equation 2.26 that the ellipticity of implies the boundedness of and its inverse in the spectral norm, uniformly in . Also, from Proposition 3.8, it follows that is a bounded operator on . Therefore, the matrices are uniformly bounded (independently of ) on (where is defined in analogy to ).

Remark 4.8.

Under the assumptions of Lemma 4.3, if the inverse matrices are uniformly bounded on , i.e.,

with , then

with the constant independent of .

Proof.

By assumption, . From Proposition 3.8, we find that is also in and, for all ,

where is the norm of on . By our assumptions on , we derive

This gives

which implies Equation 4.27.

There is a soft functional analysis argument which shows that the boundedness condition Equation 4.26 is satisfied for a certain range of close to .

Theorem 4.9.

Let for some . Then there are a and a constant such that for all and all ,

Proof.

First recall from Equation 2.26 that the condition numbers of the matrices statisfy for any . Let , where . Then, , where .

Now let . Then, both and are bounded on . Hence, we have for some positive constant independent of . Using the Riesz-Thorin interpolation theorem for and , we can find some such that , uniformly in and . By the standard Neumann series argument, we obtain Equation 4.29.

Corollary 4.10.

If for some , then there is an such that Algorithm I meets our goal for all . That is, for each , with , Algorithm I generates a sequence of sets , , such that

with a constant.

Proof.

From Theorem 4.9, there is a such that Equation 4.29 holds uniformly for all and . Remark 4.8 then shows the validity of Equation 4.27. We now apply Corollary 4.6 and obtain Equation 4.30 from Equation 4.23.

We close this section by making some observations about the growth of and for an arbitrary range of which is only limited by the properties of the wavelet bases. We shall use these observations in the following section when we modify Algorithm I.

Lemma 4.11.

Suppose that and with . Then, for any one has

with the constant depending only on when tends to .

Proof.

First note that if , then has at most nonzero coordinates. Using Equation 3.8 and Hölder’s inequality gives for such the inverse estimate

Now let denote the best -term approximation to , which we recall is obtained by retaining the largest coefficients. We use Equation 4.32 to conclude that

where we have used Equation 3.11 of Proposition 3.2. We add to both sides of Equation 4.33 and observe that

to finish the proof.

We next apply this lemma to bound residuals.

Lemma 4.12.

Let , let and let the solution to Equation 4.1 be in . For any index set generated by Algorithm I, we have

with the constant independent of and .

Proof.

The algorithm determines the set from in the same way for each . Therefore, we can assume that . By Equation 4.31 we have

We use Equation 2.21 and Theorem 4.5 to bound the second term:

Because of Proposition 3.8 we can replace by .

5. A second adaptive algorithm

In this section, we shall present a second adaptive algorithm which will meet our goal for the full range of that is permitted by the wavelet basis. We begin with some heuristics which motivate the structure of the second algorithm.

The deficiency of Algorithm I of the last section is that it is only proven to meet our goal for a small range of . This in turn is caused by our inability to prevent the possible growth of as increases. Since by assumption , growth in can only occur if gets large with . On the other hand, we know that are bounded uniformly. Typically, for a vector , its norm is much larger than its norm when has many small entries which do not effect its norm but combine to have a serious effect on the norm. We can try to prevent this from happening by thresholding the coefficients in and keeping only the large coefficients. In our application to , this is very hopeful since the large coefficients contain the main source of the approximation to .

Motivated by the above heuristics, we would like to use thresholding in our second algorithm. We introduce the thresholding operator , which for and a sequence is defined by

We shall use the following trivial estimates for thresholding (see Section 7 of Reference 31): for any , we have

and

with a constant depending only on as (of course (5.2) holds with .

Lemma 5.1.

Suppose that , , and that satisfies

for some . Then, for any , we have

and

Proof.

Let and consider the sets , , . Then,

where we used Equation 5.1 and the fact that for . This proves Equation 5.4.

For the proof of Equation 5.5, we consider the two sets and . Then, from Equation 5.2,

and

which proves Equation 5.5.

We shall use our previous notation which for an integer and a vector defines as the vector whose largest coordinates agree with those of and whose other coordinates are zero.

Corollary 5.2.

Suppose that , , and that satisfies

for some . Let be chosen as the smallest integer such that

Then

and

with and a constant depending only on as .

Proof.

We clearly have Equation 5.8. To prove Equation 5.9, we shall give a bound for .

In the case where , we trivially have Equation 5.7 with and . In the case where , let be the absolute value of the smallest nonzero coefficient in . For any , we have

On account of Equation 5.4, we have

so that Equation 5.6 and Equation 5.10 ensure that

for all . Therefore,

On the other hand, from Equation 5.5 we find that

We use Equation 5.12 to estimate each of the two terms on the right of Equation 5.13. For example, for the second term, we have

A similar estimate shows that the first term on the right of Equation 5.13 does not exceed . In other words,

where the last equality serves to define . When this estimate for is used in Equation 5.8, we arrive at Equation 5.9.

Algorithm II will modify Algorithm I by the introduction of the following step:

COARSE: Given a set and a Galerkin solution associated to this set, take and apply Corollary 5.2 with and to produce the vector . Then, COARSE produces the set of indices for the nonzero coordinates of and then applies GALERKIN to this new set to obtain .

Remark 5.3.

If is any set, it follows from Corollary 5.2 that the input of into COARSE yields a set with a Galerkin solution which satisfies

where with from Equation 2.21 and from Equation 5.9.

Proof.

We have

because the Galerkin projection gives the best approximation to from in the energy norm. We bound the right side of Equation 5.17 by Equation 5.9.

Remark 5.4.

It also follows from Corollary 5.2 together with Lemma 4.11 that the input of into COARSE yields a set with a Galerkin solution such that

with the constant depending only on as . Of course, this also implies that . Thus, the thresholding step allows the control of the norm of the residual.

We can now describe Algorithm II.

Algorithm II:

Let and .

For , determine from as follows. Let . For determine from by applying GALERKIN and then GROW to . Apply COARSE to to determine and . If , then define , , and stop the iteration on . Otherwise advance and continue.

Theorem 5.5.

If , for some , then Algorithm II satisfies our goal for this . Namely, if , then the algorithm produces sets , , such that

with the constant of Remark 5.3. In addition, for we have

with and the constants of Equation 2.21.

Proof.

Since the set is the output of COARSE, the estimate Equation 5.19 follows from Remark 5.3. By the definition in Algorithm II, we have

Iterating this inequality, we obtain

Since , we arrive at Equation 5.20.

The following remark will be important in the following section on numerical computation. It shows that the intermediate steps between and do not generate sets which might be very large in comparison to and .

Remark 5.6.

Under the assumptions of Theorem 5.5 we have

with the smallest integer such that , where , are the constants of Equation 2.21 and is given by Equation 4.13.

Proof.

This follows from the following string of inequalities, where we denote by the intermediate output of COARSE obtained by thresholding before computing the new Galerkin solution:

The fifth inequality follows from Equation 5.8 and the fact that in the application of COARSE to . All other inequalities use norm equivalences of the type Equation 2.21. From this estimate we see that the criterion is met whenever .

Note that this remark, combined with Lemma 4.12, has the following consequence.

Remark 5.7.

The residuals in the intermediate steps are also uniformly bounded in , and the cardinalities can always be controlled by . The intermediate steps remain within our goal of optimal accuracy with respect to the number of parameters.

Theorem 5.5 shows that Algorithm II is optimal for the full range of permitted by the wavelet bases. By the same considerations as in Remark 4.7, this algorithm is also optimal in the sense of achieving a prescribed tolerance .

6. Numerical realization: Basic ingredients

The previous sections have introduced and analyzed the performance of two adaptive methods for resolving elliptic equations. The analysis however was more from a theoretical perspective and did not incorporate computational issues. Our purpose now is to address these computational issues. More precisely, we want to develop a numerically realizable version of Algorithm II and to analyze its complexity. In the present section, we shall introduce the basic subroutines that constitute the resulting Algorithm III which will be described and analyzed in the final section.

Let us first explain the basic principle of Algorithm III. This algorithm will iteratively generate a sequence of index sets and approximate solutions supported in ( differs in general from the Galerkin solution ), with the property that

where is an estimate from above of (which will allow us to take as an admissible starting point and empty). This progression toward finer accuracy will be performed by the main subtroutine PROG, which will be assembled in §7 from the ingredients that we shall introduce in the present section.

If we are given a tolerance that gives the target accuracy with which we wish to resolve the solution to Equation 2.17, we shall thus need applications of PROG, where is the smallest integer such that .

We then ask what the total computational cost will be to attain this accuracy. There will be two sources of computational expense: arithmetic operations and sorting. Arithmetic operations include additions, multiplications, and square roots. We shall ignore error due to roundoff. We shall estimate the number of arithmetic computations and the number of sorts needed to achieve this accuracy. We shall see that can be related to in the same way that the error analysis of the preceding section related error to the size of the sets . The sorting will introduce an additional logarithmic factor.

Our subroutines will be described so as to apply to any vectors, and therefore Algorithm III will allow us to solve Equation 2.17 for any right hand side . However, we shall analyze its performance only when is in , , for some in the same range of optimality as for Algorithm II. Note that this range is limited only by in Equation 3.16, i.e., the compressibility order of the operator in the wavelet bases.

Our analysis will show that if has smoothness, then the computational cost and memory size needed to achieve accuracy is controlled by , so that the last step dominates the overall computational cost. This should be compared to the optimality analysis of the full multigrid algorithm, for which the complexity is also dominated by the last step of the nested iteration. However, in the multigrid algorithm, each step of the nested iteration is associated to a uniform discretization at a scale , which corresponds to requiring that be the set of all indices , rather than an adaptive set. In this case, the new layer updating the computation thus corresponds to a scale level, while in our adaptive algorithm it is rather associated to a certain size level of the wavelet coefficients of . Accordingly the classical Sobolev smoothness which enters the analysis of multigrid algorithms is replaced by the weaker Besov smoothness expressed by the property.

Algorithm III will involve numerical versions of procedures like GROW, COARSE or GALERKIN. In these subroutines exact calculations will have to be replaced by approximate counterparts whose accuracy is controlled by corresponding parameters. Thus the input will consist of objects like index sets or vectors to be processed as well as control and threshold parameters. To keep track of these parameters and their interdependencies we will consistently use the following format for such subroutines:

meaning that, given the input , the procedure NAME generates output quantities .

Some of the subroutines will make use of estimates of several constants like from previous sections, in which case we shall specify them and explain how such estimates can be obtained. All other constants entering the analysis of the algorithm but not its concrete implementation will be denoted by without further distinction. Their specific value only affects the constant in our asymptotic estimates. In particular, they are independent of the data , the solution, or its various approximations. If necessary their dependence on the parameters or will be explained.

6.1. The assembly of

We shall take the viewpoint that we have complete knowledge about the data , in the sense that we already know or can compute its wavelet coefficient to any desired accuracy by an appropriate quadrature. This in turn enables us to approximate to any accuracy by a finite wavelet expansion. We formulate this as

Assumption N1: We assume that for any given tolerance , we are provided with the set of minimal size such that satisfies

For the purpose of our asymptotic analysis, we could actually replace “minimal” by “nearly minimal”, in the sense that the following property holds: if is in for some , then we have the estimate

with and a constant that depends only on as tends to . This modified assumption is much more realistic, since in practice one can only have approximate knowledge of the index set corresponding to the largest coefficients in , using some a-priori information on the smooth and singular parts of the function . However, in order to simplify the notation and analysis, in what follows we shall assume that the set is minimal.

In the implementation of Algorithm III, the above tolerance will typically be related to the target accuracy of the solution by a fixed multiplicative constant. We perform the following two preprocessing steps on :

(i):

Sort the entries in to determine the vector of indices which gives the decreasing rearrangement . The cost of this operation is in operations.

(ii):

Compute . The cost of this is arithmetic operations.

The second step gives us the estimate . We store and the vector .

6.2. A numerical version of COARSE

Algorithm III will also make use of less accurate approximations of in its intermediate steps. This is one instance of the frequent need to provide a good coarser approximation to a finitely supported vector. Such approximations will be generated by the routine NCOARSE that we shall now describe.

NCOARSE:

(i):

Define and sort the nonzero entries of into decreasing order. Thereby one obtains the vector of indices which gives the decreasing rearrangement of the nonzero entries of . Then compute .

(ii):

For , form the sum in order to find the smallest value such that this sum exceeds . For this , define and set ; define by for and for .

We first describe the computational cost of NCOARSE.

Properties 6.1.

For any and , we need at most arithmetic operations and sorting operations, , to compute the output of NCOARSE which, by construction, satisfies

We shall also apply NCOARSE to the initial approximation of the data in order to produce other near optimal -term approximations of with fewer parameters. Thanks to the preprocessing steps, in this case we can save on the computational cost of this procedure. An immediate consequence of Equation 5.15 and Properties 6.1 is the following.

Properties 6.2.

Assume that is an optimal -term approximation of the data with accuracy , as described by Equation 6.2. Then, for , NCOARSE produces an approximation to with support , such that . In addition, if , we have

with depending only on .

Moreover, determining requires at most arithmetic operations and no sorts, since sorting of was done in the preprocessing stage.

To simplify notation throughout the remainder of the paper we will denote by NCOARSE the output of NCOARSE, since it has the same optimal approximation properties as thresholding the exact data.

We now turn to the primary purpose of the coarsening procedure. Recall that the role of COARSE in Algorithm II above is the following. If is a given vector from and is a good (finitely supported) approximation to in the -norm but has large norm, then COARSE uses thresholding to produce a new approximation with slightly worse -approximation properties but guaranteed good norms. The following algorithm gives the numerical form of COARSE that we shall use. The following additional properties of NCOARSE follow from the results in Section 5.

Properties 6.3.

Given a vector , a tolerance , and a finitely supported approximation to that satisfies

the algorithm NCOARSE produces a new approximation to , supported on , which satisfies

Moreover, the following properties hold:

(i)

If , for some , then the outputs and of NCOARSE satisfy

(ii)

If , for some , then the output of NCOARSE satisfies

where depends only on as .

(iii)

The cardinality of the support of is bounded by

Proof.

The estimate Equation 6.7 is an immediate consequence of the steps in NCOARSE. (i) follows from Corollary 5.2 (see Equation 5.9). (ii) is proved in a similar fashion to Lemma 4.11. Let and let be the best approximation to from . Then, as in Equation 4.33, we derive

where we used Equation 4.32. We insert Equation 6.8 into Equation 6.11 and add to both sides and arrive at Equation 6.9. The estimate (iii) is an immediate consequence of Equation 5.15.

6.3. The assembly of

We shall need to compute a certain finite number of entries of . The entries that need to be computed will be prescribed as the adaptive algorithm proceeds and are not known in advance. They are associated to the application of one of the compressed matrices to a finite vector , as will be discussed below. Therefore, the entries are computed as the need arises. When we compute an entry of we store it for possible future use. We shall make the following

Assumption N2: Any entry of can be computed (up to roundoff error) at unit cost.

In some cases, this assumption is completely justified. For example, if the operator is a constant coefficient differential operator and the domain is a union of cubes, then suitable biorthogonal wavelet bases are known where the primal multiresolution spaces are generated by -splines. In this case, the functions which appear in the integrals defining the entries of are piecewise polynomials. Therefore, they can be computed exactly. When is a differential operator with varying coeffcicients or when is a singular integral operator, the entries of have to be approximated with an accuracy depending on the desired final accuracy of the solution. It is then far less obvious how to realize Assumption N2, and a detailed discussion of this issue (which very much depends on the individual concrete properties of ) is beyond the scope of this paper. We therefore content ourselves with the following indications that (N2) is not quite unreasonable. A central issue in Reference 28Reference 40Reference 41 is to design suitably adapted quadrature schemes for computing the significant entries of wavelet representation of the underlying singular integral operator in the following sense. The trial spaces under consideration are spanned by all wavelets up to a highest level , say. Then, it is shown how to compute a compressed matrix having only the order of nonzero entries (up to possible factors in some studies) at a computational expense which also stays proportional to (again possibly times a factor). Since the compression in these papers is slightly different from the one used here and since only fully refined spaces have been investigated, these results do not apply here directly. Nevertheless, they indicate that the development of schemes that keep the computational work per entry low is not completely unrealistic.

In the development of the numerical algorithm, we shall need constants and such that Equation 2.21 holds. In practice, it is not difficult to obtain sharp estimates of the optimal constants, since as grows, they are well approximated by the smallest and largest eigenvalues of the preconditioned matrix corresponding to the set associated to the uniform discretization through the trial space . For simplicity we will take as an estimate for the condition number of , see Equation 2.24.

We next discuss the quasi-sparsity assumptions that we shall make on the matrix .

Assumption N3: We assume that the matrix is quasi-sparse in the sense that it is known to be in the class of §3.3 for for some . We recall that implies that for each , there is a matrix , with at most entries in each row and column, that satisfies

with a summable sequence of positive numbers. We will assume that the positions of the entries in are known to us.

We have discussed previously how this assumption follows from the original elliptic equations and the wavelet basis. In particular, it is implied by decay properties of the type Equation 2.30. The compression rules leading to the matrices and, in particular, the positions of the significant entries are in this case explicitly given in the proof of Proposition 3.4 and depend only on .

In the development of the numerical algorithm, we shall make use of the estimates Equation 6.12 in the form

where the constants are upper bounds for the compression error . The might simply correspond to a rough estimate of in Equation 6.12 or result from a more precise estimate of that can in practice be obtained by means of the Schur lemma.

The entries we compute in are determined by the vectors to which is applied. We only apply to vectors with finite support.To compute requires only that we know the nonzero entries of in the columns corresponding to the nonzero entries of . Hence, at most ) entries will need to be computed. We shall keep track of the number of these computations in the analysis that follows.

6.4. Matrix/vector multiplication

It is clear from Algorithm II that the main numerical tasks are the computation of Galerkin solutions and the evaluation of residuals. Both rest on the repeated application of the quasi-sparse matrix to a vector with finite support. Since the matrices and vectors are in general only quasi-sparse, this operation can be carried out only approximately in order to retain efficiency. For this, we shall use the algorithm of subsection 3.3 applied to . We recall our convention concerning the application of an infinite matrix to a finite vector : we consider the vector to be extended to the infinite vector on obtained by setting all new entries equal to zero. The extended vector will also be denoted by .

Given a vector of finite support and , we sort the entries of and form the vectors , , , . For , we define . Recall from Section 3 that agrees with in its largest entries and is zero otherwise. This process requires at most operations to sort.

We shall numerically approximate by using the vector

for a certain value of determined by the desired numerical accuracy. As noted ealier, this vector can be computed by using operations and requires the computation of at most this same number of entries in (recall Corollary 3.10). Note that if , then some of the terms in Equation 6.14 will be zero and therefore need not be computed.

By increasing , we increase the accuracy of the approximation to . In particular, as derived in subsection 3.3, see Equation 3.25, we have the error estimate

where is the compression bound from Equation 6.13. Note that and Hence, the right hand side of Equation 6.15 can be computed for any with at most operations.

With these remarks in hand, we introduce the following numerical procedure for approximating .

APPLY A:

(i):

Sort the nonzero entries of the vector and form the vectors , , with . Define for .

(ii):

Compute , , , .

(iii):

Set .

(a):

Compute the right hand side of Equation 6.15 for the given value of .

(b):

If stop and output ; otherwise replace by and return to (a).

(iv):

For the output of (iii) and for , compute the nonzero entries in the matrices which have a column index in common with one of the nonzero entries of .

(v):

For the output of (iii), compute as in Equation 6.14 and take and .

Properties 6.4.

Given a tolerance and a vector with finite support, the algorithm APPLY A produces a vector which satisfies

Moreover, if , with and , then the following properties hold:

(i)

The size of the output is bounded by

and the number of entries of needing to be computed is .

(ii)

The number of arithmetic operations needed to compute does not exceed with .

(iii)

The number of sorting operations needed to assemble the , , of does not exceed .

(iv)

The output vector satisfies

Proof.

The estimate Equation 6.16 follows from the preceding remarks centering upon Equation 6.15. Properties (i)-(iii) follow from the results of subsection 3.3 (see Corollary 3.10). Property (iv) is proved in the same way that we proved Proposition 3.8. Namely, for , we prove that as in Equation 3.25. This then proves Equation 6.18 because of Proposition 3.2.

6.5. The numerical computation of residuals

Recall that Algorithm II heavily utilizes knowledge of residuals. We suppose that is any given finite subset of , and we denote as usual by the Galerkin solution associated to the set . Since we cannot compute nor its residual exactly, we shall introduce a numerical algorithm which begins with an approximation to and approximately computes the residual . For this computation, we introduce the following procedure, which involves two tolerance parameters reflecting the desired accuracy of the computation of and of , respectively.

NRESIDUAL:

(i):

APPLY A.

(ii):

NCOARSE.

(iii):

Set and .

Note that, due to the various approximations, the output is not necessarily supported in , in contrast to the exact residual .

Properties 6.5.

The output of NRESIDUAL satisfies

Furthermore, if , with and (which in particular implies , see Proposition 3.8), then the following hold:

(i)

The support size of the output is bounded by

(ii)

The number of arithmetic operations used in NRESIDUAL does not exceed with

(iii)

The number of sorts needed to compute does not exceed .

(iv)

The output satisfies

Proof.

The estimate Equation 6.19 follows from

and Equation 2.22. All other properties are direct consequences of Properties 6.3 and 6.4 of NCOARSE and APPLY A.

6.6. A sparse Galerkin solver

This subsection will be concerned with the computation of a numerical approximation of for any given set . We shall discuss this issue in the context of gradient methods. A similar discussion applies to conjugate gradient methods. Given a set , we thus wish to solve

Suppose that we are provided with a current known approximation to with supported on , and that we want to produce an approximation , supported on , such that for some prescribed tolerance .

The gradient method (or damped Richardson iteration) takes as the next approximation

where is to be chosen. Then is also supported on , and, using Equation 6.22, we have

where

with the identity matrix.

To turn this into a numerical algorithm, we need to provide: (i) a value for , and (ii) an approximation for . We shall take

where is our bound for given in (2.22). With this choice, it follows that

with , the estimated condition number.

We next discuss the computation of , which we call the “internal residual”. In contrast to the full residual of the full equation, the internal residual can be computed exactly at finite cost. However, this cost remains too large for the purpose of obtaining a computationally optimal algorithm, so that in practice we shall need to replace the internal residual by a numerical approximation . We next examine the properties we shall want for the numerical approximation in order that that the modified iterations still converge. Suppose for a moment that our initial approximation satisfies

for some . We shall show in a moment how to compute an such that

Given such an , we define

Since, by Equation 6.23, Equation 6.26 and Equation 6.29, , we conclude that

with

The vector is our numerical computation of one step of the gradient algorithm with a given initial approximation and error estimate . Notice that Equation 6.31 gives an error estimate which allows us to iterate this algorithm. For example, at the next iteration, we would replace by , and by .

We next discuss how we shall compute an approximation to the internal residual which will satisfy Equation 6.29. For this, we shall use a variant of the routine NRESIDUAL from subsection 6.5, in which we shall confine all vectors to be supported in . We shall denote this new subroutine by INRESIDUAL. It is obtained by replacing by in the NCOARSE step and by in the APPLY A step.

INRESIDUAL

(i):

APPLY ;

(ii):

NCOARSE.

(iii):

Set .

Here APPLY means that is replaced by in the fast matrix vector multiplication. From Properties 6.2 and 6.4 we know that the output of satisfies

Thus the choice

suffices to ensure the validity of Equation 6.29.

Obviously the number of iterations needed to guarantee a target accuracy of the approximate Galerkin solution depends on the error bound of the initial approximation of . In fact, the number of iterations necessary to reach this accuracy is bounded by

While the above analysis gives an upper bound for the number of iterations we shall need to achieve our target accuracy, it will also be important for our analysis to note that this target accuracy may be reached before this number of iterations if the currently computed approximaton to the internal residual is small enough. The following remark (which follows from Equation 6.33) makes this statement more precise.

Remark 6.6.

If we choose , where is the target accuracy, and if is the corresponding output of INRESIDUAL, then we have

so that

Note that conversely, since we also have by Equation 6.33

we are ensured that

i.e., the criterion will be met when the exact internal residual is small enough.

Proof.

To prove Equation 6.36, we write

Since Equation 2.22 and Equation 2.25 imply , Equation 6.36 follows. Clearly, Equation 6.36 implies Equation 6.37. The rest of the claim follows from Equation 2.22.

After these considereations, we are now in a position to give our numerical algorithm for computing Galerkin approximations. Given a set , an initial approximation to , an estimate and a target accuracy , with , the approximate Galerkin solver is defined by the following:

GALERKIN:

(i):

Apply INRESIDUAL. If

define the output to be and STOP, else go to (ii).

(ii):

Set

Since , we know that . Replace by , by and go to (i).

The relevant properties of GALERKIN can be summarized as follows.

Properties 6.7.

Given as input a set , an initial approximation to the exact Galerkin solution which is supported on , an initial error estimate for and a target accuracy , the routine GALERKIN produces an approximation to which is supported on and satisfies

Moreover, if is the number of modified gradient iterations which have been used in GALERKIN to produce , one also has

with defined by Equation 6.32. Consequently, the number of iterations is always bounded by

Moreover, if , with and , then the following are true:

(i)

The output of GALERKIN satisfies

where the constant depends only on the number of iterations .

(ii)

The number of arithmetic operations used in GALERKIN is less than

where the constant depends only on the number of iterations . The number of sorting operations does not exceed .

Proof.

The first part of the assertion has already been established in the course of the preceding discussion. In particular, the bound on the maximal number of iterations clearly follows from Equation 6.35.

As for property (i), we simply remark that (iv) in Properties 6.5 of NRESIDUAL also applies in the case of the modified procedure INRESIDUAL, so that after one modified gradient iteration we have

Assertion (i) therefore follows by iterating this argument: denoting by the current approximation after iterations, we obtain

To estimate the number of arithmetic operations in this algorithm, we can use the bound on the number of operations for NRESIDUAL ((ii) in Properties 6.5), which also applies to INRESIDUAL. According to this property, at the -th iteration, the application of INRESIDUAL to requires at most arithmetic operations. We add each of these estimates for operation count over and use the estimate on to obtain the estimate in (ii).

Finally, at each iteration, the number of sorting operations is clearly bounded by , which implies the bound in for the global procedure.

The possible growth of the constants in Equation 6.42 shows the importance of controlling the number of iterations . The estimate Equation 6.41 expresses that this is feasible if the initial accuracy bound is within a fixed factor of the desired target accuracy in each application of GALERKIN. In the setting of Algorithm III below this will indeed be the case.

7. Numerical realization: The adaptive algorithm

We now have collected all the ingredients that are needed to construct an optimal adaptive algorithm, in terms of both memory size and computational cost. The purpose of this section is to describe this algorithm and to prove its optimality.

7.1. General principles of the algorithm

Recall from subsection 6.1 that we start with an estimate . Introducing the sequence of tolerances

we see that and are an admissible initialization in the sense that .

Algorithm III conceptually parallels the idealized version Algorithm II. Its core ingredient is a routine called NPROG that associates to a triplet such that is supported in and , a new pair such that is supported in and .

Iterating this procedure thus builds a sequence with supported in such that

If is the target accuracy, the algorithm thus stops after steps, where is the smallest integer such that .

As in Algorithm II, the routine NPROG itself will consist of possibly several applications of a procedure NGROW described below, which parallels GROW in Algorithm II, followed by NCOARSE for exactly the same reasons that came up in §5.

In contrast to Algorithm II, the selection of the next larger index set done by NGROW will have to be based on an approximate residual obtained by NRESIDUAL rather than on the exact one. We shall also use the approximate Galerkin solver defined by NGALERKIN to derive the intermediate approximations of the solution after each growing step. Thus, the error reduction in this growing procedure requires a more refined analysis, involving the various tolerances in these procedures. We shall first address this analysis, which will result in several constraints on the tolerance parameters.

7.2. The growing procedure

At the start of the growing procedure that will define NPROG, we are given a set , an approximate solution supported on , and a known estimate .

We set and . The growing procedure will iteratively build larger sets , , and approximate solutions , and will be stopped at some such that we are ensured that

so that applying NCOARSE will output the new set and approximate solution such that . The choice in Equation 7.3 is justified by Properties 6.3 of the thresholding procedure: it ensures the optimality of the approximate solution and the control of its norm (see (i) and (ii) in Properties 6.3).

As in Algorithm II, the growing procedure will ensure a geometric reduction of the error in the energy norm , where is the exact Galerkin solution. Although it will not ensure such a reduction for , we shall still reach Equation 7.3 after a controlled number of steps.

The procedure NGROW generating the sets can be described as follows: given a set and an approximation supported on , we compute an approximate residual and select the new set as small as possible such that

for some fixed in . This can be done by taking , where

This procedure can thus be summarized as follows.

NGROW Given an initial approximation to the Galerkin solution supported on , the procedure NGROW consists of the following steps:

(i):

Apply NRESIDUAL.

(ii):

Apply NCOARSE and define .

It is interesting to note that we allow the situation where , in which case we simply have . This was not possible with GROW in Algorithm II, since could then be the full infinite set .

Properties 7.1.

The residual computed by NGROW satisfies the estimate

If , with and , then the following are true:

(i)

The cardinality of the output of NGROW can be bounded by

where .

(ii)

The number of arithmetic operations used in NGROW is less than

(iii)

The number of sorting operations does not excede .

Proof.

The first part of the assertion follows from Equation 6.19. The claims (i), (ii) and (iii) follow from (i), (ii) and (iii) in Properties 6.5 of NRESIDUAL.

In our growing procedure, the tolerance parameters and will be related to the initial accuracy by and , where and are fixed parameters that we shall specify below through our analysis. Similarly, we shall always set the tolerance parameter in the applications of NGALERKIN in such a way that the approximate solutions will always satisfy

where is another parameter to be specified later and is the exact Galerkin solution.

Note that Equation 7.8 is not ensured for , so the very first step of our growing procedure should be to replace by the output of

The growing procedure will then proceed as follows: for , we shall define as the first output of NGROW. We then define as the output of NGALERKIN, with the constant still to be specified. It follows that Equation 7.8 will automatically be satisfied by Equation 6.39.

Regarding the parameter , we need to choose its value so that at each iteration we have

because we are using as the input for the next application of NGALERKIN. Now, for each , we have

where we have used the monotonicity of the error as the sets are growing. Hence, we see that we can take . With this choice of and with any fixed choice of , Equation 6.41 and Properties 6.7 show that the number of iterations within each application of NGALERKIN is uniformly bounded independently of and .

Note also that, in terms of the parameters , from Equation 7.5 and Equation 7.8 we deduce that

where is the second output of NGROW.

In order to analyze the error reduction in our growing procedure, we shall need to relate the property Equation 7.4 that defines NGROW with the property Equation 4.8 which is known to ensure a fixed reduction of the error . Using our error estimate Equation 7.10, we obtain

Of course, we wish to ensure that our above choice of the expanded set , which was based on the approximate residual , does capture also a sufficient bulk of the true residual . This can indeed be inferred from the above estimate, provided that the perturbation on the right hand side is small compared with the first summand. If this is not the case, the choice of the parameters should ensure that the residual itself and hence the error is already small enough. The following observation describes this in more detail.

Remark 7.2.

Given any , suppose that the parameters are chosen small enough that

Then, for constructed from as explained above, one has either

or

where

Proof.

Let . We distinguish two cases. If , then by Equation 2.22 we have . Combining this with the estimate Equation 7.8, we obtain

which, in view of Equation 7.12, proves Equation 7.13. Alternatively, when , we infer from Equation 7.11 that

which is the desired prerequisite for error reduction in the energy norm. In fact, we can invoke Lemma 4.1 to conclude that Equation 7.14 holds for defined in Equation 7.15.

It remains to adjust the various parameters . To this end, one should keep in mind that the growing procedure aims to achieve the accuracy in Equation 7.3 after a finite number of steps .

In view of Remark 7.2, a first natural choice seems to be , since the occurrence of case one in Remark 7.2 would then imply Equation 7.3. However, with such a choice, it could still happen that at the -th stage of the growing procedure, case one comes up but is not discovered by any error control. In this case, we will need to make sure that subsequent steps still satisfy Equation 7.3. For , we have

where we have again made standard use of Equation 2.21, the best approximation property of Galerkin solutions, and Equation 7.8. Thus our first requirement is

We next have to make sure that if case one never occurs a uniformly bounded finite number of steps suffices to reach Equation 7.3. In fact, we infer from Equation 7.14 that

Thus our second requirement in order to achieve Equation 7.3 is that, for sufficiently large ,

but this is always implied by our first requirement Equation 7.17 for sufficiently large but fixed.

Finally, we wish to install intermediate error controls to avoid unnecessarily many steps in the above growing procedure. To this end, we write

and deduce from Equation 7.8 that at any intermediate stage

Therefore, using Equation 7.10, we find that

Thus, imposing the requirement

we can stop the iteration if the following test of the current approximate residual is answered affirmatively:

Choice of parameters: In summary, possible choices for these parameters are limited by Equation 7.12, Equation 7.17 and Equation 7.21. A simple possibility is to take

Then choose such that Equation 7.12, Equation 7.17 and Equation 7.21 hold. We then define .

Thus one finally sees from Equation 7.18 that the maximal number of steps needed to achieve Equation 7.3 is bounded by

7.3. Description of the algorithm

We are now in a position to describe the main step NPROG in our algorithm. We fix a value of with and we choose parameters as in the Choice of parameters of the previous subsection. We fix these values. The analysis of the previous subsection shows that a uniformly bounded finite number of applications of NGROW suffices to reduce the initial error by the desired amount. The NPROG can thus be summarized as follows.

NPROG Given a set and an approximation to the exact solution of Equation 4.1 whose support is contained in and such that , the procedure NPROG consists of the following steps:

(i):

Apply GALERKIN. Set , , .

(ii):

Apply NGROW.

(iii):

If or defined in Equation 7.24, go to (iv); otherwise apply GALERKIN. Replace by , by , by , and go to (ii).

(iv):

Apply NCOARSE, set , and STOP.

The relevant properties of NPROG can be summarized as follows.

Properties 7.3.

The output of NPROG satisfies

Moreover, if , with and , then the following are true:

(i)

One has the bound

and the cardinality of is bounded by

(ii)

The cardinality of all intermediate sets produced by NGROW can be bounded by

(iii)

The number of arithmetic operations used in NPROG is less than

The number of sorting operations does not excede .

Proof.

By our choice of parameters , Remark 7.2 and the subsequent discussion show that after at most steps, given by Equation 7.24, the reduction Equation 7.3 is achieved. The estimate Equation 7.25 is then an immediate consequence of Equation 6.6 and Equation 6.7 in Properties 6.3. Moreover, when , with , then (i) is a a direct consequence of (ii) and (iii) in Properties 6.3. By a repeated application of Equation 6.42 in Properties 6.7 we conclude that

Here we have used the fact that only a uniformly bounded number of applications of NGROW and GALERKIN are used in NPROG. Combining Equation 7.30 with Equation 7.6 in Properties 7.1 yields the estimate Equation 7.28 in (ii).

Note that the same is true for the possibly somewhat larger sets generated in NGROW, since we accept the case . The remaining assertion (iii) is also obtained by combining Equation 7.30 with Equation 7.7 in Properties 7.1.

We are now prepared to describe

Algorithm III

(i):

Initialization: Let be the target accuracy. Set , and , where is defined at the beginning of this section. Select the parameters according to the above Choice of parameters, and fix these parameters.

(ii):

If , accept , as the final solution and STOP. Otherwise, apply NPROG.

(iii):

If , accept , as the solution, where , are the last outputs of NGROW in NPROG before thresholding. Otherwise, replace by , by and by , and go to (ii).

Remark 7.4.

We see that the finest accuracy needed on the data is in the last application of NPROG, so we can start with an estimate with in Equation 6.2.

Remark 7.5.

The proper choice of is meant to ensure the convergence of Algorithm III, as well as the control of the operation count in each application of NPROG. This, in turn, allows us to prove the optimality of this algorithm, as shown below. Roughly speaking, convergence is ensured if these parameters are sufficiently small, but choosing them too small typically increases the constants that enter the optimality analysis (e.g., the number of iterations needed in GALERKIN or APPLY A), so that a proper tuning should really be effective in practice. In particular, it might be that our requirements in the Choice of parameters are too pessimistic and that the algorithm still works with larger tolerances.

7.4. The main result

The convergence properties of Algorithm III can be summarized as follows.

Theorem 7.6.

Assume that , with , and that is an isomorphism on , and suppose that the assumptions (N1)–(N3) are satisfied. Let be the solution of Equation 2.17, that is, . Then for any and any , Algorithm III produces an approximation with satisfying

Moreover, Algorithm III is optimal in the following sense. If , for some , then and the computation of requires at most arithmetic operations and at most sorting operations, where the constants are independent of and .

Proof.

Let , . Let be the smallest integer such that . The algorithm shuts down when at an iteration either (i) or (ii) . In the first case, Equation 7.31 is satisfied because of Equation 7.20. When case (i) is not met for any , then Equation 7.25 in Properties 7.3 shows that Algorithm III produces a sequence such that , where . Hence the desired target accuracy is reached when . In either case, the algorithm will need at most steps to reach Equation 7.31.

As for the complexity analysis, if , for some , we conclude from Equation 7.26 that for all , and from Equation 7.27 that with .

Thus, on account of (ii) and (iii) in Properties 7.3, the number of arithmetic operations and the number of sorting operations at the th stage of the algorithm can be bounded respectively by and . The assertion now follows by summing these estimates over .

We conclude by briefly summarizing the consequences of the above theorem with regard to the original operator equation Equation 2.5.

Corollary 7.7.

Assume that is an isomorphism, and let denote the exact solution of for some . Suppose that and the wavelet bases satisfy assumptions (A1)–(A3), so that, in particular, the preconditioned wavelet representation of belongs to . Let . Then for any and every , Algorithm III produces a sequence such that

Moreover, if for some and the solution belongs to the Besov space , then the number is bounded by . At most arithmetic operations and sorting operations are needed for the computation of .

Acknowledgment

We are indebted to Dietrich Braess for valuable suggestions concerning the presentation of the material.

Mathematical Fragments

Equation (1.2)
Equation (1.3)
Equation (2.3)
Equation (2.5)
Equation (2.7)
Equation (2.11)
Equation (2.12)
Equation (2.13)
Equation (2.15)
Equation (2.17)
Equation (2.18)
Equation (2.20)
Equation (2.21)
Equation (2.22)
Equation (2.24)
Equation (2.25)
Equation (2.26)
Equation (2.27)
Equation (2.28)
Equation (2.29)
Equation (2.30)
Equation (2.31)
Equation (2.32)
Equation (2.35)
Equation (3.3)
Equation (3.6)
Equation (3.8)
Proposition 3.2.

Given , let be defined by

Then the sequence belongs to if and only if and

with constants of equivalency depending only on when tends to zero (respectively, only on when tends to ). In particular, if , then

with the constant depending only on when tends to zero.

Equation (3.12)
Equation (3.15)
Proposition 3.3.

If and , then every defines a bounded operator on .

Proposition 3.4.

For each , let

and assume that . Then, given any , there exists for every a matrix which contains at most nonzero entries in each row and column and provides the approximation efficiency

Moreover, this result also holds for provided .

Equation (3.18)
Definition 3.6.

We say a matrix is in the class if there are two positive sequences and that are both summable and for every there exists a matrix with at most nonzero entries per row and column such that

We further define

where the minimum is taken over all such sequences and .

Corollary 3.7.

Let be defined by Equation 3.16. Then for every one has

Note that the sequences , can in this case be chosen to decay exponentially and that grows when approaches .

Proposition 3.8.

If the matrix is in the class , then maps boundedly into itself for ; that is, for any , we have

with the constant depending only on and the spectral norm .

Equation (3.24)
Equation (3.25)
Corollary 3.10.

Under the hypotheses of Proposition 3.8, for each , and for each , there is a such that

and

with and related as in Equation 3.9. Moreover, the approximation can be computed with arithmetic operations. In both of these statements, the constant depends only on and the spectral norm of .

Equation (4.1)
Equation (4.3)
Equation (4.4)
Equation (4.5)
Lemma 4.1.

Let and let be the residual associated to . If , and is any set that satisfies

then

where and , are the constants of Equation 2.21. As a consequence,

with .

Equation (4.11)
Corollary 4.2.

For the sets given by Algorithm I, the corresponding Galerkin approximations of satisfy

where

Consequently,

Lemma 4.3.

Let , let be in the class , and let and . Given any set , let be the smallest set such that and

Then

where is a constant depending only on when is large.

Theorem 4.5.

Let , let be in the class , and let , . Define

with the constants of subsection 2.3. Then, the Galerkin approximations , , generated by Algorithm I satisfy

where

and the constants , , satisfy

with the constant of Lemma 4.3.

Corollary 4.6.

If the hypotheses of Theorem 4.5 are valid and in addition

then

with a constant such that depends only on when . Consequently,

Remark 4.7.

Corollary 4.6 shows that if is bounded independently of , then we are successful in the goal that we fixed in the beginning of this section. One can also check that optimality is achieved in the sense of a target accuracy : Let be the smallest such that . Then, since , we obtain the estimate from Equation 4.23. From Equation 4.16, we also derive that . It follows that we have the desired estimate .

Remark 4.8.

Under the assumptions of Lemma 4.3, if the inverse matrices are uniformly bounded on , i.e.,

with , then

with the constant independent of .

Theorem 4.9.

Let for some . Then there are a and a constant such that for all and all ,

Corollary 4.10.

If for some , then there is an such that Algorithm I meets our goal for all . That is, for each , with , Algorithm I generates a sequence of sets , , such that

with a constant.

Lemma 4.11.

Suppose that and with . Then, for any one has

with the constant depending only on when tends to .

Equation (4.32)
Equation (4.33)
Lemma 4.12.

Let , let and let the solution to Equation 4.1 be in . For any index set generated by Algorithm I, we have

with the constant independent of and .

Equation (5.1)
Equation (5.2)
Lemma 5.1.

Suppose that , , and that satisfies

for some . Then, for any , we have

and

Corollary 5.2.

Suppose that , , and that satisfies

for some . Let be chosen as the smallest integer such that

Then

and

with and a constant depending only on as .

Equation (5.10)
Equation (5.12)
Equation (5.13)
Equation (5.15)
Remark 5.3.

If is any set, it follows from Corollary 5.2 that the input of into COARSE yields a set with a Galerkin solution which satisfies

where with from Equation 2.21 and from Equation 5.9.

Equation (5.17)
Theorem 5.5.

If , for some , then Algorithm II satisfies our goal for this . Namely, if , then the algorithm produces sets , , such that

with the constant of Remark 5.3. In addition, for we have

with and the constants of Equation 2.21.

Equation (6.2)
Properties 6.1.

For any and , we need at most arithmetic operations and sorting operations, , to compute the output of NCOARSE which, by construction, satisfies

Properties 6.2.

Assume that is an optimal -term approximation of the data with accuracy , as described by Equation 6.2. Then, for , NCOARSE produces an approximation to with support , such that . In addition, if , we have

with depending only on .

Moreover, determining requires at most arithmetic operations and no sorts, since sorting of was done in the preprocessing stage.

Properties 6.3.

Given a vector , a tolerance , and a finitely supported approximation to that satisfies

the algorithm NCOARSE produces a new approximation to , supported on , which satisfies

Moreover, the following properties hold:

(i)

If , for some , then the outputs and of NCOARSE satisfy

(ii)

If , for some , then the output of NCOARSE satisfies

where depends only on as .

(iii)

The cardinality of the support of is bounded by

Equation (6.11)
Equation (6.12)
Equation (6.13)
Equation (6.14)
Equation (6.15)
Properties 6.4.

Given a tolerance and a vector with finite support, the algorithm APPLY A produces a vector which satisfies

Moreover, if , with and , then the following properties hold:

(i)

The size of the output is bounded by

and the number of entries of needing to be computed is .

(ii)

The number of arithmetic operations needed to compute does not exceed with .

(iii)

The number of sorting operations needed to assemble the , , of does not exceed .

(iv)

The output vector satisfies

Properties 6.5.

The output of NRESIDUAL satisfies

Furthermore, if , with and (which in particular implies , see Proposition 3.8), then the following hold:

(i)

The support size of the output is bounded by

(ii)

The number of arithmetic operations used in NRESIDUAL does not exceed with

(iii)

The number of sorts needed to compute does not exceed .

(iv)

The output satisfies

Equation (6.22)
Equation (6.23)
Equation (6.26)
Equation (6.29)
Equation (6.31)
Equation (6.32)
Equation (6.33)
Equation (6.35)
Remark 6.6.

If we choose , where is the target accuracy, and if is the corresponding output of INRESIDUAL, then we have

so that

Note that conversely, since we also have by Equation 6.33

we are ensured that

i.e., the criterion will be met when the exact internal residual is small enough.

Properties 6.7.

Given as input a set , an initial approximation to the exact Galerkin solution which is supported on , an initial error estimate for and a target accuracy , the routine GALERKIN produces an approximation to which is supported on and satisfies

Moreover, if is the number of modified gradient iterations which have been used in GALERKIN to produce , one also has

with defined by Equation 6.32. Consequently, the number of iterations is always bounded by

Moreover, if , with and , then the following are true:

(i)

The output of GALERKIN satisfies

where the constant depends only on the number of iterations .

(ii)

The number of arithmetic operations used in GALERKIN is less than

where the constant depends only on the number of iterations . The number of sorting operations does not exceed .

Equation (7.3)
Equation (7.4)
Properties 7.1.

The residual computed by NGROW satisfies the estimate

If , with and , then the following are true:

(i)

The cardinality of the output of NGROW can be bounded by

where .

(ii)

The number of arithmetic operations used in NGROW is less than

(iii)

The number of sorting operations does not excede .

Equation (7.8)
Equation (7.10)
Equation (7.11)
Remark 7.2.

Given any , suppose that the parameters are chosen small enough that

Then, for constructed from as explained above, one has either

or

where

Equation (7.17)
Equation (7.18)
Equation (7.20)
Equation (7.21)
Equation (7.24)
Properties 7.3.

The output of NPROG satisfies

Moreover, if , with and , then the following are true:

(i)

One has the bound

and the cardinality of is bounded by

(ii)

The cardinality of all intermediate sets produced by NGROW can be bounded by

(iii)

The number of arithmetic operations used in NPROG is less than

The number of sorting operations does not excede .

Equation (7.30)
Theorem 7.6.

Assume that , with , and that is an isomorphism on , and suppose that the assumptions (N1)–(N3) are satisfied. Let be the solution of Equation 2.17, that is, . Then for any and any , Algorithm III produces an approximation with satisfying

Moreover, Algorithm III is optimal in the following sense. If , for some , then and the computation of requires at most arithmetic operations and at most sorting operations, where the constants are independent of and .

References

Reference [1]
A. Averbuch, G. Beylkin, R. Coifman, and M. Israeli, Multiscale inversion of elliptic operators, in: Signal and Image Representation in Combined Spaces, J. Zeevi, R. Coifman (eds.), Academic Press, 1998, pp. 341–359. MR 99a:65162
Reference [2]
I. Babuška and A. Miller, A feedback finite element method with a-posteriori error estimation: Part I. The finite element method and some basic properties of the a-posteriori error estimator, Comput. Methods Appl. Mech. Engrg. 61 (1987), 1–40. MR 88d:73036
Reference [3]
I. Babuška and W.C. Rheinboldt, Error estimates for adaptive finite element computations, SIAM J. Numer. Anal. 15 (1978), 736–754. MR 58:3400
Reference [4]
R.E. Bank, A.H. Sherman and A. Weiser, Refinement algorithms and data structures for regular local mesh refinement, in: R. Stepleman et al. (eds.), Scientific Computing, Amsterdam: IMACS, North-Holland, 1983, 3–17. MR 85i:00014
Reference [5]
R.E. Bank and A. Weiser, Some a posteriori error estimates for elliptic partial differential equations, Math. Comp., 44 (1985), 283–301. MR 86g:65207
Reference [6]
A. Barinka, T. Barsch, P. Charton, A. Cohen, S. Dahlke, W. Dahmen, K. Urban, Adaptive wavelet schemes for elliptic problems—implementation and numerical experiments, IGPM Report # 173, RWTH Aachen, June 1999.
[7]
S. Bertoluzza, A-posteriori error estimates for wavelet Galerkin methods, Appl Math. Lett., 8 (1995), 1–6. MR 96e:65053
[8]
S. Bertoluzza, An adaptive collocation method based on interpolating wavelets, in: Multiscale Wavelet Methods for PDEs, W. Dahmen, A. J. Kurdila, P. Oswald (eds.), Academic Press, San Diego, 1997, 109–135. MR 98d:65149
Reference [9]
S. Bertoluzza, Wavelet stabilization of the Lagrange multiplier method, to appear in Numer. Math.
Reference [10]
S. Bertoluzza, C. Canuto, K. Urban, On the adaptive computation of integrals of wavelets, Preprint No. 1129, Istituto di Analisi Numerica del C.N.R. Pavia, 1999, to appear in Appl. Numer. Anal.
Reference [11]
G. Beylkin, R. R. Coifman, and V. Rokhlin, Fast wavelet transforms and numerical algorithms I, Comm. Pure and Appl. Math., 44 (1991), 141–183. MR 92c:65061
Reference [12]
G. Beylkin and J. M. Keiser, An adaptive pseudo-wavelet approach for solving nonlinear partial differential equations, in: Multiscale Wavelet Methods for PDEs, W. Dahmen, A. J. Kurdila, P. Oswald (eds.), Academic Press, San Diego, 1997, 137–197. MR 98i:65124
Reference [13]
F. Bornemann, B. Erdmann, and R. Kornhuber, A posteriori error estimates for elliptic problems in two and three space dimensions, SIAM J. Numer. Anal., 33 (1996), 1188–1204. MR 98a:65161
Reference [14]
D. Braess, Finite Elements: Theory, Fast Solvers, and Applications in Solid Mechanics, Cambridge University Press, 1997. MR 98f:65002
Reference [15]
S. Brenner and L.R. Scott, The mathematical theory of finite element methods, Springer Verlag, New York, 1994. MR 95f:65001
Reference [16]
C. Canuto and I. Cravero, Wavelet-based adaptive methods for advection-diffusion problems, Preprint, University of Torino, 1996.
[17]
A. Cohen, Wavelet methods in Numerical Analysis, to appear in the Handbook of Numerical Analysis, vol. VII, 1998.
Reference [18]
A. Cohen and R. Masson, Wavelet adaptive methods for elliptic equations - Preconditioning and adaptivity, Preprint, LAN, Université Pierre et Marie Curie, Paris, 1997, to appear in SIAM J. Sci. Comp.
Reference [19]
S. Dahlke, W. Dahmen, and R. DeVore, Nonlinear approximation and adaptive techniques for solving elliptic equations, in: Multiscale Techniques for PDEs, W. Dahmen, A. Kurdila, and P. Oswald (eds), Academic Press, 1997, San Diego, 237–284. MR 99a:65001
Reference [20]
S. Dahlke, W. Dahmen, R. Hochmuth, and R. Schneider, Stable multiscale bases and local error estimation for elliptic problems, Applied Numerical Mathematics, 23 (1997), 21–47. MR 98a:65075
Reference [21]
S. Dahlke and R. DeVore, Besov regularity for elliptic boundary value problems, Communications in PDEs, 22(1997), 1–16. MR 97k:35047
Reference [22]
S. Dahlke, R. Hochmuth, K. Urban, Adaptive wavelet methods for saddle point problems, IGPM Report # 170, RWTH Aachen, Feb. 1999.
Reference [23]
W. Dahmen, Wavelet and multiscale methods for operator equations, Acta Numerica 6, Cambridge University Press, 1997, 55–228. MR 98m:65102
Reference [24]
W. Dahmen, A. Kunoth, R. Schneider, Wavelet least squares methods for boundary value problems, IGPM Report # 175, RWTH Aachen, Sep. 1999.
Reference [25]
W. Dahmen, S. Müller, and T. Schlinkmann, Multigrid and multiscale decompositions, in: Large-Scale Scientific Computations of Engineering and Environmental Problems, M. Griebel, O.P. Iliev, S.D. Margenov, and P.S. Vassilevski, eds., Notes on Numerical Fluid Mechanics, Vol. 62, Vieweg, Braunschweig/Wiesbaden, 1998, 18–41. MR 2000b:65228
Reference [26]
W. Dahmen, S. Prößdorf, and R. Schneider, Multiscale methods for pseudo-differential equations on smooth manifolds, in: Proceedings of the International Conference on Wavelets: Theory, Algorithms, and Applications, C.K. Chui, L. Montefusco, and L. Puccio (eds.), Academic Press, 1994, 385-424. MR 96c:65208
Reference [27]
W. Dahmen and R. Schneider, Wavelets on manifolds I, Construction and domain decomposition, IGPM-Report # 149, RWTH Aachen, Jan 1998.
Reference [28]
W. Dahmen and R. Schneider, Wavelets on manifolds II, Application to boundary integral equations, in preparation.
Reference [29]
W. Dahmen, R. Stevenson, Element-by-element construction of wavelets—stability and moment conditions, IGPM-Report # 145, RWTH Aachen, Dec. 1997.
Reference [30]
I. Daubechies, Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics, 61, SIAM Philadelphia, 1988. MR 93e:42045
Reference [31]
R. DeVore, Nonlinear approximation, Acta Numerica 7, Campbridge University Press, 1998, 51-150. CMP 99:16
[32]
R. DeVore, B. Jawerth and V. Popov, Compression of wavelet decompositions, Amer. J. Math., 114 (1992), 737–785. MR 94a:42045
[33]
R. DeVore, V. Popov, Interpolation spaces and nonlinear approximation, in: Function Spaces and Approximation, M. Cwikel et al, eds., Lecture Notes in Mathematics, vol.1302, 1998, Springer, 191–205. MR 89d:41035
Reference [34]
R. DeVore and V. Temlyakov, Some remarks on greedy algorithms, Advances in Computational Math., 5 (1996), 173–187. MR 97g:41029
Reference [35]
W. Dörfler, A convergent adaptive algorithm for Poisson’s equation, SIAM J. Numer. Anal., 33 (1996), 1106–1124. MR 97e:65139
Reference [36]
K. Eriksson, D. Estep, P. Hansbo, and C. Johnson, Introduction to adaptive methods for differential equations, Acta Numerica 4, Cambridge University Press, (1995), 105–158. MR 96k:65057
Reference [37]
M. Frazier, B. Jawerth, and G. Weiss, Littlewood-Paley theory and the study of function spaces, CBMS Conference Lecture Notes 79, (AMS, Providence, RI), 1991. MR 92m:42021
Reference [38]
Y. Meyer, Ondelettes et Operateurs, Vol 1 and 2, Hermann, Paris, 1990. MR 93i:42002; MR 93i:42003
[39]
E. Novak, On the power of adaptation, J. Complexity, 12(1996), 199–237.
Reference [40]
T. von Petersdorff, and C. Schwab, Fully discrete multiscale Galerkin BEM, in: Multiscale Wavelet Methods for PDEs, W. Dahmen, A. Kurdila, and P. Oswald (eds.), Academic Press, San Diego, 1997, 287–346. MR 99a:65158
Reference [41]
R. Schneider, Multiskalen- und Wavelet-Matrixkompression: Analysisbasierte Methoden zur effizienten Lösung großer vollbesetzter Gleichungssysteme, Habilitationsschrift, Technische Hochschule, Darmstadt, 1995.
Reference [42]
P. Tchamitchian, Wavelets, Functions, and Operators, in: Wavelets: Theory and Applications, G. Erlebacher, M.Y. Hussaini, and L. Jameson (eds.), ICASE/LaRC Series in Computational Science and Engineering, Oxford University Press, 1996, 83–181.

Article Information

MSC 2000
Primary: 41A25 (Rate of convergence, degree of approximation), 41A46 (Approximation by arbitrary nonlinear expressions; widths and entropy), 65F99 (None of the above, but in this section), 65N12 (Stability and convergence of numerical methods), 65N55 (Multigrid methods; domain decomposition)
Keywords
  • Elliptic operator equations
  • quasi-sparse matrices and vectors
  • best -term approximation
  • fast matrix vector multiplication
  • thresholding
  • adaptive space refinement
  • convergence rates
Author Information
Albert Cohen
Laboratoire d’Analyse Numerique, Universite Pierre et Marie Curie, 4 Place Jussieu, 75252 Paris cedex 05, France
cohen@ann.jussieu.fr
Homepage
MathSciNet
Wolfgang Dahmen
Institut für Geometrie und Praktische Mathematik, RWTH Aachen, Templergraben 55, 52056 Aachen, Germany
dahmen@igpm.rwth-aachen.de
Homepage
MathSciNet
Ronald DeVore
Department of Mathematics, University of South Carolina, Columbia, SC 29208
devore@math.sc.edu
Homepage
Additional Notes

This work has been supported in part by the Deutsche Forschungsgemeinschaft grants Da 117/8–2, the Office of Naval Research Contract N0014-91-J1343, the Army Research Office Contract DAAG 55-98-1-D002, and the TMR network “Wavelets in Numerical Simulation”.

Journal Information
Mathematics of Computation, Volume 70, Issue 233, ISSN 1088-6842, published by the American Mathematical Society, Providence, Rhode Island.
Publication History
This article was received on and published on .
Copyright Information
Copyright 2000 American Mathematical Society
Article References
  • Permalink
  • Permalink (PDF)
  • DOI 10.1090/S0025-5718-00-01252-7
  • MathSciNet Review: 1803124
  • Show rawAMSref \bib{1803124}{article}{ author={Cohen, Albert}, author={Dahmen, Wolfgang}, author={DeVore, Ronald}, title={Adaptive wavelet methods for elliptic operator equations: Convergence rates}, journal={Math. Comp.}, volume={70}, number={233}, date={2001-01}, pages={27-75}, issn={0025-5718}, review={1803124}, doi={10.1090/S0025-5718-00-01252-7}, }

Settings

Change font size
Resize article panel
Enable equation enrichment

Note. To explore an equation, focus it (e.g., by clicking on it) and use the arrow keys to navigate its structure. Screenreader users should be advised that enabling speech synthesis will lead to duplicate aural rendering.

For more information please visit the AMS MathViewer documentation.