Tractability of sampling recovery on unweighted function classes

It is well-known that the problem of sampling recovery in the $L_2$-norm on unweighted Korobov spaces (Sobolev spaces with mixed smoothness) as well as classical smoothness classes such as H\"older classes suffers from the curse of dimensionality. We show that the problem is tractable for those classes if they are intersected with the Wiener algebra of functions with summable Fourier coefficients. In fact, this is a relatively simple implication of powerful results from the theory of compressed sensing. Tractability is achieved by the use of non-linear algorithms, while linear algorithms cannot do the job.

We consider the problem of recovering a high-dimensional function f : [0, 1] d → C from a class F d using algorithms of the form (1) A n : with sampling points x j ∈ [0, 1] d and a recovery map φ : C n → L 2 .The error is measured in the L 2 -norm and in a worst-case setting, i.e., err(A n , F d , L 2 ) := sup It is known that this approximation problem suffers from the curse of dimensionality for most classical function classes F d , including the smoothness classes of k-times continuously differentiable functions [16,17] or Sobolev spaces of mixed smoothness [15].That is, there exist C, γ, ε > 0 such that the (information) complexity A classical approach to make this problem tractable in high dimensions is to consider weighted function classes F d , assuming a decreasing importance of the input variables.This approach goes back to [27] and has gained significant popularity, see also Remark 14 about weighted Korobov spaces.Unfortunately, it is much harder to provide reasonable unweighted function classes where the approximation problem is tractable.By unweighted, we mean that all variables are of equal importance.Formally, one may call a function class F d unweighted if f • π ∈ F d for any f ∈ F d and any permutation matrix π.
It was recently discovered by Goda [13] that numerical integration is polynomially tractable on the unweighted class Namely, the number of sampling points that are needed to approximate the integral of any function from F log d up to an error ε is bounded above by a polynomial in ε −1 and d.Goda refines a method of Dick who already showed in [6] that integration is tractable on the class of α-Hölder continuous functions with absolutely convergent Fourier series.The class F log d avoids the Hölder condition by strengthening the condition of absolute convergence in a very slight manner.
Those results raise the question whether the problem of L 2 -approximation on F log d , whose complexity can only be higher than the complexity of the integration problem, is polynomially tractable as well.We give a positive answer.
Theorem 1.There is a constant c > 0 such that for all d ∈ N and ε ∈ (0, 1/2), we have More generally, the problem of sampling recovery on F log d in L p is tractable for all 1 ≤ p < ∞, see Corollary 13.In fact, we will show that L p -approximation is polynomially tractable for many classical unweighted classes of functions, like Sobolev spaces of mixed smoothness or Hölder continuous functions, if it is additionally assumed that the Fourier series converges absolutely.See Theorem 11 for a general result and Corollaries 15 and 17 for the corresponding examples.Those findings are consequences of powerful results from the theory of compressive sensing, as will be discussed in Section 1.
Interestingly, the curse of dimensionality is only mitigated due to the use of non-linear sampling algorithms.If one only allows linear algorithms, using linear recovery maps φ in (1), the L 2 -approximation problem on F log d suffers from the curse of dimensionality, see Lemma 18.This fact also implies that the sampling recovery on Remark 2. For numerical integration on the class F log d , Goda [13] obtained an upper bound of order d 3 ε −3 .Since the upper bound from Theorem 1 also holds for numerical integration, we improve upon this bound if the dimension d is larger than log 3/2 (ε −1 ).While the present paper was under review, Goda's upper bound on the complexity of integration has been further improved to dε −3 , see [3].  3) is from the first version of the paper [13] that appeared on arXiv in October 2022.In a later version, Goda replaced the term k ∞ by min i∈supp(k) |k i | and thus obtained tractability on an even larger class.I believe that this replacement is only possible for the integration problem and it will lead to intractability in the case of the approximation problem.
Remark 4. Other unweighted classes of functions where the approximation problem is tractable are high-dimensional functions that show some "low-dimensional structure" like tensor products of univariate functions [1,21,24], sums of low-dimensional functions [30], or compositions of univariate functions with linear functionals [23].Moreover, it was proven in [31] that the approximation problem is weakly tractable for the class of analytic functions defined on the cube with directional derivatives of all orders bounded by 1.For the integration problem, we mention the star-discrepancy as a further example [14].
Remark 5. Recently, the papers [5,19] also derived new bounds on the complexity of sampling recovery in L 2 using non-linear algorithms.It is shown in [19,Thm. 3.2] that the n-th minimal error is essentially bounded by the s-term widths of the class F d in L ∞ where it suffices to take n quasi-linear in both s and d.The paper [19] provides several examples of classical smoothness classes F d where non-linear sampling algorithms have a better rate of convergence than any linear algorithm.Here, we use instead a bound by s −1/2 times the s-term widths of the class F d in the Wiener algebra, see Lemma 9 and Remark 12.The extra factor s −1/2 eases the tractability analysis significantly.It would be interesting to know whether [19,Thm. 3.2] also leads to tractability results for the class F log d or related ones.Remark 6.It is remarkable that the complexity bounds in this paper are not obtained with a sophisticated choice of interpolation nodes but rather with i.i.d.uniformly distributed sampling points, see Lemma 9. Recently, there has been much interest in the surprising power of i.i.d.random information in comparison to optimal information.I refer to the survey paper [28].

A general complexity bound
Let µ be a probability measure on a set D and let We say that B is a bounded orthonormal system.Whenever (c k ) k∈I is an absolutely summable sequence of complex numbers, the series f = k∈I c k b k converges uniformly on D and we have We consider the Wiener algebra ( 5) and denote its unit ball by A 0 (B).For a finite index set Λ ⊂ I and f ∈ L 2 , we write and denote by T (Λ) the span of the functions b k with k ∈ Λ.Given a class F ⊂ A, we define the projection error We consider the following algorithm.
Algorithm 7. Given a class F ⊂ A, a finite index set Λ ⊂ I, and a finite and non-empty point set X ⊂ D, we consider the algorithm A F,Λ,X which maps a function f ∈ F to any solution of (6) min g A subject to g ∈ T (Λ) The algorithm is well-defined since the function g = P Λ (f ) satisfies the constraints of the minimization problem.Moreover, the output of the algorithm merely depends on f via the data (f (x)) x∈X and thus A F,Λ,X is a sampling algorithm of the form (1). Remark 8. Algorithm 7 is a direct translation of a classical algorithm from compressed sensing, called basis pursuit denoising.Namely, if we denote The insight that ℓ 1 -minimization can be of great advantage over linear algorithms goes back at least to [2,8,11] and the problem of ℓ 2 -recovery on ℓ 1 -balls in R m .Different methods of computing a solution to (7) can be found in [10,Ch. 15].
We now give an error bound for Algorithm 7. The error is expressed in terms of the best s-term widths of F in A, defined by We derive our error bound as a simple implication of [26,Thm. 6.1], where we reinterpret our data on f as noisy data on P Λ f .The key ingredient to error bounds such as [26, Thm.6.1] is the restricted isometry property of the matrix G = (b k (x)) k∈Λ,x∈X ; For m random points, the matrix m −1/2 G acts as a quasi-isometry on the set of vectors with at most s non-zero entries, where it suffices that m is quasi-linear in s and logarithmic in #Λ, see [10,Thm. 12.32] or [26,Thm. 5.2].This property implies that vectors c from the unit ball in ℓ 1 (Λ) can be recovered well from data of the form y = Gc + e if the noise e is sufficiently small, see [10,Thm. 6.12].In our case, we want to recover the vector c = ( f (k)) k∈Λ .The data is given by y = (f (x)) x∈X while Gc = (P Λ f (x)) x∈X and so the entries of e are bounded by E ∞ Λ (F ).
Lemma 9 (Compare [26, Thm.6.1]).There is a universal constant c ≥ 1 such that the following holds.Let B = {b k | k ∈ I} be a bounded orthonormal system with respect to a probability measure µ and let F ⊂ A(B).For γ ∈ (0, 1), s ≥ 2C 2 B , and Λ ⊂ I, let X m be a set of m ≥ c log(γ −1 ) s log 3 (s) log(#Λ) independent random points with distribution µ.Then, with probability at least 1 − γ, we have for all 2 ≤ p ≤ ∞ and f ∈ F that Proof.We use [26, Thm.6.1] with b j instead of φ j and ω j = C B .We choose c 0 as the maximum of all constants appearing in [26, Thm.6.1] and c = c 0 + 1.Then X m = {t 1 , . . ., t m } satisfies the conclusion of [26, Thm.6.1] with probability 1 − γ.Given g ∈ F , we apply this conclusion to the function f = P Λ g and samples [26,Thm. 6.1] equals the function A F,Λ,Xm (g).By the triangle inequality and Jensen's inequality, we have

It only remains to apply h
This leads to the following complexity bound.Recall the definition of the (information) complexity of sampling recovery in L 2 from (2) which is defined analogously for sampling recovery in L p .We also define the minimal size of an index set needed for a projection error ε > 0, i.e., ( 8) Proposition 10.There is a universal constant C ≥ 2 such that, for any bounded orthonormal system B, any F ⊂ A 0 (B), all 1 ≤ p < ∞ and ε ∈ (0, 1), we have with ε = ε/(CC B ) and r = max{p, 2}.
Proof.The case p < 2 follows from the case p = 2, so let p ≥ 2. Put C = 2c with c from Lemma 9. We fix γ = e −1 and choose Lemma 9 yields the statement.
Proposition 10 leads to our main result on the tractability of L papproximation.Here, given classes F d ⊂ A d for each d ∈ N, we say that sampling recovery on F d in L p is polynomially tractable if ∃C, q, r ≥ 0 : ∀d ∈ N ∀ε > 0 : n(ε, F d , L p ) ≤ Cd q ε −r .
Theorem 11.For every d ∈ N, let B d be a bounded orthonormal system and F d ⊂ A 0 (B d ).Assume that there are positive constants c 1 , c 2 , α, β, and γ such that That is, sampling recovery in L p on classes with absolutely convergent (generalized) Fourier series is tractable even if the number of Fourier coefficients needed for an ε-approximation of functions f ∈ F d in the uniform norm grows super-exponentially in d and ε −1 , as long as the growth is not double-exponential.
Remark 12 (Rate of convergence).For our tractability analysis, it was enough to observe that σ n (F, A) ≤ 1 for all F ⊂ A 0 .In fact, it would not help for the classes considered in this paper to take the decay of the widths σ n (F, A) into account as n would have to be exponentially large in d if we want widths significantly smaller than one.This is different if one is interested in studying the rate of convergence of the nth minimal error err(n, F d , L p ) := inf Often (e.g., for Sobolev classes of mixed smoothness) one can choose the size of Λ polynomial in n in order to obtain from Lemma 9 that err(Cn log 4 n, F, L p ) n −1/p σ n (F, A).
In comparison, the paper [19] recently revealed estimates of the form err(Cn log 4 n, F, L 2 ) σ n (F, B) L∞ by means of the best n-term widths in L ∞ instead of A, where the power of the logarithm can be reduced to 3 for the trigonometric system.

Results for specific classes
We now consider the Fourier system Corollary 13.Let 2 ≤ p < ∞.There is a constant c > 0 such that for all d ≥ 2 and ε ∈ (0, 1/2), we have Proof.For any m ∈ N and f ∈ F log d , we have and thus . Now the bound is obtained from Proposition 10.
As a second example, we consider Sobolev spaces of mixed smoothness s > 1/2 (also called unweighted Korobov spaces), namely It is well known that L 2 -approximation on H s,d mix suffers from the curse of dimensionality, see, e.g., [15].The curse is relinquished in the presence of absolutely summable Fourier coefficients, namely, for the class Remark 14 (Weighted Korobov spaces).A classical approach to relinquish the curse of dimensionality on H s,d mix is by introducing (product) weights 1 ≥ γ 1 ≥ γ 2 ≥ . . .> 0 and considering the class see, e.g., [9,22,25] and [7,Ch. 13] and the references therein.The sequence γ models a decreasing importance of the variables.It is known that L 2 -approximation on H s,d,γ mix is polynomially tractable if the weights decay fast enough, namely, if there is a constant K > 0 with see [25,Thm. 1].The present approach is more general: For any γ as above, a simple computation shows the inclusion (9) H s,d,γ mix ⊂ C d q F s,d mix with constants C, q > 0 that only depend on K and s.
Corollary 15.For any s > 1/2 and 1 ≤ p < ∞, the problem of L papproximation is polynomially tractable on F s,d mix .More precisely, for p ≥ 2, there is a constant C > 0, depending only on s and p, such that for all d ≥ 2 and ε ∈ (0, 1/2), we have Proof.For m ∈ N and f ∈ F s,d mix , we get from Hölder's inequality that with c s > 0 depending only on s, and thus 1) .Now the bound is obtained from Proposition 10.
Remark 16.In regard of (9), we note that (10) n(ε, rF d , L p ) ≤ n(ε/(2r), F d , L p ) holds for any convex and symmetric class F d and any r ≥ 1, so that tractability on F s,d mix indeed implies tractability on the classes H s,d,γ mix .Equation (10) follows from the optimality of homogeneous algorithms for linear problems, see [20,Thm. 1].
As a third example, we consider the class of Hölder functions that has been studied by Dick [6] for numerical integration, namely, with some 0 < α ≤ 1.Here, the supremum is taken over all x, y ∈ R d , considering f as a 1-periodic function on R d .
Corollary 17.For any α ∈ (0, 1] and 1 ≤ p < ∞, the problem of L p -approximation is polynomially tractable on F s,d mix .More precisely, for p ≥ 2, there is a constant c > 0, depending only on α, such that for all d ≥ 2 and ε ∈ (0, 1/2), we have Proof.Let S m = P [−m,m] be the univariate Fourier sum operator and C α per be the class of univariate α-Hölder continuous functions with Hölder constant one.It is well known that for all m ≥ 2 and all f ∈ C α per , where c 1 , c 2 > 0 depend on nothing but α, see [18].Let now f ∈ F α d .For fixed x 1 , . . ., x i−1 , x i+1 , . . ., x d ∈ [0, 1], the function f as a function of x i is in C α per .By the triangle inequality, . Now the bound is obtained from Proposition 10.
It is interesting to note that the polynomial order d 2 ε −2 that we obtained for L 2 -approximation is the same as the one obtained in [6] for numerical integration.It is surely an interesting open problem to improve the complexity bounds in Corollaries 13, 15 and 17 and/or provide matching lower bounds.

Intractability with linear algorithms
We only have tractability for the class F log d thanks to non-linear algorithms.If we restrict to linear algorithms, we still have the curse of dimensionality.This is implied by a result of [12], see also [29,Lem. 3.2].Namely, it is known that for any m, n ∈ N, m > n, and any linear mapping Note that the curse of dimensionality even holds for the class of all linear algorithms, using arbitrary linear information, instead of only sampling based algorithms.
Proof.Let A n : F log d → L 2 be a linear operator of rank n ≤ 5 d /2 and let Λ = {−2, −1, 0, 1, 2} d so that #Λ = 5 d .Consider the linear mapping T : C Λ → T (Λ) that maps a vector of coefficients to the corresponding trigonometric polynomial.Then T n = T −1 P Λ A n T : C Λ → C Λ is a linear mapping with rank at most n and (11) yields that there is some x ∈ C Λ with x 1 ≤ 1 such that The function f = T x is contained in F log d .Moreover, f ∈ T (Λ) and thus (12) f as claimed.
Clearly, the lower bound (12) also holds for the uniform instead of the L 2 -norm.Since linear algorithms are optimal for the recovery problem in the uniform norm, see [4], Lemma 18 implies that the problem of uniform recovery on the class F log d is intractable.Namely, the following holds for all algorithms of the form (13) A n : F log In a similar way, one can show that polynomial tractability cannot be achieved with linear algorithms for the classes F s,d mix and F α d from Section 2. We omit the details.

Remark 3 .
The function class F log d defined in ( with the Lebesgue measure and satisfies C B d = 1.We obtain the classical Wiener algebra A d := A(B d ) of periodic functions with absolutely convergent Fourier series.We consider three examples, starting with the class F log d from the introduction.The following corollary also proves Theorem 1.

.
The problem of L 2 -approximation on the classes F log d with linear algorithms suffers from the curse of dimensionality.For any linear mapping A n : F log d → L 2 with rank n ≤ 5 d /2 we have err

d→
L ∞ ([0, 1] d ), A n (f ) = φ(L 1 (f ), . . ., L n (f ))where L 1 , . . ., L n : F log d → C are linear functionals, possibly chosen adaptively, and φ : C n → L ∞ .This includes the sampling algorithms of the form (1). Corollary 19.The problem of (sampling) recovery in the uniform norm on the classes F log d suffers from the curse of dimensionality.For any mapping A n : F log d → L ∞ of the form (13) with n ≤ 5 d /2 we have