Approximation by sums of ridge functions with fixed directions

Ismailov, V.

doi:10.1090/spmj/1471

Approximation by sums of ridge functions with fixed directions
HTML articles powered by AMS MathViewer

by V. E. Ismailov
Translated by: S. Kislyakov

St. Petersburg Math. J. 28 (2017), 741-772

DOI: https://doi.org/10.1090/spmj/1471

Published electronically: October 2, 2017

PDF | Request permission

Abstract:

The paper contains a survey of some results about approximation of functions of several variables by sums of ridge functions with fixed directions. Also, some new theorems are proved, both for uniform approximation and for approximation in $L_{2}$. These theorems generalize some results by the author known previously. The paper is finished by the study of the role of ridge functions in a problem of approximation by neural networks.

References

V. I. Arnol′d, On the representation of continuous functions of three variables by superpositions of continuous functions of two variables, Mat. Sb. (N.S.) 48 (90) (1959), 3–74 (Russian). MR 0121453
M.-B. A. Babaev, The approximation of polynomials of two variables by functions of the form $\varphi (x)+\psi (y)$, Dokl. Akad. Nauk SSSR 193 (1970), 967–969 (Russian). MR 0280915
M.-B. A. Babaev, Sharp estimates for the approximation of functions of several variables by sums of functions of a lesser number of variables, Mat. Zametki 12 (1972), 105–114 (Russian). MR 326243
Dietrich Braess and Allan Pinkus, Interpolation by ridge functions, J. Approx. Theory 73 (1993), no. 2, 218–236. MR 1216487, DOI 10.1006/jath.1993.1039
R. C. Buck, On approximation theory and functional equations, J. Approximation Theory 5 (1972), 228–237. MR 377363, DOI 10.1016/0021-9045(72)90016-0
Martin D. Buhmann and Allan Pinkus, Identifying linear combinations of ridge functions, Adv. in Appl. Math. 22 (1999), no. 1, 103–118. MR 1657745, DOI 10.1006/aama.1998.0623
Emmanuel J. Candès, Ridgelets: estimating with ridge functions, Ann. Statist. 31 (2003), no. 5, 1561–1599. MR 2012826, DOI 10.1214/aos/1065705119
—, Ridgelets: theory and applications, Ph.D. Thesis, Depart. Statist., Stanford Univ., Stanford, 1998.
T. Chen and H. Chen, Approximation of continuous functionals by neural networks with application to dynamic systems, IEEE Trans. Neural Networks 4 (1993), 910–918.
W. A. Light and E. W. Cheney, Approximation theory in tensor product spaces, Lecture Notes in Mathematics, vol. 1169, Springer-Verlag, Berlin, 1985. MR 817984, DOI 10.1007/BFb0075391
Charles K. Chui and Xin Li, Approximation by ridge functions and neural networks with one hidden layer, J. Approx. Theory 70 (1992), no. 2, 131–141. MR 1172015, DOI 10.1016/0021-9045(92)90081-X
R. C. Cowsik, A. Kłopotowski, and M. G. Nadkarni, When is $f(x,y)=u(x)+v(y)$?, Proc. Indian Acad. Sci. Math. Sci. 109 (1999), no. 1, 57–64. MR 1687171, DOI 10.1007/BF02837767
G. Cybenko, Approximation by superpositions of a sigmoidal function, Math. Control Signals Systems 2 (1989), no. 4, 303–314. MR 1015670, DOI 10.1007/BF02551274
Stephen Demko, A superposition theorem for bounded continuous functions, Proc. Amer. Math. Soc. 66 (1977), no. 1, 75–78. MR 457651, DOI 10.1090/S0002-9939-1977-0457651-5
Ronald A. DeVore, Konstantin I. Oskolkov, and Pencho P. Petrushev, Approximation by feed-forward neural networks, Ann. Numer. Math. 4 (1997), no. 1-4, 261–287. The heritage of P. L. Chebyshev: a Festschrift in honor of the 70th birthday of T. J. Rivlin. MR 1422683
Persi Diaconis and Mehrdad Shahshahani, On nonlinear functions of linear combinations, SIAM J. Sci. Statist. Comput. 5 (1984), no. 1, 175–191. MR 731890, DOI 10.1137/0905013
S. P. Diliberto and E. G. Straus, On the approximation of a function of several variables by the sum of functions of fewer variables, Pacific J. Math. 1 (1951), 195–210. MR 43882
David L. Donoho and Iain M. Johnstone, Projection-based approximation and a duality with kernel methods, Ann. Statist. 17 (1989), no. 1, 58–106. MR 981438, DOI 10.1214/aos/1176347004
Nira Dyn, W. A. Light, and E. W. Cheney, Interpolation by piecewise-linear radial basis functions. I, J. Approx. Theory 59 (1989), no. 2, 202–223. MR 1022117, DOI 10.1016/0021-9045(89)90152-4
B. L. Fridman, An improvement in the smoothness of the functions in A. N. Kolmogorov’s theorem on superpositions, Dokl. Akad. Nauk SSSR 177 (1967), 1019–1022 (Russian). MR 0225066
J. H. Friedman and J. W. Tukey, A projection pursuit algorithm for exploratory data analysis, IEEE Trans. Comput. 23 (1974), 881–890.
Jerome H. Friedman and Werner Stuetzle, Projection pursuit regression, J. Amer. Statist. Assoc. 76 (1981), no. 376, 817–823. MR 650892
A. L. Garkavi, V. A. Medvedev, and S. Ya. Khavinson, On the existence of a best uniform approximation of a function in two variables by sums $\phi (x)+\psi (y)$, Sibirsk. Mat. Zh. 36 (1995), no. 4, 819–827, ii (Russian, with Russian summary); English transl., Siberian Math. J. 36 (1995), no. 4, 707–713. MR 1367249, DOI 10.1007/BF02107327
M. von Golitschek and W. A. Light, Approximation by solutions of the planar wave equation, SIAM J. Numer. Anal. 29 (1992), no. 3, 816–830. MR 1163358, DOI 10.1137/0729050
Michael Golomb, Approximation by functions of fewer variables, On numerical approximation. Proceedings of a Symposium, Madison, April 21-23, 1958, Publication of the Mathematics Research Center, U.S. Army, the University of Wisconsin, no. 1, University of Wisconsin Press, Madison, Wis., 1959, pp. 275–327. Edited by R. E. Langer. MR 0102168
G. Gripenberg, Approximation by neural networks with a bounded number of nodes at each level, J. Approx. Theory 122 (2003), no. 2, 260–266. MR 1988304
K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural Networks 4 (1991), 251–257.
Peter J. Huber, Projection pursuit, Ann. Statist. 13 (1985), no. 2, 435–525. With discussion. MR 790553, DOI 10.1214/aos/1176349519
Vugar E. Ismailov, Approximation by neural networks with weights varying on a finite set of directions, J. Math. Anal. Appl. 389 (2012), no. 1, 72–83. MR 2876482, DOI 10.1016/j.jmaa.2011.11.037
Vugar E. Ismailov, A note on the representation of continuous functions by linear superpositions, Expo. Math. 30 (2012), no. 1, 96–101. MR 2899658, DOI 10.1016/j.exmath.2011.07.005
Vugar E. Ismailov, On the theorem of M Golomb, Proc. Indian Acad. Sci. Math. Sci. 119 (2009), no. 1, 45–52. MR 2508488, DOI 10.1007/s12044-009-0005-4
Vugar E. Ismailov, On the representation by linear superpositions, J. Approx. Theory 151 (2008), no. 2, 113–125. MR 2407861, DOI 10.1016/j.jat.2007.09.003
Vugar E. Ismailov, A note on the best $L_2$ approximation by ridge functions, Appl. Math. E-Notes 7 (2007), 71–76. MR 2295689
Vugar E. Ismailov, Characterization of an extremal sum of ridge functions, J. Comput. Appl. Math. 205 (2007), no. 1, 105–115. MR 2324828, DOI 10.1016/j.cam.2006.04.043
Vugar E. Ismailov, On error formulas for approximation by sums of univariate functions, Int. J. Math. Math. Sci. , posted on (2006), Art. ID 65620, 11. MR 2219166, DOI 10.1155/IJMMS/2006/65620
V. È. Ismailov, On methods for computing the exact value of the best approximation by sums of functions of one variable, Sibirsk. Mat. Zh. 47 (2006), no. 5, 1076–1082 (Russian, with Russian summary); English transl., Siberian Math. J. 47 (2006), no. 5, 883–888. MR 2266517, DOI 10.1007/s11202-006-0097-3
Yoshifusa Ito, Nonlinearity creates linear independence, Adv. Comput. Math. 5 (1996), no. 2-3, 189–203. MR 1399380, DOI 10.1007/BF02124743
—, Approximation of functions on a compact set by finite sums of a sigmoid function without scaling, Neural Networks 4 (1991), no. 6, 817–826.
—, Approximation of continuous functions on $\mathbb {R}^{d}$ by linear combinations of shifted rotations of a sigmoid function with and without scaling, Neural Networks 5 (1992), 105–115.
Fritz John, Plane waves and spherical means applied to partial differential equations, Interscience Publishers, New York-London, 1955. MR 0075429
I. G. Kazantsev, Tomographic reconstruction from arbitrary directions using ridge functions, Inverse Problems 14 (1998), no. 3, 635–645. MR 1630007, DOI 10.1088/0266-5611/14/3/014
I. Kazantsev and I. Lemahieu, Reconstruction of elongated structures using ridge functions and natural pixels, Inverse Problems 16 (2000), no. 6, 505–517.
S. Ya. Khavinson, Best approximation by linear superpositions (approximate nomography), Translations of Mathematical Monographs, vol. 159, American Mathematical Society, Providence, RI, 1997. Translated from the Russian manuscript by D. Khavinson. MR 1421322, DOI 10.1090/mmono/159
S. Ja. Havinson, A Čebyšev theorem for the approximation of a function of two variables by sums $\phi (x)+\psi (y)$, Izv. Akad. Nauk SSSR Ser. Mat. 33 (1969), 650–666 (Russian). MR 0262746
A. Kłopotowski, M. G. Nadkarni, and K. P. S. Bhaskara Rao, When is $f(x_1,x_2,\dots ,x_n)=u_1(x_1)+u_2(x_2)+\dots +u_n(x_n)$?, Proc. Indian Acad. Sci. Math. Sci. 113 (2003), no. 1, 77–86. Functional analysis (Kolkata, 2001). MR 1971557, DOI 10.1007/BF02829681
A. N. Kolmogorov, On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition, Dokl. Akad. Nauk SSSR 114 (1957), 953–956 (Russian). MR 0111809
M. Leshno, V. Ya. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward networks with a non-polynomial activation function can approximate any function, Neural Networks 6 (1993), 861–867.
Xin Li, Interpolation by ridge polynomials and its application in neural networks, J. Comput. Appl. Math. 144 (2002), no. 1-2, 197–209. MR 1909992, DOI 10.1016/S0377-0427(01)00560-X
W. A. Light and E. W. Cheney, On the approximation of a bivariate function by the sum of univariate functions, J. Approx. Theory 29 (1980), no. 4, 305–322. MR 598725, DOI 10.1016/0021-9045(80)90119-7
Will Light, Ridge functions, sigmoidal functions and neural networks, Approximation theory VII (Austin, TX, 1992) Academic Press, Boston, MA, 1993, pp. 163–206. MR 1212573, DOI 10.1144/GSL.SP.1993.071.01.08
Vladimir Ya. Lin and Allan Pinkus, Fundamentality of ridge functions, J. Approx. Theory 75 (1993), no. 3, 295–311. MR 1250542, DOI 10.1006/jath.1993.1104
B. F. Logan and L. A. Shepp, Optimal reconstruction of a function from its projections, Duke Math. J. 42 (1975), no. 4, 645–659. MR 397240
G. G. Lorentz, Metric entropy, widths, and superpositions of functions, Amer. Math. Monthly 69 (1962), 469–485. MR 141926, DOI 10.2307/2311185
V. Maiorov and A. Pinkus, Lower bounds for approximation by MLP neural networks, Neurocomputing 25 (1999), 81–91.
V. Maiorov, Approximation by neural networks and learning theory, J. Complexity 22 (2006), no. 1, 102–117. MR 2198503, DOI 10.1016/j.jco.2005.09.001
V. E. Maiorov, On best approximation by ridge functions, J. Approx. Theory 99 (1999), no. 1, 68–94. MR 1696577, DOI 10.1006/jath.1998.3304
Robert B. Marr, On the reconstruction of a function on a circular domain from a sampling of its line integrals, J. Math. Anal. Appl. 45 (1974), 357–374. MR 336156, DOI 10.1016/0022-247X(74)90078-X
Donald E. Marshall and Anthony G. O’Farrell, Approximation by a sum of two algebras. The lightning bolt principle, J. Funct. Anal. 52 (1983), no. 3, 353–368. MR 712586, DOI 10.1016/0022-1236(83)90074-5
Donald E. Marshall and Anthony G. O’Farrell, Uniform approximation by real functions, Fund. Math. 104 (1979), no. 3, 203–211. MR 559174, DOI 10.4064/fm-104-3-203-211
F. Natterer, The mathematics of computerized tomography, B. G. Teubner, Stuttgart; John Wiley & Sons, Ltd., Chichester, 1986. MR 856916
K. I. Oskolkov, Ridge approximation, Fourier-Chebyshev analysis, and optimal quadrature formulas, Tr. Mat. Inst. Steklova 219 (1997), no. Teor. Priblizh. Garmon. Anal., 269–285 (Russian); English transl., Proc. Steklov Inst. Math. 4(219) (1997), 265–280. MR 1642280
Phillip A. Ostrand, Dimension of metric spaces and Hilbert’s problem $13$, Bull. Amer. Math. Soc. 71 (1965), 619–622. MR 177391, DOI 10.1090/S0002-9904-1965-11363-5
Pencho P. Petrushev, Approximation by ridge functions and neural networks, SIAM J. Math. Anal. 30 (1999), no. 1, 155–189. MR 1646689, DOI 10.1137/S0036141097322959
Allan Pinkus, Approximating by ridge functions, Surface fitting and multiresolution methods (Chamonix–Mont-Blanc, 1996) Vanderbilt Univ. Press, Nashville, TN, 1997, pp. 279–292. MR 1660030
Allan Pinkus, Approximation theory of the MLP model in neural networks, Acta numerica, 1999, Acta Numer., vol. 8, Cambridge Univ. Press, Cambridge, 1999, pp. 143–195. MR 1819645, DOI 10.1017/S0962492900002919
Walter Rudin, Functional analysis, McGraw-Hill Series in Higher Mathematics, McGraw-Hill Book Co., New York-Düsseldorf-Johannesburg, 1973. MR 0365062
Marcello Sanguineti, Universal approximation by ridge computational models and neural networks: a survey, Open Appl. Math. J. 2 (2008), 31–58. MR 2399691, DOI 10.2174/1874114200802010031
Laurent Schwartz, Théorie générale des fonctions moyenne-périodiques, Ann. of Math. (2) 48 (1947), 857–929 (French). MR 23948, DOI 10.2307/1969386
Y. Shin and J. Ghosh, Ridge polynomial networks, IEEE Trans. Neural Networks 6 (1995), 610–622.
David A. Sprecher, An improvement in the superposition theorem of Kolmogorov, J. Math. Anal. Appl. 38 (1972), 208–213. MR 302838, DOI 10.1016/0022-247X(72)90129-1
J. P. Sproston and D. Strauss, Sums of subalgebras of $C(X)$, J. London Math. Soc. (2) 45 (1992), no. 2, 265–278. MR 1171554, DOI 10.1112/jlms/s2-45.2.265
Y. Sternfeld, Dimension, superposition of functions and separation of points, in compact metric spaces, Israel J. Math. 50 (1985), no. 1-2, 13–53. MR 788068, DOI 10.1007/BF02761117
Yaki Sternfeld, Uniform separation of points and measures and representation by sums of algebras, Israel J. Math. 55 (1986), no. 3, 350–362. MR 876401, DOI 10.1007/BF02765032
Y. Sternfeld, Uniformly separating families of functions, Israel J. Math. 29 (1978), no. 1, 61–91. MR 487991, DOI 10.1007/BF02760402
M. Stinchcombe and H. White, Approximating and learning unknown mappings using multilayer feedforward networks with bounded weights, Proc. Internat. Joint Conf. Neural Networks, vol. 3, IEEE, New York, 1990, pp. 7–16.
V. N. Temlyakov, On approximation by ridge functions, Preprint. Depart. Math., Univ. South Carolina, 1996.
A. G. Vituškin and G. M. Henkin, Linear superpositions of functions, Uspehi Mat. Nauk 22 (1967), no. 1 (133), 77–124 (Russian). MR 0237729
B. A. Vostrecov and M. A. Kreĭnes, Approximation of continuous functions by superpositions of plane waves. , Dokl. Akad. Nauk SSSR 140 (1961), 1237–1240 (Russian). MR 0131106
Wei Wu, Guorui Feng, and Xin Li, Training multilayer perceptrons via minimization of sum of ridge functions, Adv. Comput. Math. 17 (2002), no. 4, 331–347. MR 1916983, DOI 10.1023/A:1016249727555
Ting Fan Xie and Fei Long Cao, The ridge function representation of polynomials and an application to neural networks, Acta Math. Sin. (Engl. Ser.) 27 (2011), no. 11, 2169–2176. MR 2843194, DOI 10.1007/s10114-011-9407-1