AMS eBook CollectionsOne of the world's most respected mathematical collections, available in digital format for your library or institution
Generalized Mercer Kernels and Reproducing Kernel Banach Spaces
About this Title
Yuesheng Xu, School of Data and Computer Science, Guangdong Province Key Laboratory of Computational Science, Sun Yat-Sen University, Guangzhou, Guangdong 510275 P. R. China and Qi Ye, School of Mathematical Sciences, South China Normal University, Guangzhou, Guangdong 510631 P. R. China
Publication: Memoirs of the American Mathematical Society
Publication Year:
2019; Volume 258, Number 1243
ISBNs: 978-1-4704-3550-9 (print); 978-1-4704-5077-9 (online)
DOI: https://doi.org/10.1090/memo/1243
Published electronically: February 21, 2019
Keywords: Reproducing Kernel Banach Spaces,
Generalized Mercer Kernels,
Positive Definite Kernels,
Machine Learning,
Support Vector Machines,
Sparse Learning Methods.
MSC: Primary 68Q32, 68T05; Secondary 46E22, 68P01
Table of Contents
Chapters
- 1. Introduction
- 2. Reproducing Kernel Banach Spaces
- 3. Generalized Mercer Kernels
- 4. Positive Definite Kernels
- 5. Support Vector Machines
- 6. Concluding Remarks
- Acknowledgments
Abstract
This article studies constructions of reproducing kernel Banach spaces (RKBSs) which may be viewed as a generalization of reproducing kernel Hilbert spaces (RKHSs). A key point is to endow Banach spaces with reproducing kernels such that machine learning in RKBSs can be well-posed and of easy implementation. First we verify many advanced properties of the general RKBSs such as density, continuity, separability, implicit representation, imbedding, compactness, representer theorem for learning methods, oracle inequality, and universal approximation. Then, we develop a new concept of generalized Mercer kernels to construct p-norm RKBSs for 1 less than or equal to p less than or equal to infinity. The p-norm RKBSs preserve the same simple format as the Mercer representation of RKHSs. Moreover, the p-norm RKBSs are isometrically equivalent to the standard p-norm spaces of countable sequences. Hence, the p-norm RKBSs possess more geometrical structures than RKHSs including sparsity. To be more precise, the suitable countable expansion terms of the generalized Mercer kernels can be used to represent the pairs of Schauder bases and biorthogonal systems of the p-norm RKBSs such that the generalized Mercer kernels become the reproducing kernels of the p-norm RKBSs. The generalized Mercer kernels also cover many well-known kernels, for example, min kernels, Gaussian kernels, and power series kernels. Finally, we propose to solve the support vector machines in the p-norm RKBSs, which are to minimize the regularized empirical risks over the p-norm RKBSs. We show that the infinite dimensional support vector machines in the p-norm RKBSs can be equivalently transferred to finite dimensional convex optimization problems such that we obtain the finite dimensional representations of the support vector machine solutions for practical applications. In particular, we verify that some special support vector machines in the 1-norm RKBSs are equivalent to the classical 1-norm sparse regressions. This gives fundamental supports of a novel learning tool called sparse learning methods to be investigated in our next research project.- Andreas Argyriou, Charles A. Micchelli, and Massimiliano Pontil, When is there a representer theorem? Vector versus matrix regularizers, J. Mach. Learn. Res. 10 (2009), 2507–2529. MR 2576327
- Andreas Argyriou, Charles A. Micchelli, and Massimiliano Pontil, On spectral learning, J. Mach. Learn. Res. 11 (2010), 935–953. MR 2600635, DOI 10.1093/protein/6.4.383
- N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (1950), 337–404. MR 51437, DOI 10.1090/S0002-9947-1950-0051437-7
- Alfred M. Bruckstein, David L. Donoho, and Michael Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Rev. 51 (2009), no. 1, 34–81. MR 2481111, DOI 10.1137/060657704
- José M. Bioucas-Dias and Mário A. T. Figueiredo, A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration, IEEE Trans. Image Process. 16 (2007), no. 12, 2992–3004. MR 2472806, DOI 10.1109/TIP.2007.909319
- B. E. Boser, I. Guyon, and V. Vapnik, A training algorithm for optimal margin classifiers, The Fifth Annual ACM Workshop on Computational Learning Theory (Pittsburgh, PA) (D. Haussler, ed.), ACM Press, 1992, pp. 144–152.
- Ronald E. Bruck Jr., On the weak convergence of an ergodic iteration for the solution of variational inequalities for monotone operators in Hilbert space, J. Math. Anal. Appl. 61 (1977), no. 1, 159–164. MR 636416, DOI 10.1016/0022-247X(77)90152-4
- Amir Beck and Marc Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci. 2 (2009), no. 1, 183–202. MR 2486527, DOI 10.1137/080716542
- Alain Berlinet and Christine Thomas-Agnan, Reproducing kernel Hilbert spaces in probability and statistics, Kluwer Academic Publishers, Boston, MA, 2004. With a preface by Persi Diaconis. MR 2239907
- M. D. Buhmann, Radial basis functions: theory and implementations, Cambridge Monographs on Applied and Computational Mathematics, vol. 12, Cambridge University Press, Cambridge, 2003. MR 1997878
- Antonin Chambolle, Ronald A. DeVore, Nam-yong Lee, and Bradley J. Lucier, Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage, IEEE Trans. Image Process. 7 (1998), no. 3, 319–335. MR 1669536, DOI 10.1109/83.661182
- Zhongying Chen, Charles A. Micchelli, and Yuesheng Xu, Multiscale methods for Fredholm integral equations, Cambridge Monographs on Applied and Computational Mathematics, vol. 28, Cambridge University Press, Cambridge, 2015. MR 3381986
- Emmanuel J. Candès, Justin K. Romberg, and Terence Tao, Stable signal recovery from incomplete and inaccurate measurements, Comm. Pure Appl. Math. 59 (2006), no. 8, 1207–1223. MR 2230846, DOI 10.1002/cpa.20124
- Felipe Cucker and Steve Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc. (N.S.) 39 (2002), no. 1, 1–49. MR 1864085, DOI 10.1090/S0273-0979-01-00923-5
- C. Cortes and V. Vapnik, Support vector networks, Mach. Learn. 20 (1995), 273–297.
- Ingrid Daubechies, Michel Defrise, and Christine De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Appl. Math. 57 (2004), no. 11, 1413–1457. MR 2077704, DOI 10.1002/cpa.20042
- David L. Donoho and Michael Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via $l^1$ minimization, Proc. Natl. Acad. Sci. USA 100 (2003), no. 5, 2197–2202. MR 1963681, DOI 10.1073/pnas.0437847100
- Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle, A multilinear singular value decomposition, SIAM J. Matrix Anal. Appl. 21 (2000), no. 4, 1253–1278. MR 1780272, DOI 10.1137/S0895479896305696
- Jean Duchon, Splines minimizing rotation-invariant semi-norms in Sobolev spaces, Constructive theory of functions of several variables (Proc. Conf., Math. Res. Inst., Oberwolfach, 1976) Springer, Berlin, 1977, pp. 85–100. Lecture Notes in Math., Vol. 571. MR 0493110
- Dean G. Duffy, Green’s functions with applications, Studies in Advanced Mathematics, Chapman & Hall/CRC, Boca Raton, FL, 2001. MR 1888091
- John F. Erickson and Gregory E. Fasshauer, Generalized native spaces, Approximation theory XII: San Antonio 2007, Mod. Methods Math., Nashboro Press, Brentwood, TN, 2008, pp. 133–142. MR 2537124
- Ivar Ekeland and Thomas Turnbull, Infinite-dimensional optimization and convexity, Chicago Lectures in Mathematics, University of Chicago Press, Chicago, IL, 1983. MR 769469
- Gregory E. Fasshauer, Meshfree approximation methods with MATLAB, Interdisciplinary Mathematical Sciences, vol. 6, World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2007. With 1 CD-ROM (Windows, Macintosh and UNIX). MR 2357267
- G. E. Fasshauer, F. J. Hickernell, and H. Woźniakowski, Average case approximation: convergence and tractability of Gaussian kernels, Monte Carlo and quasi-Monte Carlo methods 2010, Springer Proc. Math. Stat., vol. 23, Springer, Heidelberg, 2012, pp. 329–344. MR 3173842, DOI 10.1007/978-3-642-27440-4_{1}6
- Gregory E. Fasshauer, Fred J. Hickernell, and Henryk Woźniakowski, On dimension-independent rates of convergence for function approximation with Gaussian kernels, SIAM J. Numer. Anal. 50 (2012), no. 1, 247–271. MR 2888312, DOI 10.1137/10080138X
- Gregory E. Fasshauer, Fred J. Hickernell, and Qi Ye, Solving support vector machines in reproducing kernel Banach spaces with positive definite functions, Appl. Comput. Harmon. Anal. 38 (2015), no. 1, 115–139. MR 3273290, DOI 10.1016/j.acha.2014.03.007
- Bengt Fornberg, Elisabeth Larsson, and Natasha Flyer, Stable computations with Gaussian radial basis functions, SIAM J. Sci. Comput. 33 (2011), no. 2, 869–892. MR 2801193, DOI 10.1137/09076756X
- Gregory E. Fasshauer and Michael J. McCourt, Stable evaluation of Gaussian radial basis function interpolants, SIAM J. Sci. Comput. 34 (2012), no. 2, A737–A762. MR 2914302, DOI 10.1137/110824784
- Mário A. T. Figueiredo and Robert D. Nowak, An EM algorithm for wavelet-based image restoration, IEEE Trans. Image Process. 12 (2003), no. 8, 906–916. MR 2008658, DOI 10.1109/TIP.2003.814255
- Gregory E. Fasshauer and Qi Ye, Reproducing kernels of generalized Sobolev spaces via a Green function approach with distributional operators, Numer. Math. 119 (2011), no. 3, 585–611. MR 2845629, DOI 10.1007/s00211-011-0391-2
- Gregory E. Fasshauer and Qi Ye, Reproducing kernels of Sobolev spaces via a green kernel approach with differential operators and boundary operators, Adv. Comput. Math. 38 (2013), no. 4, 891–921. MR 3044643, DOI 10.1007/s10444-011-9264-6
- K. F. Gauss, Theory of Motion of the Heavenly Bodies Moving About the Sun in Conic Sections: A Translation of Theoria Motus, Hamburg: Friedrich Perthes and I. H. Besser, 1809.
- J. R. Giles, Classes of semi-inner-product spaces, Trans. Amer. Math. Soc. 129 (1967), 436–446. MR 217574, DOI 10.1090/S0002-9947-1967-0217574-1
- Antonio G. García and Alberto Portal, Sampling in reproducing kernel Banach spaces, Mediterr. J. Math. 10 (2013), no. 3, 1401–1417. MR 3080216, DOI 10.1007/s00009-012-0234-0
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The elements of statistical learning, 2nd ed., Springer Series in Statistics, Springer, New York, 2009. Data mining, inference, and prediction. MR 2722294
- Robert C. James, Orthogonality and linear functionals in normed linear spaces, Trans. Amer. Math. Soc. 61 (1947), 265–292. MR 21241, DOI 10.1090/S0002-9947-1947-0021241-4
- L. A. Karlovitz, Construction of nearest points in the $L^{p}$, $p$ even, and $L^{\infty }$ norms. I, J. Approximation Theory 3 (1970), 123–127. MR 265829, DOI 10.1016/0021-9045(70)90019-5
- Tamara G. Kolda and Brett W. Bader, Tensor decompositions and applications, SIAM Rev. 51 (2009), no. 3, 455–500. MR 2535056, DOI 10.1137/07070111X
- Rainer Kress, Linear integral equations, Applied Mathematical Sciences, vol. 82, Springer-Verlag, Berlin, 1989. MR 1007594
- George S. Kimeldorf and Grace Wahba, A correspondence between Bayesian estimation on stochastic processes and smoothing by splines, Ann. Math. Statist. 41 (1970), 495–502. MR 254999, DOI 10.1214/aoms/1177697089
- Qia Li, Charles A. Micchelli, Lixin Shen, and Yuesheng Xu, A proximity algorithm accelerated by Gauss-Seidel iterations for L1/TV denoising models, Inverse Problems 28 (2012), no. 9, 095003, 20. MR 2959279, DOI 10.1088/0266-5611/28/9/095003
- Qia Li, Lixin Shen, Yuesheng Xu, and Na Zhang, Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing, Adv. Comput. Math. 41 (2015), no. 2, 387–422. MR 3337498, DOI 10.1007/s10444-014-9363-2
- G. Lumer, Semi-inner-product spaces, Trans. Amer. Math. Soc. 100 (1961), 29–43. MR 133024, DOI 10.1090/S0002-9947-1961-0133024-2
- Robert E. Megginson, An introduction to Banach space theory, Graduate Texts in Mathematics, vol. 183, Springer-Verlag, New York, 1998. MR 1650235
- Herbert Meschkowski, Hilbertsche Räume mit Kernfunktion, Die Grundlehren der mathematischen Wissenschaften, Bd. 113, Springer-Verlag, Berlin-Göttingen-Heidelberg, 1962 (German). MR 0140912
- Charles A. Micchelli and Massimiliano Pontil, A function representation for learning in Banach spaces, Learning theory, Lecture Notes in Comput. Sci., vol. 3120, Springer, Berlin, 2004, pp. 255–269. MR 2177914, DOI 10.1007/978-3-540-27819-1_{1}8
- Charles A. Micchelli, Lixin Shen, and Yuesheng Xu, Proximity algorithms for image models: denoising, Inverse Problems 27 (2011), no. 4, 045009, 30. MR 2781033, DOI 10.1088/0266-5611/27/4/045009
- Charles A. Micchelli, Lixin Shen, Yuesheng Xu, and Xueying Zeng, Proximity algorithms for the L1/TV image denoising model, Adv. Comput. Math. 38 (2013), no. 2, 401–426. MR 3019155, DOI 10.1007/s10444-011-9243-y
- Charles A. Micchelli, Yuesheng Xu, and Haizhang Zhang, Universal kernels, J. Mach. Learn. Res. 7 (2006), 2651–2667. MR 2274454
- Gregory B. Passty, Ergodic convergence to a zero of the sum of monotone operators in Hilbert space, J. Math. Anal. Appl. 72 (1979), no. 2, 383–390. MR 559375, DOI 10.1016/0022-247X(79)90234-8
- N. Parikh and S. Boyd, Proximal algorithms, Found. Trends Optim. 1 (2014), no. 3, 127–239.
- B. D. Rao, K. Engan, S. F. Cotter, J. Palmer, and K. Kreutz-Delgado, Subset selection in noise based on diversity measure minimization, IEEE Trans. Signal Process. 51 (2003), 760–770.
- F. Riesz, Sur une espèce de géométrie analytique des systèmes de fonctions sommables, C. R. Acad. Sci. Paris 144 (1907), 1409–1411.
- F. Riesz, Sur les opérations fonctionnelles linéaires, C. R. Acad. Sci. Paris 149 (1909), 974–977.
- Bhaskar D. Rao and Kenneth Kreutz-Delgado, An affine scaling methodology for best basis selection, IEEE Trans. Signal Process. 47 (1999), no. 1, 187–200. MR 1729739, DOI 10.1109/78.738251
- Walter Rudin, Real and complex analysis, 3rd ed., McGraw-Hill Book Co., New York, 1987. MR 924157
- Ingo Steinwart and Andreas Christmann, Support vector machines, Information Science and Statistics, Springer, New York, 2008. MR 2450103
- B. K. Sriperumbudur, K. Fukumizu, and G. R. G. Lanckriet, Learning in Hilbert vs.Banach spaces: a measure embedding viewpoint., Advances in Neural Information Processing Systems (Cambridge) (J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett, F. Pereira, and K.Q. Weinberger, eds.), MIT Press, 2011, pp. 1773–1781.
- Bernhard Schölkopf, Ralf Herbrich, and Alex J. Smola, A generalized representer theorem, Computational learning theory (Amsterdam, 2001) Lecture Notes in Comput. Sci., vol. 2111, Springer, Berlin, 2001, pp. 416–426. MR 2042050, DOI 10.1007/3-540-44581-1_{2}7
- B. Schölkopf and A. J. Smola, Learning with Kernels, MIT Press, Cambridge, Massachusetts, 2002.
- Ingo Steinwart, Sparseness of support vector machines, J. Mach. Learn. Res. 4 (2004), no. 6, 1071–1105. MR 2125346, DOI 10.1162/1532443041827925
- Robert Schaback and Holger Wendland, Kernel techniques: from machine learning to meshless methods, Acta Numer. 15 (2006), 543–639. MR 2269747, DOI 10.1017/S0962492906270016
- Guohui Song and Yuesheng Xu, Approximation of high-dimensional kernel matrices by multilevel circulant matrices, J. Complexity 26 (2010), no. 4, 375–405. MR 2669664, DOI 10.1016/j.jco.2010.02.003
- Steve Smale and Ding-Xuan Zhou, Shannon sampling and function reconstruction from point values, Bull. Amer. Math. Soc. (N.S.) 41 (2004), no. 3, 279–305. MR 2058288, DOI 10.1090/S0273-0979-04-01025-0
- Guohui Song and Haizhang Zhang, Reproducing kernel Banach spaces with the $\ell ^1$ norm II: Error analysis for regularized least square regression, Neural Comput. 23 (2011), no. 10, 2713–2729. MR 2866617, DOI 10.1162/NECO_{a}_{0}0178
- Guohui Song, Haizhang Zhang, and Fred J. Hickernell, Reproducing kernel Banach spaces with the $\ell ^1$ norm, Appl. Comput. Harmon. Anal. 34 (2013), no. 1, 96–116. MR 2981335, DOI 10.1016/j.acha.2012.03.009
- T. Villmann, S. Haase, and M. Kästner, Gradient based learning in vector quantization using differentiable kernels, Advances in Self-Organizing Maps (Santiago, Chile) (P.A. Estévez, J.C. Príncipe, and P. Zegers, eds.), Springer-Verlag, 2013, pp. 193–204.
- V. N. Vapnik and A. Lerner, Pattern recognition using generalized portrait method, Auto. Rmote Control 24 (1963), 774–780.
- Grace Wahba, Spline models for observational data, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990. MR 1045442
- Holger Wendland, Scattered data approximation, Cambridge Monographs on Applied and Computational Mathematics, vol. 17, Cambridge University Press, Cambridge, 2005. MR 2131724
- Q. Ye, Analyzing reproducing kernel approximation method via a Green function approach, Ph.D. thesis, Illinois Institute of Technology, Chicago, 2012.
- Qi Ye, Support vector machines in reproducing kernel Hilbert spaces versus Banach spaces, Approximation theory XIV: San Antonio 2013, Springer Proc. Math. Stat., vol. 83, Springer, Cham, 2014, pp. 377–395. MR 3218589, DOI 10.1007/978-3-319-06404-8_{2}3
- Ding-Xuan Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE Trans. Inform. Theory 49 (2003), no. 7, 1743–1752. MR 1985575, DOI 10.1109/TIT.2003.813564
- Barbara Zwicknagl, Power series kernels, Constr. Approx. 29 (2009), no. 1, 61–84. MR 2465291, DOI 10.1007/s00365-008-9012-4
- Haizhang Zhang, Yuesheng Xu, and Jun Zhang, Reproducing kernel Banach spaces for machine learning, J. Mach. Learn. Res. 10 (2009), 2741–2775. MR 2579912, DOI 10.1109/IJCNN.2009.5179093
- Haizhang Zhang and Jun Zhang, Regularized learning in Banach spaces as an optimization problem: representer theorems, J. Global Optim. 54 (2012), no. 2, 235–250. MR 2979626, DOI 10.1007/s10898-010-9575-z