A note on the prediction error of principal component regression in high dimensions
Authors:
Laura Hucker and Martin Wahl
Journal:
Theor. Probability and Math. Statist. 109 (2023), 37-53
MSC (2020):
Primary 62H25
DOI:
https://doi.org/10.1090/tpms/1196
Published electronically:
October 3, 2023
MathSciNet review:
4652993
Full-text PDF
Abstract |
References |
Similar Articles |
Additional Information
Abstract: We analyze the prediction error of principal component regression (PCR) and prove high probability bounds for the corresponding squared risk conditional on the design. Our first main result shows that PCR performs comparably to the oracle method obtained by replacing empirical principal components by their population counterparts, provided that an effective rank condition holds. On the other hand, if the latter condition is violated, then empirical eigenvalues start to have a significant upward bias, resulting in a self-induced regularization of PCR. Our approach relies on the behavior of empirical eigenvalues, empirical eigenvectors and the excess risk of principal component analysis in high-dimensional regimes.
References
- Peter L. Bartlett, Philip M. Long, Gábor Lugosi, and Alexander Tsigler, Benign overfitting in linear regression, Proc. Natl. Acad. Sci. USA 117 (2020), no. 48, 30063–30070. MR 4263288, DOI 10.1073/pnas.1907378117
- Peter L. Bartlett, Andrea Montanari, and Alexander Rakhlin, Deep learning: a statistical viewpoint, Acta Numer. 30 (2021), 87–201. MR 4295218, DOI 10.1017/S0962492921000027
- Florent Benaych-Georges and Raj Rao Nadakuditi, The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices, Adv. Math. 227 (2011), no. 1, 494–521. MR 2782201, DOI 10.1016/j.aim.2011.02.007
- Gilles Blanchard and Nicole Mücke, Optimal rates for regularization of statistical inverse learning problems, Found. Comput. Math. 18 (2018), no. 4, 971–1013. MR 3833647, DOI 10.1007/s10208-017-9359-7
- Alex Bloemendal, Antti Knowles, Horng-Tzer Yau, and Jun Yin, On the principal components of sample covariance matrices, Probab. Theory Related Fields 164 (2016), no. 1-2, 459–552. MR 3449395, DOI 10.1007/s00440-015-0616-x
- Stéphane Boucheron, Gábor Lugosi, and Pascal Massart, Concentration inequalities, Oxford University Press, Oxford, 2013. A nonasymptotic theory of independence; With a foreword by Michel Ledoux. MR 3185193, DOI 10.1093/acprof:oso/9780199535255.001.0001
- Élodie Brunel, André Mas, and Angelina Roche, Non-asymptotic adaptive prediction in functional linear models, J. Multivariate Anal. 143 (2016), 208–232. MR 3431429, DOI 10.1016/j.jmva.2015.09.008
- Hervé Cardot and Jan Johannes, Thresholding projection estimators in functional linear models, J. Multivariate Anal. 101 (2010), no. 2, 395–408. MR 2564349, DOI 10.1016/j.jmva.2009.03.001
- Alain Celisse and Martin Wahl, Analyzing the discrepancy principle for kernelized spectral filter learning algorithms, J. Mach. Learn. Res. 22 (2021), Paper No. 76, 59. MR 4253769
- László Györfi, Michael Kohler, Adam Krzyżak, and Harro Walk, A distribution-free theory of nonparametric regression, Springer Series in Statistics, Springer-Verlag, New York, 2002. MR 1920390, DOI 10.1007/b97848
- Peter Hall and Joel L. Horowitz, Methodology and convergence rates for functional linear regression, Ann. Statist. 35 (2007), no. 1, 70–91. MR 2332269, DOI 10.1214/009053606000000957
- Lajos Horváth and Piotr Kokoszka, Inference for functional data with applications, Springer Series in Statistics, Springer, New York, 2012. MR 2920735, DOI 10.1007/978-1-4614-3655-3
- Tailen Hsing and Randall Eubank, Theoretical foundations of functional data analysis, with an introduction to linear operators, Wiley Series in Probability and Statistics, John Wiley & Sons, Ltd., Chichester, 2015. MR 3379106, DOI 10.1002/9781118762547
- Moritz Jirak and Martin Wahl, Relative perturbation bounds with applications to empirical covariance operators, Adv. Math. 412 (2023), Paper No. 108808, 59. MR 4517351, DOI 10.1016/j.aim.2022.108808
- I. T. Jolliffe, Principal component analysis, 2nd ed., Springer Series in Statistics, Springer-Verlag, New York, 2002. MR 2036084
- Vladimir Koltchinskii and Karim Lounici, Concentration inequalities and moment bounds for sample covariance operators, Bernoulli 23 (2017), no. 1, 110–133. MR 3556768, DOI 10.3150/15-BEJ730
- Shuai Lu and Sergei V. Pereverzev, Regularization theory for ill-posed problems, Inverse and Ill-posed Problems Series, vol. 58, De Gruyter, Berlin, 2013. Selected topics. MR 3114700, DOI 10.1515/9783110286496
- Song Mei, Theodor Misiakiewicz, and Andrea Montanari, Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration, Appl. Comput. Harmon. Anal. 59 (2022), 3–84. MR 4412180, DOI 10.1016/j.acha.2021.12.003
- Boaz Nadler, Finite sample approximation results for principal component analysis: a matrix perturbation approach, Ann. Statist. 36 (2008), no. 6, 2791–2817. MR 2485013, DOI 10.1214/08-AOS618
- Debashis Paul, Asymptotics of sample eigenstructure for a large dimensional spiked covariance model, Statist. Sinica 17 (2007), no. 4, 1617–1642. MR 2399865
- Markus Reiss and Martin Wahl, Nonasymptotic upper bounds for the reconstruction error of PCA, Ann. Statist. 48 (2020), no. 2, 1098–1123. MR 4102689, DOI 10.1214/19-AOS1839
- Alexander Tsigler and Peter L. Bartlett, Benign overfitting in ridge regression, J. Mach. Learn. Res. 24 (2023), Paper No. [123], 76. MR 4583284
- Roman Vershynin, Introduction to the non-asymptotic analysis of random matrices, Compressed sensing, Cambridge Univ. Press, Cambridge, 2012, pp. 210–268. MR 2963170
- Roman Vershynin, High-dimensional probability, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47, Cambridge University Press, Cambridge, 2018. An introduction with applications in data science; With a foreword by Sara van de Geer. MR 3837109, DOI 10.1017/9781108231596
- Ernesto De Vito, Lorenzo Rosasco, Andrea Caponnetto, Umberto De Giovannini, and Francesca Odone, Learning from examples as an inverse problem, J. Mach. Learn. Res. 6 (2005), 883–904. MR 2249842
References
- P. L. Bartlett, P. M. Long, G. Lugosi, and A. Tsigler, Benign overfitting in linear regression, Proc. Natl. Acad. Sci. USA 117 (2020), no. 48, 30063–30070. MR 4263288
- P. L. Bartlett, A. Montanari, and A. Rakhlin, Deep learning: a statistical viewpoint, Acta Numer. 30 (2021), 87–201. MR 4295218
- F. Benaych-Georges and R. Nadakuditi, The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices, Adv. Math. 227 (2011), no. 1, 494–521. MR 2782201
- G. Blanchard and N. Mücke, Optimal rates for regularization of statistical inverse learning problems, Found. Comput. Math. 18 (2018), no. 4, 971–1013. MR 3833647
- A. Bloemendal, A. Knowles, H. Yau, and J. Yin, On the principal components of sample covariance matrices, Probab. Theory Related Fields 164 (2016), no. 1–2, 459–552. MR 3449395
- S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities: A nonasymptotic theory of independence, Oxford University Press, Oxford, 2013. MR 3185193
- E. Brunel, A. Mas, and A. Roche, Non-asymptotic adaptive prediction in functional linear models, J. Multivariate Anal. 143 (2016), 208–232. MR 3431429
- H. Cardot and J. Johannes, Thresholding projection estimators in functional linear models, J. Multivariate Anal. 101 (2010), no. 2, 395–408. MR 2564349
- A. Celisse and M. Wahl, Analyzing the discrepancy principle for kernelized spectral filter learning algorithms, J. Mach. Learn. Res. 22 (2021), Paper No. 76. MR 4253769
- L. Györfi, M. Kohler, A. Krzyżak, and H. Walk, A distribution-free theory of nonparametric regression, Springer Series in Statistics, Springer-Verlag, New York, 2002. MR 1920390
- P. Hall and J. L. Horowitz, Methodology and convergence rates for functional linear regression, Ann. Statist. 35 (2007), no. 1, 70–91. MR 2332269
- L. Horváth and P. Kokoszka, Inference for functional data with applications, Springer Series in Statistics, Springer, New York, 2012. MR 2920735
- T. Hsing and R. Eubank, Theoretical foundations of functional data analysis, with an introduction to linear operators, Wiley Series in Probability and Statistics, John Wiley & Sons, Ltd., Chichester, 2015. MR 3379106
- M. Jirak and M. Wahl, Relative perturbation bounds with applications to empirical covariance operators, Adv. Math. 412 (2023), Paper No. 108808. MR 4517351
- I. T. Jolliffe, Principal component analysis, second ed., Springer Series in Statistics, Springer-Verlag, New York, 2002. MR 2036084
- V. Koltchinskii and K. Lounici, Concentration inequalities and moment bounds for sample covariance operators, Bernoulli 23 (2017), no. 1, 110–133. MR 3556768
- S. Lu and S. V. Pereverzev, Regularization theory for ill-posed problems, Inverse and Ill-posed Problems Series, vol. 58, De Gruyter, Berlin, 2013. MR 3114700
- S. Mei, T. Misiakiewicz, and A. Montanari, Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration, Appl. Comput. Harmon. Anal. 59 (2022), 3–84. MR 4412180
- B. Nadler, Finite sample approximation results for principal component analysis: a matrix perturbation approach, Ann. Statist. 36 (2008), no. 6, 2791–2817. MR 2485013
- D. Paul, Asymptotics of sample eigenstructure for a large dimensional spiked covariance model, Statist. Sinica 17 (2007), no. 4, 1617–1642. MR 2399865
- M. Reiss and M. Wahl, Nonasymptotic upper bounds for the reconstruction error of PCA, Ann. Statist. 48 (2020), no. 2, 1098–1123. MR 4102689
- A. Tsigler and P. L. Bartlett, Benign overfitting in ridge regression, Technical Report arXiv:2009.14286, 2020. MR 4583284
- R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, Compressed Sensing: Theory and Applications (Y. C. Eldar and G. Kutyniok, eds.), Cambridge University Press, Cambridge, 2012, pp. 210–268. MR 2963170
- —, High-dimensional probability, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47, Cambridge University Press, Cambridge, 2018. MR 3837109
- E. De Vito, L. Rosasco, A. Caponnetto, U. De Giovannini, and F. Odone, Learning from examples as an inverse problem, J. Mach. Learn. Res. 6 (2005), 883–904. MR 2249842
Similar Articles
Retrieve articles in Theory of Probability and Mathematical Statistics
with MSC (2020):
62H25
Retrieve articles in all journals
with MSC (2020):
62H25
Additional Information
Laura Hucker
Affiliation:
Institut für Mathematik, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
Email:
huckerla@math.hu-berlin.de
Martin Wahl
Affiliation:
Fakultät für Mathematik, Universität Bielefeld, Postfach 100131, 33615 Bielefeld, Germany
MR Author ID:
1054755
Email:
martin.wahl@math.uni-bielefeld.de
Keywords:
Principal component regression,
prediction error,
principal component analysis,
excess risk,
eigenvalue upward bias,
benign overfitting
Received by editor(s):
June 28, 2022
Accepted for publication:
January 10, 2023
Published electronically:
October 3, 2023
Additional Notes:
The first author was supported by Deutsche Forschungsgemeinschaft (DFG) - FOR5381 - 460867398.
The research of the second author has been partially funded by Deutsche Forschungsgemeinschaft (DFG) - SFB1294 - 318763901.
Article copyright:
© Copyright 2023
Taras Shevchenko National University of Kyiv