Abstract
We study the predictive performance of ℓ 1-regularized linear regression in a model-free setting, including the case where the number of covariates is substantially larger than the sample size. We introduce a new analysis method that avoids the boundedness problems that typically arise in model-free empirical minimization. Our technique provides an answer to a conjecture of Greenshtein and Ritov (Bernoulli 10(6):971–988, 2004) regarding the “persistence” rate for linear regression and allows us to prove an oracle inequality for the error of the regularized minimizer. It also demonstrates that empirical risk minimization gives optimal rates (up to log factors) of convex aggregation of a set of estimators of a regression function.
Article PDF
Similar content being viewed by others
References
Bartlett P.L.: Fast rates for estimation error and oracle inequalities for model selection. Econom. Theory 24(02), 545–552 (2008)
Bartlett P.L., Jordan M.I., McAuliffe J.D.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)
Bartlett P.L., Mendelson S.: Empirical minimization. Probab. Theory Relat. Fields 135(3), 311–334 (2006)
Bickel P.J., Ritov Y., Tsybakov A.B.: Simultaneous analysis of LASSO and Dantzig selector. Ann. Stat. 39(4), 1705–1732 (2009)
Bunea F., Tsybakov A., Wegkamp M.: Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1, 169–194 (2007) (electronic)
Bunea, F., Tsybakov, A.B., Wegkamp, M.H.: Aggregation and sparsity via ℓ 1 penalized least squares. In: Proceedings of the 19th Annual Conference on Learning Theory (COLT 2006). Lecture Notes in Artificial Intelligence, vol. 4005, pp. 379–391. Springer, Berlin (2006)
Candès E.J., Plan Y.: Near-ideal model selection by l1 minimization. Ann. Stat. 37(5A), 2145–2177 (2009)
Carl B.: Inequalities of Bernstein–Jackson-type and the degree of compactness of operators in Banach spaces. Ann. Inst. Fourier (Grenoble) 35(3), 79–118 (1985)
Catoni, O.: Statistical learning theory and stochastic optimization. Ecole d’Eté de Probabilités de Saint-Flour 2001. Lecture Notes in Mathematics, vol. 1851. Springer, Berlin (2004)
de la Peña V.H., Giné E.: Decoupling: From Dependence to Independence, Probability and its Applications (New York). Springer, New York (1999)
Donoho D.L., Elad M., Temlyakov V.N.: Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52(1), 6–18 (2006)
Donoho D.L., Johnstone I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3), 425–455 (1994)
Dudley R.M.: The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. In: Giné, E., Koltchinskii, V., Norvaisa, R. (eds) Selected Works of R.M. Dudley, Selected Works in Probability and Statistics, pp. 125–165. Springer, New York (2010)
Giné E., Zinn J.: Some limit theorems for empirical processes. Ann. Probab. 12(4), 929–998 (1984) (with discussion)
Gordon Y., Litvak A.E., Mendelson S., Pajor A.: Gaussian averages of interpolated bodies and applications to approximate reconstruction. J. Approx. Theory 149(1), 59–73 (2007)
Greenshtein E.: Best subset selection, persistence in high-dimensional statistical learning and optimization under ℓ 1 constraint. Ann. Stat. 34(5), 2367–2386 (2006)
Greenshtein E., Ritov Y.: Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10(6), 971–988 (2004)
Guédon O., Mendelson S., Pajor A., Tomczak-Jaegermann N.: Subspaces and orthogonal decompositions generated by bounded orthogonal systems. Positivity 11(2), 269–283 (2007)
Guédon O., Mendelson S., Pajor A., Tomczak-Jaegermann N.: measures and proportional subsets of bounded orthonormal systems. Rev. Mat. Iberoamericana 24(3), 1075–1095 (2008)
Hoeffding W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Koltchinskii V.: Sparsity in penalized empirical risk minimization. Annales de l’Institut Henri Poincaré-Probabilités et Statistiques 45(1), 7–57 (2009)
Lecué, G., Mendelson, S.: General oracle inequalities and applications to high dimensional data analysis (preprint)
Leng C., Lin Y., Wahba G.: A note on the lasso and related procedures in model selection. Stat. Sinica 16(4), 1273–1284 (2006)
Lounici K.: Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2, 90–102 (2008)
Meinshausen N., Bühlmann P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34(3), 1436–1462 (2006)
Meinshausen N., Yu B.: Lasso-type recovery of sparse representations for high-dimensional data. Ann. Stat. 37(1), 246–270 (2009)
Mendelson S.: Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48(7), 1977–1991 (2002)
Mendelson S.: On the performance of kernel classes. J. Mach. Learn. Res. 4(5), 759–771 (2004)
Mendelson S., Neeman J.: Regularization in Kernel learning. Ann. Stat. 30(1), 526–565 (2010)
Milman V.D., Schechtman G.: Asymptotic theory of finite-dimensional normed spaces. Lecture Notes in Mathematics, vol. 1200. Springer, Berlin (1986)
Pajor A., Tomczak-Jaegermann N.: Remarques sur les nombres d’entropie d’un opérateur et de son transposé. C. R. Acad. Sci. Paris Sér. I Math. 301(15), 743–746 (1985)
Paouris G.: Concentration of mass on convex bodies. Geom. Funct. Anal. 16(5), 1021–1049 (2006)
Pisier, G.: Some applications of the metric entropy condition to harmonic analysis. In: Banach Spaces, Harmonic Analysis, and Probability Theory, pp. 123–154 (1983)
Pisier, G.: The volume of convex bodies and Banach space geometry. In: Cambridge Tracts in Mathematics, vol. 94. Cambridge University Press, Cambridge (1989)
Talagrand M.: Regularity of gaussian processes. Acta Mathematica 159, 99–149 (1987). doi:10.1007/BF02392556
Talagrand M.: The generic chaining. Springer Monographs in Mathematics Upper and lower bounds of stochastic processes. Springer, Berlin (2005)
Tibshirani R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B 58(1), 267–288 (1996)
Tsybakov, A.B.: Optimal rates of aggregation. In: Computational Learning Theory 2003. Lecture Notes in Artificial Intelligence. vol. 2777, pp. 303–313. Springer, Berlin (2003)
van de Geer S.A.: High-dimensional generalized linear models and the lasso. Ann. Stat. 36(2), 614–645 (2008)
van der Vaart, A.W., Wellner, J.A.: Weak convergence and empirical processes. Springer Series in Statistics. Springer, New York (with applications to statistics, 1996)
Zhang C. H., Huang J.: The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Stat. 36(4), 1567–1594 (2008)
Zhang T.: Some sharp performance bounds for least squares regression with ℓ 1 regularization. Ann. Stat. 37(5A), 2109–2144 (2009)
Author information
Authors and Affiliations
Corresponding author
Additional information
The research leading to these results was supported by the Centre for Mathematics and its Applications, The Australian National University, and has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement no [203134], from the Israel Science Foundation grant 666/06 and from the Australian Research Council grant DP0986563. We gratefully acknowledge the support of the NSF through grant DMS-0707060.
Rights and permissions
About this article
Cite this article
Bartlett, P.L., Mendelson, S. & Neeman, J. ℓ 1-regularized linear regression: persistence and oracle inequalities. Probab. Theory Relat. Fields 154, 193–224 (2012). https://doi.org/10.1007/s00440-011-0367-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-011-0367-2