ℓ 1-regularized linear regression: persistence and oracle inequalities

Bartlett, Peter L.; Mendelson, Shahar; Neeman, Joseph

doi:10.1007/s00440-011-0367-2

ℓ ₁-regularized linear regression: persistence and oracle inequalities

Published: 08 June 2011

Volume 154, pages 193–224, (2012)
Cite this article

Download PDF

Probability Theory and Related Fields Aims and scope Submit manuscript

ℓ ₁-regularized linear regression: persistence and oracle inequalities

Download PDF

Peter L. Bartlett^1,2,
Shahar Mendelson³ &
Joseph Neeman¹

862 Accesses
24 Citations
Explore all metrics

Abstract

We study the predictive performance of ℓ ₁-regularized linear regression in a model-free setting, including the case where the number of covariates is substantially larger than the sample size. We introduce a new analysis method that avoids the boundedness problems that typically arise in model-free empirical minimization. Our technique provides an answer to a conjecture of Greenshtein and Ritov (Bernoulli 10(6):971–988, 2004) regarding the “persistence” rate for linear regression and allows us to prove an oracle inequality for the error of the regularized minimizer. It also demonstrates that empirical risk minimization gives optimal rates (up to log factors) of convex aggregation of a set of estimators of a regression function.

Article PDF

On the rates of convergence for moments convergence in regression models

Article 30 September 2017

Moment convergence of regularized least-squares estimator for linear regression model

Article 09 August 2016

A note on the strong consistency of M-estimates in linear models

Article 23 June 2015

References

Bartlett P.L.: Fast rates for estimation error and oracle inequalities for model selection. Econom. Theory 24(02), 545–552 (2008)
Article MathSciNet MATH Google Scholar
Bartlett P.L., Jordan M.I., McAuliffe J.D.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)
Article MathSciNet MATH Google Scholar
Bartlett P.L., Mendelson S.: Empirical minimization. Probab. Theory Relat. Fields 135(3), 311–334 (2006)
Article MathSciNet MATH Google Scholar
Bickel P.J., Ritov Y., Tsybakov A.B.: Simultaneous analysis of LASSO and Dantzig selector. Ann. Stat. 39(4), 1705–1732 (2009)
Article MathSciNet Google Scholar
Bunea F., Tsybakov A., Wegkamp M.: Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1, 169–194 (2007) (electronic)
Article MathSciNet MATH Google Scholar
Bunea, F., Tsybakov, A.B., Wegkamp, M.H.: Aggregation and sparsity via ℓ ₁ penalized least squares. In: Proceedings of the 19th Annual Conference on Learning Theory (COLT 2006). Lecture Notes in Artificial Intelligence, vol. 4005, pp. 379–391. Springer, Berlin (2006)
Candès E.J., Plan Y.: Near-ideal model selection by l1 minimization. Ann. Stat. 37(5A), 2145–2177 (2009)
Article MATH Google Scholar
Carl B.: Inequalities of Bernstein–Jackson-type and the degree of compactness of operators in Banach spaces. Ann. Inst. Fourier (Grenoble) 35(3), 79–118 (1985)
Article MathSciNet MATH Google Scholar
Catoni, O.: Statistical learning theory and stochastic optimization. Ecole d’Eté de Probabilités de Saint-Flour 2001. Lecture Notes in Mathematics, vol. 1851. Springer, Berlin (2004)
de la Peña V.H., Giné E.: Decoupling: From Dependence to Independence, Probability and its Applications (New York). Springer, New York (1999)
Google Scholar
Donoho D.L., Elad M., Temlyakov V.N.: Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52(1), 6–18 (2006)
Article MathSciNet Google Scholar
Donoho D.L., Johnstone I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3), 425–455 (1994)
Article MathSciNet MATH Google Scholar
Dudley R.M.: The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. In: Giné, E., Koltchinskii, V., Norvaisa, R. (eds) Selected Works of R.M. Dudley, Selected Works in Probability and Statistics, pp. 125–165. Springer, New York (2010)
Chapter Google Scholar
Giné E., Zinn J.: Some limit theorems for empirical processes. Ann. Probab. 12(4), 929–998 (1984) (with discussion)
Article MathSciNet MATH Google Scholar
Gordon Y., Litvak A.E., Mendelson S., Pajor A.: Gaussian averages of interpolated bodies and applications to approximate reconstruction. J. Approx. Theory 149(1), 59–73 (2007)
Article MathSciNet MATH Google Scholar
Greenshtein E.: Best subset selection, persistence in high-dimensional statistical learning and optimization under ℓ ₁ constraint. Ann. Stat. 34(5), 2367–2386 (2006)
Article MathSciNet MATH Google Scholar
Greenshtein E., Ritov Y.: Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10(6), 971–988 (2004)
Article MathSciNet MATH Google Scholar
Guédon O., Mendelson S., Pajor A., Tomczak-Jaegermann N.: Subspaces and orthogonal decompositions generated by bounded orthogonal systems. Positivity 11(2), 269–283 (2007)
Article MathSciNet MATH Google Scholar
Guédon O., Mendelson S., Pajor A., Tomczak-Jaegermann N.: measures and proportional subsets of bounded orthonormal systems. Rev. Mat. Iberoamericana 24(3), 1075–1095 (2008)
Article MATH Google Scholar
Hoeffding W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Article MathSciNet MATH Google Scholar
Koltchinskii V.: Sparsity in penalized empirical risk minimization. Annales de l’Institut Henri Poincaré-Probabilités et Statistiques 45(1), 7–57 (2009)
Article MathSciNet MATH Google Scholar
Lecué, G., Mendelson, S.: General oracle inequalities and applications to high dimensional data analysis (preprint)
Leng C., Lin Y., Wahba G.: A note on the lasso and related procedures in model selection. Stat. Sinica 16(4), 1273–1284 (2006)
MathSciNet MATH Google Scholar
Lounici K.: Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2, 90–102 (2008)
Article MathSciNet MATH Google Scholar
Meinshausen N., Bühlmann P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34(3), 1436–1462 (2006)
Article MATH Google Scholar
Meinshausen N., Yu B.: Lasso-type recovery of sparse representations for high-dimensional data. Ann. Stat. 37(1), 246–270 (2009)
Article MathSciNet MATH Google Scholar
Mendelson S.: Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48(7), 1977–1991 (2002)
Article MathSciNet MATH Google Scholar
Mendelson S.: On the performance of kernel classes. J. Mach. Learn. Res. 4(5), 759–771 (2004)
MathSciNet MATH Google Scholar
Mendelson S., Neeman J.: Regularization in Kernel learning. Ann. Stat. 30(1), 526–565 (2010)
Article MathSciNet Google Scholar
Milman V.D., Schechtman G.: Asymptotic theory of finite-dimensional normed spaces. Lecture Notes in Mathematics, vol. 1200. Springer, Berlin (1986)
Google Scholar
Pajor A., Tomczak-Jaegermann N.: Remarques sur les nombres d’entropie d’un opérateur et de son transposé. C. R. Acad. Sci. Paris Sér. I Math. 301(15), 743–746 (1985)
MathSciNet MATH Google Scholar
Paouris G.: Concentration of mass on convex bodies. Geom. Funct. Anal. 16(5), 1021–1049 (2006)
Article MathSciNet MATH Google Scholar
Pisier, G.: Some applications of the metric entropy condition to harmonic analysis. In: Banach Spaces, Harmonic Analysis, and Probability Theory, pp. 123–154 (1983)
Pisier, G.: The volume of convex bodies and Banach space geometry. In: Cambridge Tracts in Mathematics, vol. 94. Cambridge University Press, Cambridge (1989)
Talagrand M.: Regularity of gaussian processes. Acta Mathematica 159, 99–149 (1987). doi:10.1007/BF02392556
Article MathSciNet MATH Google Scholar
Talagrand M.: The generic chaining. Springer Monographs in Mathematics Upper and lower bounds of stochastic processes. Springer, Berlin (2005)
Google Scholar
Tibshirani R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Tsybakov, A.B.: Optimal rates of aggregation. In: Computational Learning Theory 2003. Lecture Notes in Artificial Intelligence. vol. 2777, pp. 303–313. Springer, Berlin (2003)
van de Geer S.A.: High-dimensional generalized linear models and the lasso. Ann. Stat. 36(2), 614–645 (2008)
Article MathSciNet MATH Google Scholar
van der Vaart, A.W., Wellner, J.A.: Weak convergence and empirical processes. Springer Series in Statistics. Springer, New York (with applications to statistics, 1996)
Zhang C. H., Huang J.: The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Stat. 36(4), 1567–1594 (2008)
Article MATH Google Scholar
Zhang T.: Some sharp performance bounds for least squares regression with ℓ ₁ regularization. Ann. Stat. 37(5A), 2109–2144 (2009)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of California, Berkeley, CA, 94720, USA
Peter L. Bartlett & Joseph Neeman
Computer Science Division, University of California, Berkeley, CA, 94720, USA
Peter L. Bartlett
Department of Mathematics, Technion, I.I.T, 32000, Haifa, Israel
Shahar Mendelson

Authors

Peter L. Bartlett
View author publications
You can also search for this author in PubMed Google Scholar
Shahar Mendelson
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Neeman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joseph Neeman.

Additional information

The research leading to these results was supported by the Centre for Mathematics and its Applications, The Australian National University, and has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement no [203134], from the Israel Science Foundation grant 666/06 and from the Australian Research Council grant DP0986563. We gratefully acknowledge the support of the NSF through grant DMS-0707060.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bartlett, P.L., Mendelson, S. & Neeman, J. ℓ ₁-regularized linear regression: persistence and oracle inequalities. Probab. Theory Relat. Fields 154, 193–224 (2012). https://doi.org/10.1007/s00440-011-0367-2

Download citation

Received: 21 May 2010
Revised: 29 March 2011
Published: 08 June 2011
Issue Date: October 2012
DOI: https://doi.org/10.1007/s00440-011-0367-2

Mathematics Subject Classification (2000)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

ℓ ₁-regularized linear regression: persistence and oracle inequalities

Abstract

Article PDF

Similar content being viewed by others

On the rates of convergence for moments convergence in regression models

Moment convergence of regularized least-squares estimator for linear regression model

A note on the strong consistency of M-estimates in linear models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification (2000)

Navigation

ℓ 1-regularized linear regression: persistence and oracle inequalities

Abstract

Article PDF

Similar content being viewed by others

On the rates of convergence for moments convergence in regression models

Moment convergence of regularized least-squares estimator for linear regression model

A note on the strong consistency of M-estimates in linear models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification (2000)

Search

Navigation

ℓ ₁-regularized linear regression: persistence and oracle inequalities