Universal halting times in optimization and machine learning
Authors:
Levent Sagun, Thomas Trogdon and Yann LeCun
Journal:
Quart. Appl. Math. 76 (2018), 289-301
MSC (2010):
Primary 68Q32, 78M50
DOI:
https://doi.org/10.1090/qam/1483
Published electronically:
September 20, 2017
MathSciNet review:
3769897
Full-text PDF
Abstract |
References |
Similar Articles |
Additional Information
Abstract: We present empirical evidence that the halting times for a class of optimization algorithms are universal. The algorithms we consider come from quadratic optimization, spin glasses and machine learning. A universality theorem is given in the case of the quadratic gradient descent flow. More precisely, given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of the halting time of the algorithm follow a distribution that, after centering and scaling, appears invariant under changes in the distribution on the landscape — universality is present.
References
- Robert J. Adler and Jonathan E. Taylor, Random fields and geometry, Springer Monographs in Mathematics, Springer, New York, 2007. MR 2319516
- Antonio Auffinger, Gérard Ben Arous, and Jiří Černý, Random matrices and complexity of spin glasses, Comm. Pure Appl. Math. 66 (2013), no. 2, 165–201. MR 2999295, DOI https://doi.org/10.1002/cpa.21422
- Yuri Bakhtin and Joshua Correll, A neural computation model for decision-making times, J. Math. Psych. 56 (2012), no. 5, 333–340. MR 2983392, DOI https://doi.org/10.1016/j.jmp.2012.05.005
- Alex Bloemendal, Antti Knowles, Horng-Tzer Yau, and Jun Yin, On the principal components of sample covariance matrices, Probab. Theory Related Fields 164 (2016), no. 1-2, 459–552. MR 3449395, DOI https://doi.org/10.1007/s00440-015-0616-x
- Léon Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of COMPSTAT’2010, Physica-Verlag/Springer, Heidelberg, 2010, pp. 177–186. MR 3362066
- Percy A. Deift, Govind Menon, Sheehan Olver, and Thomas Trogdon, Universality in numerical computations with random data, Proc. Natl. Acad. Sci. USA 111 (2014), no. 42, 14973–14978. MR 3276499, DOI https://doi.org/10.1073/pnas.1413446111
- Percy A. Deift, Thomas Trogdon, and Govind Menon, On the condition number of the critically-scaled Laguerre unitary ensemble, Discrete Contin. Dyn. Syst. 36 (2016), no. 8, 4287–4347. MR 3479516, DOI https://doi.org/10.3934/dcds.2016.36.4287
- Percy Deift and Thomas Trogdon, Universality for the Toda algorithm to compute the largest eigenvalue of a random matrix, Commun. Pure Appl. Math. (2017), 1–27. doi:10.1002/cpa.21715.
- Percy Deift and Thomas Trogdon, Universality for eigenvalue algorithms on sample covariance matrices, SIAM J. Num. Anal. (2017), 1–31.
- A. Greenbaum, Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences, Linear Algebra Appl. 113 (1989), 7–63. MR 978581, DOI https://doi.org/10.1016/0024-3795%2889%2990285-1
- A. Greenbaum and Z. Strakoš, Predicting the behavior of finite precision Lanczos and conjugate gradient computations, SIAM J. Matrix Anal. Appl. 13 (1992), no. 1, 121–137. MR 1146656, DOI https://doi.org/10.1137/0613011
- Magnus R. Hestenes and Eduard Stiefel, Methods of conjugate gradients for solving linear systems, J. Research Nat. Bur. Standards 49 (1952), 409–436 (1953). MR 0060307
- Yann LeCun, Corinna Cortes, and Burges Christopher J. C. Mnist database. http://yann.lecun.com/exdb/mnist/, accessed 2017.
- Christian W. Pfrang, Percy Deift, and Govind Menon, How long does it take to compute the eigenvalues of a random symmetric matrix?, Random matrix theory, interacting particle systems, and integrable systems, Math. Sci. Res. Inst. Publ., vol. 65, Cambridge Univ. Press, New York, 2014, pp. 411–442. MR 3380694
- Natesh S. Pillai and Jun Yin, Universality of covariance matrices, Ann. Appl. Probab. 24 (2014), no. 3, 935–1001. MR 3199978, DOI https://doi.org/10.1214/13-AAP939
- Levent Sagun, V Uğur Güney, Gérard Ben Arous, and Yann LeCun, Explorations on high dimensional landscapes, arXiv preprint arXiv:1412.6615, 2014.
- Craig A. Tracy and Harold Widom, Level-spacing distributions and the Airy kernel, Comm. Math. Phys. 159 (1994), no. 1, 151–174. MR 1257246
References
- Robert J. Adler and Jonathan E. Taylor, Random fields and geometry, Springer Monographs in Mathematics, Springer, New York, 2007. MR 2319516
- Antonio Auffinger, Gérard Ben Arous, and JiříČerný, Random matrices and complexity of spin glasses, Comm. Pure Appl. Math. 66 (2013), no. 2, 165–201. MR 2999295, DOI https://doi.org/10.1002/cpa.21422
- Yuri Bakhtin and Joshua Correll, A neural computation model for decision-making times, J. Math. Psych. 56 (2012), no. 5, 333–340. MR 2983392, DOI https://doi.org/10.1016/j.jmp.2012.05.005
- Alex Bloemendal, Antti Knowles, Horng-Tzer Yau, and Jun Yin, On the principal components of sample covariance matrices, Probab. Theory Related Fields 164 (2016), no. 1-2, 459–552. MR 3449395, DOI https://doi.org/10.1007/s00440-015-0616-x
- Léon Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of COMPSTAT’2010, Physica-Verlag/Springer, Heidelberg, 2010, pp. 177–186. MR 3362066
- Percy A. Deift, Govind Menon, Sheehan Olver, and Thomas Trogdon, Universality in numerical computations with random data, Proc. Natl. Acad. Sci. USA 111 (2014), no. 42, 14973–14978. MR 3276499, DOI https://doi.org/10.1073/pnas.1413446111
- Percy A. Deift, Thomas Trogdon, and Govind Menon, On the condition number of the critically-scaled Laguerre unitary ensemble, Discrete Contin. Dyn. Syst. 36 (2016), no. 8, 4287–4347. MR 3479516, DOI https://doi.org/10.3934/dcds.2016.36.4287
- Percy Deift and Thomas Trogdon, Universality for the Toda algorithm to compute the largest eigenvalue of a random matrix, Commun. Pure Appl. Math. (2017), 1–27. doi:10.1002/cpa.21715.
- Percy Deift and Thomas Trogdon, Universality for eigenvalue algorithms on sample covariance matrices, SIAM J. Num. Anal. (2017), 1–31.
- A. Greenbaum, Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences, Linear Algebra Appl. 113 (1989), 7–63. MR 978581, DOI https://doi.org/10.1016/0024-3795%2889%2990285-1
- A. Greenbaum and Z. Strakoš, Predicting the behavior of finite precision Lanczos and conjugate gradient computations, SIAM J. Matrix Anal. Appl. 13 (1992), no. 1, 121–137. MR 1146656, DOI https://doi.org/10.1137/0613011
- Magnus R. Hestenes and Eduard Stiefel, Methods of conjugate gradients for solving linear systems, J. Research Nat. Bur. Standards 49 (1952), 409–436 (1953). MR 0060307
- Yann LeCun, Corinna Cortes, and Burges Christopher J. C. Mnist database. http://yann.lecun.com/exdb/mnist/, accessed 2017.
- Christian W. Pfrang, Percy Deift, and Govind Menon, How long does it take to compute the eigenvalues of a random symmetric matrix?, Random matrix theory, interacting particle systems, and integrable systems, Math. Sci. Res. Inst. Publ., vol. 65, Cambridge Univ. Press, New York, 2014, pp. 411–442. MR 3380694
- Natesh S. Pillai and Jun Yin, Universality of covariance matrices, Ann. Appl. Probab. 24 (2014), no. 3, 935–1001. MR 3199978, DOI https://doi.org/10.1214/13-AAP939
- Levent Sagun, V Uğur Güney, Gérard Ben Arous, and Yann LeCun, Explorations on high dimensional landscapes, arXiv preprint arXiv:1412.6615, 2014.
- Craig A. Tracy and Harold Widom, Level-spacing distributions and the Airy kernel, Comm. Math. Phys. 159 (1994), no. 1, 151–174. MR 1257246
Similar Articles
Retrieve articles in Quarterly of Applied Mathematics
with MSC (2010):
68Q32,
78M50
Retrieve articles in all journals
with MSC (2010):
68Q32,
78M50
Additional Information
Levent Sagun
Affiliation:
Department of Mathematics, New York University, New York, New York 10012
Email:
sagun@cims.nyu.edu
Thomas Trogdon
Affiliation:
Department of Mathematics, University of California, , Irvine, California 92697
MR Author ID:
965414
ORCID:
0000-0002-6955-4154
Email:
ttrogdon@math.uci.edu
Yann LeCun
Affiliation:
Department of Computer Science, New York University, New York,, New York 10012
MR Author ID:
759382
Email:
yann@cs.nyu.edu
Received by editor(s):
June 16, 2017
Published electronically:
September 20, 2017
Article copyright:
© Copyright 2017
Brown University