Universal halting times in optimization and machine learning

Sagun, Levent; Trogdon, Thomas; LeCun, Yann

doi:10.1090/qam/1483

Authors: Levent Sagun, Thomas Trogdon and Yann LeCun
Journal: Quart. Appl. Math. 76 (2018), 289-301
MSC (2010): Primary 68Q32, 78M50
DOI: https://doi.org/10.1090/qam/1483
Published electronically: September 20, 2017
MathSciNet review: 3769897
Full-text PDF

Abstract | References | Similar Articles | Additional Information

Abstract: We present empirical evidence that the halting times for a class of optimization algorithms are universal. The algorithms we consider come from quadratic optimization, spin glasses and machine learning. A universality theorem is given in the case of the quadratic gradient descent flow. More precisely, given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of the halting time of the algorithm follow a distribution that, after centering and scaling, appears invariant under changes in the distribution on the landscape — universality is present.

References

Robert J. Adler and Jonathan E. Taylor, Random fields and geometry, Springer Monographs in Mathematics, Springer, New York, 2007. MR 2319516
Antonio Auffinger, Gérard Ben Arous, and Jiří Černý, Random matrices and complexity of spin glasses, Comm. Pure Appl. Math. 66 (2013), no. 2, 165–201. MR 2999295, DOI https://doi.org/10.1002/cpa.21422
Yuri Bakhtin and Joshua Correll, A neural computation model for decision-making times, J. Math. Psych. 56 (2012), no. 5, 333–340. MR 2983392, DOI https://doi.org/10.1016/j.jmp.2012.05.005
Alex Bloemendal, Antti Knowles, Horng-Tzer Yau, and Jun Yin, On the principal components of sample covariance matrices, Probab. Theory Related Fields 164 (2016), no. 1-2, 459–552. MR 3449395, DOI https://doi.org/10.1007/s00440-015-0616-x
Léon Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of COMPSTAT’2010, Physica-Verlag/Springer, Heidelberg, 2010, pp. 177–186. MR 3362066
Percy A. Deift, Govind Menon, Sheehan Olver, and Thomas Trogdon, Universality in numerical computations with random data, Proc. Natl. Acad. Sci. USA 111 (2014), no. 42, 14973–14978. MR 3276499, DOI https://doi.org/10.1073/pnas.1413446111
Percy A. Deift, Thomas Trogdon, and Govind Menon, On the condition number of the critically-scaled Laguerre unitary ensemble, Discrete Contin. Dyn. Syst. 36 (2016), no. 8, 4287–4347. MR 3479516, DOI https://doi.org/10.3934/dcds.2016.36.4287

Percy Deift and Thomas Trogdon, Universality for the Toda algorithm to compute the largest eigenvalue of a random matrix, Commun. Pure Appl. Math. (2017), 1–27. doi:10.1002/cpa.21715.

Percy Deift and Thomas Trogdon, Universality for eigenvalue algorithms on sample covariance matrices, SIAM J. Num. Anal. (2017), 1–31.

A. Greenbaum, Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences, Linear Algebra Appl. 113 (1989), 7–63. MR 978581, DOI https://doi.org/10.1016/0024-3795%2889%2990285-1

A. Greenbaum and Z. Strakoš, Predicting the behavior of finite precision Lanczos and conjugate gradient computations, SIAM J. Matrix Anal. Appl. 13 (1992), no. 1, 121–137. MR 1146656, DOI https://doi.org/10.1137/0613011

Magnus R. Hestenes and Eduard Stiefel, Methods of conjugate gradients for solving linear systems, J. Research Nat. Bur. Standards 49 (1952), 409–436 (1953). MR 0060307

Yann LeCun, Corinna Cortes, and Burges Christopher J. C. Mnist database. http://yann.lecun.com/exdb/mnist/, accessed 2017.

Christian W. Pfrang, Percy Deift, and Govind Menon, How long does it take to compute the eigenvalues of a random symmetric matrix?, Random matrix theory, interacting particle systems, and integrable systems, Math. Sci. Res. Inst. Publ., vol. 65, Cambridge Univ. Press, New York, 2014, pp. 411–442. MR 3380694

Natesh S. Pillai and Jun Yin, Universality of covariance matrices, Ann. Appl. Probab. 24 (2014), no. 3, 935–1001. MR 3199978, DOI https://doi.org/10.1214/13-AAP939

Levent Sagun, V Uğur Güney, Gérard Ben Arous, and Yann LeCun, Explorations on high dimensional landscapes, arXiv preprint arXiv:1412.6615, 2014.

Craig A. Tracy and Harold Widom, Level-spacing distributions and the Airy kernel, Comm. Math. Phys. 159 (1994), no. 1, 151–174. MR 1257246