LSOS: Line-search second-order stochastic optimization methods for nonconvex finite sums
HTML articles powered by AMS MathViewer
- by Daniela di Serafino, Nataša Krejić, Nataša Krklec Jerinkić and Marco Viola;
- Math. Comp. 92 (2023), 1273-1299
- DOI: https://doi.org/10.1090/mcom/3802
- Published electronically: December 2, 2022
- HTML | PDF | Request permission
Abstract:
We develop a line-search second-order algorithmic framework for minimizing finite sums. We do not make any convexity assumptions, but require the terms of the sum to be continuously differentiable and have Lipschitz-continuous gradients. The methods fitting into this framework combine line searches and suitably decaying step lengths. A key issue is a two-step sampling at each iteration, which allows us to control the error present in the line-search procedure. Stationarity of limit points is proved in the almost-sure sense, while almost-sure convergence of the sequence of approximations to the solution holds with the additional hypothesis that the functions are strongly convex. Numerical experiments, including comparisons with state-of-the art stochastic optimization methods, show the efficiency of our approach.References
- Stefania Bellavia, Gianmarco Gurioli, and Benedetta Morini, Adaptive cubic regularization methods with dynamic inexact Hessian information and applications to finite-sum minimization, IMA J. Numer. Anal. 41 (2021), no. 1, 764–799. MR 4205071, DOI 10.1093/imanum/drz076
- Stefania Bellavia, Nataša Krejić, and Nataša Krklec Jerinkić, Subsampled inexact Newton methods for minimizing large sums of convex functions, IMA J. Numer. Anal. 40 (2020), no. 4, 2309–2341. MR 4167047, DOI 10.1093/imanum/drz027
- A. S. Berahas, L. Cao, and K. Scheinberg, Global convergence rate analysis of a generic line search algorithm with noise, SIAM J. Optim. 31 (2021), no. 2, 1489–1518. MR 4269501, DOI 10.1137/19M1291832
- Albert S. Berahas and Martin Takáč, A robust multi-batch L-BFGS method for machine learning, Optim. Methods Softw. 35 (2020), no. 1, 191–219. MR 4032946, DOI 10.1080/10556788.2019.1658107
- Dimitri P. Bertsekas, Nonlinear programming, 2nd ed., Athena Scientific Optimization and Computation Series, Athena Scientific, Belmont, MA, 1999. MR 3444832
- Raghu Bollapragada, Richard H. Byrd, and Jorge Nocedal, Exact and inexact subsampled Newton methods for optimization, IMA J. Numer. Anal. 39 (2019), no. 2, 545–578. MR 3941877, DOI 10.1093/imanum/dry009
- Léon Bottou, Frank E. Curtis, and Jorge Nocedal, Optimization methods for large-scale machine learning, SIAM Rev. 60 (2018), no. 2, 223–311. MR 3797719, DOI 10.1137/16M1080173
- R. H. Byrd, S. L. Hansen, Jorge Nocedal, and Y. Singer, A stochastic quasi-Newton method for large-scale optimization, SIAM J. Optim. 26 (2016), no. 2, 1008–1031. MR 3485979, DOI 10.1137/140954362
- Richard H. Byrd, Gillian M. Chin, Will Neveitt, and Jorge Nocedal, On the use of stochastic Hessian information in optimization methods for machine learning, SIAM J. Optim. 21 (2011), no. 3, 977–995. MR 2837560, DOI 10.1137/10079923X
- Richard H. Byrd, Gillian M. Chin, Jorge Nocedal, and Yuchen Wu, Sample size selection in optimization methods for machine learning, Math. Program. 134 (2012), no. 1, Ser. B, 127–155. MR 2947555, DOI 10.1007/s10107-012-0572-5
- Huiming Chen, Ho-Chun Wu, Shing-Chow Chan, and Wong-Hing Lam, A stochastic quasi-Newton method for large-scale nonconvex optimization with applications, IEEE Trans. Neural Netw. Learn. Syst. 31 (2020), no. 11, 4776–4790. MR 4169990, DOI 10.1109/tnnls.2019.2957843
- Frank E. Curtis and Rui Shi, A fully stochastic second-order trust region method, Optim. Methods Softw. 37 (2022), no. 3, 844–877. MR 4492000, DOI 10.1080/10556788.2020.1852403
- A. Defazio, F. Bach, and S. Lacoste-Julien, SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems 27, Curran Associates, Inc., 2014, pp. 1646–1654.
- D. di Serafino, G. Toraldo, and M. Viola, A gradient-based globalization strategy for the Newton method, Numerical Computations: Theory and Algorithms (Cham), Lecture Notes in Computer Science, vol. 11973, Springer International Publishing, 2020, pp. 177–185.
- Daniela di Serafino, Gerardo Toraldo, and Marco Viola, Using gradient directions to get global convergence of Newton-type methods, Appl. Math. Comput. 409 (2021), Paper No. 125612, 14. MR 4283770, DOI 10.1016/j.amc.2020.125612
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep learning, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, 2016. MR 3617773
- R. M. Gower, D. Goldfarb, and P. Richtárik, Stochastic block BFGS: squeezing more curvature out of data, Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, 2016, pp. 1869–1878, JMLR.org.
- Robert M. Gower, Peter Richtárik, and Francis Bach, Stochastic quasi-gradient methods: variance reduction via Jacobian sketching, Math. Program. 188 (2021), no. 1, Ser. A, 135–192. MR 4276584, DOI 10.1007/s10107-020-01506-0
- S. Horváth and P. Richtarik, Nonconvex variance reduced optimization with arbitrary sampling, Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, PMLR, June 9–15, 2019, pp. 2781–2789.
- Alfredo N. Iusem, Alejandro Jofré, Roberto I. Oliveira, and Philip Thompson, Variance-based extragradient methods with line search for stochastic variational inequalities, SIAM J. Optim. 29 (2019), no. 1, 175–206. MR 3900801, DOI 10.1137/17M1144799
- M. Jahani, M. Nazari, R. Tappenden, A. Berahas, and M. Takáč, SONIA: a symmetric blockwise truncated optimization algorithm, Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 130, PMLR, April 13–15, 2021, pp. 487–495.
- R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems 26, Curran Associates, Inc., 2013, pp. 315–323.
- Achim Klenke, Probability theory, Translation from the German edition, Universitext, Springer, London, 2014. A comprehensive course. MR 3112259, DOI 10.1007/978-1-4471-5361-0
- Nataša Krejić, Zorana Lužanin, Zoran Ovcin, and Irena Stojkovska, Descent direction method with line search for unconstrained optimization in noisy environment, Optim. Methods Softw. 30 (2015), no. 6, 1164–1184. MR 3401091, DOI 10.1080/10556788.2015.1025403
- Aryan Mokhtari, Mark Eisen, and Alejandro Ribeiro, IQN: an incremental quasi-Newton method with local superlinear convergence rate, SIAM J. Optim. 28 (2018), no. 2, 1670–1698. MR 3805550, DOI 10.1137/17M1122943
- Aryan Mokhtari and Alejandro Ribeiro, RES: regularized stochastic BFGS algorithm, IEEE Trans. Signal Process. 62 (2014), no. 23, 6089–6104. MR 3281504, DOI 10.1109/TSP.2014.2357775
- Aryan Mokhtari and Alejandro Ribeiro, Global convergence of online limited memory BFGS, J. Mach. Learn. Res. 16 (2015), 3151–3181. MR 3450536
- P. Moritz, R. Nishihara, and M. Jordan, A linearly-convergent stochastic L-BFGS algorithm, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (Cadiz, Spain), Proceedings of Machine Learning Research, vol. 51, PMLR, May 9–11, 2016, pp. 249–258.
- Courtney Paquette and Katya Scheinberg, A stochastic line search method with expected complexity analysis, SIAM J. Optim. 30 (2020), no. 1, 349–376. MR 4060460, DOI 10.1137/18M1216250
- Seonho Park, Seung Hyun Jung, and Panos M. Pardalos, Combining stochastic adaptive cubic regularization with negative curvature for nonconvex optimization, J. Optim. Theory Appl. 184 (2020), no. 3, 953–971. MR 4061676, DOI 10.1007/s10957-019-01624-6
- Herbert Robbins and Sutton Monro, A stochastic approximation method, Ann. Math. Statistics 22 (1951), 400–407. MR 42668, DOI 10.1214/aoms/1177729586
- H. Robbins and D. Siegmund, A convergence theorem for non negative almost supermartingales and some applications, Optimizing methods in statistics (Proc. Sympos., Ohio State Univ., Columbus, Ohio, 1971) Academic Press, New York-London, 1971, pp. 233–257. MR 343355
- David Ruppert, A Newton-Raphson version of the multivariate Robbins-Monro procedure, Ann. Statist. 13 (1985), no. 1, 236–245. MR 773164, DOI 10.1214/aos/1176346589
- J. C. Spall, A second order stochastic approximation algorithm using only function measurements, Proceedings of 1994 33rd IEEE Conference on Decision and Control, vol. 3, 1994, pp. 2472–2477.
- J. C. Spall, Stochastic version of second-order (Newton-Raphson) optimization using only function measurements, Proceedings of the 27th Conference on Winter Simulation (USA), WSC ’95, IEEE Computer Society, 1995, pp. 347–352.
- J. C. Spall, Accelerated second-order stochastic optimization using only function measurements, Proceedings of the 36th IEEE Conference on Decision and Control, vol. 2, 1997, pp. 1417–1424.
- Xiao Wang, Shiqian Ma, Donald Goldfarb, and Wei Liu, Stochastic quasi-Newton methods for nonconvex stochastic optimization, SIAM J. Optim. 27 (2017), no. 2, 927–956. MR 3651489, DOI 10.1137/15M1053141
- Xiaoyu Wang, Xiao Wang, and Ya-xiang Yuan, Stochastic proximal quasi-Newton methods for non-convex composite optimization, Optim. Methods Softw. 34 (2019), no. 5, 922–948. MR 4002760, DOI 10.1080/10556788.2018.1471141
- P. Xu, F. Roosta, and M. W. Mahoney, Second-order optimization for non-convex machine learning: an empirical study, Proceedings of the 2020 SIAM International Conference on Data Mining (SDM), 2020, pp. 199–207.
Bibliographic Information
- Daniela di Serafino
- Affiliation: Department of Mathematics and Applications “Renato Caccioppoli”, University of Naples Federico II, Via Cintia, Monte S. Angelo, 80126 Napoli, Italy
- MR Author ID: 811010
- Email: daniela.diserafino@unina.it
- Nataša Krejić
- Affiliation: Department of Mathematics and Informatics, Faculty of Science, University of Novi Sad, Trg Dositeja Obradovića 4, 21000 Novi Sad, Serbia
- ORCID: 0000-0003-3348-7233
- Email: natasak@uns.ac.rs
- Nataša Krklec Jerinkić
- Affiliation: Department of Mathematics and Informatics, Faculty of Science, University of Novi Sad, Trg Dositeja Obradovića 4, 21000 Novi Sad, Serbia
- ORCID: 0000-0001-5195-9295
- Email: natasa.krklec@dmi.uns.ac.rs
- Marco Viola
- Affiliation: Department of Mathematics and Physics, University of Campania “Luigi Vanvitelli”, Viale A. Lincoln 5, 81100 Caserta, Italy
- MR Author ID: 1197355
- ORCID: 0000-0002-2140-8094
- Email: marco.viola@unicampania.it
- Received by editor(s): June 10, 2021
- Received by editor(s) in revised form: June 15, 2022, and June 20, 2022
- Published electronically: December 2, 2022
- Additional Notes: This research was supported by the Executive Programme of Scientific and Technological Cooperation between the Italian Republic and the Republic of Serbia for years 2019-21 (Italian grant no. RS19MO05, Serbian grant no. 451-03-9/2021-14/200125). The first and fourth authors were also supported by the Istituto Nazionale di Alta Matematica - Gruppo Nazionale per il Calcolo Scientifico (INdAM-GNCS) and by the 2019 V:ALERE Program of the University of Campania “L. Vanvitelli”. The second author was also supported by Serbian Academy of Science and Arts, grant F10.
- © Copyright 2022 American Mathematical Society
- Journal: Math. Comp. 92 (2023), 1273-1299
- MSC (2020): Primary 65K05, 90C15, 62L20
- DOI: https://doi.org/10.1090/mcom/3802
- MathSciNet review: 4550326