Local conditions for global convergence of gradient flows and proximal point sequences in metric spaces
HTML articles powered by AMS MathViewer
- by Lorenzo Dello Schiavo, Jan Maas and Francesco Pedrotti;
- Trans. Amer. Math. Soc. 377 (2024), 3779-3804
- DOI: https://doi.org/10.1090/tran/9156
- Published electronically: April 11, 2024
- HTML | PDF | Request permission
Abstract:
This paper deals with local criteria for the convergence to a global minimiser for gradient flow trajectories and their discretisations. To obtain quantitative estimates on the speed of convergence, we consider variations on the classical Kurdyka–Łojasiewicz inequality for a large class of parameter functions. Our assumptions are given in terms of the initial data, without any reference to an equilibrium point. The main results are convergence statements for gradient flow curves and proximal point sequences to a global minimiser, together with sharp quantitative estimates on the speed of convergence. These convergence results apply in the general setting of lower semicontinuous functionals on complete metric spaces, generalising recent results for smooth functionals on $\mathbb {R}^n$. While the non-smooth setting covers very general spaces, it is also useful for (non)-smooth functionals on $\mathbb {R}^n$.References
- Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré, Gradient flows in metric spaces and in the space of probability measures, 2nd ed., Lectures in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, 2008. MR 2401600
- Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré, Calculus and heat flow in metric measure spaces and applications to spaces with Ricci bounds from below, Invent. Math. 195 (2014), no. 2, 289–391. MR 3152751, DOI 10.1007/s00222-013-0456-1
- Hedy Attouch and Jérôme Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Math. Program. 116 (2009), no. 1-2, Ser. B, 5–16. MR 2421270, DOI 10.1007/s10107-007-0133-5
- Hédy Attouch, Jérôme Bolte, Patrick Redont, and Antoine Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality, Math. Oper. Res. 35 (2010), no. 2, 438–457. MR 2674728, DOI 10.1287/moor.1100.0449
- Hedy Attouch, Jérôme Bolte, and Benar Fux Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Math. Program. 137 (2013), no. 1-2, Ser. A, 91–129. MR 3010421, DOI 10.1007/s10107-011-0484-9
- Peter L. Bartlett, Andrea Montanari, and Alexander Rakhlin, Deep learning: a statistical viewpoint, Acta Numer. 30 (2021), 87–201. MR 4295218, DOI 10.1017/S0962492921000027
- Adrien Blanchet and Jérôme Bolte, A family of functional inequalities: Łojasiewicz inequalities and displacement convex functions, J. Funct. Anal. 275 (2018), no. 7, 1650–1673. MR 3832005, DOI 10.1016/j.jfa.2018.06.014
- Jérôme Bolte, Aris Daniilidis, Olivier Ley, and Laurent Mazet, Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity, Trans. Amer. Math. Soc. 362 (2010), no. 6, 3319–3363. MR 2592958, DOI 10.1090/S0002-9947-09-05048-X
- J. Bolte, L. Miclo, and S. Villeneuve, Swarm gradient dynamics for global optimization: the density case, arXiv:2204.01306, 2022.
- S. Bombari, M. H. Amani, and M. Mondelli, Memorization and optimization in deep neural networks with minimum over-parameterization, Adv. Neural Inf. Process. Syst. 35 (2022), 7628–7640
- E. Boursier, L. Pillaud-Vivien, and N. Flammarion, Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs, arXiv:2206.00939, 2022.
- S. Chatterjee, Convergence of gradient descent for deep neural networks, arXiv:2203.16462, 2022.
- Jean Dolbeault, Bruno Nazaret, and Giuseppe Savaré, A new class of transport distances between measures, Calc. Var. Partial Differential Equations 34 (2009), no. 2, 193–231. MR 2448650, DOI 10.1007/s00526-008-0182-5
- D. Drusvyatskiy, A. D. Ioffe, and A. S. Lewis, Curves of descent, SIAM J. Control Optim. 53 (2015), no. 1, 114–138. MR 3299130, DOI 10.1137/130920216
- Daniel Hauer and José M. Mazón, Kurdyka-Łojasiewicz-Simon inequality for gradient flows in metric spaces, Trans. Amer. Math. Soc. 372 (2019), no. 7, 4917–4976. MR 4009443, DOI 10.1090/tran/7801
- A. D. Ioffe, On lower semicontinuity of integral functionals. I, SIAM J. Control Optim. 15 (1977), no. 4, 521–538. MR 637234, DOI 10.1137/0315035
- A. D. Ioffe, Metric regularity and subdifferential calculus, Uspekhi Mat. Nauk 55 (2000), no. 3(333), 103–162 (Russian, with Russian summary); English transl., Russian Math. Surveys 55 (2000), no. 3, 501–558. MR 1777352, DOI 10.1070/rm2000v055n03ABEH000292
- Richard Jordan, David Kinderlehrer, and Felix Otto, The variational formulation of the Fokker-Planck equation, SIAM J. Math. Anal. 29 (1998), no. 1, 1–17. MR 1617171, DOI 10.1137/S0036141096303359
- H. Karimi, J. Nutini, and M. Schmidt, Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition, Machine Learning and Knowledge Discovery in Databases, Springer International Publishing, Cham, 2016, pp. 795–811
- Stanislav Kondratyev and Dmitry Vorotnikov, Spherical Hellinger-Kantorovich gradient flows, SIAM J. Math. Anal. 51 (2019), no. 3, 2053–2084. MR 3952677, DOI 10.1137/18M1213063
- Krzysztof Kurdyka, On gradients of functions definable in o-minimal structures, Ann. Inst. Fourier (Grenoble) 48 (1998), no. 3, 769–783 (English, with English and French summaries). MR 1644089, DOI 10.5802/aif.1638
- Linshan Liu, Mateusz B. Majka, and Łukasz Szpruch, Polyak-Łojasiewicz inequality on the space of measures and convergence of mean-field birth-death processes, Appl. Math. Optim. 87 (2023), no. 3, Paper No. 48, 27. MR 4565015, DOI 10.1007/s00245-022-09962-0
- S. Łojasiewicz, A topological property of real analytic subsets, Equ. Derivees partielles, Paris 1962, Colloques Internat. Centre Nat. Rech. Sci., vol. 117, 1963, pp. 87–89.
- Stanislas Łojasiewicz, Sur la géométrie semi- et sous-analytique, Ann. Inst. Fourier (Grenoble) 43 (1993), no. 5, 1575–1595 (French, with English and French summaries). MR 1275210, DOI 10.5802/aif.1384
- Jan Maas, Gradient flows of the entropy for finite Markov chains, J. Funct. Anal. 261 (2011), no. 8, 2250–2292. MR 2824578, DOI 10.1016/j.jfa.2011.06.009
- B. Martinet, Régularisation d’inéquations variationnelles par approximations successives, Rev. Française Informat. Recherche Opérationnelle 4 (1970), no. Sér, Sér. R-3, 154–158 (French). MR 298899
- Alexander Mielke, A gradient structure for reaction-diffusion systems and for energy-drift-diffusion systems, Nonlinearity 24 (2011), no. 4, 1329–1346. MR 2776123, DOI 10.1088/0951-7715/24/4/016
- Matteo Muratori and Giuseppe Savaré, Gradient flows and evolution variational inequalities in metric spaces. I: Structural properties, J. Funct. Anal. 278 (2020), no. 4, 108347, 67. MR 4044740, DOI 10.1016/j.jfa.2019.108347
- S. Oymak and M. Soltanolkotabi, Overparameterized nonlinear learning: gradient descent takes the shortest path? Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 97, June 9–15, 2019, pp. 4951–4960.
- B. Polyak. Gradient methods for the minimisation of functionals. USSR Computational Mathematics and Mathematical Physics, 3(4):864–878, 1963.
- R. Tyrrell Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optim. 14 (1976), no. 5, 877–898. MR 410483, DOI 10.1137/0314056
- Leon Simon, Asymptotics for a class of nonlinear evolution equations, with applications to geometric problems, Ann. of Math. (2) 118 (1983), no. 3, 525–571. MR 727703, DOI 10.2307/2006981
Bibliographic Information
- Lorenzo Dello Schiavo
- Affiliation: Institute of Science and Technology Austria, Am Campus 1, 3400 Klosterneuburg, Austria
- MR Author ID: 1123044
- ORCID: 0000-0002-9881-6870
- Email: lorenzo.delloschiavo@ist.ac.at
- Jan Maas
- Affiliation: Institute of Science and Technology Austria, Am Campus 1, 3400 Klosterneuburg, Austria
- MR Author ID: 822765
- ORCID: 0000-0002-0845-1338
- Email: jan.maas@ist.ac.at
- Francesco Pedrotti
- Affiliation: Institute of Science and Technology Austria, Am Campus 1, 3400 Klosterneuburg, Austria
- ORCID: 0009-0002-8099-2611
- Email: francesco.pedrotti@ist.ac.at
- Received by editor(s): April 20, 2023
- Published electronically: April 11, 2024
- Additional Notes: The authors gratefully acknowledges support by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 716117). This research was funded in part by the Austrian Science Fund (FWF) project 10.55776/ESP208. This research was funded in part by the Austrian Science Fund (FWF) project 10.55776/F65
- © Copyright 2024 American Mathematical Society
- Journal: Trans. Amer. Math. Soc. 377 (2024), 3779-3804
- MSC (2020): Primary 45J05, 49Q20; Secondary 39B62, 37N40, 49J52, 65K10
- DOI: https://doi.org/10.1090/tran/9156
- MathSciNet review: 4748608