Stochastic alternating structure-adapted proximal gradient descent method with variance reduction for nonconvex nonsmooth optimization
HTML articles powered by AMS MathViewer
- by Zehui Jia, Wenxing Zhang, Xingju Cai and Deren Han;
- Math. Comp. 93 (2024), 1677-1714
- DOI: https://doi.org/10.1090/mcom/3867
- Published electronically: March 8, 2024
- HTML | PDF | Request permission
Abstract:
The blocky optimization has gained a significant amount of attention in far-reaching practical applications. Following the recent work (M. Nikolova and P. Tan [SIAM J. Optim. 29 (2019), pp. 2053–2078]) on solving a class of nonconvex nonsmooth optimization, we develop a stochastic alternating structure-adapted proximal (s-ASAP) gradient descent method for solving blocky optimization problems. By deploying some state-of-the-art variance reduced gradient estimators (rather than full gradient) in stochastic optimization, the s-ASAP method is applicable to nonconvex optimization whose objective is the sum of a nonsmooth data-fitting term and a finite number of differentiable functions. The sublinear convergence rate of s-ASAP is built upon the proximal point algorithmic framework, whilst the linear convergence rate of s-ASAP is achieved under the error bound condition. Furthermore, the convergence of the sequence produced by s-ASAP is established under the Kurdyka-Łojasiewicz property. Preliminary numerical simulations on some image processing applications demonstrate the compelling performance of the proposed method.References
- P.-A. Absil, R. Mahony, and B. Andrews, Convergence of the iterates of descent methods for analytic cost functions, SIAM J. Optim. 16 (2005), no. 2, 531–547. MR 2197994, DOI 10.1137/040605266
- Hedy Attouch and Jérôme Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Math. Program. 116 (2009), no. 1-2, Ser. B, 5–16. MR 2421270, DOI 10.1007/s10107-007-0133-5
- Hédy Attouch, Jérôme Bolte, Patrick Redont, and Antoine Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality, Math. Oper. Res. 35 (2010), no. 2, 438–457. MR 2674728, DOI 10.1287/moor.1100.0449
- J. F. Aujol, G. Gilboa, T. Chan, and S. Osher, Structure-texture image decomposition—modeling, algorithms, and parameter selection, Int. J. Comput. Vis. 67 (2006), 111–136.
- A. Auslender, Asymptotic properties of the Fenchel dual functional and applications to decomposition problems, J. Optim. Theory Appl. 73 (1992), no. 3, 427–449. MR 1164802, DOI 10.1007/BF00940050
- Alfred Auslender, Marc Teboulle, and Sami Ben-Tiba, Coupling the logarithmic-quadratic proximal method and the block nonlinear Gauss-Seidel algorithm for linearly constrained convex minimization, Ill-posed variational problems and regularization techniques (Trier, 1998) Lecture Notes in Econom. and Math. Systems, vol. 477, Springer, Berlin, 1999, pp. 35–47. MR 1737311, DOI 10.1007/978-3-642-45780-7_{3}
- Amir Beck, Aharon Ben-Tal, and Christian Kanzow, A fast method for finding the global solution of the regularized structured total least squares problem for image deblurring, SIAM J. Matrix Anal. Appl. 30 (2008), no. 1, 419–443. MR 2399588, DOI 10.1137/070709013
- Amir Beck, Shoham Sabach, and Marc Teboulle, An alternating semiproximal method for nonconvex regularized structured total least squares problems, SIAM J. Matrix Anal. Appl. 37 (2016), no. 3, 1129–1150. MR 3539896, DOI 10.1137/15M1017557
- Dimitri P. Bertsekas, Nonlinear programming, 2nd ed., Athena Scientific Optimization and Computation Series, Athena Scientific, Belmont, MA, 1999. MR 3444832
- Edward Bierstone and Pierre D. Milman, Semianalytic and subanalytic sets, Inst. Hautes Études Sci. Publ. Math. 67 (1988), 5–42. MR 972342, DOI 10.1007/BF02699126
- Jérôme Bolte, Aris Daniilidis, and Adrian Lewis, The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems, SIAM J. Optim. 17 (2006), no. 4, 1205–1223. MR 2274510, DOI 10.1137/050644641
- Jérôme Bolte, Shoham Sabach, and Marc Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program. 146 (2014), no. 1-2, Ser. A, 459–494. MR 3232623, DOI 10.1007/s10107-013-0701-9
- S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn. 3 (2011), no. 1, 1–122.
- Patrick Breheny and Jian Huang, Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection, Ann. Appl. Stat. 5 (2011), no. 1, 232–253. MR 2810396, DOI 10.1214/10-AOAS388
- R. Cabral, F. De la Torre, J. Paulo Costeira, and A. Bernardino, Matrix completion for weakly-supervised multi-label image classification, IEEE Trans. Pattern Anal. Mach. Intell. 37 (2014), no. 1, 121–135.
- Antonin Chambolle and Thomas Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Imaging Vision 40 (2011), no. 1, 120–145. MR 2782122, DOI 10.1007/s10851-010-0251-1
- Julianne Chung and James G. Nagy, An efficient iterative approach for large-scale separable nonlinear inverse problems, SIAM J. Sci. Comput. 31 (2009/10), no. 6, 4654–4674. MR 2594997, DOI 10.1137/080732213
- D. Davis, The asynchronous PALM algorithm for nonsmooth nonconvex problems, arXiv:1604.00526, 2016.
- A. Defazio, F. Bach, and S. Lacoste-Julien, SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives, NIPS, 2014, pp. 1646–1654.
- D. A. D’Esopo, A convex programming procedure, Naval Res. Logist. Quart. 6 (1959), 33–42. MR 104520, DOI 10.1002/nav.3800060105
- Derek Driggs, Junqi Tang, Jingwei Liang, Mike Davies, and Carola-Bibiane Schönlieb, A stochastic proximal alternating minimization for nonsmooth and nonconvex optimization, SIAM J. Imaging Sci. 14 (2021), no. 4, 1932–1970. MR 4354998, DOI 10.1137/20M1387213
- Francisco Facchinei and Jong-Shi Pang, Finite-dimensional variational inequalities and complementarity problems. Vol. I, Springer Series in Operations Research, Springer-Verlag, New York, 2003. MR 1955648
- Jérôme Fehrenbach, Pierre Weiss, and Corinne Lorenzo, Variational algorithms to remove stationary noise: applications to microscopy imaging, IEEE Trans. Image Process. 21 (2012), no. 10, 4420–4430. MR 2975542, DOI 10.1109/TIP.2012.2206037
- Roland Glowinski, On alternating direction methods of multipliers: a historical perspective, Modeling, simulation and optimization for science and technology, Comput. Methods Appl. Sci., vol. 34, Springer, Dordrecht, 2014, pp. 59–82. MR 3330832, DOI 10.1007/978-94-017-9054-3_{4}
- R. Glowinski and A. Marrocco, Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité, d’une classe de problèmes de Dirichlet non linéaires, Rev. Française Automat. Informat. Recherche Opérationnelle Sér. Rouge Anal. Numér. 9 (1975), no. R-2, 41–76 (French, with English summary). MR 388811, DOI 10.1051/m2an/197509R200411
- M. Guillaumin, J. Verbeek, and C. Schmid, Multimodal semi-supervised learning for image classification, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 902–909.
- Per Christian Hansen, James G. Nagy, and Dianne P. O’Leary, Deblurring images, Fundamentals of Algorithms, vol. 3, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2006. Matrices, spectra, and filtering. MR 2271138, DOI 10.1137/1.9780898718874
- Clifford Hildreth, A quadratic programming procedure, Naval Res. Logist. Quart. 4 (1957), 79–85. MR 89100, DOI 10.1002/nav.3800040113
- Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal, Convex analysis and minimization algorithms. I, Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 305, Springer-Verlag, Berlin, 1993. Fundamentals. MR 1261420
- R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, NIPS, vol. 26, 2013, pp. 315–323.
- R. Kimmel, M. Elad, D. Shaked, R. Keshet, and I. Sobel, A variational framework for retinex, Int. J. Comput. Vis. 52 (2003), no. 1, 7–23.
- Krzysztof Kurdyka, On gradients of functions definable in o-minimal structures, Ann. Inst. Fourier (Grenoble) 48 (1998), no. 3, 769–783 (English, with English and French summaries). MR 1644089, DOI 10.5802/aif.1638
- Valentin Leplat, Nicolas Gillis, and Andersen M. S. Ang, Blind audio source separation with minimum-volume beta-divergence NMF, IEEE Trans. Signal Process. 68 (2020), 3400–3410. MR 4114764, DOI 10.1109/TSP.2020.2991801
- S. Łojasiewicz, Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles (Paris, 1962) Colloq. Internat. CNRS, No. 117, CNRS, Paris, 1963, pp. 87–89 (French). MR 160856
- Z. Q. Luo and P. Tseng, On the convergence of the coordinate descent method for convex differentiable minimization, J. Optim. Theory Appl. 72 (1992), no. 1, 7–35. MR 1141764, DOI 10.1007/BF00939948
- Nicola Mastronardi, Philippe Lemmerling, and Sabine Van Huffel, Fast structured total least squares algorithm for solving the basic deconvolution problem, SIAM J. Matrix Anal. Appl. 22 (2000), no. 2, 533–553. MR 1780199, DOI 10.1137/S0895479898345813
- Rahul Mazumder, Jerome H. Friedman, and Trevor Hastie, SparseNet: coordinate descent with nonconvex penalties, J. Amer. Statist. Assoc. 106 (2011), no. 495, 1125–1138. MR 2894769, DOI 10.1198/jasa.2011.tm09738
- Yurii Nesterov, Introductory lectures on convex optimization, Applied Optimization, vol. 87, Kluwer Academic Publishers, Boston, MA, 2004. A basic course. MR 2142598, DOI 10.1007/978-1-4419-8853-9
- L. M. Nguyen, J. Liu, K. Scheinberg, and M. Takáč, SARAH: a novel method for machine learning problems using stochastic recursive gradient, International Conference on Machine Learning, 2017, pp. 2613–2621.
- M. Nikolova, Optimization, Applications to Image Processing, Citeseer, Philadelphia, Pennsylvania, 2014.
- Mila Nikolova and Pauline Tan, Alternating structure-adapted proximal gradient descent for nonconvex nonsmooth block-regularized problems, SIAM J. Optim. 29 (2019), no. 3, 2053–2078. MR 3986564, DOI 10.1137/17M1142624
- Armin Pruessner and Dianne P. O’Leary, Blind deconvolution using a regularized structured total least norm algorithm, SIAM J. Matrix Anal. Appl. 24 (2003), no. 4, 1018–1037. MR 2003319, DOI 10.1137/S0895479801395446
- Herbert Robbins and Sutton Monro, A stochastic approximation method, Ann. Math. Statistics 22 (1951), 400–407. MR 42668, DOI 10.1214/aoms/1177729586
- H. Robbins and D. Siegmund, A convergence theorem for non negative almost supermartingales and some applications, Optimizing methods in statistics (Proc. Sympos., Ohio State Univ., Columbus, Ohio, 1971) Academic Press, New York-London, 1971, pp. 233–257. MR 343355
- R. Tyrrell Rockafellar and R. J. Wets, Variational Analysis, Springer, 2009.
- Hayden Schaeffer and Stanley Osher, A low patch-rank interpretation of texture, SIAM J. Imaging Sci. 6 (2013), no. 1, 226–262. MR 3032953, DOI 10.1137/110854989
- Otmar Scherzer, Markus Grasmair, Harald Grossauer, Markus Haltmeier, and Frank Lenzen, Variational methods in imaging, Applied Mathematical Sciences, vol. 167, Springer, New York, 2009. MR 2455620
- Min Tao and Xiaoming Yuan, Recovering low-rank and sparse components of matrices from incomplete and noisy observations, SIAM J. Optim. 21 (2011), no. 1, 57–81. MR 2765489, DOI 10.1137/100781894
- P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl. 109 (2001), no. 3, 475–494. MR 1835069, DOI 10.1023/A:1017501703105
- Paul Tseng and Sangwoon Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program. 117 (2009), no. 1-2, Ser. B, 387–423. MR 2421312, DOI 10.1007/s10107-007-0170-0
- Stephen A. Vavasis, On the complexity of nonnegative matrix factorization, SIAM J. Optim. 20 (2009), no. 3, 1364–1377. MR 2587731, DOI 10.1137/070709967
- Stephen J. Wright, Coordinate descent algorithms, Math. Program. 151 (2015), no. 1, Ser. B, 3–34. MR 3347548, DOI 10.1007/s10107-015-0892-3
Bibliographic Information
- Zehui Jia
- Affiliation: School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, People’s Republic of China
- MR Author ID: 1037600
- Email: jiazehui90@126.com
- Wenxing Zhang
- Affiliation: School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 611731, People’s Republic of China
- ORCID: 0000-0003-2250-4309
- Email: zhangwx@uestc.edu.cn
- Xingju Cai
- Affiliation: School of Mathematical Sciences, Nanjing Normal University, Nanjing, People’s Republic of China
- MR Author ID: 729080
- ORCID: 0000-0002-6957-477X
- Email: caixingju@njnu.edu.cn
- Deren Han
- Affiliation: School of Mathematical Sciences, Beihang University, Beijing, People’s Republic of China
- MR Author ID: 664477
- ORCID: 0000-0003-2250-4309
- Email: handr@buaa.edu.cn
- Received by editor(s): March 26, 2022
- Received by editor(s) in revised form: February 9, 2023
- Published electronically: March 8, 2024
- Additional Notes: The first author was supported by the NSFC grant 11801279. The second author was supported by the NSFC grant 11971003. The third author was supported by the NSFC grant 11871279. The fourth author was supported by the NSFC grants 12131004, 12126603.
The fourth author is the corresponding author - © Copyright 2024 American Mathematical Society
- Journal: Math. Comp. 93 (2024), 1677-1714
- MSC (2020): Primary 90C25, 65K10, 94A08, 68W10
- DOI: https://doi.org/10.1090/mcom/3867
- MathSciNet review: 4730246