Sampling-based methods for multi-block optimization problems over transport polytopes
HTML articles powered by AMS MathViewer
- by Yukuan Hu, Mengyu Li, Xin Liu and Cheng Meng;
- Math. Comp. 94 (2025), 1281-1322
- DOI: https://doi.org/10.1090/mcom/3989
- Published electronically: August 15, 2024
- HTML | PDF | Request permission
Abstract:
This paper focuses on multi-block optimization problems over transport polytopes, which underlie various applications including strongly correlated quantum physics and machine learning. Conventional block coordinate descent-type methods for the general multi-block problems store and operate on the matrix variables directly, resulting in formidable expenditure for large-scale settings. On the other hand, optimal transport problems, as a special case, have attracted extensive attention and numerical techniques that waive the use of the full matrices have recently emerged. However, it remains nontrivial to apply these techniques to the multi-block, possibly nonconvex problems with theoretical guarantees. In this work, we leverage the benefits of both sides and develop novel sampling-based block coordinate descent-type methods, which are equipped with either entropy regularization or Kullback-Leibler divergence. Each iteration of these methods solves subproblems restricted on the sampled degrees of freedom. Consequently, they involve only sparse matrices, which amounts to considerable complexity reductions. We explicitly characterize the sampling-induced errors and establish convergence and asymptotic properties for the methods equipped with the entropy regularization. Numerical experiments on typical strongly correlated electron systems corroborate their superior scalability over the methods utilizing full matrices. The advantage also enables the first visualization of approximate optimal transport maps between electron positions in three-dimensional contexts.References
- D. Achlioptas, Z. S. Karnin, and E. Liberty, Near-optimal Entrywise Sampling for Data Matrices, Advances in Neural Information Processing Systems, Vol. 26, Curran Associates, Inc., 2013, pp. 1565–1573.
- Dimitris Achlioptas and Frank McSherry, Fast computation of low-rank matrix approximations, J. ACM 54 (2007), no. 2, Art. 9, 19. MR 2295993, DOI 10.1145/1219092.1219097
- Masoud Ahookhosh, Le Thi Khanh Hien, Nicolas Gillis, and Panagiotis Patrinos, Multi-block Bregman proximal alternating linearized minimization and its application to orthogonal nonnegative matrix factorization, Comput. Optim. Appl. 79 (2021), no. 3, 681–715. MR 4274279, DOI 10.1007/s10589-021-00286-3
- Mingyao Ai, Fei Wang, Jun Yu, and Huiming Zhang, Optimal subsampling for large-scale quantile regression, J. Complexity 62 (2021), Paper No. 101512, 25. MR 4174536, DOI 10.1016/j.jco.2020.101512
- Aurélien Alfonsi, Rafaël Coyaud, and Virginie Ehrlacher, Constrained overdamped Langevin dynamics for symmetric multimarginal optimal transportation, Math. Models Methods Appl. Sci. 32 (2022), no. 3, 403–455. MR 4404218, DOI 10.1142/S0218202522500105
- Aurélien Alfonsi, Rafaël Coyaud, Virginie Ehrlacher, and Damiano Lombardi, Approximation of optimal transport problems with marginal moments constraints, Math. Comp. 90 (2021), no. 328, 689–737. MR 4194160, DOI 10.1090/mcom/3568
- J. Altschuler, F. Bach, A. Rudi, and J. Niles-Weed, Massively Scalable Sinkhorn Distances via the Nyström Method, Advances in Neural Information Processing Systems, Vol. 32, Curran Associates, Inc., 2019, pp. 4427–4437.
- M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein Generative Adversarial Networks, Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 70, PMLR, 2017, pp. 214–223.
- Amir Beck, Edouard Pauwels, and Shoham Sabach, The cyclic block conditional gradient method for convex optimization problems, SIAM J. Optim. 25 (2015), no. 4, 2024–2049. MR 3404688, DOI 10.1137/15M1008397
- J.-D. Benamou, Y. Brenier, and K. Guittet, The Monge-Kantorovitch mass transfer and its computational fluid mechanics formulation, Internat. J. Numer. Methods Fluids 40 (2002), no. 1-2, 21–30. ICFD Conference on Numerical Methods for Fluid Dynamics (Oxford, 2001). MR 1928314, DOI 10.1002/fld.264
- Roland Glowinski and Stanley J. Osher (eds.), Splitting methods in communication, imaging, science, and engineering, Scientific Computation, Springer, Cham, 2016. MR 3587821, DOI 10.1007/978-3-319-41589-5
- Dimitri P. Bertsekas, Nonlinear programming, 3rd ed., Athena Scientific Optimization and Computation Series, Athena Scientific, Belmont, MA, 2016. MR 3587371
- J. Bigot and T. Klein, Consistent estimation of a population barycenter in the Wasserstein space, arXiv:1212.2562 (2012).
- Jérôme Bolte, Shoham Sabach, and Marc Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program. 146 (2014), no. 1-2, Ser. A, 459–494. MR 3232623, DOI 10.1007/s10107-013-0701-9
- A. Borzì and U. Hohenester, Multigrid optimization schemes for solving Bose-Einstein condensate control problems, SIAM J. Sci. Comput. 30 (2007/08), no. 1, 441–462. MR 2377449, DOI 10.1137/070686135
- G. Braun, A. Carderera, C. W. Combettes, H. Hassani, A. Karbasi, A. Mokhtari, and S. Pokutta, Conditional gradient methods, arXiv:2211.14103 (2022).
- V. Braverman, R. Krauthgamer, A. R. Krishnan, and S. Sapir, Near-optimal Entrywise Sampling of Numerically Sparse Matrices, Proceedings of the 34th Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 134, PMLR, 2021, pp. 759–773.
- L. M. Brègman, A relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming, Ž. Vyčisl. Mat i Mat. Fiz. 7 (1967), 620–631 (Russian). MR 215617
- Yann Brenier, A homogenized model for vortex sheets, Arch. Rational Mech. Anal. 138 (1997), no. 4, 319–353. MR 1467558, DOI 10.1007/s002050050044
- G. Buttazzo, L. De Pascale, and P. Gori-Giorgi, Optimal-transport formulation of electronic density-functional theory, Phys. Rev. A 85 (2012), no. 6, 062502., DOI 10.1103/PhysRevA.85.062502
- Guillaume Carlier, Adam Oberman, and Edouard Oudet, Numerical methods for matching for teams and Wasserstein barycenters, ESAIM Math. Model. Numer. Anal. 49 (2015), no. 6, 1621–1642. MR 3423268, DOI 10.1051/m2an/2015033
- Caihua Chen, Min Li, Xin Liu, and Yinyu Ye, Extended ADMM and BCD for nonseparable convex minimization models with quadratic coupling terms: convergence analysis and insights, Math. Program. 173 (2019), no. 1-2, Ser. A, 37–77. MR 3904457, DOI 10.1007/s10107-017-1205-9
- H. Chen, G. Friesecke, and C. B. Mendl, Numerical methods for a Kohn-Sham density functional model based on optimal transport, J. Chem. Theory Comput. 10 (2014), no. 10, 4360–4368., DOI 10.1021/ct500586q
- Jingrun Chen and Carlos J. García-Cervera, An efficient multigrid strategy for large-scale molecular mechanics optimization, J. Comput. Phys. 342 (2017), 29–42. MR 3649263, DOI 10.1016/j.jcp.2017.04.035
- Maria Colombo, Luigi De Pascale, and Simone Di Marino, Multimarginal optimal transport maps for one-dimensional repulsive costs, Canad. J. Math. 67 (2015), no. 2, 350–368. MR 3314838, DOI 10.4153/CJM-2014-011-x
- Codina Cotar, Gero Friesecke, and Claudia Klüppelberg, Density functional theory and optimal transportation with Coulomb cost, Comm. Pure Appl. Math. 66 (2013), no. 4, 548–599. MR 3020313, DOI 10.1002/cpa.21437
- M. Cuturi, Sinkhorn Distances: Lightspeed Computation of Optimal Transport, Advances in Neural Information Processing Systems, Vol. 26, Curran Associates, Inc., 2013, pp. 2292–2300.
- M. Cuturi and A. Doucet, Fast Computation of Wasserstein Barycenters, Proceedings of the 31st International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 32, PMLR, 2014, pp. 685–693.
- E. Dagotto, Complexity in strongly correlated electronic systems, Science 309 (2005), no. 5732, 257–262., DOI 10.1126/science.1107559
- S. Di Marino and A. Gerolin, Optimal transport losses and Sinkhorn algorithm with general convex regularization, arXiv:2007.00976 (2020).
- Derek Driggs, Junqi Tang, Jingwei Liang, Mike Davies, and Carola-Bibiane Schönlieb, A stochastic proximal alternating minimization for nonsmooth and nonconvex optimization, SIAM J. Imaging Sci. 14 (2021), no. 4, 1932–1970. MR 4354998, DOI 10.1137/20M1387213
- Petros Drineas and Anastasios Zouzias, A note on element-wise matrix sparsification via a matrix-valued Bernstein inequality, Inform. Process. Lett. 111 (2011), no. 8, 385–389. MR 2760960, DOI 10.1016/j.ipl.2011.01.010
- P. Dvurechensky, A. Gasnikov, and A. Kroshnin, Computational Optimal Transport: Complexity by Accelerated Gradient Descent is Better than by Sinkhorn’s Algorithm, Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, PMLR, 2018, pp. 1367–1376.
- V. Elvira and L. Martino, Advances in importance sampling, Wiley Statist. Ref. Stat. Ref. Online , posted on (2021), 1–14., DOI 10.1002/9781118445112.stat08284
- Olivier Fercoq and Peter Richtárik, Optimization in high dimensions via accelerated, parallel, and proximal coordinate descent, SIAM Rev. 58 (2016), no. 4, 739–771. MR 3567606, DOI 10.1137/16M1085905
- M. Filatov, Spin-restricted ensemble-referenced Kohn-Sham method: Basic principles and application to strongly correlated ground and excited states of molecules, Wiley Interdiscip. Rev. Comput. Mol. Sci. 5 (2015), no. 1, 146–167., DOI 10.1002/wcms.1209
- G. Friesecke, A. Gerolin, and P. Gori-Giorgi, The strong-interaction limit of density functional theory, Density Functional Theory: Modeling, Mathematical Analysis, Computational Methods, and Applications, Mathematics and Molecular Modeling, Springer, 2023, pp. 183–266., DOI 10.1007/978-3-031-22340-2_4
- Gero Friesecke, Andreas S. Schulz, and Daniela Vögler, Genetic column generation: fast computation of high-dimensional multimarginal optimal transport problems, SIAM J. Sci. Comput. 44 (2022), no. 3, A1632–A1654. MR 4443671, DOI 10.1137/21M140732X
- X. Geng, Label distribution learning, IEEE Trans. Knowl. Data Eng. 28 (2016), no. 7, 1734–1748., DOI 10.1109/TKDE.2016.2545658
- Johannes Hertrich and Gabriele Steidl, Inertial stochastic PALM and applications in machine learning, Sampl. Theory Signal Process. Data Anal. 20 (2022), no. 1, Paper No. 4, 33. MR 4414488, DOI 10.1007/s43670-022-00021-x
- Bamdad Hosseini and Stefan Steinerberger, Intrinsic sparsity of Kantorovich solutions, C. R. Math. Acad. Sci. Paris 360 (2022), 1173–1175 (English, with English and French summaries). MR 4491851, DOI 10.5802/crmath.392
- Yukuan Hu, Huajie Chen, and Xin Liu, A global optimization approach for multimarginal optimal transport problems with Coulomb cost, SIAM J. Sci. Comput. 45 (2023), no. 3, A1214–A1238. MR 4599325, DOI 10.1137/21M1455164
- Yukuan Hu and Xin Liu, The convergence properties of infeasible inexact proximal alternating linearized minimization, Sci. China Math. 66 (2023), no. 10, 2385–2410. MR 4646978, DOI 10.1007/s11425-022-2074-7
- Y. Hu and X. Liu, The exactness of the $\ell _1$ penalty function for a class of mathematical programs with generalized complementarity constraints, Fundam. Res. , posted on (2023), in press., DOI 10.1016/j.fmre.2023.04.006
- L. Kantorovitch, On the translocation of masses, C. R. (Doklady) Acad. Sci. URSS (N.S.) 37 (1942), 199–201. MR 9619
- Tanguy Kerdoncuff, Rémi Emonet, and Marc Sebban, Sampled Gromov Wasserstein, Mach. Learn. 110 (2021), no. 8, 2151–2186. MR 4301115, DOI 10.1007/s10994-021-06035-1
- Yuehaw Khoo, Lin Lin, Michael Lindsey, and Lexing Ying, Semidefinite relaxation of multimarginal optimal transport for strictly correlated electrons in second quantization, SIAM J. Sci. Comput. 42 (2020), no. 6, B1462–B1489. MR 4181101, DOI 10.1137/20M1310977
- Yuehaw Khoo and Lexing Ying, Convex relaxation approaches for strictly correlated density functional theory, SIAM J. Sci. Comput. 41 (2019), no. 4, B773–B795. MR 3984313, DOI 10.1137/18M1207478
- S. Kullback and R. A. Leibler, On information and sufficiency, Ann. Math. Statistics 22 (1951), 79–86. MR 39968, DOI 10.1214/aoms/1177729694
- Abhisek Kundu, Petros Drineas, and Malik Magdon-Ismail, Recovering PCA and sparse PCA via hybrid-$(\ell _1,\ell _2)$ sparse sampling of data elements, J. Mach. Learn. Res. 18 (2017), Paper No. 75, 34. MR 3714238
- S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher, Block-coordinate Frank-Wolfe Optimization for Structural SVMs, Proceedings of the 30th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 28, PMLR, 2013, pp. 53–61.
- Ching-pei Lee, Accelerating inexact successive quadratic approximation for regularized optimization through manifold identification, Math. Program. 201 (2023), no. 1-2, Ser. A, 599–633. MR 4620237, DOI 10.1007/s10107-022-01916-2
- Mengyu Li, Jun Yu, Tao Li, and Cheng Meng, Importance sparsification for Sinkhorn algorithm, J. Mach. Learn. Res. 24 (2023), Paper No. [247], 44. MR 4633974
- Mengyu Li, Jun Yu, Hongteng Xu, and Cheng Meng, Efficient approximation of Gromov-Wasserstein distance using importance sparsification, J. Comput. Graph. Statist. 32 (2023), no. 4, 1512–1523. MR 4669265, DOI 10.1080/10618600.2023.2165500
- Q. Li, Z. Zhu, G. Tang, and M. B. Wakin, Provable Bregman-divergence based methods for nonconvex and non-Lipschitz problems, arXiv:1904.09712 (2019).
- J. S. Liu, Metropolized independent sampling with comparisons to rejection sampling and importance sampling, Stat. Comput. 6 (1996), no. 2, 113–119., DOI 10.1007/BF00162521
- Jun S. Liu, Monte Carlo strategies in scientific computing, Springer Series in Statistics, Springer, New York, 2008. MR 2401592
- Jialin Liu, Wotao Yin, Wuchen Li, and Yat Tin Chow, Multilevel optimal transport: a fast approximation of Wasserstein-1 distances, SIAM J. Sci. Comput. 43 (2021), no. 1, A193–A220. MR 4198574, DOI 10.1137/18M1219813
- Zhi-Quan Luo and Paul Tseng, On the convergence rate of dual ascent methods for linearly constrained convex minimization, Math. Oper. Res. 18 (1993), no. 4, 846–867. MR 1251683, DOI 10.1287/moor.18.4.846
- M. Ma, X. Wang, Y. Duan, S. H. Frey, and X. Gu, Optimal mass transport based brain morphometry for patients with congenital hand deformities, Vis. Comput. 35 (2019), 1311–1325., DOI 10.1007/s00371-018-1543-5
- P. Ma, M. Mahoney, and B. Yu, A Statistical Perspective on Algorithmic Leveraging, Proceedings of the 31st International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 32, PMLR, 2014, pp. 91–99.
- C. B. Mendl and L. Lin, Kantorovich dual solution for strictly correlated electrons in atoms and molecules, Phys. Rev. B 87 (2013), no. 12, 125106., DOI 10.1103/PhysRevB.87.125106
- C. Meng, Y. Ke, J. Zhang, M. Zhang, W. Zhong, and P. Ma, Large-scale Optimal Transport Map Estimation using Projection Pursuit, Advances in Neural Information Processing Systems, Vol. 32, Curran Associates, Inc., 2019, pp. 8118–8129.
- G. Monge, Mémoire sur la théorie des déblais et des remblais, Hist. Acad. R. Sci. (1781), 666–704.
- A. B. Owen, Monte Carlo Theory, Methods and Examples, Stanford University, 2013.
- O. Pele and M. Werman, Fast and Robust Earth Mover’s Distances, IEEE 12th International Conference on Computer Vision, IEEE, 2009, pp. 460–467., DOI 10.1109/ICCV.2009.5459199
- G. Peyré and M. Cuturi, Computational optimal transport: With applications to data science, Found. Trends Mach. Learn. 11 (2019), no. 5-6, 355–607., DOI 10.1561/2200000073
- Y. Rubner, L. J. Guibas, and C. Tomasi, The Earth Mover’s Distance, Multi-Dimensional Scaling, and Color-Based Image Retrieval, Proceedings of the ARPA Image Understanding Workshop, ARPA, 1997, pp. 661–668.
- M. Seidl, Strong-interaction limit of density-functional theory, Phys. Rev. A 60 (1999), no. 6, 4387., DOI 10.1103/PhysRevA.60.4387
- M. Seidl, J. P. Perdew, and S. Kurth, Simulation of all-order density-functional perturbation theory, using the second order and the strong-correlation limit, Phys. Rev. Lett. 84 (2000), no. 22, 5070., DOI 10.1103/PhysRevLett.84.5070
- M. Seidl, J. P. Perdew, and M. Levy, Strictly correlated electrons in density-functional theory, Phys. Rev. A 59 (1999), no. 1, 51., DOI 10.1103/PhysRevA.59.51
- C. E. Shannon, A mathematical theory of communication, Bell System Tech. J. 27 (1948), 379–423, 623–656. MR 26286, DOI 10.1002/j.1538-7305.1948.tb01338.x
- Richard Sinkhorn and Paul Knopp, Concerning nonnegative matrices and doubly stochastic matrices, Pacific J. Math. 21 (1967), 343–348. MR 210731, DOI 10.2140/pjm.1967.21.343
- Ruoyu Sun, Zhi-Quan Luo, and Yinyu Ye, On the efficiency of random permutation for ADMM and coordinate descent, Math. Oper. Res. 45 (2020), no. 1, 233–271. MR 4066996, DOI 10.1287/moor.2019.0990
- Cédric Villani, Topics in optimal transportation, Graduate Studies in Mathematics, vol. 58, American Mathematical Society, Providence, RI, 2003. MR 1964483, DOI 10.1090/gsm/058
- H. Wang and J. Zou, A Comparative Study on Sampling with Replacement vs Poisson Sampling in Optimal Subsampling, Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 130, PMLR, 2021, pp. 289–297.
- Jing Wang, Jiahui Zou, and HaiYing Wang, Sampling with replacement vs Poisson sampling: a comparative study in optimal subsampling, IEEE Trans. Inform. Theory 68 (2022), no. 10, 6605–6630. MR 4502557, DOI 10.1109/tit.2022.3176955
- Stephen J. Wright, Coordinate descent algorithms, Math. Program. 151 (2015), no. 1, Ser. B, 3–34. MR 3347548, DOI 10.1007/s10107-015-0892-3
- Q. Xia and T. Shi, A cascadic multilevel optimization algorithm for the design of composite structures with curvilinear fiber based on Shepard interpolation, Compos. Struct. 188 (2018), 209–219., DOI 10.1016/j.compstruct.2018.01.013
- Y. Xie, X. Wang, R. Wang, and H. Zha, A Fast Proximal Point Method for Computing Exact Wasserstein Distance, Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, Jul. 22-25, Proceedings of Machine Learning Research, vol. 115, PMLR, 2020, pp. 433–453.
- L. Xu, H. Sun, and Y. Liu, Learning with Batch-wise Optimal Transport Loss for 3D Shape Recognition, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3333–3342.
- Lei Yang and Kim-Chuan Toh, Bregman proximal point algorithm revisited: a new inexact version and its inertial variant, SIAM J. Optim. 32 (2022), no. 3, 1523–1554. MR 4451317, DOI 10.1137/20M1360748
- Jun Yu, HaiYing Wang, Mingyao Ai, and Huiming Zhang, Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data, J. Amer. Statist. Assoc. 117 (2022), no. 537, 265–276. MR 4399084, DOI 10.1080/01621459.2020.1773832
- P. Zhao and Z. Zhou, Label Distribution Learning by Optimal Transport, Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, AAAI Press, 2018, pp. 4506–4513., DOI 10.1609/aaai.v32i1.11609
Bibliographic Information
- Yukuan Hu
- Affiliation: State Key Laboratory of Scientific and Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, People’s Republic of China; and University of Chinese Academy of Sciences, Beijing 100049, People’s Republic of China
- MR Author ID: 1563353
- ORCID: 0000-0002-5372-3814
- Email: ykhu@lsec.cc.ac.cn
- Mengyu Li
- Affiliation: Institute of Statistics and Big Data, Renmin University of China, Beijing 100872, People’s Republic of China
- MR Author ID: 1511457
- ORCID: 0000-0002-5286-7525
- Email: limengyu516@ruc.edu.cn
- Xin Liu
- Affiliation: State Key Laboratory of Scientific and Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, People’s Republic of China; and University of Chinese Academy of Sciences, Beijing 100049, People’s Republic of China
- Email: liuxin@lsec.cc.ac.cn
- Cheng Meng
- Affiliation: Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China, Beijing 100872, People’s Republic of China
- MR Author ID: 1398867
- Email: chengmeng@ruc.edu.cn
- Received by editor(s): September 22, 2023
- Received by editor(s) in revised form: March 17, 2024, and April 19, 2024
- Published electronically: August 15, 2024
- Additional Notes: The work of the first author was supported by the National Key R&D Program of China (2020YFA0711900). The work of the second author was supported by the Outstanding Innovative Talents Cultivation Funded Programs 2021 of Renmin University of China. The work of the third author was supported in part by the National Natural Science Foundation of China (12125108, 12226008, 11991021, 11991020, 12021001, 12288201), Key Research Program of Frontier Sciences, Chinese Academy of Sciences (ZDBS-LY-7022), and CAS-Croucher Funding Scheme for Joint Laboratories “CAS AMSS-PolyU Joint Laboratory of Applied Mathematics: Nonlinear Optimization Theory, Algorithms and Applications”. The work of the fourth author was supported by Beijing Municipal Natural Science Foundation (1232019), the National Natural Science Foundation of China (12101606), and Renmin University of China research fund program for young scholars.
- © Copyright 2024 American Mathematical Society
- Journal: Math. Comp. 94 (2025), 1281-1322
- MSC (2020): Primary 52-08, 65C05, 65F50, 81V05, 90C30
- DOI: https://doi.org/10.1090/mcom/3989