Properties of discrete sliced Wasserstein losses
HTML articles powered by AMS MathViewer
- by Eloi Tanguy, Rémi Flamary and Julie Delon;
- Math. Comp. 94 (2025), 1411-1465
- DOI: https://doi.org/10.1090/mcom/3994
- Published electronically: June 20, 2024
- HTML | PDF
Abstract:
The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the SW energy. In this paper we study the properties of $\mathcal {E}: Y \longmapsto \mathrm {SW}_2^2(\gamma _Y, \gamma _Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb {R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal {E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal {E}_p$ to those of $\mathcal {E}$, as well as an almost-sure uniform convergence and a uniform Central Limit result on the process $\mathcal {E}_p$. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal {E}$ and $\mathcal {E}_p$ converge towards (Clarke) critical points of these energies.References
- H. Alghamdi, M. Grogan, and R. Dahyot, Patch-based colour transfer with optimal transport, 2019 27th European Signal Processing Conference (EUSIPCO), IEEE, 2019, pp. 1–5.
- M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein generative adversarial networks, Proceedings of the 34th International Conference on Machine Learning (Doina Precup and Yee Whye Teh, eds.), Proceedings of Machine Learning Research, vol. 70, PMLR, 6–11 August 2017, pp. 214–223.
- Dario Azzimonti and David Ginsbourger, Estimating orthant probabilities of high-dimensional Gaussian vectors with an application to set estimation, J. Comput. Graph. Statist. 27 (2018), no. 2, 255–267. MR 3816262, DOI 10.1080/10618600.2017.1360781
- J. Backhoff-Veraguas, J. Fontbona, G. Rios, and F. Tobar, Stochastic gradient descent for barycenters in Wasserstein space, 2022.
- Erhan Bayraktar and Gaoyue Guoï, Strong equivalence between metrics of Wasserstein type, Electron. Commun. Probab. 26 (2021), Paper No. 13, 13. MR 4236683, DOI 10.3390/mca26010013
- Pascal Bianchi, Walid Hachem, and Sholom Schechtman, Convergence of constant step stochastic gradient descent for non-smooth non-convex functions, Set-Valued Var. Anal. 30 (2022), no. 3, 1117–1147. MR 4455163, DOI 10.1007/s11228-022-00638-z
- Pascal Bianchi, Walid Hachem, and Sholom Schechtman, Stochastic subgradient descent escapes active strict saddles on weakly convex functions, Math. Oper. Res. (2023).
- Xin Bing, Florentina Bunea, and Jonathan Niles-Weed, The sketched Wasserstein distance for mixture distributions, Preprint, arXiv:2206.12768, 2022.
- Jérôme Bolte, Aris Daniilidis, Adrian Lewis, and Masahiro Shiota, Clarke subgradients of stratifiable functions, SIAM J. Optim. 18 (2007), no. 2, 556–572. MR 2338451, DOI 10.1137/060670080
- Jérôme Bolte and Edouard Pauwels, Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning, Math. Program. 188 (2021), no. 1, Ser. A, 19–51. MR 4276581, DOI 10.1007/s10107-020-01501-5
- Nicolas Bonneel, Julien Rabin, Gabriel Peyré, and Hanspeter Pfister, Sliced and Radon Wasserstein barycenters of measures, J. Math. Imaging Vision 51 (2015), no. 1, 22–45. MR 3300482, DOI 10.1007/s10851-014-0506-3
- Nicolas Bonneel, James Tompkin, Kalyan Sunkavalli, Deqing Sun, Sylvain Paris, and Hanspeter Pfister, Blind video temporal consistency, ACM Trans. Graph. 34 (2015), no. 6, 1–9.
- Nicolas Bonnotte, Unidimensional and evolution methods for optimal transportation, Ph.D. Thesis, Paris, 2013.
- F. H. Clarke, Optimization and nonsmooth analysis, 2nd ed., Classics in Applied Mathematics, vol. 5, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990. MR 1058436, DOI 10.1137/1.9781611971309
- Marco Cuturi, Sinkhorn distances: lightspeed computation of optimal transport, Advances in Neural Information Processing Systems, vol. 26, 2013.
- Damek Davis and Dmitriy Drusvyatskiy, Proximal methods avoid active strict saddles of weakly convex functions, Found. Comput. Math. 22 (2022), no. 2, 561–606. MR 4407752, DOI 10.1007/s10208-021-09516-w
- Damek Davis, Dmitriy Drusvyatskiy, and Liwei Jiang, Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization, 2023.
- Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, and Jason D. Lee, Stochastic subgradient method converges on tame functions, Found. Comput. Math. 20 (2020), no. 1, 119–154. MR 4056927, DOI 10.1007/s10208-018-09409-5
- Ishan Deshpande, Ziyu Zhang, and Alexander G. Schwing, Generative modeling using the sliced Wasserstein distance, 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, Computer Vision Foundation/IEEE Computer Society, 2018, pp. 3483–3491.
- R. M. Dudley, The speed of mean Glivenko-Cantelli convergence, Ann. Math. Statist. 40 (1968), 40–50. MR 236977, DOI 10.1214/aoms/1177697802
- Kilian Fatras, Younes Zine, Szymon Majewski, Rémi Flamary, Rémi Gribonval, and Nicolas Courty, Minibatch optimal transport distances; analysis and applications, Preprint, arXiv:2101.01792, 2021.
- Rémi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z. Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, Léo Gautheron, Nathalie T. H. Gayraud, Hicham Janati, Alain Rakotomamonjy, Ievgen Redko, Antoine Rolet, Antony Schutz, Vivien Seguy, Danica J. Sutherland, Romain Tavenard, Alexander Tong, and Titouan Vayer, POT: Python optimal transport, J. Mach. Learn. Res. 22 (2021), no. 78, 1–8.
- Aude Genevay, Gabriel Peyré, and Marco Cuturi, Learning generative models with Sinkhorn divergences, International Conference on Artificial Intelligence and Statistics, PMLR, 2018, pp. 1608–1617.
- Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville, Improved training of Wasserstein GANs, Advances in Neural Information Processing Systems, vol. 30, 2017.
- Charles R. Harris, K. Jarrod Millman, Stéfan J. Van Der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, et al., Array programming with NumPy, Nature 585 (2020), no. 7825, 357–362.
- Eric Heitz, Kenneth Vanhoey, Thomas Chambon, and Laurent Belcour, A sliced Wasserstein loss for neural texture synthesis, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9412–9420.
- Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, and Michael I. Jordan, How to escape saddle points efficiently, International Conference on Machine Learning, PMLR, 2017, pp. 1724–1732.
- Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, and Michael I. Jordan, On nonconvex optimization for machine learning: gradients, stochasticity, and saddle points, J. ACM 68 (2021), no. 2, Art. 11, 29. MR 4267060, DOI 10.1145/3418526
- Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen, Progressive growing of GANs for improved quality, stability, and variation, Preprint, arXiv:1710.10196, 2017.
- Soheil Kolouri, Phillip E. Pope, Charles E. Martin, and Gustavo K. Rohde, Sliced Wasserstein auto-encoders, International Conference on Learning Representations, 2019.
- Francis Hirsch and Gilles Lacombe, Elements of functional analysis, Graduate Texts in Mathematics, vol. 192, Springer-Verlag, New York, 1999. Translated from the 1997 French original by Silvio Levy. MR 1678925, DOI 10.1007/978-1-4612-1444-1
- Shiying Li and Caroline Moosmueller, Measure transfer via stochastic slicing and matching, 2023.
- Antoine Liutkus, Umut Simsekli, Szymon Majewski, Alain Durmus, and Fabian-Robert Stöter, Sliced-Wasserstein flows: nonparametric generative modeling via optimal transport and diffusions, Proceedings of the 36th International Conference on Machine Learning (Kamalika Chaudhuri and Ruslan Salakhutdinov, eds.), Proceedings of Machine Learning Research, vol. 97, PMLR, 9–15 June 2019, pp. 4104–4113.
- Szymon Majewski, Błażej Miasojedow, and Eric Moulines, Analysis of nonsmooth stochastic approximation: the differential inclusion approach, Preprint, arXiv:1805.01916, 2018.
- Quentin Mérigot, Filippo Santambrogio, and Clément Sarrazin, Non-asymptotic convergence bounds for Wasserstein approximation using point clouds, Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 12810–12821.
- Kimia Nadjahi, Valentin De Bortoli, Alain Durmus, Roland Badeau, and Umut Şimşekli, Approximate bayesian computation with the sliced-Wasserstein distance, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 5470–5474.
- Kimia Nadjahi, Alain Durmus, Lénaïc Chizat, Soheil Kolouri, Shahin Shahrampour, and Umut Simsekli, Statistical and topological properties of sliced probability divergences, Advances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, eds.), vol. 33, Curran Associates, Inc., 2020, pp. 20802–20812.
- Kimia Nadjahi, Alain Durmus, Umut Simsekli, and Roland Badeau, Asymptotic guarantees for learning generative models with the sliced-Wasserstein distance, Advances in Neural Information Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett, eds.), vol. 32, Curran Associates, Inc., 2019.
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala, Pytorch: an imperative style, high-performance deep learning library, Advances in Neural Information Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett, eds.), vol. 32, Curran Associates, Inc., 2019.
- G. Peyré and M. Cuturi, Computational optimal transport, Found. Trends Mach. Learn. 51 (2019), no. 1, 1–44.
- Julien Rabin, Gabriel Peyré, Julie Delon, and Marc Bernot, Wasserstein barycenter and its application to texture mixing, Scale Space and Variational Methods in Computer Vision: Third International Conference, SSVM 2011, Ein-Gedi, Israel, May 29–June 2, 2011, Revised Selected Papers 3, Springer, 2012, pp. 435–446.
- Filippo Santambrogio, Optimal transport for applied mathematicians, Progress in Nonlinear Differential Equations and their Applications, vol. 87, Birkhäuser/Springer, Cham, 2015. Calculus of variations, PDEs, and modeling. MR 3409718, DOI 10.1007/978-3-319-20828-2
- Eloi Tanguy, Rémi Flamary, and Julie Delon, Reconstructing discrete measures from projections. Consequences on the empirical sliced Wasserstein distance, Preprint, arXiv:2304.12029, 2023.
- Guillaume Tartavel, Gabriel Peyré, and Yann Gousseau, Wasserstein loss for image synthesis and restoration, SIAM J. Imaging Sci. 9 (2016), no. 4, 1726–1755. MR 3564778, DOI 10.1137/16M1067494
- Joel A. Tropp, User-friendly tail bounds for sums of random matrices, Found. Comput. Math. 12 (2012), no. 4, 389–434. MR 2946459, DOI 10.1007/s10208-011-9099-z
- A. W. van der Vaart, Asymptotic statistics, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 3, Cambridge University Press, Cambridge, 1998. MR 1652247, DOI 10.1017/CBO9780511802256
- Jean-Philippe Vial, Strong and weak convexity of sets and functions, Math. Oper. Res. 8 (1983), no. 2, 231–259. MR 707055, DOI 10.1287/moor.8.2.231
- Cédric Villani, Optimal transport, Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 338, Springer-Verlag, Berlin, 2009. Old and new. MR 2459454, DOI 10.1007/978-3-540-71050-9
- Seiichiro Wakabayashi, Remarks on semi-algebraic functions, Online Notes, January 2008.
- J. Wu, Z. Huang, D. Acharya, W. Li, J. Thoma, D. Paudel, and L. Van Gool, Sliced Wasserstein generative models, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Los Alamitos, CA, USA), IEEE Computer Society, June 2019, pp. 3708–3717.
- Jiaqi Xi and Jonathan Niles-Weed, Distributional convergence of the sliced Wasserstein process, Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 13961–13973.
- X. Xu and Z. Huang, Central limit theorem for the sliced 1-Wasserstein distance and the max-sliced 1-Wasserstein distance, arXiv preprint arXiv:2205.14624 (2022).
Bibliographic Information
- Eloi Tanguy
- Affiliation: Université Paris Cité, CNRS, MAP5, F-75006 Paris, France
- Email: eloi.tanguy@u-paris.fr
- Rémi Flamary
- Affiliation: CMAP, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris, France
- ORCID: 0000-0002-4212-6627
- Email: remi.flamary@polytechnique.edu
- Julie Delon
- Affiliation: Université Paris Cité, CNRS, MAP5, F-75006 Paris, France
- MR Author ID: 743632
- ORCID: 0000-0002-7182-7537
- Email: julie.delon@u-paris.fr
- Received by editor(s): August 31, 2023
- Received by editor(s) in revised form: March 2, 2024, and May 13, 2024
- Published electronically: June 20, 2024
- Additional Notes: This research was funded, in part, by the Agence nationale de la recherche (ANR), through the SOCOT project (ANR-23-CE40-0017), and the PEPR PDE-AI project (ANR-23-PEIA-0004).
- © Copyright 2024 by the authors
- Journal: Math. Comp. 94 (2025), 1411-1465
- MSC (2020): Primary 49Q22
- DOI: https://doi.org/10.1090/mcom/3994