Sparse Cholesky factorization for solving nonlinear PDEs via Gaussian processes
HTML articles powered by AMS MathViewer
- by Yifan Chen, Houman Owhadi and Florian Schäfer;
- Math. Comp.
- DOI: https://doi.org/10.1090/mcom/3992
- Published electronically: June 14, 2024
- HTML | PDF | Request permission
Abstract:
In recent years, there has been widespread adoption of machine learning-based approaches to automate the solving of partial differential equations (PDEs). Among these approaches, Gaussian processes (GPs) and kernel methods have garnered considerable interest due to their flexibility, robust theoretical guarantees, and close ties to traditional methods. They can transform the solving of general nonlinear PDEs into solving quadratic optimization problems with nonlinear, PDE-induced constraints. However, the complexity bottleneck lies in computing with dense kernel matrices obtained from pointwise evaluations of the covariance kernel, and its partial derivatives, a result of the PDE constraint and for which fast algorithms are scarce.
The primary goal of this paper is to provide a near-linear complexity algorithm for working with such kernel matrices. We present a sparse Cholesky factorization algorithm for these matrices based on the near-sparsity of the Cholesky factor under a novel ordering of pointwise and derivative measurements. The near-sparsity is rigorously justified by directly connecting the factor to GP regression and exponential decay of basis functions in numerical homogenization. We then employ the Vecchia approximation of GPs, which is optimal in the Kullback-Leibler divergence, to compute the approximate factor. This enables us to compute $\epsilon$-approximate inverse Cholesky factors of the kernel matrices with complexity $O(N\log ^d(N/\epsilon ))$ in space and $O(N\log ^{2d}(N/\epsilon ))$ in time. We integrate sparse Cholesky factorizations into optimization algorithms to obtain fast solvers of the nonlinear PDE. We numerically illustrate our algorithm’s near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Ampère equations. In summary, we provide a fast, scalable, and accurate method for solving general PDEs with GPs and kernel methods.
References
- Sivaram Ambikasaran and Eric Darve, An $\scr {O}(N\log N)$ fast direct solver for partial hierarchically semi-separable matrices: with application to radial basis function interpolation, J. Sci. Comput. 57 (2013), no. 3, 477–501. MR 3123554, DOI 10.1007/s10915-013-9714-z
- S. Ambikasaran, D. Foreman-Mackey, L. Greengard, D. W. Hogg, and M. O’Neil, Fast direct methods for Gaussian processes, IEEE transactions on pattern analysis and machine intelligence 38 (2015), no. 2, 252–265.
- P. Batlle, Y. Chen, B. Hosseini, H. Owhadi, and A. M. Stuart, Error analysis of kernel/gp methods for nonlinear and parametric pdes, Preprint, arXiv:2305.04962, 2023.
- Alain Berlinet and Christine Thomas-Agnan, Reproducing kernel Hilbert spaces in probability and statistics, Kluwer Academic Publishers, Boston, MA, 2004. With a preface by Persi Diaconis. MR 2239907, DOI 10.1007/978-1-4419-9096-9
- G. Beylkin, R. Coifman, and V. Rokhlin, Fast wavelet transforms and numerical algorithms. I, Comm. Pure Appl. Math. 44 (1991), no. 2, 141–183. MR 1085827, DOI 10.1002/cpa.3160440202
- Kaushik Bhattacharya, Bamdad Hosseini, Nikola B. Kovachki, and Andrew M. Stuart, Model reduction and neural networks for parametric PDEs, SMAI J. Comput. Math. 7 (2021), 121–157. MR 4290514, DOI 10.5802/smai-jcm.74
- Y. Chen, E. N. Epperly, J. A. Tropp, and R. J. Webber, Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations, Preprint, arXiv:2207.06503, 2022.
- Yifan Chen, Bamdad Hosseini, Houman Owhadi, and Andrew M. Stuart, Solving and learning nonlinear PDEs with Gaussian processes, J. Comput. Phys. 447 (2021), Paper No. 110668, 29. MR 4311012, DOI 10.1016/j.jcp.2021.110668
- Yifan Chen and Thomas Y. Hou, Function approximation via the subsampled Poincaré inequality, Discrete Contin. Dyn. Syst. 41 (2021), no. 1, 169–199. MR 4182318, DOI 10.3934/dcds.2020296
- Yifan Chen and Thomas Y. Hou, Multiscale elliptic PDE upscaling and function approximation via subsampled data, Multiscale Model. Simul. 20 (2022), no. 1, 188–219. MR 4385643, DOI 10.1137/20M1372214
- Yifan Chen, Houman Owhadi, and Andrew M. Stuart, Consistency of empirical Bayes and kernel flow for hierarchical parameter estimation, Math. Comp. 90 (2021), no. 332, 2527–2578. MR 4305361, DOI 10.1090/mcom/3649
- Jon Cockayne, Chris J. Oates, T. J. Sullivan, and Mark Girolami, Bayesian probabilistic numerical methods, SIAM Rev. 61 (2019), no. 4, 756–789. MR 4027836, DOI 10.1137/17M1139357
- Matthieu Darcy, Boumediene Hamzi, Giulia Livieri, Houman Owhadi, and Peyman Tavallali, One-shot learning of stochastic differential equations with data adapted kernels, Phys. D 444 (2023), Paper No. 133583, 18. MR 4517044, DOI 10.1016/j.physd.2022.133583
- A. Daw, J. Bu, S. Wang, P. Perdikaris, and A. Karpatne, Rethinking the importance of sampling in physics-informed neural networks, Preprint, arXiv:2207.02338, 2022.
- F. De Roos, A. Gessner, and P. Hennig, High-dimensional Gaussian process inference with derivatives, International Conference on Machine Learning, PMLR, 2021, pp. 2535–2545.
- D. Eriksson, K. Dong, E. Lee, D. Bindel, and A. G. Wilson, Scaling Gaussian process regression with derivatives, Advances in Neural Information Processing Systems, vol. 31, 2018.
- Reinhard Furrer, Marc G. Genton, and Douglas Nychka, Covariance tapering for interpolation of large spatial datasets, J. Comput. Graph. Statist. 15 (2006), no. 3, 502–523. MR 2291261, DOI 10.1198/106186006X132178
- Christopher J. Geoga, Mihai Anitescu, and Michael L. Stein, Scalable Gaussian process computations using hierarchical matrices, J. Comput. Graph. Statist. 29 (2020), no. 2, 227–237. MR 4116037, DOI 10.1080/10618600.2019.1652616
- D. Gines, G. Beylkin, and J. Dunn, $LU$ factorization of non-standard forms and direct multiresolution solvers, Appl. Comput. Harmon. Anal. 5 (1998), no. 2, 156–201. MR 1614458, DOI 10.1006/acha.1997.0227
- T. G. Grossmann, U. J. Komorowska, J. Latz, and C.-B. Schönlieb, Can physics-informed neural networks beat the finite element method?, IMA J. Appl. Math., 2024, DOI 10.1093/imamat/hxae011.
- M. Gu and L. Miranian, Strong rank revealing Cholesky factorization, Electron. Trans. Numer. Anal. 17 (2004), 76–92. MR 2040798
- Joseph Guinness, Permutation and grouping methods for sharpening Gaussian process approximations, Technometrics 60 (2018), no. 4, 415–429. MR 3878098, DOI 10.1080/00401706.2018.1437476
- W. Hackbusch, A sparse matrix arithmetic based on $\scr H$-matrices. I. Introduction to $\scr H$-matrices, Computing 62 (1999), no. 2, 89–108. MR 1694265, DOI 10.1007/s006070050015
- W. Hackbusch and S. Börm, Data-sparse approximation by adaptive $\scr H^2$-matrices, Computing 69 (2002), no. 1, 1–35. MR 1954142, DOI 10.1007/s00607-002-1450-4
- W. Hackbusch and B. N. Khoromskij, A sparse $\scr H$-matrix arithmetic. II. Application to multi-dimensional problems, Computing 64 (2000), no. 1, 21–47. MR 1755846, DOI 10.1007/PL00021408
- Jiequn Han, Arnulf Jentzen, and Weinan E, Solving high-dimensional partial differential equations using deep learning, Proc. Natl. Acad. Sci. USA 115 (2018), no. 34, 8505–8510. MR 3847747, DOI 10.1073/pnas.1718942115
- Moritz Hauck and Daniel Peterseim, Super-localization of elliptic multiscale problems, Math. Comp. 92 (2023), no. 341, 981–1003. MR 4550317, DOI 10.1090/mcom/3798
- Patrick Henning and Daniel Peterseim, Oversampling for the multiscale finite element method, Multiscale Model. Simul. 11 (2013), no. 4, 1149–1175. MR 3123820, DOI 10.1137/120900332
- Thomas Y. Hou and Pengchuan Zhang, Sparse operator compression of higher-order elliptic operators with rough coefficients, Res. Math. Sci. 4 (2017), Paper No. 24, 49. MR 3736719, DOI 10.1186/s40687-017-0113-1
- A. Jacot, F. Gabriel, and C. Hongler, Neural tangent kernel: convergence and generalization in neural networks, Advances in Neural Information Processing Systems, vol. 31, 2018.
- G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, Physics-informed machine learning, Nat. Rev. Phys. 3 (2021), no. 6, 422–440.
- Matthias Katzfuss, A multi-resolution approximation for massive spatial datasets, J. Amer. Statist. Assoc. 112 (2017), no. 517, 201–214. MR 3646566, DOI 10.1080/01621459.2015.1123632
- Matthias Katzfuss, Joseph Guinness, Wenlong Gong, and Daniel Zilber, Vecchia approximations of Gaussian-process predictions, J. Agric. Biol. Environ. Stat. 25 (2020), no. 3, 383–414. MR 4139037, DOI 10.1007/s13253-020-00401-7
- Ralf Kornhuber, Daniel Peterseim, and Harry Yserentant, An analysis of a class of variational multiscale methods based on subspace decomposition, Math. Comp. 87 (2018), no. 314, 2765–2774. MR 3834684, DOI 10.1090/mcom/3302
- A. Krishnapriyan, A. Gholami, S. Zhe, R. Kirby, and M. W. Mahoney, Characterizing possible failure modes in physics-informed neural networks, Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 26548–26560.
- Kenneth L. Ho and Lexing Ying, Hierarchical interpolative factorization for elliptic operators: integral equations, Comm. Pure Appl. Math. 69 (2016), no. 7, 1314–1353. MR 3503023, DOI 10.1002/cpa.21577
- J. Lee, Y. Bahri, R. Novak, S. S. Schoenholz, J. Pennington, and J. Sohl-Dickstein, Deep neural networks as Gaussian processes, Preprint, arXiv:1711.00165, 2017.
- Shengguo Li, Ming Gu, Cinna Julie Wu, and Jianlin Xia, New efficient and robust HSS Cholesky factorization of SPD matrices, SIAM J. Matrix Anal. Appl. 33 (2012), no. 3, 886–904. MR 3023456, DOI 10.1137/110851110
- Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, Fourier neural operator for parametric partial differential equations, arXiv preprint arXiv:2010.08895 (2020).
- Finn Lindgren, Håvard Rue, and Johan Lindström, An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach, J. R. Stat. Soc. Ser. B Stat. Methodol. 73 (2011), no. 4, 423–498. With discussion and a reply by the authors. MR 2853727, DOI 10.1111/j.1467-9868.2011.00777.x
- Alexander Litvinenko, Ying Sun, Marc G. Genton, and David E. Keyes, Likelihood approximation with hierarchical matrices for large spatial datasets, Comput. Statist. Data Anal. 137 (2019), 115–132. MR 3921064, DOI 10.1016/j.csda.2019.02.002
- Haitao Liu, Yew-Soon Ong, Xiaobo Shen, and Jianfei Cai, When Gaussian process meets big data: a review of scalable GPs, IEEE Trans. Neural Netw. Learn. Syst. 31 (2020), no. 11, 4405–4423. MR 4169962, DOI 10.1109/tnnls.2019.2957109
- D. Long, N. Mrvaljevic, S. Zhe, and B. Hosseini, A kernel approach for pde discovery and operator learning, Preprint, arXiv:2210.08140, 2022.
- L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis, Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators, Nat. Mach. Intell. 3 (2021), no. 3, 218–229.
- Tzon-Tzer Lu and Sheng-Hua Shiou, Inverses of $2\times 2$ block matrices, Comput. Math. Appl. 43 (2002), no. 1-2, 119–129. MR 1873248, DOI 10.1016/S0898-1221(01)00278-4
- Axel Målqvist and Daniel Peterseim, Localization of elliptic multiscale problems, Math. Comp. 83 (2014), no. 290, 2583–2603. MR 3246801, DOI 10.1090/S0025-5718-2014-02868-8
- Rui Meng and Xianjin Yang, Sparse Gaussian processes for solving nonlinear PDEs, J. Comput. Phys. 490 (2023), Paper No. 112340, 26. MR 4614570, DOI 10.1016/j.jcp.2023.112340
- Victor Minden, Anil Damle, Kenneth L. Ho, and Lexing Ying, Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizations, Multiscale Model. Simul. 15 (2017), no. 4, 1584–1611. MR 3720352, DOI 10.1137/17M1116477
- Victor Minden, Kenneth L. Ho, Anil Damle, and Lexing Ying, A recursive skeletonization factorization based on strong admissibility, Multiscale Model. Simul. 15 (2017), no. 2, 768–796. MR 3639567, DOI 10.1137/16M1095949
- K. P. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012.
- C. Musco and C. Musco, Recursive sampling for the Nyström method, Advances in Neural Information Processing Systems, vol. 30, 2017.
- R. M. Neal, Priors for infinite networks, in Bayesian Learning for Neural Networks, 1996, pp. 29–53.
- Nicholas H. Nelsen and Andrew M. Stuart, The random feature model for input-output maps between Banach spaces, SIAM J. Sci. Comput. 43 (2021), no. 5, A3212–A3243. MR 4313849, DOI 10.1137/20M133957X
- Houman Owhadi, Bayesian numerical homogenization, Multiscale Model. Simul. 13 (2015), no. 3, 812–828. MR 3369060, DOI 10.1137/140974596
- Houman Owhadi, Multigrid with rough coefficients and multiresolution operator decomposition from hierarchical information games, SIAM Rev. 59 (2017), no. 1, 99–149. MR 3605827, DOI 10.1137/15M1013894
- Houman Owhadi and Clint Scovel, Operator-adapted wavelets, fast solvers, and numerical homogenization, Cambridge Monographs on Applied and Computational Mathematics, vol. 35, Cambridge University Press, Cambridge, 2019. From a game theoretic approach to numerical approximation and algorithm design. MR 3971243, DOI 10.1017/9781108594967
- Houman Owhadi and Gene Ryan Yoo, Kernel flows: from learning kernels from data into the abyss, J. Comput. Phys. 389 (2019), 22–47. MR 3934896, DOI 10.1016/j.jcp.2019.03.040
- M. Padidar, X. Zhu, L. Huang, J. Gardner, and D. Bindel, Scaling Gaussian processes with derivative information using variational inference, Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 6442–6453.
- Joaquin Quiñonero-Candela and Carl Edward Rasmussen, A unifying view of sparse approximate Gaussian process regression, J. Mach. Learn. Res. 6 (2005), 1939–1959. MR 2249877
- A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in Neural Information Processing Systems, vol. 20, 2007.
- M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys. 378 (2019), 686–707. MR 3881695, DOI 10.1016/j.jcp.2018.10.045
- Maziar Raissi, Paris Perdikaris, and George Em Karniadakis, Numerical Gaussian processes for time-dependent and nonlinear partial differential equations, SIAM J. Sci. Comput. 40 (2018), no. 1, A172–A198. MR 3747469, DOI 10.1137/17M1120762
- Lassi Roininen, Markku S. Lehtinen, Sari Lasanen, Mikko Orispää, and Markku Markkanen, Correlation priors, Inverse Probl. Imaging 5 (2011), no. 1, 167–184. MR 2773430, DOI 10.3934/ipi.2011.5.167
- Huiyan Sang and Jianhua Z. Huang, A full scale approximation of covariance functions for large spatial data sets, J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 (2012), no. 1, 111–132. MR 2885842, DOI 10.1111/j.1467-9868.2011.01007.x
- Daniel Sanz-Alonso and Ruiyi Yang, Finite element representations of Gaussian processes: balancing numerical and statistical accuracy, SIAM/ASA J. Uncertain. Quantif. 10 (2022), no. 4, 1323–1349. MR 4499223, DOI 10.1137/21M144788X
- Daniel Sanz-Alonso and Ruiyi Yang, The SPDE approach to Matérn fields: graph representations, Statist. Sci. 37 (2022), no. 4, 519–540. MR 4497230, DOI 10.1214/21-sts838
- Robert Schaback and Holger Wendland, Kernel techniques: from machine learning to meshless methods, Acta Numer. 15 (2006), 543–639. MR 2269747, DOI 10.1017/S0962492906270016
- Florian Schäfer, Matthias Katzfuss, and Houman Owhadi, Sparse Cholesky factorization by Kullback-Leibler minimization, SIAM J. Sci. Comput. 43 (2021), no. 3, A2019–A2046. MR 4267493, DOI 10.1137/20M1336254
- Florian Schäfer, T. J. Sullivan, and Houman Owhadi, Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity, Multiscale Model. Simul. 19 (2021), no. 2, 688–730. MR 4243658, DOI 10.1137/19M129526X
- B. Schölkopf, A. J. Smola, F. Bach, et al., Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2002.
- Michael L. Stein, The screening effect in kriging, Ann. Statist. 30 (2002), no. 1, 298–323. MR 1892665, DOI 10.1214/aos/1015362194
- Michael L. Stein, 2010 Rietz Lecture: When does the screening effect hold?, Ann. Statist. 39 (2011), no. 6, 2795–2819. MR 3012392, DOI 10.1214/11-AOS909
- A. V. Vecchia, Estimation and model identification for continuous spatial processes, J. Roy. Statist. Soc. Ser. B 50 (1988), no. 2, 297–312. MR 964183, DOI 10.1111/j.2517-6161.1988.tb01729.x
- Sifan Wang, Yujun Teng, and Paris Perdikaris, Understanding and mitigating gradient flow pathologies in physics-informed neural networks, SIAM J. Sci. Comput. 43 (2021), no. 5, A3055–A3081. MR 4309866, DOI 10.1137/20M1318043
- Sifan Wang, Xinling Yu, and Paris Perdikaris, When and why PINNs fail to train: a neural tangent kernel perspective, J. Comput. Phys. 449 (2022), Paper No. 110768, 28. MR 4337814, DOI 10.1016/j.jcp.2021.110768
- Holger Wendland, Scattered data approximation, Cambridge Monographs on Applied and Computational Mathematics, vol. 17, Cambridge University Press, Cambridge, 2005. MR 2131724
- C. Williams and M. Seeger, Using the Nyström method to speed up kernel machines, Advances in Neural Information Processing Systems, vol. 13, 2000.
- Carl Edward Rasmussen and Christopher K. I. Williams, Gaussian processes for machine learning, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, 2006. MR 2514435
- A. Wilson and H. Nickisch, Kernel interpolation for scalable structured Gaussian processes (KISS-GP), International Conference on Machine Learning, PMLR, 2015, pp. 1775–1784.
- A. G. Wilson, Z. Hu, R. Salakhutdinov, and E. P. Xing, Deep kernel learning, Artificial Intelligence and Statistics, PMLR, 2016, pp. 370–378.
- J. Wu, M. Poloczek, A. G. Wilson, and P. Frazier, Bayesian optimization with gradients, Advances in Neural Information Processing Systems, vol. 30, 2017.
- Ang Yang, Cheng Li, Santu Rana, Sunil Gupta, and Svetha Venkatesh, Sparse approximation for Gaussian process with derivative observations, AI 2018: Advances in artificial intelligence, Lecture Notes in Comput. Sci., vol. 11320, Springer, Cham, 2018, pp. 507–518. MR 3884119, DOI 10.1007/978-3-030-03991-2_{4}
- Q. Zeng, Y. Kothari, S. H. Bryngelson, and F. T. Schaefer, Competitive physics informed networks, The Eleventh International Conference on Learning Representations, 2023.
- X. Zhang, K. Z. Song, M. W. Lu, and X. Liu, Meshless methods based on collocation with radial basis functions, Comput. Mech. 26 (2000), 333–343.
Bibliographic Information
- Yifan Chen
- Affiliation: Courant Institute of Mathematical Sciences, New York University, New York, NY 10012
- MR Author ID: 1291574
- ORCID: 0000-0001-5494-4435
- Email: yifan.chen@nyu.edu
- Houman Owhadi
- Affiliation: Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91106
- MR Author ID: 695100
- ORCID: 0000-0002-5677-1600
- Email: owhadi@caltech.edu
- Florian Schäfer
- Affiliation: School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332
- ORCID: 0000-0002-4891-0172
- Email: florian.schaefer@cc.gatech.edu
- Received by editor(s): April 14, 2023
- Received by editor(s) in revised form: March 8, 2024
- Published electronically: June 14, 2024
- Additional Notes: The first and second authors were suported from the Air Force Office of Scientific Research under MURI award number FA9550-20-1-0358 (Machine Learning and Physics-Based Modeling and Simulation). The first author was partly supported by NSF Grants DMS-2205590. The second author was supported from the Department of Energy under award number DE-SC0023163 (SEA-CROGS: Scalable, Efficient and Accelerated Causal Reasoning Operators, Graphs and Spikes for Earth and Embedded Systems). The third author was supported from the Office of Naval Research under award number N00014-23-1-2545 (Untangling Computation).
The first author is the corresponding author - © Copyright 2024 American Mathematical Society
- Journal: Math. Comp.
- MSC (2020): Primary 65F08, 60G15, 65N75, 65M75, 65F50, 68W40
- DOI: https://doi.org/10.1090/mcom/3992