Skip to Main Content

Mathematics of Computation

Published by the American Mathematical Society since 1960 (published as Mathematical Tables and other Aids to Computation 1943-1959), Mathematics of Computation is devoted to research articles of the highest quality in computational mathematics.

ISSN 1088-6842 (online) ISSN 0025-5718 (print)

The 2020 MCQ for Mathematics of Computation is 1.78.

What is MCQ? The Mathematical Citation Quotient (MCQ) measures journal impact by looking at citations over a five-year period. Subscribers to MathSciNet may click through for more detailed information.

 

Stochastic subspace correction methods and fault tolerance
HTML articles powered by AMS MathViewer

by Michael Griebel and Peter Oswald HTML | PDF
Math. Comp. 89 (2020), 279-312 Request permission

Abstract:

We present convergence results in expectation for stochastic subspace correction schemes and their accelerated versions to solve symmetric positive-definite variational problems, and discuss their potential for achieving fault tolerance in an unreliable compute network. We employ an overlapping domain decomposition algorithm for PDE discretizations to discuss the latter aspect.
References
  • E. Agullo, L. Giraud, and M. Zounon, On the resilience of a parallel hybrid solver, In IEEE 22nd International Conference on High Performance Computing (HiPC), 2015, 75–84; also RR-8744, INRIA Bordeaux 2015 (hal-01165186v2).
  • Mark Ainsworth and Christian Glusa, Is the multigrid method fault tolerant? The two-grid case, SIAM J. Sci. Comput. 39 (2017), no. 2, C116–C143. MR 3633785, DOI 10.1137/16M1100691
  • Mark Ainsworth and Christian Glusa, Is the multigrid method fault tolerant? The multilevel case, SIAM J. Sci. Comput. 39 (2017), no. 6, C393–C416. MR 3719032, DOI 10.1137/16M1097274
  • M. Altenbernd and D. Göddeke, Soft fault detection and correction for multigrid, Int. J. High Perf. Comput. Appl., vol. 32:6 (2017), 897–912. DOI: 10.1177/1094342016684006
  • Randolph E. Bank and Michael Holst, A new paradigm for parallel adaptive meshing algorithms, SIAM Rev. 45 (2003), no. 2, 291–323. Reprinted from SIAM J. Sci. Comput. 22 (2000), no. 4, 1411–1443 [MR1797889]. MR 2010380, DOI 10.1137/S003614450342061
  • Olivier Fercoq and Peter Richtárik, Optimization in high dimensions via accelerated, parallel, and proximal coordinate descent, SIAM Rev. 58 (2016), no. 4, 739–771. MR 3567606, DOI 10.1137/16M1085905
  • Z. Chen and J. Dongarra, Algorithm-based fault tolerance for fail-stop failures, IEEE Transactions on Parallel and Distributed Systems 19:12 (2008), 1628–1641.
  • Tao Cui, Jinchao Xu, and Chen-Song Zhang, An error-resilient redundant subspace correction method, Comput. Vis. Sci. 18 (2017), no. 2-3, 65–77. MR 3603807, DOI 10.1007/s00791-016-0270-6
  • Aymeric Dieuleveut and Francis Bach, Nonparametric stochastic approximation with large step-sizes, Ann. Statist. 44 (2016), no. 4, 1363–1399. MR 3519927, DOI 10.1214/15-AOS1391
  • Michael Griebel and Peter Oswald, Greedy and randomized versions of the multiplicative Schwarz method, Linear Algebra Appl. 437 (2012), no. 7, 1596–1610. MR 2946344, DOI 10.1016/j.laa.2012.04.052
  • Michael Griebel and Peter Oswald, Stochastic subspace correction in Hilbert space, Constr. Approx. 48 (2018), no. 3, 501–521. MR 3869450, DOI 10.1007/s00365-018-9447-1
  • M. Griebel and P. Oswald, Stochastic subspace correction methods and fault tolerance, INS Preprint No. 1809, Bonn Univ., 2018; arXiv:1807.11315 (July 30, 2018).
  • Michael Griebel, Michael Schneider, and Christoph Zenger, A combination technique for the solution of sparse grid problems, Iterative methods in linear algebra (Brussels, 1991) North-Holland, Amsterdam, 1992, pp. 263–281. MR 1159736
  • Michael Griebel and Gerhard Zumbusch, Parallel adaptive subspace correction schemes with applications to elasticity, Comput. Methods Appl. Mech. Engrg. 184 (2000), no. 2-4, 303–332. Vistas in domain decomposition and parallel processing in computational mechanics. MR 1764193, DOI 10.1016/S0045-7825(99)00233-9
  • Markus Huber, Björn Gmeiner, Ulrich Rüde, and Barbara Wohlmuth, Resilience for massively parallel multigrid solvers, SIAM J. Sci. Comput. 38 (2016), no. 5, S217–S239. MR 3565560, DOI 10.1137/15M1026122
  • S. Kavila, P. Raju, S. Satapathy, A. Machiraju, G. Kinnera, and K. Rasly, A survey on fault management techniques in distributed computing, In: Proc. of Int. Conf. on Front. of Intell. Comput., AISC 199 S. Satapathy et al. (Eds.), 2013, pp. 593–602, DOI: 10.1007/978364235314767
  • David E. Keyes and William D. Gropp, A comparison of domain decomposition techniques for elliptic partial differential equations and their parallel implementation, SIAM J. Sci. Statist. Comput. 8 (1987), no. 2, S166–S202. Parallel processing for scientific computing (Norfolk, Va., 1985). MR 879402, DOI 10.1137/0908020
  • Yin Tat Lee and Aaron Sidford, Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science—FOCS 2013, IEEE Computer Soc., Los Alamitos, CA, 2013, pp. 147–156. MR 3246216, DOI 10.1109/FOCS.2013.24
  • Ji Liu and Stephen J. Wright, An accelerated randomized Kaczmarz algorithm, Math. Comp. 85 (2016), no. 297, 153–178. MR 3404446, DOI 10.1090/mcom/2971
  • Yurii Nesterov, Introductory lectures on convex optimization, Applied Optimization, vol. 87, Kluwer Academic Publishers, Boston, MA, 2004. A basic course. MR 2142598, DOI 10.1007/978-1-4419-8853-9
  • Yu. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim. 22 (2012), no. 2, 341–362. MR 2968857, DOI 10.1137/100802001
  • Peter Oswald, Multilevel finite element approximation, Teubner Skripten zur Numerik. [Teubner Scripts on Numerical Mathematics], B. G. Teubner, Stuttgart, 1994. Theory and applications. MR 1312165, DOI 10.1007/978-3-322-91215-2
  • Peter Oswald and Weiqi Zhou, Convergence analysis for Kaczmarz-type methods in a Hilbert space framework, Linear Algebra Appl. 478 (2015), 131–161. MR 3342418, DOI 10.1016/j.laa.2015.03.028
  • S. Pauli, P. Arbenz, and C. Schwab, Intrinsic fault tolerance of multi-level Monte Carlo methods, In Parallel Computing: Accelerating Computational Science and Engineering (CSE), M. Bader et al. (Eds.), IOS Press, 2014, pp. 471–480.
  • Boris T. Polyak, Introduction to optimization, Translations Series in Mathematics and Engineering, Optimization Software, Inc., Publications Division, New York, 1987. Translated from the Russian; With a foreword by Dimitri P. Bertsekas. MR 1099605
  • P. Richtárik and M. Takǎć, Stochastic reformulations of linear systems: Algorithms and convergence theory, arxiv:1706.01109v2 (6 Jun 2017).
  • F. Rizzi, K. Morris, K. Sargsyan, P. Mycek, C. Safta, O. Le Maître, O. Knio, and B. Debusschere, Partial differential equations preconditioner resilient to soft and hard faults, Int. J. High Perf. Comput. Appl. 32:5 (2017), 658–673, DOI: 10.1177/1094342016684975
  • Khachik Sargsyan, Francesco Rizzi, Paul Mycek, Cosmin Safta, Karla Morris, Habib Najm, Olivier Le Maître, Omar Knio, and Bert Debusschere, Fault resilient domain decomposition preconditioner for PDEs, SIAM J. Sci. Comput. 37 (2015), no. 5, A2317–A2345. MR 3395134, DOI 10.1137/15M1014474
  • Barry F. Smith, A parallel implementation of an iterative substructuring algorithm for problems in three dimensions, SIAM J. Sci. Comput. 14 (1993), no. 2, 406–423. MR 1204238, DOI 10.1137/0914025
  • M. Snir, R. Wisniewski et al., Addressing failures in exascale computing, Int. J. High Perf. Comput. Appl. 28:2 (2014), 129–173, DOI: 10.1177/1094342014522573
  • L. Stals, Algorithm-based fault recovery of adaptively refined parallel multilevel grids, Int. J. High Perf. Comput. Appl. 33:1 (2017), 189–211, DOI: 10.1177/1094342017720801
  • Miroslav Stoyanov and Clayton Webster, Numerical analysis of fixed point algorithms in the presence of hardware faults, SIAM J. Sci. Comput. 37 (2015), no. 5, C532–C553. MR 3394363, DOI 10.1137/140991406
  • Andrea Toselli and Olof Widlund, Domain decomposition methods—algorithms and theory, Springer Series in Computational Mathematics, vol. 34, Springer-Verlag, Berlin, 2005. MR 2104179, DOI 10.1007/b137868
  • M. Treaster, A survey of fault-tolerance and fault-recovery techniques in parallel systems, ACM Computing Research Repository (CoRR 501002, 1–11) (2005); arxiv:0501002v1.
  • Jinchao Xu, Iterative methods by space decomposition and subspace correction, SIAM Rev. 34 (1992), no. 4, 581–613. MR 1193013, DOI 10.1137/1034116
Similar Articles
Additional Information
  • Michael Griebel
  • Affiliation: Institute for Numerical Simulation, Universität Bonn, Endenicher Allee 19b, 53115 Bonn, Germany; and Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754 Sankt Augustin, Germany, Corresponding author, tel.: +49-228-73-69829, fax: +49-228-73-69847
  • MR Author ID: 270664
  • Email: griebel@ins.uni-bonn.de
  • Peter Oswald
  • Affiliation: Institute for Numerical Simulation, Universität Bonn, Endenicher Allee 19b, 53115 Bonn, Germany
  • Email: agp.oswald@gmail.com
  • Received by editor(s): July 16, 2018
  • Received by editor(s) in revised form: March 8, 2019
  • Published electronically: August 5, 2019
  • Additional Notes: The first author acknowledges the support from the DFG priority program 1648 “Software for Exascale Computing” within the project “EXAHD - An Exa-Scalable Two-Level Sparse Grid Approach for Higher-Dimensional Problems in Plasma Physics and Beyond”.
    The main results of this paper were obtained during a yearlong stay of the second author at the Institute for Numerical Simulation (INS) sponsored by the Hausdorff Center for Mathematics of the University of Bonn and funded by the Deutsche Forschungsgemeinschaft. He is grateful for this support.
  • © Copyright 2019 American Mathematical Society
  • Journal: Math. Comp. 89 (2020), 279-312
  • MSC (2010): Primary 65F10, 65N22, 65N55, 65Y05, 68W20
  • DOI: https://doi.org/10.1090/mcom/3459
  • MathSciNet review: 4011543