Error bounds on complex floating-point multiplication with an FMA
HTML articles powered by AMS MathViewer
- by Claude-Pierre Jeannerod, Peter Kornerup, Nicolas Louvet and Jean-Michel Muller;
- Math. Comp. 86 (2017), 881-898
- DOI: https://doi.org/10.1090/mcom/3123
- Published electronically: July 15, 2016
- PDF | Request permission
Abstract:
The accuracy analysis of complex floating-point multiplication done by Brent, Percival, and Zimmermann [Math. Comp., 76:1469–1481, 2007] is extended to the case where a fused multiply-add (FMA) operation is available. Considering floating-point arithmetic with rounding to nearest and unit roundoff $u$, we show that their bound $\sqrt 5 u$ on the normwise relative error $|\widehat z/z-1|$ of a complex product $z$ can be decreased further to $2u$ when using the FMA in the most naive way. Furthermore, we prove that the term $2u$ is asymptotically optimal not only for this naive FMA-based algorithm but also for two other algorithms, which use the FMA operation as an efficient way of implementing rounding error compensation. Thus, although highly accurate in the componentwise sense, these two compensated algorithms bring no improvement to the normwise accuracy $2u$ already achieved using the FMA naively. Asymptotic optimality is established for each algorithm thanks to the explicit construction of floating-point inputs for which we prove that the normwise relative error then generated satisfies $|\widehat z/z-1| \to 2u$ as $u\to 0$. All our results hold for IEEE floating-point arithmetic, with radix $\beta$, precision $p$, and rounding to nearest; it is only assumed that underflows and overflows do not occur and that $\beta ^{p-1} \geqslant 24$.References
- M. Baudin, Error bounds of complex arithmetic, June 2011, available at http://forge.scilab.org/upload/compdiv/files/complexerrorbounds_v0.2.pdf.
- Sylvie Boldo, Pitfalls of a full floating-point proof: example on the formal proof of the Veltkamp/Dekker algorithms, Automated reasoning, Lecture Notes in Comput. Sci., vol. 4130, Springer, Berlin, 2006, pp. 52–66. MR 2354672, DOI 10.1007/11814771_{6}
- Richard Brent, Colin Percival, and Paul Zimmermann, Error bounds on complex floating-point multiplication, Math. Comp. 76 (2007), no. 259, 1469–1481. MR 2299783, DOI 10.1090/S0025-5718-07-01931-X
- M. Cornea, J. Harrison, and P. T. P. Tang, Scientific Computing on Itanium®-based Systems, Intel Press, Hillsboro, OR, USA, 2002.
- T. J. Dekker, A floating-point technique for extending the available precision, Numer. Math. 18 (1971/72), 224–242. MR 299007, DOI 10.1007/BF01397083
- Nicholas J. Higham, Accuracy and stability of numerical algorithms, 2nd ed., Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002. MR 1927606, DOI 10.1137/1.9780898718027
- IEEE Computer Society, IEEE Standard for Floating-Point Arithmetic, IEEE Standard 754-2008, August 2008, available at http://ieeexplore.ieee.org/servlet/opac?punumber=4610933.
- C.-P. Jeannerod, A radix-independent error analysis of the Cornea-Harrison-Tang method, ACM Trans. Math. Software 42 (2016), no. 3, Art. 19, 20 pp.
- Claude-Pierre Jeannerod, Nicolas Louvet, and Jean-Michel Muller, Further analysis of Kahan’s algorithm for the accurate computation of $2\times 2$ determinants, Math. Comp. 82 (2013), no. 284, 2245–2264. MR 3073198, DOI 10.1090/S0025-5718-2013-02679-8
- W. Kahan, Further remarks on reducing truncation errors, Communications of the ACM 8 (1965), no. 1, 40.
- Seppo Linnainmaa, Analysis of some known methods of improving the accuracy of floating-point sums, Nordisk Tidskr. Informationsbehandling (BIT) 14 (1974), 167–202. MR 483373, DOI 10.1007/bf01932946
- Seppo Linnainmaa, Software for doubled-precision floating-point computations, ACM Trans. Math. Software 7 (1981), no. 3, 272–283. MR 630437, DOI 10.1145/355958.355960
- Ole Møller, Quasi double-precision in floating point addition, Nordisk Tidskr. Informationsbehandling (BIT) 5 (1965), 37–50. MR 181130, DOI 10.1007/bf01937505
- O. Møller, Note on quasi double-precision, Nordisk Tidskr. Informationsbehandling (BIT) 5 (1965), 251–255.
- Jean-Michel Muller, On the error of computing $ab+cd$ using Cornea, Harrison and Tang’s method, ACM Trans. Math. Software 41 (2015), no. 2, Art. 7, 8. MR 3318079, DOI 10.1145/2629615
- Jean-Michel Muller, Nicolas Brisebarre, Florent de Dinechin, Claude-Pierre Jeannerod, Vincent Lefèvre, Guillaume Melquiond, Nathalie Revol, Damien Stehlé, and Serge Torres, Handbook of floating-point arithmetic, Birkhäuser Boston, Ltd., Boston, MA, 2010. MR 2568265, DOI 10.1007/978-0-8176-4705-6
- M. Pichat, Correction d’une somme en arithmétique à virgule flottante, Numer. Math. 19 (1972), 400–406 (French, with English summary). MR 324892, DOI 10.1007/BF01404922
- M. Pichat, Contributions à l’étude des erreurs d’arrondi en arithmétique à virgule flottante, Ph.D. thesis, Université Scientifique et Médicale de Grenoble, Grenoble, France, 1976.
Bibliographic Information
- Claude-Pierre Jeannerod
- Affiliation: Inria, Laboratoire LIP (CNRS, ENS de Lyon, Inria, UCBL), Université de Lyon, 46, allée d’Italie, 69364 Lyon cedex 07, France
- MR Author ID: 644190
- Email: claude-pierre.jeannerod@inria.fr
- Peter Kornerup
- Affiliation: Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark
- Email: kornerup@imada.sdu.dk
- Nicolas Louvet
- Affiliation: UCBL, Laboratoire LIP (CNRS, ENS de Lyon, Inria, UCBL), Université de Lyon, 46, allée d’Italie, 69364 Lyon cedex 07, France
- MR Author ID: 893389
- Email: nicolas.louvet@ens-lyon.fr
- Jean-Michel Muller
- Affiliation: CNRS, Laboratoire LIP (CNRS, ENS de Lyon, Inria, UCBL), Université de Lyon, 46, allée d’Italie, 69364 Lyon cedex 07, France
- Email: jean-michel.muller@ens-lyon.fr
- Received by editor(s): September 26, 2013
- Received by editor(s) in revised form: July 25, 2014, May 15, 2015, and September 28, 2015
- Published electronically: July 15, 2016
- © Copyright 2016 American Mathematical Society
- Journal: Math. Comp. 86 (2017), 881-898
- MSC (2010): Primary 65G50
- DOI: https://doi.org/10.1090/mcom/3123
- MathSciNet review: 3584553