Error bounds on complex floating-point multiplication with an FMA

Jeannerod, Claude-Pierre; Kornerup, Peter; Louvet, Nicolas; Muller, Jean-Michel

doi:10.1090/mcom/3123

Error bounds on complex floating-point multiplication with an FMA
HTML articles powered by AMS MathViewer

by Claude-Pierre Jeannerod, Peter Kornerup, Nicolas Louvet and Jean-Michel Muller PDF

Math. Comp. 86 (2017), 881-898 Request permission

Abstract:

The accuracy analysis of complex floating-point multiplication done by Brent, Percival, and Zimmermann [Math. Comp., 76:1469–1481, 2007] is extended to the case where a fused multiply-add (FMA) operation is available. Considering floating-point arithmetic with rounding to nearest and unit roundoff $u$, we show that their bound $\sqrt 5 u$ on the normwise relative error $|\widehat z/z-1|$ of a complex product $z$ can be decreased further to $2u$ when using the FMA in the most naive way. Furthermore, we prove that the term $2u$ is asymptotically optimal not only for this naive FMA-based algorithm but also for two other algorithms, which use the FMA operation as an efficient way of implementing rounding error compensation. Thus, although highly accurate in the componentwise sense, these two compensated algorithms bring no improvement to the normwise accuracy $2u$ already achieved using the FMA naively. Asymptotic optimality is established for each algorithm thanks to the explicit construction of floating-point inputs for which we prove that the normwise relative error then generated satisfies $|\widehat z/z-1| \to 2u$ as $u\to 0$. All our results hold for IEEE floating-point arithmetic, with radix $\beta$, precision $p$, and rounding to nearest; it is only assumed that underflows and overflows do not occur and that $\beta ^{p-1} \geqslant 24$.

References

M. Baudin, Error bounds of complex arithmetic, June 2011, available at http://forge.scilab.org/upload/compdiv/files/complexerrorbounds_v0.2.pdf.
Sylvie Boldo, Pitfalls of a full floating-point proof: example on the formal proof of the Veltkamp/Dekker algorithms, Automated reasoning, Lecture Notes in Comput. Sci., vol. 4130, Springer, Berlin, 2006, pp. 52–66. MR 2354672, DOI 10.1007/11814771_{6}
Richard Brent, Colin Percival, and Paul Zimmermann, Error bounds on complex floating-point multiplication, Math. Comp. 76 (2007), no. 259, 1469–1481. MR 2299783, DOI 10.1090/S0025-5718-07-01931-X
M. Cornea, J. Harrison, and P. T. P. Tang, Scientific Computing on Itanium^®-based Systems, Intel Press, Hillsboro, OR, USA, 2002.
T. J. Dekker, A floating-point technique for extending the available precision, Numer. Math. 18 (1971/72), 224–242. MR 299007, DOI 10.1007/BF01397083
Nicholas J. Higham, Accuracy and stability of numerical algorithms, 2nd ed., Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002. MR 1927606, DOI 10.1137/1.9780898718027
IEEE Computer Society, IEEE Standard for Floating-Point Arithmetic, IEEE Standard 754-2008, August 2008, available at http://ieeexplore.ieee.org/servlet/opac?punumber=4610933.
C.-P. Jeannerod, A radix-independent error analysis of the Cornea-Harrison-Tang method, ACM Trans. Math. Software 42 (2016), no. 3, Art. 19, 20 pp.
Claude-Pierre Jeannerod, Nicolas Louvet, and Jean-Michel Muller, Further analysis of Kahan’s algorithm for the accurate computation of $2\times 2$ determinants, Math. Comp. 82 (2013), no. 284, 2245–2264. MR 3073198, DOI 10.1090/S0025-5718-2013-02679-8
W. Kahan, Further remarks on reducing truncation errors, Communications of the ACM 8 (1965), no. 1, 40.
Seppo Linnainmaa, Analysis of some known methods of improving the accuracy of floating-point sums, Nordisk Tidskr. Informationsbehandling (BIT) 14 (1974), 167–202. MR 483373, DOI 10.1007/bf01932946
Seppo Linnainmaa, Software for doubled-precision floating-point computations, ACM Trans. Math. Software 7 (1981), no. 3, 272–283. MR 630437, DOI 10.1145/355958.355960
Ole Møller, Quasi double-precision in floating point addition, Nordisk Tidskr. Informationsbehandling (BIT) 5 (1965), 37–50. MR 181130, DOI 10.1007/bf01937505
O. Møller, Note on quasi double-precision, Nordisk Tidskr. Informationsbehandling (BIT) 5 (1965), 251–255.
Jean-Michel Muller, On the error of computing $ab+cd$ using Cornea, Harrison and Tang’s method, ACM Trans. Math. Software 41 (2015), no. 2, Art. 7, 8. MR 3318079, DOI 10.1145/2629615
Jean-Michel Muller, Nicolas Brisebarre, Florent de Dinechin, Claude-Pierre Jeannerod, Vincent Lefèvre, Guillaume Melquiond, Nathalie Revol, Damien Stehlé, and Serge Torres, Handbook of floating-point arithmetic, Birkhäuser Boston, Ltd., Boston, MA, 2010. MR 2568265, DOI 10.1007/978-0-8176-4705-6
M. Pichat, Correction d’une somme en arithmétique à virgule flottante, Numer. Math. 19 (1972), 400–406 (French, with English summary). MR 324892, DOI 10.1007/BF01404922
M. Pichat, Contributions à l’étude des erreurs d’arrondi en arithmétique à virgule flottante, Ph.D. thesis, Université Scientifique et Médicale de Grenoble, Grenoble, France, 1976.