Differentiation of matrix functionals using triangular factorization

de Hoog, F.; Anderssen, R.; Lukas, M.

doi:10.1090/S0025-5718-2011-02451-8

Differentiation of matrix functionals using triangular factorization
HTML articles powered by AMS MathViewer

by F. R. de Hoog, R. S. Anderssen and M. A. Lukas PDF

Math. Comp. 80 (2011), 1585-1600 Request permission

Abstract:

In various applications, it is necessary to differentiate a matrix functional $w(\textbf {A}(\textbf {x}))$, where $\textbf {A}(\textbf {x})$ is a matrix depending on a parameter vector $\textbf {x}$. Usually, the functional itself can be readily computed from a triangular factorization of $\textbf {A}(\textbf {x})$. This paper develops several methods that also use the triangular factorization to efficiently evaluate the first and second derivatives of the functional. Both the full and sparse matrix situations are considered. There are similarities between these methods and algorithmic differentiation. However, the methodology developed here is explicit, leading to new algorithms. It is shown how the methods apply to several applications where the functional is a log determinant, including spline smoothing, covariance selection and restricted maximum likelihood.

References

Ronald Paul Barry and R. Kelley Pace, Monte Carlo estimates of the log determinant of large sparse matrices, Linear Algebra Appl. 289 (1999), no. 1-3, 41–54. Linear algebra and statistics (Istanbul, 1997). MR 1670972, DOI 10.1016/S0024-3795(97)10009-X
R. A. Berk. Statistical Learning from a Regression Perspective. Springer-Verlag, New York, 2008.
A. Dempster. Covariance selection. Biometrics, 28:57–175, 1972.
Jack J. Dongarra, Iain S. Duff, Danny C. Sorensen, and Henk A. van der Vorst, Numerical linear algebra for high-performance computers, Software, Environments, and Tools, vol. 7, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1998. MR 1654239, DOI 10.1137/1.9780898719611
A. M. Erisman and W. F. Tinney, On computing certain elements of the inverse of a sparse matrix, Comm. ACM 18 (1975), 177–179. MR 0378372, DOI 10.1145/360680.360704
Alan George and Joseph W. H. Liu, The evolution of the minimum degree ordering algorithm, SIAM Rev. 31 (1989), no. 1, 1–19. MR 986480, DOI 10.1137/1031001
Gene H. Golub and Robert J. Plemmons, Large-scale geodetic least-squares adjustment by dissection and orthogonal decomposition, Linear Algebra Appl. 34 (1980), 3–28. MR 591422, DOI 10.1016/0024-3795(80)90156-1
Andreas Griewank, Evaluating derivatives, Frontiers in Applied Mathematics, vol. 19, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2000. Principles and techniques of algorithmic differentiation. MR 1753583
David A. Harville, Matrix algebra from a statistician’s perspective, Springer-Verlag, New York, 1997. MR 1467237, DOI 10.1007/b98818
T. J. Hastie and R. J. Tibshirani, Generalized additive models, Monographs on Statistics and Applied Probability, vol. 43, Chapman and Hall, Ltd., London, 1990. MR 1082147
M. F. Hutchinson and F. R. de Hoog, Smoothing noisy data with spline functions, Numer. Math. 47 (1985), no. 1, 99–106. MR 797880, DOI 10.1007/BF01389878
K. Kubota, Matrix inversion algorithms by means of automatic differentiation, Appl. Math. Lett. 7 (1994), no. 4, 19–22. MR 1350388, DOI 10.1016/0893-9659(94)90004-3
Steffen L. Lauritzen, Graphical models, Oxford Statistical Science Series, vol. 17, The Clarendon Press, Oxford University Press, New York, 1996. Oxford Science Publications. MR 1419991
Mark A. Lukas, Robust generalized cross-validation for choosing the regularization parameter, Inverse Problems 22 (2006), no. 5, 1883–1902. MR 2261272, DOI 10.1088/0266-5611/22/5/021
Mark A. Lukas, Frank R. de Hoog, and Robert S. Anderssen, Efficient algorithms for robust generalized cross-validation spline smoothing, J. Comput. Appl. Math. 235 (2010), no. 1, 102–107. MR 2671036, DOI 10.1016/j.cam.2010.05.016
Harry M. Markowitz, The elimination form of the inverse and its application to linear programming, Management Sci. 3 (1957), 255–269. MR 112244, DOI 10.1287/mnsc.3.3.255
Stephen G. Nash, A survey of truncated-Newton methods, J. Comput. Appl. Math. 124 (2000), no. 1-2, 45–59. Numerical analysis 2000, Vol. IV, Optimization and nonlinear equations. MR 1803293, DOI 10.1016/S0377-0427(00)00426-X
H. Niessner and K. Reichert. On computing the inverse of a sparse matrix. Inter. J. Numer. Methods Eng., 19:1513–1526, 1983.
P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu. High-dimensional covariance estimation by minimizing $\ell _1$-penalized log-determinant divergence. IEEE, 2008.
Samuel David Silvey, Optimal design, Monographs on Applied Probability and Statistics, Chapman & Hall, London-New York, 1980. An introduction to the theory for parameter estimation. MR 606742
S. P. Smith. Differentiation of the Cholesky algorithm. J. Comp. Graph. Stat., 4:134–147, 1995.
K. Takahashi, J. Fagan, and M. S. Chin. Formation of a sparse bus impedance matrix and its application to short circuit study. Power Industry Computer Applications Conf. Proc. Minneapolis, Minn., 8:63–69, June 4-6, 1973.
Grace Wahba, Bayesian “confidence intervals” for the cross-validated smoothing spline, J. Roy. Statist. Soc. Ser. B 45 (1983), no. 1, 133–150. MR 701084
Frederick V. Waugh and Paul S. Dwyer, Compact computation of the inverse of a matrix, Ann. Math. Statistics 16 (1945), 259–271. MR 13919, DOI 10.1214/aoms/1177731089
Joe Whittaker, Graphical models in applied multivariate statistics, Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons, Ltd., Chichester, 1990. MR 1112133