Skip to Main Content

an Inductive Mean?

Frank Nielsen

Communicated by Notices Associate Editor Richard Levine

Notions of means

The notion of means 10 is central to mathematics and statistics, and plays a key role in machine learning and data analytics. The three classical Pythagorean means of two positive reals and are the arithmetic (A), geometric (G), and harmonic (H) means, given respectively by

These Pythagorean means were originally geometrically studied to define proportions, and the harmonic mean led to a beautiful connection between mathematics and music. The Pythagorean means enjoy the following inequalities:

with equality if and only if . These Pythagorean means belong to a broader parametric family of means, the power means defined for . We have , and in the limits: , , and . Power means are also called binomial, Minkowski, or Hölder means in the literature.

There are many ways to define and axiomatize means with a rich literature 8. An important class of means are the quasi-arithmetic means induced by strictly increasing and differentiable real-valued functional generators :

Quasi-arithmetic means satisfy the in-betweenness property of means: , and are called so because is the arithmetic mean on the -representation of numbers.

The power means are quasi-arithmetic means, , obtained for the following continuous family of generators:

Power means are the only homogeneous quasi-arithmetic means, where a mean is said to be homogeneous when for any .

Quasi-arithmetic means can also be defined for -variable means (i.e., ), and more generally for calculating expected values of random variables 10: We denote by the quasi-arithmetic expected value of a random variable induced by a strictly monotone and differentiable function . For example, the geometric and harmonic expected values of are defined by and , respectively. The ordinary expectation is recovered for : . The quasi-arithmetic expected values satisfy a strong law of large numbers and a central limit theorem (10, Theorem 1): Let be independent and identically distributed (i.i.d.) with finite variance and derivative at . Then we have

as , where denotes a normal distribution of expectation and variance .

Inductive means

An inductive mean is a mean defined as a limit of a convergence sequence of other means 15. The notion of inductive means defined as limits of sequences was pioneered independently by Lagrange and Gauss 7 who studied the following double sequence of iterations:

initialized with and . We have

where the homogeneous arithmetic-geometric mean (AGM) is obtained in the limit:

There is no closed-form formula for the AGM in terms of elementary functions as this induced mean is related to the complete elliptic integral of the first kind 7:

where is the elliptic integral. The fast quadratic convergence 11 of the AGM iterations makes it computationally attractive, and the AGM iterations have been used to numerically calculate digits of or approximate the perimeters of ellipses among others 7.

Some inductive means admit closed-form formulas: For example, the arithmetic-harmonic mean obtained as the limit of the double sequence

initialized with and converges to the geometric mean:

In general, inductive means defined as the limits of double sequences with respect to two smooth symmetric means and :

are proven to converge quadratically 11 to (order- convergence).

Inductive means and matrix means

We have obtained so far three ways to get the geometric scalar mean between positive reals and :

1.

As an inductive mean with the arithmetic-harmonic double sequence: ,

2.

As a quasi-arithmetic mean obtained for the generator : , and

3.

As the limit of power means: .

Let us now consider the geometric mean of two symmetric positive-definite (SPD) matrices and of size . SPD matrices generalize positive reals. We shall investigate the three generalizations of the above approaches of the scalar geometric mean, and show that they yield different notions of matrix geometric means when .

First, the AHM iterations can be extended to SPD matrices instead of reals:

where the matrix arithmetic mean is and the matrix harmonic mean is . The AHM iterations initialized with and yield in the limit , the matrix arithmetic-harmonic mean 314 (AHM):

Remarkably, the matrix AHM enjoys quadratic convergence to the following SPD matrix:

When and are positive reals, we recover . When , the identity matrix, we get , the positive square root of SPD matrix . Thus the matrix AHM iterations provide a fast method in practice to numerically approximate matrix square roots by bypassing the matrix eigendecomposition. When matrices and commute (i.e., ), we have . The geometric mean is proven to be the unique solution to the matrix Ricatti equation , is invariant under inversion (i.e., ), and satisfies the determinant property .

Let denote the set of symmetric positive-definite matrices. The matrix geometric mean can be interpreted using a Riemannian geometry 5 of the cone : Equip with the trace metric tensor, i.e., a collection of smoothly varying inner products for defined by

where and are matrices belonging to the vector space of symmetric matrices (i.e., and are geometrically vectors of the tangent plane of ). The geodesic length distance on the Riemannian manifold is

where denotes the -th largest real eigenvalue of a symmetric matrix , denotes the Frobenius norm, and is the unique matrix logarithm of a SPD matrix . Interestingly, the matrix geometric mean can also be interpreted as the Riemannian center of mass of and :

This Riemannian least squares mean is also called the Cartan, Kärcher, or Fréchet mean in the literature. More generally, the Riemannian geodesic between and of for is expressed using the weighted matrix geometric mean minimizing

This Riemannian barycenter can be solved as

with , , and , i.e., is the arc length parameterization of the constant speed geodesic . When matrices and commute, we have . We thus interpret the matrix geometric mean as the Riemannian geodesic midpoint.

Second, let us consider the matrix geometric mean as the limit of matrix quasi-arithmetic power means which can be defined 13 as for , with and . We get , the log-Euclidean matrix mean defined by

where and denote the matrix exponential and the matrix logarithm, respectively. We have . Consider the Loewner partial order on the cone : if and only if is positive semi-definite. A mean is said operator monotone 5 if for and , we have . The log-Euclidean mean is not operator monotone but the Riemannian geometric matrix mean is operator monotone.

Third, we can define matrix power means for by uniquely solving the following matrix equation 13:

Let denote the unique solution of Eq. 2. This equation is the matrix analogue of the scalar equation which can be solved as , i.e., the scalar -power mean. In the limit case , this matrix power mean yields the matrix geometric/Riemannian mean 13:

In general, we get the following closed-form expression 13 of this matrix power mean for :

Inductive means, circumcenters, and medians of several matrices

To extend these various binary matrix means of two matrices to matrix means of matrices of , we can use induction sequences 9. First, the -variable matrix geometric mean can be defined as the unique Riemannian center of mass: