Communicated by Notices Associate Editor Richard Levine
Notions of means
The notion of means 10 is central to mathematics and statistics, and plays a key role in machine learning and data analytics. The three classical Pythagorean means of two positive reals $x$ and $y$ are the arithmetic (A), geometric (G), and harmonic (H) means, given respectively by
These Pythagorean means were originally geometrically studied to define proportions, and the harmonic mean led to a beautiful connection between mathematics and music. The Pythagorean means enjoy the following inequalities:
with equality if and only if $x=y$. These Pythagorean means belong to a broader parametric family of means, the power means $M_p(x,y)=\left(x^p+y^p\right)^\frac{1}{p}$ defined for $p\in \mathbb{R}\backslash \{0\}$. We have $A(x,y)=M_1(x,y)$,$H(x,y)=M_{-1}(x,y)$ and in the limits: $G(x,y)=\lim _{p\rightarrow 0} M_p(x,y)$,$\max (x,y)=\lim _{p\rightarrow +\infty } M_p(x,y)$, and $\min (x,y)=\lim _{p\rightarrow -\infty } M_p(x,y)$. Power means are also called binomial, Minkowski, or Hölder means in the literature.
There are many ways to define and axiomatize means with a rich literature 8. An important class of means are the quasi-arithmetic means induced by strictly increasing and differentiable real-valued functional generators $f(u)$:
Quasi-arithmetic means satisfy the in-betweenness property of means: $\min (x,y)\leq M_f(x,y)\leq \max (x,y)$, and are called so because $f(M_f(x,y))=\frac{f(x)+f(y)}{2}=A(f(x),f(y))$ is the arithmetic mean on the $f$-representation of numbers.
The power means are quasi-arithmetic means, $M_p=M_{f_p}$, obtained for the following continuous family of generators:
Power means are the only homogeneous quasi-arithmetic means, where a mean $M(x,y)$ is said to be homogeneous when $M(\lambda x,\lambda y)=\lambda M(x,y)$ for any $\lambda >0$.
Quasi-arithmetic means can also be defined for $n$-variable means (i.e., $M_f(x_1,\ldots ,x_n)=f^{-1}(\frac{1}{n}\sum _{i=1}^n f(x_i))$), and more generally for calculating expected values of random variables 10: We denote by $\mathbb{E}_f[X]=f^{-1}(\mathbb{E}[f(X)])$ the quasi-arithmetic expected value of a random variable $X$ induced by a strictly monotone and differentiable function $f(u)$. For example, the geometric and harmonic expected values of $X$ are defined by $\mathbb{E}^G[X]=\mathbb{E}_{\log x}[X]=\exp (\mathbb{E}[\log X])$ and $\mathbb{E}^H[X]=\mathbb{E}_{x^{-1}}[X]=\frac{1}{\mathbb{E}[1/X]}$, respectively. The ordinary expectation is recovered for $f(u)=u$:$\mathbb{E}^A[X]=\mathbb{E}_x[X]=\mathbb{E}[X]$. The quasi-arithmetic expected values satisfy a strong law of large numbers and a central limit theorem (10, Theorem 1): Let $X_1,\ldots , X_n$ be independent and identically distributed (i.i.d.) with finite variance $\mathbb{V}[f(X)]<\infty$ and derivative $f'(\mathbb{E}_f[X])\not =0$ at $x=\mathbb{E}_f[X]$. Then we have
as $n\rightarrow \infty$, where $N(\mu ,\sigma ^2)$ denotes a normal distribution of expectation $\mu$ and variance $\sigma ^2$.
Inductive means
An inductive mean is a mean defined as a limit of a convergence sequence of other means 15. The notion of inductive means defined as limits of sequences was pioneered independently by Lagrange and Gauss 7 who studied the following double sequence of iterations:
There is no closed-form formula for the AGM in terms of elementary functions as this induced mean is related to the complete elliptic integral of the first kind $K(\cdot )$7:
where $K(u)=\int _0^{\frac{\pi }{2}} \frac{\mathrm{d}\theta }{\sqrt {1-u^2\sin ^2(\theta )}}$ is the elliptic integral. The fast quadratic convergence 11 of the AGM iterations makes it computationally attractive, and the AGM iterations have been used to numerically calculate digits of $\pi$ or approximate the perimeters of ellipses among others 7.
Some inductive means admit closed-form formulas: For example, the arithmetic-harmonic mean $\mathrm{AHM}(x,y)$ obtained as the limit of the double sequence
are proven to converge quadratically 11 to $\mathrm{DS}_{M_1,M_2}(a_0,b_0)=\lim _{t\rightarrow \infty } a_t=\lim _{t\rightarrow \infty } b_t$(order-$2$ convergence).
Inductive means and matrix means
We have obtained so far three ways to get the geometric scalar mean $G(x,y)=\sqrt {xy}$ between positive reals $x$ and $y$:
1.
As an inductive mean with the arithmetic-harmonic double sequence: $G(x,y)=\mathrm{AHM}(x,y)$,
2.
As a quasi-arithmetic mean obtained for the generator $f(u)=\log u$:$G(x,y)=M_{\log }(x,y)$, and
3.
As the limit of power means: $G(x,y)= \lim _{p\rightarrow 0} M_p(x,y)$.
Let us now consider the geometric mean $G(X,Y)$ of two symmetric positive-definite (SPD) matrices $X$ and $Y$ of size $d\times d$. SPD matrices generalize positive reals. We shall investigate the three generalizations of the above approaches of the scalar geometric mean, and show that they yield different notions of matrix geometric means when $d>1$.
First, the AHM iterations can be extended to SPD matrices instead of reals:
where the matrix arithmetic mean is $A(X,Y)=\frac{X+Y}{2}$ and the matrix harmonic mean is $H(X,Y)=2(X^{-1}+Y^{-1})^{-1}$. The AHM iterations initialized with $A_0=X$ and $H_0=Y$ yield in the limit $t \rightarrow \infty$, the matrix arithmetic-harmonic mean 314 (AHM):
Remarkably, the matrix AHM enjoys quadratic convergence to the following SPD matrix:
$$\mathrm{AHM}(X,Y)=X^{\frac{1}{2}} (X^{-\frac{1}{2}} Y X^{-\frac{1}{2}})^{\frac{1}{2}} X^{\frac{1}{2}}=G(X,Y).$$
When $X=x$ and $Y=y$ are positive reals, we recover $G(X,Y)=\sqrt {xy}$. When $X=I$, the identity matrix, we get $G(I,Y)=Y^{\frac{1}{2}}=\sqrt {Y}$, the positive square root of SPD matrix $Y$. Thus the matrix AHM iterations provide a fast method in practice to numerically approximate matrix square roots by bypassing the matrix eigendecomposition. When matrices $X$ and $Y$ commute (i.e., $XY=YX$), we have $G(X,Y)=\sqrt {XY}$. The geometric mean $G(A,B)$ is proven to be the unique solution to the matrix Ricatti equation $XA^{-1}X=B$, is invariant under inversion (i.e., $G(A,B)=G(A^{-1},B^{-1})^{-1}$), and satisfies the determinant property $\det (G(A,B))=\sqrt {\det (A)\,\det (B)}$.
Let $\mathbb{P}$ denote the set of symmetric positive-definite $d\times d$ matrices. The matrix geometric mean can be interpreted using a Riemannian geometry 5 of the cone $\mathbb{P}$: Equip $\mathbb{P}$ with the trace metric tensor, i.e., a collection of smoothly varying inner products $g_P$ for $P\in \mathbb{P}$ defined by
where $S_1$ and $S_2$ are matrices belonging to the vector space of symmetric $d\times d$ matrices (i.e., $S_1$ and $S_2$ are geometrically vectors of the tangent plane $T_P$ of $P\in \mathbb{P}$). The geodesic length distance on the Riemannian manifold $(\mathbb{P},g)$ is
where $\lambda _i(M)$ denotes the $i$-th largest real eigenvalue of a symmetric matrix $M$,$\|\cdot \|_F$ denotes the Frobenius norm, and $\log P$ is the unique matrix logarithm of a SPD matrix $P$. Interestingly, the matrix geometric mean $G(X,Y)=\mathrm{AHM}(X,Y)$ can also be interpreted as the Riemannian center of mass of $X$ and $Y$:
This Riemannian least squares mean is also called the Cartan, Kärcher, or Fréchet mean in the literature. More generally, the Riemannian geodesic $\gamma (X,Y;t)=X\#_t Y$ between $X$ and $Y$ of $(\mathbb{P},g)$ for $t\in [0,1]$ is expressed using the weighted matrix geometric mean $G(X,Y;1-t,t)=X\#_t Y$ minimizing
$$(1-t) \rho ^2(X,P)+t\rho ^2(Y,P).$$
This Riemannian barycenter can be solved as
$$X\#_t Y= X^{\frac{1}{2}}\left(X^{-\frac{1}{2}} Y X^{-\frac{1}{2}}\right)^t X^{\frac{1}{2}},$$
with $G(X,Y)=X\#_{\frac{1}{2}} Y$,$X\#_t Y=Y\#_{1-t} X$, and $\rho (X\#_t Y,X)=t \rho (X,Y)$, i.e., $t$ is the arc length parameterization of the constant speed geodesic $\gamma (X,Y;t)$. When matrices $X$ and $Y$ commute, we have $X\#_t Y=X^{1-t}Y^t$. We thus interpret the matrix geometric mean $G(X,Y)=X\#Y=X\#_{\frac{1}{2}} Y$ as the Riemannian geodesic midpoint.
Second, let us consider the matrix geometric mean as the limit of matrix quasi-arithmetic power means which can be defined 13 as $Q_p(X,Y)=(X^p+Y^p)^{\frac{1}{p}}$ for $p\in \mathbb{R}, p\not =0$, with $Q_1(X,Y)=A(X,Y)$ and $Q_{-1}(X,Y)=H(X,Y)$. We get $\lim _{p\rightarrow 0} Q_p(X,Y)=\mathrm{LE}(X,Y)$, the log-Euclidean matrix mean defined by
where $\exp$ and $\log$ denote the matrix exponential and the matrix logarithm, respectively. We have $\mathrm{LE}(X,Y)\not =G(X,Y)$. Consider the Loewner partial order $\preceq$ on the cone $\mathbb{P}$:$P\preceq Q$ if and only if $Q-P$ is positive semi-definite. A mean $M(X,Y)$ is said operator monotone 5 if for $X'\preceq X$ and $Y'\preceq Y$, we have $M(X',Y')\preceq M(X,Y)$. The log-Euclidean mean $\mathrm{LE}(X,Y)$ is not operator monotone but the Riemannian geometric matrix mean $G(X,Y)$ is operator monotone.
Third, we can define matrix power means $M_p(X,Y)$ for $p\in (0,1]$ by uniquely solving the following matrix equation 13:
2$$\begin{equation} M=\frac{1}{2} M\#_p X + \frac{1}{2} M\#_p Y. \cssId{texmlid1}{\tag{2}} \end{equation}$$
Let $M_p(X,Y)=M$ denote the unique solution of Eq. 2. This equation is the matrix analogue of the scalar equation $m=\frac{1}{2} m^{1-p}x^p + \frac{1}{2} m^{1-p}y^p$ which can be solved as $m=\left(\frac{1}{2}x^p+\frac{1}{2}y^p\right)^{\frac{1}{p}}=M_p(x,y)$, i.e., the scalar $p$-power mean. In the limit case $p\rightarrow 0$, this matrix power mean $M_p$ yields the matrix geometric/Riemannian mean 13:
$$\lim _{p\rightarrow 0^+} M_p(X,Y)=G(X,Y).$$
In general, we get the following closed-form expression 13 of this matrix power mean for $p\in (0,1)$:
$$M_p(X,Y)= X \#_{\frac{1}{p}}\left(\frac{1}{2}X+ \frac{1}{2}(X \#_p Y)\right).$$
Inductive means, circumcenters, and medians of several matrices
To extend these various binary matrix means of two matrices to matrix means of $n$ matrices $P_1,\ldots , P_n$ of $\mathbb{P}$, we can use induction sequences 9. First, the $n$-variable matrix geometric mean $G(P_1,\ldots , P_n)$ can be defined as the unique Riemannian center of mass: