Three-point bounds for energy minimization

Three-point semidefinite programming bounds are one of the most powerful known tools for bounding the size of spherical codes. In this paper, we use them to prove lower bounds for the potential energy of particles interacting via a pair potential function. We show that our bounds are sharp for seven points in RP^2. Specifically, we prove that the seven lines connecting opposite vertices of a cube and of its dual octahedron are universally optimal. (In other words, among all configurations of seven lines through the origin, this one minimizes energy for all potential functions that are completely monotonic functions of squared chordal distance.) This configuration is the only known universal optimum that is not distance regular, and the last remaining universal optimum in RP^2. We also give a new derivation of semidefinite programming bounds and present several surprising conjectures about them.


Introduction
Consider the seven lines connecting opposite vertices of a cube and of its dual octahedron.Although the symmetry group does not act transitively on the lines, they are exceedingly well distributed within RP 2 .In this paper, we prove that they form a universally optimal configuration; in other words, they minimize a wide variety of natural notions of energy.Universal optima are rare, and we show that this configuration is the largest universal optimum in RP 2 .
To prove universal optimality, we use semidefinite programming bounds, which are a powerful technique for proving bounds in coding theory.We give a new derivation of these bounds and extend them from coding to energy minimization.Proving universal optimality involves challenges that have not arisen in previous applications of semidefinite programming bounds, and we provide a general methodology for meeting these challenges.Furthermore, we conjecture that in certain other cases our bounds remain sharp throughout a phase transition between two different ground states, which would be a remarkable phenomenon.We have not yet been able to prove these conjectures, but the techniques we introduce here represent the first steps in a program to do so.
1.1.Background.How can one distribute N points as uniformly as possible in a compact metric space X with metric d?There are many possible answers, such as forming a good error-correcting code, i.e., maximizing the distance between the closest points.One particularly important family of answers generalizing coding theory is given by potential energy minimization.Given a decreasing, continuous function f : 0, max x,y∈X d(x, y) 2 → R called the potential function, define the energy of a finite configuration C ⊆ X by E f (C) = 1 2 x,y∈C x =y f d(x, y) 2 .
Note that the use of squared distance instead of distance is not standard in physics, but it is mathematically convenient (and of course there is no loss of generality).
We wish to choose C so as to minimize E f (C) subject to |C| = N .Because f is decreasing, this amounts to moving the points as far apart as possible, but all the distances matter, not just the minimal distance.This energy minimization problem arises naturally in physics, as the problem of determining the ground state of a classical particle system with isotropic pair interactions.Even in the case of particles confined to two-dimensional surfaces, these models capture important features of many real-world materials, such as colloidal particles adsorbing to the surface of a droplet (see [BG09] for details and other examples).
Small systems frequently display beautiful symmetry, such as twelve particles on a sphere forming the vertices of an icosahedron, but in larger systems the symmetry is often broken by the appearance of defects.One might expect that these defects would occur only in local minima for energy, and that the global minimum would have a perfect crystalline structure.Sometimes that is the case, but in many systems the defects actually contribute to lowering energy.(See for example the discussion of spherical crystals in Section 3 of [BG09].)One of the fundamental problems in this area is understanding when highly symmetrical configurations are optimal and why [C10].Even when the answer is easy to guess, it is usually not easy to prove, and guessing the answer can itself be tricky.
Energy minimization can also be viewed as a generalization of coding theory.Specifically, it includes coding theory as a degenerate special case: if we take f (r) = 1/r s , then as s → ∞, the problem of minimizing E f (C) turns into the problem of maximizing the minimal distance in C. The same problem of understanding symmetry occurs here as well: when should one expect an optimal code to be highly symmetrical?Large or high-dimensional codes are frequently much less highly structured than small codes.
Linear programming bounds are a powerful technique for proving lower bounds for energy.In the simplest cases, they deal with the distance distribution of the configuration (in physics terms, the radial pair correlation function), which counts how many times each distance occurs between pairs of points.In other words, the distance distribution of C is the function δ : [0, ∞) → R defined by δ(r) = |{x, y ∈ C : d(x, y) = r}| .
Clearly δ(0) = |C|, δ(r) ≥ 0 for all r, and r δ(r) = |C| 2 (note that only finitely many terms are nonzero), but we will see that these are far from the only constraints on δ.The distance distribution plays a key role because energy is a linear function of δ: By contrast, the underlying configuration C is not always determined by δ (see [P44]).
Delsarte [D72] realized that in addition to the obvious constraints on δ mentioned above, there are often many other linear constraints.Using these constraints, one can derive bounds on energy via linear programming, because one is optimizing a linear function of δ subject to linear constraints.This was first carried out for energy minimization by Yudin [Y92].We refer to them as two-point bounds because they depend only on the distances between pairs of points.Linear programming bounds are generally not sharp, or even close to sharp, but in certain cases they are unexpectedly powerful.For example, Cohn and Kumar [CK07] used linear programming bounds to prove that a number of exceptional structures in spheres or projective spaces are universally optimal.In other words, these configurations minimize energy for all completely monotonic functions of squared Euclidean distance on spheres or squared chordal distance in projective space.(Recall that a completely monotonic function is a smooth, nonnegative function whose derivatives alternate in sign: it is decreasing, convex, etc.For example, inverse power laws are completely monotonic.)Examples include the vertices of any regular polytope with simplicial facets, the E 8 root system, or the minimal vectors in the Leech lattice.
It is surprising that linear programming bounds are ever sharp, because when they are sharp, pair distance information alone suffices to identify the true ground state.As one might expect, that is very rarely the case.To prove stronger bounds, it is natural to try to take into account triples as well as pairs (i.e., how many times each triangle of distances occurs between three points), and Schrijver [S05] found the right approach.The constraints are no longer linear, but rather semidefinite.Bachoc and Vallentin [BV08] developed a representation-theoretic explanation and extended the method from binary codes to spheres, with an approach that applies also to more general spaces (and this method was further developed by Musin [M07]).These three-point semidefinite programming bounds are one of the most powerful general tools known for proving bounds in coding theory.Using them, Bachoc and Vallentin [BV09b] determined the optimal 10-point code in S 3 , which was the first new optimality proof for a spherical code in decades.They also conjectured that the bounds are sharp for the optimal 8-point code in S 2 (a square antiprism, first proved optimal by Schütte and van der Waerden [SW51], so there was less motivation to verify the sharpness in this case).
1.2.Our results.In this paper, we give a new derivation of the semidefinite programming bounds, which has the advantage of requiring no explicit calculation beyond what is necessary for the linear programming bounds.We then prove semidefinite bounds for potential energy minimization.Using three-point bounds, we prove universal optimality for a seven-point code in RP 2 .It is given by the seven lines through opposite vertices of a cube and its dual octahedron.Equivalently, the lines connect opposite vertices of a rhombic dodecahedron.In the dual picture of planes through the origin in R 3 , the planes are parallel to the facets of a cuboctahedron (the dual polyhedron to the rhombic dodecahedron).
Theorem 1.1.The rhombic dodecahedron code is universally optimal in RP 2 .It is globally minimal for energy for each completely monotonic potential function of squared chordal distance, and it is the unique global minimum unless the potential function is a linear function.It is also the unique optimal seven-point code in RP 2 .
It is straightforward to check that two-point bounds cannot prove Theorem 1.1.It was already known that this configuration is an optimal projective code (i.e., it maximizes the minimal distance), as a consequence of the orthoplex bound of Conway,Hardin,and Sloane [CHS96].Uniqueness was conjectured in [CHS96] but does not follow from the orthoplex bound; see Appendix A for an explanation.
Note that the uniqueness asserted in Theorem 1.1 is as strong as possible: for linear potential functions, uniqueness fails because one can rotate the cube and octahedron relative to each other (so they are no longer in dual position) without changing the energy.That can be checked directly, but conceptually it holds because both polyhedra already define projective 1-designs.More generally, k-designs automatically minimize potential energy for completely monotonic polynomials of degree at most k.
One noteworthy aspect of the rhombic dodecahedron code is that it is the first case in which linear or semidefinite programming bounds are sharp for a code that is not distance regular.In other words, there is a distance such that the points in the code do not all have the same number of neighbors at that distance.All universal optima known or conjectured so far have been distance regular, so this example is the least regular one known.Theorem 1.1 thus extends the types of configurations that can be analyzed rigorously.
We have searched extensively for other cases in which the three-point semidefinite programming bounds are sharp (in spheres, real projective spaces, and complex projective spaces), but the rhombic dodecahedron is the only new case we have found.It would be very surprising if it were the only example in projective space, and we think there must be others.However, our failure to find them suggests that they are either large or high-dimensional.In the process of searching for other examples, we have formulated several remarkable conjectures about semidefinite programming bounds (Conjectures 5.2 through 5.4): in the few sharp cases that are known, the bounds appear to remain sharp even all the way through a phase transition between two different structures.
Theorem 1.1 provides the last remaining universal optimum in RP 2 : Theorem 1.2.The complete list of universally optimal line configurations in R 3 (up to isometry) is as follows: (1) Up to three orthogonal lines.
(2) The four lines through opposite vertices of a cube.
(3) The six lines through opposite vertices of an icosahedron.
(4) The seven lines through opposite vertices of a rhombic dodecahedron.
Case (1) is trivial, while cases (2) and (3) follow from Theorem 8.2 in [CK07].Case (4) is Theorem 1.1, and the completeness of the list is proved in Appendix B.

Notation and definitions.
A code is simply a finite subset of a metric space.It is optimal if it maximizes the minimal distance between points, given the number of points and the metric space.
An (n, N, t) spherical code is a set of N points in the unit sphere S n−1 such that the maximal inner product between distinct points is at most t.(In other words, the minimal angle between them is at least cos −1 t.)An antipodal code is a spherical code that is closed under multiplication by −1.Real projective codes are of course equivalent to antipodal codes, and the rhombic dodecahedron code corresponds to a (3, 14, 1/ √ 3) antipodal code.(Note that the vertices of the rhombic dodecahedron do not lie on a common sphere, so they must be rescaled to form this code.) The chordal metric is defined on real, complex, or quaternionic projective space as follows (in the octonionic case it is defined using the Frobenius norm on the exceptional Jordan algebra; see [CK07,p. 130]).If we represent points in projective space by unit vectors, then the chordal distance between x and y is 1 − | x, y | 2 , where x, y is the Hermitian inner product.This metric, first introduced in [CHS96], is equivalent to the Fubini-Study metric but is in many ways more convenient.
A code in a sphere or projective space is universally optimal if it minimizes the energy E f for all completely monotonic potential functions f (compared to all other codes of the same size), with the metric chosen to be the Euclidean metric on spheres or the chordal metric on projective spaces.Recall that distance is squared in the definition of E f .This strengthens the notion of universal optimality, compared to defining it with unsquared distance, and it improves the connections with topics such as spherical or projective designs.(See [CK07,p. 101].) Let P n k (t) denote the degree k ultraspherical (i.e., Gegenbauer) polynomial for S n−1 , normalized with P n k (1) = 1.(This is not the most common notation.)These polynomials are orthogonal with respect to the measure (1 − t 2 ) (n−3)/2 dt on [−1, 1].See Chapters 6 and 9 of [AAR99] for more details.
In the literature, two-point bounds are typically referred to as linear programming bounds and three-point bounds as semidefinite programming bounds.These names are a reasonable reference to the underlying inequalities, but we propose calling them "k-point bounds," which focuses attention on the most geometrically relevant feature, namely the number of points being simultaneously considered.
Given a matrix A, we write A 0 if A is Hermitian (i.e., it equals its own conjugate transpose) and positive semidefinite.The inner product on Hermitian matrices is defined by A, B = tr(AB) (i.e., the entry-by-entry Hermitian inner product), and angle brackets will also denote the usual inner product on vectors.We will use J to denote a matrix with all entries 1.
When we deal with points in real projective space RP n−1 , we will represent each point by an arbitrary lift to the double cover S n−1 .That introduces sign ambiguities, so we must ensure that all formulas are invariant under the choice of the lift.For example, if we apply a function to the inner product between two points, then it should be an even function.
2. Linear and semidefinite programming bounds 2.1.Positive-definite kernels.Linear and semidefinite programming bounds are based on the theory of positive-definite kernels.Let G be a topological group acting continuously on a topological space X.Call a continuous function K : X × X → C a positive-definite kernel 1 if for all N ∈ N and all x 1 , . . ., x N ∈ X, the N × N matrix (K(x i , x j )) 1≤i,j≤N is Hermitian and positive semidefinite.(Of course, saying it is Hermitian simply means that K(x, y) = K(y, x) for all x, y ∈ X.) We call the kernel G-invariant if K(gx, gy) = K(x, y) for all g ∈ G and x, y ∈ X.
Given any unitary representation V of G with inner product •, • and any continuous map ϕ : X → V that is G-equivariant (i.e., ϕ(gx) = gϕ(x) for g ∈ G and x ∈ X), setting K(x, y) = ϕ(x), ϕ(y) defines a G-invariant positive-definite kernel on X.In fact, every such kernel arises in this way: Theorem 2.1.For every G-invariant positive-definite kernel K on X, there exists a unitary representation V of G and a continuous, G-equivariant map ϕ : X → V such that K(x, y) = ϕ(x), ϕ(y) for all x, y ∈ X.
This theorem is a variant of a characterization of positive-definite kernels due to Bochner [B41].See, for example, §5 of Chapter IV in [L85] for a closely related result (with the same idea behind it).
Proof.Let W be the complex vector space formally spanned by the points of X, and define a form •, • on W by setting x, y = K(x, y) for x, y ∈ X and extending linearly in the first coordinate and conjugate linearly in the second.The action of G on X extends to an action of G on W .
Because K is a G-invariant positive-definite kernel, the form •, • on W is Ginvariant and positive semidefinite.Let W 0 be the subspace of vectors that have norm 0. Then the inner product on W/W 0 is positive definite, and the completion V of W/W 0 is a Hilbert space.Let ϕ : X → V be the obvious map (the composition of the trivial embedding of X in W , the quotient map by W 0 , and the embedding in the completion).Then ϕ is continuous and G-equivariant, and K(x, y) = ϕ(x), ϕ(y) for all x, y ∈ X.
To complete the proof, we just need to verify that V is a unitary representation of G.In other words, for each v ∈ V , the map g → gv must be a continuous function from G to V .By G-invariance it suffices to verify continuity at g = 1.Thus, we wish to show that gv → v as g → 1.Furthermore, it suffices to show this convergence for a dense subset of V , and we choose W/W 0 as that subset.Therefore, we can assume that where [x] denotes the vector in V corresponding to x ∈ X and where all but finitely many of the coefficients c x vanish.Then By the continuity of K and the action of G on X, the finitely many terms on the right side with nonvanishing coefficients can be made arbitrarily small by making g close to the identity.Thus, V is indeed a unitary representation of G.
Suppose that G is compact.Then by the Peter-Weyl theorem, every unitary representation of G is an orthogonal direct sum of finite-dimensional irreducible representations.Thus, the cone of G-invariant positive-definite kernels is spanned by those arising from irreducible representations.
When G acts transitively on X, we can identify X with G/H, where H is the stabilizer of a point e ∈ X.Then a G-equivariant map ϕ : X → V is completely determined by ϕ(e) via ϕ(ge) = gϕ(e), and ϕ(e) can be any vector in V that is fixed by H.
The simplest case is when (G, H) is a Gelfand pair: then the fixed space V H has dimension at most 1 when V is irreducible, so each irreducible representation is associated with at most one positive-definite kernel (up to scaling).This occurs, for example, when X is a sphere, projective space, or Grassmannian and G is the group of isometries of X.
When the fixed space V H has dimension greater than 1, the situation is more complicated.Then V is associated with several positive-definite kernels, but in fact there is a richer structure behind them.For x 1 , x 2 ∈ X, define a sesquilinear form K x1,x2 on V H as follows.Let x i = g i e with g 1 , g 2 ∈ G, and for The key property of these forms is that for all N ∈ N, x 1 , . . ., x N ∈ X, and To prove this inequality, note that 1≤i,j≤N As a consequence (by taking where A 0 means that A is positive-semidefinite and Hermitian.The G-invariant positive-definite kernels are given by (x, y) In general, define a matrix-valued positive-definite kernel to be a map that takes x, y ∈ X to a sesquilinear form K x,y on a fixed complex vector space W (not necessarily finite-dimensional), with the following properties: for all v, w ∈ W , the map (x, y) → K x,y (v, w) is continuous, for all x, y ∈ X and v, w ∈ W , and for all N ∈ N and all x 1 , . . ., x N ∈ X and w The last two properties are equivalent to requiring that We say K is defined on X and over W .The matrix-valued kernel is G-invariant if K gx,gy = K x,y for all g ∈ G and x, y ∈ X.The construction above defines a Ginvariant matrix-valued positive-definite kernel over V H whenever G acts transitively on X.Note that when dim W = 1, a matrix-valued positive-definite kernel over W is the same as an ordinary (scalar-valued) kernel.
When G does not act transitively on X, the identification of the vector space Hom G (X, V ) of continuous, G-equivariant maps with V H breaks down.Nevertheless, essentially the same construction works.One can define a G-invariant matrix-valued positive-definite kernel over Hom G (X, V ) by setting ).This construction is fully general, aside from changing variables in trivial ways: Theorem 2.2.For every G-invariant matrix-valued positive-definite kernel K on X and over W , there exists a unitary representation V of G and a linear function ϕ : W → Hom G (X, V ) (written w → ϕ w ) such that for all x, y ∈ X and v, w ∈ W , Proof.Let U = W ⊗ C CX, where CX denotes the complex vector space formally spanned by the points of X.We define a form •, • on U by Because K is a matrix-valued positive-definite kernel, this defines a Hermitian, positive-semidefinite form on U .Let U 0 denote the subspace of vectors with norm 0, and let V be the completion of U/U 0 , so V is a Hilbert space.Define ϕ w (x) to be the image of w ⊗ x in V .
The Hilbert space V is a unitary representation of V , with the trivial action of G on W and the usual action on X. (Strong continuity follows as in the proof of Theorem 2.1.)By construction, which completes the proof.
When X = S n−1 and G is the stabilizer in O(n) of a point in X, the matrices constructed in Theorem 3.1 of [BV08] (by a different method from that used here) define matrix-valued positive-definite kernels, and we were led to our construction in an attempt to develop a more abstract approach to Bachoc and Vallentin's discovery.
2.2.Linear and semidefinite programming bounds.Given the machinery of positive-definite kernels, it is easy to write down linear and semidefinite programming bounds for energy minimization.We will write X 2 /G for the set of orbits of G acting on pairs, and [x, y] will denote the orbit of the pair (x, y).
The linear programming bounds involve linear constraints on the G-invariant pair distribution of a code.Given a finite subset C ⊆ X, define These numbers satisfy some obvious linear constraints.They are all nonnegative, and their sum over the set X 2 /G of orbits is |C| 2 .Furthermore, A x,x = |C| if G acts transitively on X, and even if that is not the case, the diagonal terms A x,x are the orbit sizes of G acting on X.
In addition to these obvious constraints, each G-invariant positive-definite kernel K : X × X → C gives a more subtle constraint.Specifically, Of course, the sum over [x, y] ∈ X 2 /G means a sum over representatives of the orbits.Positive-definite kernels are remarkable because they have this property despite typically being negative at many points.Pfender [P07] discovered that they are not the only such functions, and one can occasionally improve the linear programming bounds by incorporating his functions.Unfortunately the improvement seems to be small in general, and the method becomes much less systematic (because the representation-theoretic context is lost), so we will not use Pfender's approach here.
Given a G-invariant potential function f : X × X → R, define the energy of C by 1 2 Potential energy is a linear functional of the pair distribution A, so by solving a linear programming problem (possibly infinite dimensional) one can optimize it subject to the linear constraints described above.To prove a lower bound for energy, valid for arbitrary sets of |C| points in X, one need only find a feasible point in the dual linear program.When X = G/H and (G, H) is a Gelfand pair, linear programming bounds express all systematically available information about the pair distribution.In more general cases, linear programming bounds can be generalized to two-point semidefinite programming bounds.They work the same way, except they use the matrix-valued positive-definite kernels K x,y developed in Subsection 2.1.For each such kernel, which is a semidefinite constraint on A. Thus, optimizing energy subject to these constraints becomes a semidefinite programming problem.
2.3.Three-point bounds and beyond.The most important application of semidefinite programming bounds is to prove three-point bounds (as in [BV08]).Fix a point e ∈ X and let H = Stab G (e) be its stabilizer.The semidefinite programming bounds give constraints on H-invariant pair distributions on X, and symmetrizing e with the other two points turns them into constraints on G-invariant triple distributions.One can then attempt to optimize energy (or other quantities) subject to these constraints.
Of course, there is nothing sacred about three points, and one can prove k-point bounds in the same way.(Musin [M07] was the first to formulate these bounds for spherical codes.)However, as k increases the bounds become increasingly difficult to compute with.The difficulty is that k-point distributions are functions on X k /G, so k-point bounds involve optimization over functions on this space.When X k /G is one-dimensional, the optimization problem is usually doable, but it rapidly becomes intractable as the dimension increases.
For a concrete example, suppose X = S n−1 and G = O(n).Then elements of X k /G are determined by their pairwise distances, so dim X k /G = k 2 .For k = 3, functions of three variables are barely tractable, but for k = 4, functions of six variables are too complicated: the space of polynomials of degree at most m in six variables has dimension m+6 6 , which grows too rapidly to allow for extensive computations.
The situation can be much worse for other spaces.For example, a pair of points in CP n−1 is determined up to isometries by the distance between them, but for triples of points there is a fourth parameter, namely a complex phase change.(Two unit vectors in C n can easily be phase shifted so that their inner product is real, but for three vectors that is generally impossible.)The Grassmannian of k-dimensional subspaces in R n is even worse, since min(k, n − k) parameters are required to determine a pair up to isometries (see, for example, [B06]).
2.4.Explicit computations.Of course, to apply any of these bounds in practice, one must carry out the representation-theoretic computations explicitly.Fortunately, in the cases of interest in this paper, no more calculations are needed to set up k-point bounds than two-point bounds.
Recall that for the sphere S n−1 , it is a theorem of Schoenberg [S42] that the cone of O(n)-invariant positive-definite kernels is spanned by the functions (x, y) → P n k ( x, y ).Here, P n k denotes the k-th degree Gegenbauer polynomial for S n−1 .See Subsection 2.2 of [CK07] for a brief account of these calculations.In terms of Theorem 2.1, these positive-definite kernels correspond to the irreducible representations of O(n) that contain a nonzero vector fixed by O(n − 1) (and the other irreducible representations do not have corresponding kernels).
Once this characterization of positive-definite kernels is known, one can extend it to k-point bounds without needing any additional representation theory or information about special functions.For example, three-point bounds can be handled as follows.This gives a new proof of Corollary 3.5 in [BV08].
Theorem 2.3.Let H be the stabilizer in O(n) of a point e in S n−1 .For each k ≥ 0, there is an H-invariant matrix-valued positive-definite kernel on S n−1 that takes x 1 , x 2 ∈ S n−1 to the infinite matrix whose i 1 , i 2 entry (indexed starting with 0) is , where u j = e, x j and t = x 1 , x 2 .
Of course, for numerical computations one uses finite submatrices of these infinite matrices.
Proof.Let X = S n−1 , with e and H as in Theorem 2.3.We must compute matrix-valued positive-definite kernels on the vector space Hom H (X, V ), where V is an irreducible unitary representation of H.Of course, Hom H (X, V ) is infinite dimensional, but we will choose a basis for a dense subset.Let Y = {y ∈ S n−1 : e, y = 0} be the equator relative to e.To specify a map ϕ ∈ Hom H (X, V ), we just need to specify its restriction to the equator and to all the parallel slices of the sphere.Fortunately, that is relatively manageable.As in Subsection 2.1, Hom H (Y, V ) ∼ = V H because H acts transitively on Y , and the Gelfand pair property implies that V H is at most one-dimensional; the same is true for all the parallel slices.Thus, the map ϕ is determined by specifying a scaling factor on each slice (in other words, a function of one variable).Suppose dim V H = 1, and choose an element φ ∈ Hom H (Y, V ) that is not identically zero.We will use φ to define ϕ on the equator Y .To complete the determination of ϕ, we must specify the scaling factor on each parallel slice.We will use scaling factors that are polynomials in the inner product with e (up to multiplication by some fixed function f to be specified later).Specifically, for −1 ≤ u ≤ 1 and y ∈ Y , define The function ϕ i is H-equivariant because φ is, and as i varies these functions are dense in Hom H (X, V ).
To compute the matrix-valued positive-definite kernel, we need to compute ϕ i1 (x 2 ), ϕ i2 (x 2 ) for x 1 , x 2 ∈ X.If we write x j = u j e + 1 − u 2 j y j , then ), φ2 (y 2 ) .The inner product φ1 (y 1 ), φ2 (y 2 ) can be computed using Schoenberg's theorem, because Y is a sphere and H is the full isometry group of Y .Specifically, there is some k (depending on V ) such that φ1 (y 1 ), φ2 (y 2 ) = P n−1 k ( y 1 , y 2 ), if φ is rescaled appropriately, which we can assume.
If we let t = x 1 , x 2 , then .
This choice has the advantage that the right side is a polynomial in u 1 , u 2 , and t (because P k is an even or odd function, according as k is even or odd).
The same approach works straightforwardly for k-point bounds: choosing H to be the stabilizer of several points recovers the results of Musin [M07].It also works for projective spaces.As in the case of spheres, the calculations for an n-dimensional projective space with respect to the stabilizer of a point reduce immediately to those for an (n − 1)-dimensional space with respect to the full group.
The calculations in real projective space are almost the same as those in the sphere.If we lift points arbitrarily to the sphere (as discussed in Subsection 1.3), then we just need to avoid any sign ambiguity.Specifically, in terms of the three inner products u 1 , u 2 , and t from Theorem 2.3, we want only terms that are invariant under changing the signs of two of the three variables.That means we take only the entries for which i 1 ≡ i 2 ≡ k (mod 2).This submatrix defines a matrix-valued positive-definite kernel on real projective space.

Three-point bounds for energy minimization
In this section, we write down three-point bounds for energy minimization in spheres or real projective spaces.We begin with spheres, after which we can easily adapt the answer to real projective spaces.All of the required theory is in Section 2, but there are a number of details that must be worked out carefully.
It will be convenient to write potential functions in terms of the inner product.Given a potential function f : . The function f is absolutely monotonic (i.e., it is infinitely differentiable and all of its derivatives are nonnegative) if and only if g is completely monotonic.Thus, C is universally optimal if and only if it minimizes E f for all absolutely monotonic f .Given a configuration C ⊆ S n−1 with |C| = N > 2, define the corresponding triple distribution by

This function counts ordered triples of points modulo the action of O(n). Of course,
A u,v,t ≥ 0, and it equals zero unless −1 ≤ u, v, t ≤ 1 and In other words, the 3 × 3 Gram matrix must be positive semidefinite.Furthermore, A u,v,t is symmetric in u, v, and t.It satisfies the identities Because the potential function is not necessarily defined at inner product 1, it will be important to work with sums over D. For example, Equivalently, summing over D counts ordered triples of distinct points.
We can express E f (C) as a sum over D by writing It follows that Let S n k (u, v, t) be the infinite matrix whose i, j entry (indexed starting with 0) is the symmetrization of in u, v, and t (i.e., the average over all permutations of the variables).The sum of this matrix over the code is positive semidefinite by Theorem 2.3; in terms of A u,v,t , u,v,t A u,v,t S n k (u, v, t) 0.
If we break the sum up according to which variables equal 1, we find that (It might appear that S n k (u, v, t) is undefined when one of the variables equals 1, but in fact all its entries are polynomials in u, v, and t.)The same trick as we used for (In fact, this is simply (3.1) with f (u) replaced with 2S n k (u, u, 1).)Thus, if we define Furthermore, note that S n k (1, 1, 1) is the zero matrix unless k = 0, in which case all of its entries are 1.If we let J denote the all 1 matrix, then our inequality becomes where δ denotes the Kronecker delta function.
Theorem 3.1.The minimal value of E f for N > 2 points in S n−1 is greater than or equal to the optimum of the following semidefinite program: minimize To compute a lower bound for energy, we will use the dual semidefinite program.Suppose we define a function H on D by where c is a constant, F k denotes an infinite symmetric matrix, and the inner product on symmetric matrices is the trace of the product.We assume only finitely many entries of F k are nonzero, so the inner product is well defined.
Theorem 3.2.With the notation established above, if F k 0 for all k and for all (u, v, t) ∈ D, then the minimal value of E f for N points in S n−1 is at least Proof.For any N -point code C with triple distribution A, by (3.1).On the other hand, the inner product of two positive semidefinite matrices is nonnegative, from which it follows that Summing over k yields and combining these inequalities with (u,v,t)∈D A u,v,t = N (N −1)(N −2) completes the proof.
As discussed at the end of Section 2, the real projective case is almost identical.If f is an even function, then we can lift points arbitrarily to S n−1 without introducing any ambiguity in E f .It will prove convenient to use a third variant E of the notation for energy (in addition to E and E); that may seem unnecessary, but E has the clearest connection to physics, E is the most convenient for spheres, and E is the most convenient for real projective space.Given a potential function f : Here x and y represent lifts to S n−1 of points in RP n−1 .Note that each point in RP n−1 has two lifts, but we only include one of them in the sum (chosen arbitrarily).
Let S n k be the submatrix of S n k indexed by numbers with the same parity as k, and define ).Then the three-point energy bounds for RP n−1 are exactly the same as those for S n−1 , except with S and T replacing S and T : with F k 0 for all k and The choice of c and F 0 , F 1 , . . . in Theorems 3.2 and 3.3 can be optimized using semidefinite programming.First, assume F k = 0 for k beyond some bound, and that for all k the i, j entry of F k vanishes if i and j are sufficiently large, so that only finitely many variables need to be considered.Assume also that f is a polynomial (although that assumption can be relaxed, at the cost of additional complications).To express the constraint that we will use a sum of squares representation due to Putinar.Unfortunately, there is a nontrivial error in the paper [P93] (see [M08, p. 98]), so we must be careful here.The space D is defined by the constraints 1 − u 2 ≥ 0, 1 − v 2 ≥ 0, 1 − t 2 ≥ 0, and 1 + 2uvt − u 2 − v 2 − t 2 ≥ 0, and this representation of D is stably compact (i.e., it defines a compact set even if the coefficients are perturbed); furthermore, only the last constraint has odd degree.It follows from Corollary 7.2.5 in [M08] that if a polynomial is strictly positive on D, then it can be expressed in the form where G 0 (u, v, t), . . ., G 4 (u, v, t) are sums of squares of polynomials in u, v, t.A polynomial is a sum of squares if and only if it is of the form z(u, v, t), M z(u, v, t) , where M is a symmetric, positive-semidefinite matrix and z(u, v, t) is a vector whose entries are monomials in u, v, t; thus, being a sum of squares is a semidefinite condition.If we apply this approach to then we get semidefinite programs that are guaranteed to come arbitrarily close to the true optimum in Theorem 3.2.Of course, we hope they will actually achieve the true optimum, and in practice that occurs.

Proof of universal optimality
In this section we prove Theorem 1.1.There are two fundamental difficulties.One is that, although we can solve a semidefinite program numerically to obtain a bound for any given potential function, the solutions are very cumbersome and it is not easy to produce a rigorous, exact solution.Bachoc and Vallentin dealt with this difficulty in [BV09b], but their problem is substantially simpler and they employed ad hoc techniques.We will develop a more systematic approach.
The second difficulty is that to prove universal optimality, we must prove optimality for infinitely many different potential functions, namely all completely monotonic functions.Of course we only need to deal with the extreme rays in this cone, but there are infinitely many of them as well.To address this difficulty, we will construct a finite set of functions (not all completely monotonic) such that optimality for all of them implies universal optimality.In other words, we will replace the cone of completely monotonic functions with a larger cone that has only finitely many extreme rays, so we will prove a slightly stronger result than universal optimality.
Recall that where x and y represent lifts to S n−1 of points in RP n−1 (with only one lift of each point being used).One advantage of squaring the inner product is that it becomes invariant under sign changes, but it also relates well to the chordal distance.The squared chordal distance between x and y is 1 . Thus, C is universally optimal in RP n−1 if and only if it minimizes E f for each absolutely monotonic function f .The cone of absolutely monotonic functions on [0, 1) is spanned by the monomials f (t) = t i (see Theorem 9b in [W41, p. 154]), so it suffices to prove optimality for these monomials.
To reduce proving universal optimality to a finite problem, we will apply Hermite interpolation.Recall that given a nonempty, finite multiset T of points in R (with the multiplicity of t ∈ T denoted mult T (t)), the Hermite interpolation H T (f ) of a function f is the unique polynomial of degree less than t∈T mult T (t) that agrees with f to order mult T (t) at each t ∈ T .(I.e., f (i) (t) = (H T (f )) (i) (t) for all i < mult T (t).) See Subsection 2.1 of [CK07] for background on Hermite interpolation.The following observation of Yudin will be crucial: Lemma 4.1 (Yudin [Y92]).Let T be a finite, nonempty multisubset of an interval I such that each point in T has even multiplicity, except for the endpoints of I (which are allowed to have even or odd multiplicity).For each absolutely monotonic function f : for all t ∈ I.
Lemma 4.1 follows from the remainder formula for Hermite interpolation (see, for example, Lemmas 2.1 and 5.1 in [CK07] for a proof).
Lemma 4.2.Let T = {t 1 , . . ., t M } be a nonempty multisubset of an interval I (written with t i repeated according to its multiplicity).If f : What Lemma 4.2 says depends on the ordering of t 1 , . . ., t M , but it is true for every ordering.
Proof.We prove Lemma 4.2 by induction on M .For M = 1, H T (f ) is the constant function f (t 1 ) and the lemma is trivial.Otherwise, let T = {t 2 , . . ., t M }.Then (4.1) where g(t) = (f (t) − f (t 1 ))/(t − t 1 ) (and g(t 1 ) = f (t 1 )).The function g is absolutely monotonic on I, by Proposition 2.2 in [CK07]; alternatively, that can be seen directly via which is the fundamental theorem of calculus for k = 0 and can be proved by induction (or by using a Taylor series expansion for (f (t) − f (t 1 ))/(t − t 1 ) about t 1 ).Now applying the lemma inductively to H T (g) and using (4.1) completes the proof.
Combining Lemmas 4.1 and 4.2 reduces proving universal optimality to a finite number of cases, as follows.
Corollary 4.3.Let C be a finite subset of RP n−1 (represented via arbitrary lifts to S n−1 ), and let T = {t 1 , . . ., t M } be any finite multisubset of [0, 1) (written with multiplicities) such that all elements other than 0 have even multiplicity and Proof.Let f : [0, 1) → R be absolutely monotonic.By Lemma 4.1, f ≥ H T (f ) on [0, 1), and by Lemma 4.2, there are nonnegative coefficients λ 0 , . . ., λ M −1 such that Thus, for any configuration D with |D| = |C|, where the last equation holds because f = H T (f ) at each squared inner product between distinct points in C.
Unfortunately, even if C is universally optimal, there is no guarantee that it is optimal for the potential functions constructed in Corollary 4.3.That seems to depend on luck and the proper choice of T .
The proof of the main theorem in [CK07] can be recast in this framework, which simplifies the argument given there: specifically, it replaces the use of conductivity.As a side benefit, this approach allows us to give a substantially simpler proof of the universal optimality of the regular 600-cell in S 3 than was given in [CK07].(That is the one case that was not proved by a conceptual argument, but rather by somewhat complicated calculations.)See Appendix C for more details.
The triple repetition of 0 in T is not essential for the proof, but it is helpful.Our approach fails if we take mult T (0) = 1; with mult T (0) = 2, it works but the numerical calculations are more cumbersome.
For the rhombic dodecahedron code C, Thus, we wish to prove a lower bound of 3f (0) + 6f (1/9) + 12f (1/3) for energy with respect to each of the potential functions f (t) = t 2 , t 3 , t 3 (t − 1/9), t 3 (t − 1/9) 2 , and t 3 (t − 1/9) 2 (t − 1/3).In fact, in each case we prove something stronger, namely that the same lower bound holds not just for the potential function f (t) but also for Because f 0 (t) ≤ f (t) for all t ≥ 0, we have E f0 ≤ E f , and equality holds for the rhombic dodecahedron code.
For each potential function, we use an auxiliary function where the matrix F 0 is 5 × 5 (i.e., all entries outside of the upper left 5 × 5 block are zero), F 1 and F 2 are 4 × 4, F 3 and F 4 are 3 × 3, and F 5 is 2 × 2. We then optimize the bound obtainable from Theorem 3.3.To enforce the constraint that where M 0 is a 120 × 120 matrix and z(u, v, t) is the vector consisting of all monomials in u, v, t of degree at most 7.Note that this condition is very strong (for example, it forces the degree of the left side to be at most 14 and it forces the inequality to hold for all (u, v, t) ∈ R 3 ), but in fact it works.
The program SDPA-GMP can do arbitrary-precision semidefinite programming [N+08], and CSDP can do sufficiently high-precision semidefinite programming for our purposes [B99].Using either of them, one can solve this semidefinite program numerically and verify that the bound is nearly sharp.However, getting a rigorous proof takes more work.We would like to round an approximate solution to get an exact solution, but the rounding process can violate the constraints at locations where there is equality, so it must be done carefully.
To get a sharp bound, we will impose two types of conditions.First, there are necessary constraints on the matrices F k from complementary slackness (i.e., the conditions under which equality can hold in the proof of Theorem 3.3).Specifically, for 0 ≤ k ≤ 5, the inner product of F k with 35δ k,0 J + 6 T 3 k 0, 0, 0 + 24 must vanish.The coefficients and arguments of T 3 k come from the triple distribution of the rhombic dodecahedron code.Second, we require that H(u, v, t) must equal f 0 u 2 + f 0 v 2 + f 0 t 2 /3 at the triples of inner products that occur, namely the five arguments to T 3 k in the formula above, and all their first partial derivatives must agree as well.These conditions are linear in the variables c, F 0 , . . ., F 5 , and M .
To perform the rounding correctly, we write the problem in a basis for the set of all c, F 0 , . . ., F 5 , and M that satisfy the constraints listed in the previous paragraph, together with the defining equations such as We then solve the semidefinite program numerically for the coefficients in this basis, and we round the coefficients to eight or nine decimal places to get an exact solution.It is not guaranteed that this rounding process will work: the problem is the semidefinite constraints, because unexpected zero eigenvalues may become negative due to the rounding.However, we do not run into that difficulty, because there turn out to be no zero eigenvalues except for the ones forced by the constraints we have built into our basis.Note that the exact auxiliary function is far from unique, and our rounding strategy makes essential use of this freedom.
Finally, for uniqueness as a spherical code, we use f (t) = t 3 (t − 1/9) 2 (t − 1/3).Every code whose minimal distance is as large as the rhombic dodecahedron code's minimal distance has E f ≤ 0, and hence it minimizes E f (because the minimal energy is 0).Thus, uniqueness for this potential function implies uniqueness as a spherical code.This completes the proof of Theorem 1.1.

Open questions
5.1.Three-point bounds and generalizations.One pressing question is why three-point bounds are not sharp more often.Of course, it is unreasonable to expect that sharp bounds will ever be common in packing or energy minimization problems.The phenomena seem to be intrinsically complicated (see, for example, [B+09]), and the most one can hope for is to prove optimality in exceptional cases.However, even just for spheres, the two-point bounds are sharp for three infinite families and a dozen sporadic cases (see Table 1 in [CK07]), and they are sharp even more often in projective spaces.Thus, it is surprising that only three sharp cases are known for three-point bounds, excluding the cases where two-point bounds already suffice.Surely there must be more examples, but so far we have not been able to find them.
It is natural to ask what sort of bounds are required to prove optimality for an N -point configuration.Of course N -point bounds suffice, but only because in the process of optimizing over N -point distributions they implicitly optimize over all N -point configurations.For what sorts of families of configurations might o(N )-point bounds suffice?
It would also be very interesting to do explicit calculations with four-point bounds.This project would involve exceptionally time-consuming calculations, but it is possible in principle and perhaps in practice.Gijswijt, Mittelmann, and Schrijver have carried out the analogue for binary error-correcting codes [GMS10], which suggests that the continuous version may also be tractable.
If we had further examples of sharp three-point bounds, they might suggest organizing principles that could lead to a deeper understanding.For example, for two-point bounds in two-point homogeneous spaces, Levenshtein [L92] proved a beautiful criterion for being an optimal code in terms of strength as a design: any m-distance set that is a (2m − 1)-design (or even an antipodal (2m − 2)-design) is an optimal code.In these cases, the two-point bounds are sharp and can be understood conceptually, with no need for numerical calculations.The same criterion applies more generally to prove universal optimality (Theorems 1.2 and 8.2 in [CK07]).Every known case in which two-point bounds are sharp fits into this framework, with one exception, namely the regular 600-cell [A99,CK07].
We cannot even propose a conjectural generalization of Levenshtein's criterion to three-point bounds.However, we hope that some general principle will explain the sharp cases, offer guidance for how to locate more of them, and lead to proofs that involve no explicit numerical computations.
Another question is whether there are good applications of three-point bounds to potential functions that depend on triples of points, rather than pairs.Of course, there is no obstacle to writing down such bounds, but it is not clear that there are any exciting examples.Pair potentials are far more common in physics, and we do not know which higher-order generalizations may be worthy of investigation.5.2.Projective spaces.Theorem 1.2 settles the question of universal optimality in RP 2 , but as we will see in Subsection 5.3, there may be cases in which three-point bounds are sharp but universal optimality does not hold.We are unaware of any cases in projective space besides the rhombic dodecahedron code in which three-point bounds are sharp and two-point bounds are not, but it is difficult to imagine that it is the only example.
It is especially intriguing that the three-point bounds prove universal optimality for this code, and it would be fascinating to find other such cases.It is unlikely that there are any in RP 3 .In addition to the universal optima that occur in every dimension (up to n orthogonal lines in R n , or n + 1 connecting the vertices of a regular simplex to its centroid), there are at least two other universally optimal line configurations in R 4 : 12 lines through opposite vertices of a regular 24-cell and 60 from a regular 600-cell.In each of these cases, universal optimality follows from two-point bounds: the last one can be proved using the techniques from Section 7 of [CK07], and all the others follow from Theorem 8.2 in [CK07].We know of no other candidates for universal optimality in RP 3 .
Five-dimensional line configurations are more promising.In addition to the generic cases of up to six lines, two-point bounds prove universal optimality for a ten-line configuration from [CHS96] (by Theorem 8.2 in [CK07]).In the subspace of R 6 consisting of all points with coordinate sum zero, this configuration consists of the lines through all the permutations of (1, 1, 1, −1, −1, −1).If we include also the lines through the permutations of (5, −1, −1, −1, −1, −1), we get a 16-line configuration also studied in [CHS96].The orthoplex bound is sharp for it, so it is an optimal projective code, but nothing more is known about energy minimization or uniqueness.
Conjecture 5.1.The 16-line configuration in R 5 described above is universally optimal and is the unique optimal 16-point code in RP 4 .
The parallels between this code and the rhombic dodecahedron code are noteworthy.The seven lines in R 3 can be constructed completely analogously, using the permutations of the points (1, 1, −1, −1) and (3, −1, −1, −1).Equivalently, both codes can be constructed by starting with the lines through the vertices of a regular simplex centered at the origin, and then filling in all the holes (i.e., the lines at maximal distance from the original lines).
Both codes can be proved optimal using the orthoplex bound.Nevertheless, the three-point bounds for potential energy do not seem to settle Conjecture 5.1, although they suffice to prove Theorem 1.1.Perhaps four-point bounds can prove Conjecture 5.1, but the calculations required to investigate this could be formidable.The pattern certainly does not continue to R 7 : the analogous configuration has 5.3.Spheres.Although in this paper we focus on real projective space, we have also applied three-point bounds to potential energy minimization on spheres.We have not found any sharp cases beyond the two identified by Bachoc and Vallentin [BV09b], namely the Petersen code in R 4 (the ten edge midpoints of a regular simplex) and the square antiprism in R 3 .Using three-point bounds, they proved rigorously that the Petersen code is an optimal code, and based on their calculations they conjectured that the bounds are also sharp for the antiprism.
For ten points in S 3 , we observe a remarkable phenomenon in the three-point bounds for energy.The Petersen code is not universally optimal, because it sometimes has greater potential energy than the code consisting of two regular pentagons on orthogonal planes in R 4 .However, the three-point bounds appear to be sharp in every case: Conjecture 5.2.For every completely monotonic potential function f on (0, 4], either the Petersen code or the orthogonal pentagon code minimizes the energy E f among all ten-point codes in S 3 .Furthermore, the three-point bounds are always sharp (for whichever code is optimal).
In other words, the three-point bounds remain sharp throughout the phase transition between the two ground states.If true, this represents an unprecedented phenomenon in coding and energy minimization.
We believe that Conjecture 5.2 should be provable, although we have not been able to complete a proof.The cone of completely monotonic functions on (0, 4] is spanned by the functions f (r) = (4 − r) k (see Theorem 9b in [W41, p. 154]).For k ≤ 2, both codes have the same energy and the two-point bounds are sharp.For 3 ≤ k ≤ 6, the three-point bounds are sharp for the orthogonal pentagons, and for k ≥ 7 we can prove that they are sharp for the Petersen code (by using the spherical analogue of Corollary 4.3 to reduce to a finite basis).To prove Conjecture 5.2, it would suffice to prove it just for the functions f (r) = (4 − r) j + α j,k (4 − r) k with 3 ≤ j ≤ 6 and k ≥ 7, where α j,k > 0 is chosen to make the two codes have equal energy.Unfortunately, we have found it much more difficult to deal with these cases.
The same phenomenon as in Conjecture 5.2 seems to occur also for antiprisms.In that case, there is a one-parameter family of configurations.Each consists of two squares in parallel planes and offset by a 45 • angle, but the distance between the planes can vary.
Conjecture 5.3.For every completely monotonic potential function f on (0, 4], some square antiprism minimizes the energy E f among all eight-point codes in S 2 .Furthermore, the three-point bounds are always sharp (for whichever code is optimal).
Because this conjecture involves a continuous family of optima, it may be more difficult to prove than Conjecture 5.2.It seems to require a new idea beyond what suffices for Theorem 1.1 and our partial progress on Conjecture 5.2.The dependence of the optimal energy on the potential function is also extraordinarily complicated, because of the need to optimize over all square antiprisms.For example, for the Coulomb potential, the minimal energy within the family of antiprisms is a root of an irreducible polynomial of degree 48.
These conjectures suggest a broad generalization of the mechanism behind the known proofs of universal optimality: Conjecture 5.4.Suppose X is a sphere or projective space, C is a finite subset of X, and f is a completely monotonic function other than a polynomial.If k-point bounds prove that C optimizes energy under f (or prove that C is an optimal code), then for every completely monotonic potential function, k-point bounds will prove a sharp bound for energy for |C| points in X, although C may not itself be optimal.
For low-degree polynomials, there are numerous counterexamples: for example, for each spherical k-design, two-point bounds prove that it is optimal for f (r) = (4 − r) k .However, that is the only loophole we have found.Conjecture 5.4 is very strong, and perhaps we have been misled by the few known examples with k > 2. It may hold in these examples merely because the optimal structures are in some sense of comparable complexity.However, Conjecture 5.4 seems to be the simplest explanation of the available evidence.
The most dramatic test case will be 24 points in S 3 .The vertices of a regular 24cell are almost certainly the unique optimal code, but they are not universally optimal [CCEK07].Instead, there is a one-parameter family of competing configurations that improve on the 24-cell for some potential functions.Numerical evidence suggests that either the 24-cell or one of these competitors is always optimal.If that is true, and Conjecture 5.4 is true as well, then k-point bounds cannot settle the optimality of the 24-cell without also dealing with its more exotic competitors.If they can accomplish both with a small value of k, it will be truly remarkable.