Skip to Main Content

Sums of Squares: A Real Projective Story

Grigoriy Blekherman
Rainer Sinn
Gregory G. Smith
Mauricio Velasco

Communicated by Notices Associate Editor Steven Sam

Article cover

How do we find the minimum of the function on the unit sphere? An algebraic way to see that the minimum equals involves the decomposition

The unit sphere in is the zero-locus of the polynomial . This decomposition establishes that the function is a sum of squares modulo the defining equation of the unit sphere, so this function is nonnegative on the sphere. Thus, we have a certificate that the minimum of on the unit sphere is at least . This lower bound is optimal because the sum of squares vanishes at the points on the unit sphere. Figure 1 illustrates that the unit sphere is tangent to the zero-locus of at these two points.

Figure 1.

The unit sphere in red and the zero-locus of in teal.

This example indicates that nonnegativity is intimately related to polynomial optimization and sum-of-squares representations give rise to lower bounds on minima. Does this approach lead to sharp bounds for all quadratic polynomials? Does it apply to higher-degree polynomials on the unit sphere? What happens when we replace the unit sphere with another algebraic variety? This article examines these basic questions. In the process, we also uncover fascinating connections between real and complex algebraic geometry. Remarkably, convex geometry provides the bridge between these two worlds.

As a second motivating example, let be the cubic surface defined as the real zero-locus of the polynomial and consider . This quadratic polynomial is not a sum of squares modulo the defining ideal of because no nonzero polynomial of degree less than vanishes on the algebraic variety and is not globally nonnegative. Nevertheless, we claim that is nonnegative on the algebraic variety . We prove this in Section 1 by recognizing as the Motzkin polynomial in disguise. More directly, we certify the nonnegativity of on with the sum-of-squares decomposition

where is a sum of squares, and the squares on the right are given by , , , , , and . The important new ingredient is the multiplier . As is nonnegative and not divisible by the irreducible polynomial , this decomposition confirms that whenever . Equivalently, the function is the sum of nonnegative rational functions on the variety . Restricting to , Figure 2 depicts the variety as tangent to the sphere defined by at the points , , , and . Thus, the restriction of the function to has a minimum value equal to zero.

Figure 2.

The real zero-loci of in red and in teal.

To understand the relationship between nonnegativity and sums of squares, it suffices to consider homogeneous polynomials. Indeed, homogenizing any sum-of-squares decomposition by introducing a new variable produces a homogeneous sum-of-squares decomposition. It follows that a polynomial is a sum of squares modulo the ideal of an affine subvariety if and only if an appropriate homogenization (having sufficiently high degree) is a sum of squares modulo the homogeneous ideal of the projective completion .

Working with projective varieties has conceptual and technical advantages, so we concentrate on nonnegative functions and sums of squares on real subvarieties of projective space. We examine three problems:

(1)

Identify all of the real projective varieties on which every nonnegative quadratic function is a polynomial sum of squares.

(2)

Knowing that a nonnegative function is a polynomial sum of squares on a projective variety, control the number of squares needed in such a decomposition.

(3)

Bound the degree of a sum-of-squares multiplier such that its product with a nonnegative function decomposes into a polynomial sum of squares.

Each of the subsequent sections is devoted to one of these problems. Sections begin with some historial context before describing more recent results.

Sums of squares of polynomials are indispensable in the theory of nonnegative polynomials. Beyond forming a self-evident subset of nonnegative polynomials, there are practical algorithms for deciding whether a given polynomial is a sum of squares. The existence of a polynomial sum-of-squares decomposition is equivalent to the feasibility of a semidefinite programming problem: a convex optimization problem with a linear objective function on the intersection of the cone of positive semidefinite matrices with an affine linear space. There are robust and efficient software packages, typically based on interior point methods, for solving these semidefinite programming problems. More theoretically, polynomial sums of squares generate an infinite hierarchy of approximations for the set of nonnegative polynomials. The resulting sum-of-squares relaxations (also known as Lasserre relaxations) play a prominent role in engineering applications and computer science; see Chapter 7 in 4.

1. Geometry of Sums of Squares

When is every nonnegative polynomial a sum of squares? For which degrees and in how many variables is there an equivalence? David Hilbert gave a eulogy 17 in 1910 for Hermann Minkowski (1864–1909) to the Royal Society of Sciences in Göttingen. Hilbert explained that the history of this question began in 1885 with Minkowski’s thesis defense. Hilbert participated in this defense as an examiner. Two years prior to the defense, Minkowski made a name for himself by solving a problem posed by Eisenstein in a competition for the Academy of Sciences (Paris) about the number of representations of a positive integer as a sum of five squares. His doctoral thesis was a continuation of this prize-winning work. In his defense, Minkowski argued that there are polynomials with real coefficients that are nonnegative functions on Euclidean space but cannot be written as sums of squares of polynomials.

Minkowski never published these insights, but he did inspire Hilbert’s seminal work in the area. Hilbert 15 demonstrated, in 1888, that Minkowski’s claim is correct by completely characterizing when every nonnegative polynomial is a polynomial sum of squares.

Theorem 1.1 (Hilbert).

Every nonnegative homogeneous polynomial on having degree is a sum of squares of polynomials if and only if

, (quadratic forms)

, or (binary forms)

and . (ternary quartics)

In a second paper, Hilbert 16 also characterized nonnegative polynomials in two variables (or equivalently homogeneous polynomials in three variables) as sums of squares of rational functions. These results undoubtably prompted Hilbert to formulate the 17th problem on his celebrated list of 23 problems.

For the first two cases in Theorem 1.1, establishing that all nonnegative polynomials are sums of squares of polynomials is relatively straightforward. The third case is noticeably more challenging. Arguably, the hardest part is proving that there exist nonnegative polynomials that are not polynomial sums of squares in all of the remaining cases. Hilbert accomplished this task with an impressive nonconstructive argument.

Curiously, Theodore Motzkin published in 1967, nearly 80 years after Hilbert’s paper, the first explicit example of a nonnegative polynomial that cannot be expressed as a polynomial sum of squares. The Motzkin polynomial is the ternary sextic . By taking suitable means of , , and , the nonnegativity of this homogeneous polynomial on follows immediately from the arithmetic-geometric mean inequality.

Real projective varieties

As alluded in the preceding section, it is advantageous to work with homogeneous polynomials and real projective varieties. Within this broader framework, we not only generalize Theorem 1.1 but discover a geometric characterization of the equality between nonnegative polynomials and sums of squares.

We consider the -dimensional projective space to be the set of lines in passing through the origin. The point is the equivalence class

The point is real if it has a representative such that for all .

A real subvariety is the set of common zeroes of some homogeneous polynomials with real coefficients in the variables , , …, . Whenever a homogeneous polynomial vanishes at , it also vanishes at all scalar multiples because homogeneity implies that for all nonzero scalars . In other words, the homogeneous polynomial vanishes at the point and the set of common zeroes of a collection of homogeneous polynomials is a subset of . For example, the twisted cubic curve

is the set of common zeroes of the quadratic polynomials , , and .

Throughout, we assume that the subvariety is nondegenerate and totally real. The first assumption means that the subvariety does not lie in a hyperplane. This mild hypothesis guarantees that the ambient projective space is not unnecessarily large. The second assumption is more significant—it ensures that has enough real points. By definition, the subvariety is totally real if the set of real points in , regarded as a subset of the complex points in , is dense in the Zariski topology. Equivalently, every irreducible component of the algebraic variety has a nonsingular real point. The twisted cubic curve in is totally real whereas the zero set of the polynomial is not because it does not contain a real point in .

In this geometric context, polynomials are replaced by elements in another ring. The homogeneous coordinate ring of the subvariety is the quotient of the polynomial ring by the ideal generated by all homogeneous polynomials that vanish on . For any nonnegative integer , let denote the real vector space spanned by the homogeneous elements in of degree . Generalizing the concept of a polynomial sum of squares is not difficult. A homogeneous element is a sum of squares on if there exist homogeneous elements such that .

In comparison, defining nonnegativity requires more care. A homogeneous element of even degree is nonnegative on if its evaluation at each real point in is greater than or equal to . Since elements in the ring and points in the space may both be thought of as equivalence classes, the evaluation process involves choosing a polynomial in to represent the ring element and a point in to represent the real point in . Although a homogeneous element in does not determine a function from to , evaluating its polynomial representative at a representative point in does have a well-defined sign because the degree of polynomial is even. We recover our original notion of nonnegativity for polynomials when .

Example 1.2.

Since we have

both and represent the same element in the homogeneous coordinate ring of the twisted cubic curve . Choosing or as representatives for the real point , either of the evaluations

or show that this element in the homogeneous coordinate ring is positive at this point.

Sums of squares on real projective varieties

The real vector space of quadrics on the real subvariety contains two fundamental subsets:

The set consists of those elements whose evaluations at every real point in is nonnegative;

The set consists of sums of squares;

As the square of any real number is nonnegative, we have the inclusion . Following Hilbert 15, we ask the following question.

Question 1.3.

For which subvarieties is every nonnegative quadratic element in a sum of squares? Equivalently, when does the equality hold?

At first glance, focusing on just the quadratic elements seems to be a limited generalization of Hilbert’s work. However, the elbow room gained by considering all real projective subvarieties alleviates this concern. Suppose that we are interested in the homogeneous elements of degree on a subvariety . Set and let be the th Veronese map that sends the point to the point in whose coordinates are all possible monomials of degree evaluated at . By re-embedding the subvariety , it is enough to understand the quadratic elements on the image .

Example 1.4.

On the twisted cubic curve , the set may be identified with the homogeneous polynomials in having degree that are nonnegative on . Likewise, the set may be identified with sums of squares of homogeneous polynomials in having degree . By Theorem 1.1, these sets coincide, so .

Example 1.5.

A variant of the Veronese map shows that the Motzkin polynomial is not a sum of squares of cubic polynomials. If it were, then an easy (if somewhat tedious) case study shows that the cubic polynomials would only involve the monomials . Alternatively, one may use Newton polytopes (the convex hull of the exponent vectors) to identify these monomials. The Newton polytope of a sum of squares contains the Newton polytopes of every summand, and the Newton polytope of a product is the Minkowski sum of the Newton polytopes of the factors. So consider the map defined by

The image of this map is the zero-locus of the polynomial where the homogeneous coordinate ring of is . Hence, the Motzkin polynomial is the restriction of the function to the image variety. As explained in the first section, this quadratic polynomial is not a sum of squares modulo the defining ideal of the surface because no nonzero polynomial of degree less than vanishes on the image variety and is not nonnegative on . Therefore, the Motzkin polynomial cannot be a polynomial sum of squares.

The surprising answer to Question 1.3, restated in the next theorem, first appeared as Theorem 1.1 in 8. To formulate this result, recall that a variety is irreducible if it is not the union of two proper subvarieties. From the algebraic point of view, a variety is irreducible if and only if the ideal of polynomials vanishing on it is prime.

Theorem 1.6 (Blekherman, Smith, and Velasco).

Let be an irreducible nondegenerate totally-real subvariety in . Every nonnegative quadratic function on is a polynomial sum of squares, modulo the defining ideal of , if and only if we have .

This theorem reveals a remarkable connection between a semialgebraic property (any nonnegative polynomial being a polynomial sum of squares) and the fundamental geometric invariants of the complex variety. For any subvariety , the codimension is a simple numerical measure of its relative size; . The degree is a second numerical invariant depending on the embedding of the variety in . Geometrically, the degree, denoted by , is the number of points in the intersection of the variety and a general linear subspace of dimension . For instance, if we intersect the twisted cubic with the generic hyperplane given by the linear equation , then the intersection points correspond to the three (typically distinct) roots of the binary cubic , so . As the codimension of the curve in is , Theorem 1.6 again implies that ; compare with Example 1.4.

The irreducible nondegenerate subvarieties satisfying are called varieties of minimal degree. As the terminology suggests, these subvarieties do have the smallest possible degree. The complete classification of varieties of minimal degree is one of the outstanding achievements of the classical Italian school of algebraic geometry. Pasquale del Pezzo (1886) classified the surfaces of minimal degree and Eugenio Bertini (1907) extended it to varieties of any dimension; see 13 for an account. The next subsection summarizes this result and highlights links with sums of squares. The remainder of this subsection is devoted to two ideas: the impact of the minimal degree condition and the pivotal role played by convex geometry, which has so far been hidden.

To get a feel for the degree condition, we temporarily assume that the subvariety is a finite set of real points in . Although this variety is reducible whenever there is more than one point, this zero-dimensional case nevertheless develops some valuable intuition. If is a variety of minimal degree, then it is a set of real points which span the ambient space . Hence, we may choose coordinates on such that , where are the standard basis vectors for .

Why can we express every nonnegative quadratic form as a polynomial sum of squares on this zero-dimensional variety? The answer essentially comes from interpolation. The coordinate function vanishes at all points in except for all . It follows that, for any nonnegative quadratic form on , the equality

holds in the homogeneous coordinate ring of , proving that is a polynomial sum of squares. If the subvariety has more than real points (which implies that ), then the interpolation is no longer possible, because there exists a nontrivial linear relation between the values of the linear forms at the points on . These linear relations impose constraints on the possible values of quadratic sums of squares on allowing us to separate and .

For the surface , Hilbert 15 already used these linear relations to prove that the cone of sums of squares is properly contained in the cone of nonnegative polynomials. A hyperplane section of the subvariety corresponds to a cubic curve in the plane. Cutting with two hyperplanes, one obtains the intersection of two cubic curves in the plane. By Bézout’s Theorem, two general cubics intersect in points, so . By choosing the two cubics appropriately, we can arrange for the intersection points to all be real. These nine points of intersection lie in and are therefore linearly dependent. Moreover, the values of cubic forms on these points are also linearly related. In classical algebraic geometry, this is known as the Cayley–Bacharach theorem and is stated as: any cubic passing through any eight points of the intersection must also pass through the ninth point. Exploiting this linear relation, Hilbert showed that sum-of-square cone is properly contained in the nonnegativity cone . Robinson later used Hilbert’s technique to construct an explicit nonnegative ternary form that is not a sum of squares; see 23.

What serves as the bridge between complex algebraic geometry and semialgebraic nonnegativity results? For the work under discussion, the unifying answer is convex geometry. Both and are more than just subsets of the vector space of quadrics on the subvariety . They are closed convex cones. As convex cones, these subsets are closed under taking linear combinations with nonnegative coefficients. Being closed means that these subsets are closed sets in the natural Euclidean topology on the real vector space .

Duality is an intrinsic feature of convex geometry. Every closed convex cone is equal to the intersection of the closed half-spaces that contain it. The linear inequalities defining these closed half-spaces form a convex cone in the dual vector space, unimaginatively called the dual cone. The most accessible convex cones are polyhedral, defined by finitely many linear inequalities. Neither nor is a polyhedral cone. Their dual cones can be quite complicated. Fortuitously, the dual cone belongs to the next best class of convex cones.

For this part of our story, the cone of positive semidefinite matrices plays a distinguished role. The positive semidefinite quadratic forms constitute the closed convex cone inside the real vector space of quadratic polynomials. Moreover, the sum-of-squares cone is the image of the cone under the canonical linear projection from to . It follows that the dual cone is spectrahedral—this cone can be represented as a linear matrix inequality or equivalently as the intersection of with a linear affine subspace. Understanding the minimal generators of the spectrahedral cone underpins our advancements.

Convex duality also produces fruitful connections between real algebraic geometry and real analysis. The dual cone consists of the linear functionals that are nonnegative on . Given a linear functional that evaluates nonnegatively on nonnegative functions, one hopes that it may be represented as integration with respect to a measure supported on . This forecast is often correct. For example, the equality between nonnegative polynomials and sums of squares in the univariate leads to a solution of the Hamburger, Hausdorff, and Stiltjes moment problems; see Chapters 3 and 9 in 25.

The classification

Varieties of minimal degree come in three flavors. The most palatable are quadratic hypersurfaces. The real quadratic forms satisfying the hypothesis of Theorem 1.6 are necessarily indefinite. Specializing to this case, the theorem asserts that, when a quadratic form is nonnegative on the set of real points at which an indefinite quadratic form vanishes, there exists a linear combination of the two forms that is positive semidefinite. This statement is equivalent to the well-known -Lemma (or -procedure) in the optimization community. When , this covers the quadratic forms in Theorem 1.1.

The second flavor is an infinite family of projective varieties called rational normal scrolls. Every member in this family is a smooth toric variety, but there are infinitely many in each dimension. The one-dimensional family members are the rational normal curves arising as the Veronese embeddings of . This family corresponds to the binary forms in Theorem 1.1.

The third flavor consists of just one variety, namely the Veronese surface in . This projective variety is defined by . This exceptional variety corresponds to the sporadic case of the ternary quartics in Theorem 1.1.

The structure of the rational normal scrolls warrants a closer look. The surfaces in this family are toric varieties corresponding to lattice polygons of the form ; see Figure 3.

Figure 3.

The polygon of the rational normal scroll when and .

This polygon defines a map from the torus to given by , where monomials have exponent vectors equal to the lattice points of the polygon. The closure of the image is the rational normal scroll corresponding to the polygon. It is a projective embedding of the th Hirzebruch surface. There are two rational normal curves, one of degree (for ) and one of degree (for ), contained in this surface. The scroll is swept out by the lines joining the two points on the rational normal curves for the same values of . Assuming that , one may homogenize and write as the map sending the pair to

When and , we recoup the first map.

Higher-dimensional rational normal scrolls are constructed in the same way. To obtain a variety of dimension , there are rational normal curves in some (whose spans do not intersect) and the scroll is swept out by the linear spaces spanned by the points on the curves for the same value of .

More generally, a lattice polytope determines a monomial map by interpreting the lattice points lying in the polytope as exponent vectors. Toric geometry provides a dictionary between the properties of the polytopes and the properties of the (closure of the) image of the corresponding monomial map. A toric surface is a variety of minimal degree if and only if the polygon does not contain any lattice points in its interior. Excluding the polygons of the rational normal scrolls that have height one (like the one in Figure 3), there is only one more example: the triangle corresponding to the Veronese surface .

Restricting a quadratic form to a rational normal scroll of dimension yields an element in two variables that has degree in and degree in (say ). In the higher-dimensional cases, it is a form that has degree in variables , , …, and some degree in . Homogenizing gives a form that is homogeneous of degree in the variables , , …, and homogeneous of degree in , . In 11, Man Duen Choi, T. Y. Lam, and Bruce Reznick called these elements biforms. A biform can also naturally be thought of as a symmetric matrix with real polynomial entries that is pointwise positive semidefinite on as in 14.

From a historical perspective, Theorem 1.6 unites two fundamental results that developed independently in the 1880s. As we discuss the case of reducible varieties, we will see that a similar story repeated about 100 years later.

The reducible case

Thus far, we have emphasized sums of squares on irreducible varieties. However, Theorem 1.6 extends to all varieties including the reducible ones. In this subsection, we discuss some of the ideas involved in this generalization as well as some of its applications.

The varieties of minimal degree have the smallest possible degree among irreducible varieties which span . In the reducible case, this inequality no longer holds (think of two skew lines in ). The right generalization of this geometric concept is not immediately clear, but it turns out to be algebraic and involves syzygies. Consider homogeneous quadratic polynomials and in three variables defining two quadratic curves in . A syzygy is a linear relation between and where the coefficients are elements in the polynomial ring . If at least one of or is irreducible, then every syzygy between them is generated by the obvious one: . When the polynomials and have a common linear factor, then a linear syzygy exists. For example, when and , we have .

For a general set of polynomials, there can be syzygies among the syzygies. This data is usually organized into a minimal free resolution. One good way to gauge the complexity of such a resolution is called the Castelnuovo–Mumford regularity. A linear space has Castelnuovo–Mumford regularity . Varieties of minimal degree have the smallest Castelnuovo–Mumford regularity among the varieties that are not linear spaces, namely . For our purposes, -regular varieties are the right generalization for the concept of minimal degree. The next result first appeared as Theorem 9 in 6.

Theorem 1.7 (Blekherman, Sinn, and Velasco).

Let be a nondegenerate totally-real projective variety in . Every nonnegative quadratic form on is a sum of squares, modulo the defining ideal of , if and only if is a -regular variety.

What are all the -regular varieties? In 2006, David Eisenbud, Mark Green, Klaus Hulek, and Sorin Popescu gave a complete classification and a beautiful geometric characterization. A subvariety is -regular if and only if, for any linear subspace such that is a finite set, this finite set is linearly independent (linear independence has to be defined carefully when the intersection is a nonreduced zero-dimensional scheme).

Instead of digging into the details of this classification, we concentrate on a special case, namely arrangements of coordinate subspaces. We consider projective varieties , where each is a linear subspace in spanned by some set of standard coordinate vectors , where . We also assume that corresponding homogeneous ideals are generated in degree . For the combinatorially inclined, these subspace arrangements correspond to flag complexes. The definition ensures that each arrangement is determined by the coordinate lines in that it contains.

Alternatively, each such arrangement of coordinate subspaces is determined by a graph. The graph has vertices corresponding to coordinates in . There is an edge between vertex and vertex when the coordinate subspace spanned by the th and th variables is contained in . In other words, for each irreducible component , we add the clique on all the vertices satisfying . Hence, the ideal defining the subvariety is generated by all the monomials such that is not an edge in our graph; see Figure 4. To stress that the subvariety comes from a graph , we write .

Figure 4.

The graph corresponding to the subspace arrangement in defined by the ideal .

The bijection between these subspace arrangements and graphs allows us to translate combinatorial properties of the graph into geometric properties of the subvariety . For instance, the dimension of is one less than the clique number (the size of the largest clique in ). For which graphs is a -regular variety?

Theorem 1.8 (Fröberg).

The variety is -regular if and only if the graph is chordal.

A graph is chordal if it contains no induced cycles of length at least four. In spite of their seemingly innocuous definition, chordal graphs are very useful and arise naturally in combinatorics, numerical linear algebra, and semidefinite programming; see 26. Theorem 1.7 shows that chordal graphs determine the only varieties on which nonnegative quadratic forms are sums of squares.

What does it mean for the quadratic form , where , to be nonnegative when restricted to ? The restriction remembers certain minors of the matrix , but does not know about other entries. For the graph in Figure 4, the restriction of the quadratic form to the -dimensional linear subspace defined by the ideal corresponds to the upper left