Functions preserving positive definiteness for sparse matrices

We consider the problem of characterizing entrywise functions that preserve the cone of positive definite matrices when applied to every off-diagonal element. Our results extend theorems of Schoenberg [Duke Math. J. 9], Rudin [Duke Math. J. 26], Christensen and Ressel [Trans. Amer. Math. Soc., 243], and others, where similar problems were studied when the function is applied to all elements, including the diagonal ones. It is shown that functions that are guaranteed to preserve positive definiteness cannot at the same time induce sparsity, i.e., set elements to zero. These results have important implications for the regularization of positive definite matrices, where functions are often applied to only the off-diagonal elements to obtain sparse matrices with better properties (e.g., Markov random field/graphical model structure, better condition number). As a particular case, it is shown that \emph{soft-thresholding}, a commonly used operation in modern high-dimensional probability and statistics, is not guaranteed to maintain positive definiteness, even if the original matrix is sparse. This result has a deep connection to graphs, and in particular, to the class of trees. We then proceed to fully characterize functions which do preserve positive definiteness. This characterization is in terms of absolutely monotonic functions and turns out to be quite different from the case when the function is also applied to diagonal elements. We conclude by giving bounds on the condition number of a matrix which guarantee that the regularized matrix is positive definite.


Introduction
In one of his celebrated papers, Positive definite functions on spheres [12], I.J. Schoenberg proved that every continuous function f : (−1, 1) → R having the property that the matrix (f (a ij )) is positive semidefinite for every symmetric positive semidefinite matrix (a ij ) with entries in (−1, 1) has a power series representation with nonnegative coefficients. Functions satisfying this latter property are often known as absolutely monotonic functions. The aforementioned result has been generalized by Rudin [11] who showed that the class of absolutely monotonic functions fully characterizes the class of (not necessarily continuous) functions mapping every positive (semi)definite sequence to a positive (semi)definite sequence. Equivalently, the class of absolutely monotonic functions are exactly the functions mapping sequences of Fourier-Stieltjes coefficients to sequences of Fourier-Stieltjes coefficients.
In this paper, we revisit and extend Schoenberg's results with important modern applications in mind. Positive definite matrices arise naturally as covariance or correlation matrices. Consider an n × n covariance (or correlation) matrix Σ. In modern high-dimensional probability and statistics, two of the most common techniques employed to improve the properties of Σ are the so-called hard-thresholding and soft-thresholding procedures. Hard-thresholding a positive definite matrix entails setting small off-diagonal elements of Σ to zero. This technique has the advantage of eliminating spurious or insignificant correlations, and leads to sparse estimates of the matrix Σ. These thresholded matrices generally have better properties (such as better conditioning, graphical model structure) and lead to models that are easier to store, interpret, and work with. At the same time, in contrast with most "regularization" techniques, this procedure incurs very little computational cost. Hence it can be applied to ultra high-dimensional matrices, as required by many modern-day applications (see [15,9,1,4,3,5]).
An important property of thresholded covariance matrices that is generally required for applications is positive definiteness. Nonetheless, regularization procedures such as hard-thresholding are often used indiscriminately, and with very little attention paid to the algebraic properties of the resulting thresholded matrices. It is therefore critical to understand whether or not the cone of positive definite matrices is invariant with respect to hard-thresholding (and other similar operations), especially in order for these regularization methods to be widely applicable. We now formalize some notation. Given > 0, the hard-thresholding operation is equivalent to applying the function f H : R → R defined by to every off-diagonal element of the matrix Σ. As mentioned above, modern probability and statistics require that the thresholding function is applied only to off-diagonal elements. As a consequence, previous results from the mathematics literature cannot be directly used to determine whether hard-thresholding and other similar techniques preserve positive definiteness. The aim of this paper is to investigate this important question, especially given its significance in contemporary mathematical sciences. Algebraic properties of hard-thresholded matrices have been studied in detail in [3], where it is shown that, even if the original matrix is sparse, hard-thresholding is not guaranteed to preserve positive definiteness. Thus the function f H does not map the cone of positive definite matrices into itself.
A type of function that is equally frequently used in the literature is the so-called soft-thresholding function f S : R → R, given by (1.2) f S (x) = sgn(x)(|x| − ) + , where sgn(x) denotes the sign of x and (a) + = max(a, 0). Compared to hard-thresholding, softthresholding continuously shrinks all elements of a matrix to zero, thus giving more hope of preserving positive definiteness than hard-thresholding. To the authors' knowledge, a detailed analysis of whether or not this is true has not been undertaken in the literature. It is also natural to ask whether the hard or soft-thresholding function can be replaced by other functions in order to induce sparsity (i.e., zeros) in positive definite matrices and, at the same time, maintain positive definiteness.
The first theorem of this paper extends results from [3] and shows the rather surprising result that, for a given positive definite matrix, even if it is already sparse, there is generally no guarantee that its soft-thresholded version will remain positive definite. We state this result below: Theorem. Let G = (V, E) be a connected undirected graph and denote by P + G the cone of symmetric positive definite matrices with zeros according to G where P + denotes the cone of all symmetric positive definite matrices. For > 0, denote by η (A) the soft-thresholded matrix Then the following are equivalent: (1) There exists > 0 such that for every A ∈ P + G , we have η (A) > 0; (2) For every > 0 and every A ∈ P + G , we have η (A) > 0; (3) G is a tree.
Note that for a given matrix A ∈ P + G , by the continuity of the eigenvalues, there exists > 0 such that η (A) > 0. However, different matrices can lose positive definiteness for different values of . The existence of a "universal" value 0 > 0 with the property that η 0 (A) > 0 for every A ∈ P + G would have tremendous practical implications. Indeed, if such an 0 existed, matrices could be safely soft-thresholded to remove some of their small entries while retaining positive definiteness. The previous theorem asserts that, except when the structure of zeros of A corresponds to a tree, such an 0 unfortunately does not exist.
Following the previous result, we extend Schoenberg's results by fully characterizing the functions that preserve positive definiteness when applied to every off-diagonal element. The statement of the main theorem of the paper is given below.
The above result does come as a surprise. It formally demonstrates that, except in trivial cases, no guarantee can be given that applying a function to the off-diagonal elements of a matrix will preserve positive definiteness. There are thus no theoretical safeguards that thresholding procedures used in innumerable applications will maintain positive definiteness.
The remainder of the paper is structured as follows. Section 2 reviews results that have been recently established for hard-thresholding. In Section 3, a characterization of matrices preserving positive definiteness upon soft-thresholding is given. The characterization turns out to have a non-trivial relationship to graphs and the structure of zeros in the original matrix. Section 4 then studies the behavior of positive semidefinite matrices when an arbitrary function f is applied to every element of the matrix. A review of previous results from the literature is first given. The results are then extended to include the case where the function is applied only to the off-diagonal elements of the matrix. A complete characterization of functions preserving positive definiteness in this modern setting is given. Finally, Section 5 gives sufficient conditions for a matrix A and a function f so that the matrix f * [A] remains positive definite. In particular, it is shown that the matrix f * [A] is guaranteed to be positive definite as long as the condition number of A is smaller than an explicit bound.
Notation: Throughout the paper, we shall make use of the following graph theoretic notation. Let G = (V, E) be an undirected graph with n ≥ 1 vertices V = {1, . . . , n} and edge set E. Two vertices a, b ∈ V , a = b, are said to be adjacent in G if (a, b) ∈ E. A graph is simple if it is undirected, and does not have multiple edges or self-loops. We will only work with finite simple graphs in this paper.
We say that the graph G = (V , E ) is a subgraph of G = (V, E), denoted by G ⊂ G, if V ⊆ V and E ⊂ E. In addition, if G ⊂ G and E = (V × V ) ∩ E, we say that G is an induced subgraph of G. A graph G is called complete if every pair of vertices are adjacent. A path of length k ≥ 1 from vertex i to j is a finite sequence of distinct vertices v 0 = i, . . . , v k = j in V and edges (v 0 , v 1 ), . . . , (v k−1 , v k ) ∈ E. A k-cycle in G is a path of length k − 1 with an additional edge connecting the two end points. A graph G is called connected if for any pair of distinct vertices i, j ∈ V there exists a path between them.
A special class of graphs are trees. These are connected graphs on n vertices with exactly n − 1 edges. A tree can also be defined as a connected graph with no cycle of length n ≥ 3, or as a connected graph with a unique path between any two vertices.
Graphs provide a useful way to encode patterns of zeros in symmetric matrices by letting (i, j) ∈ E if and only if a ij = 0. Denote by P + n the cone of n × n symmetric positive definite matrices, and by P + the cone of symmetric positive definite matrices (of any dimension). We shall write A > 0 whenever A ∈ P + and A > B if A − B ∈ P + . Similarly, we write A ≥ 0 whenever A is symmetric positive semidefinite, and A ≥ B if A − B ≥ 0. We define the cone of symmetric positive definite matrices with zeros according to a given graph G with n vertices by Denoting the space of n × n matrices by M n , recall that a (n 1 + n 2 ) × (n 1 + n 2 ) symmetric block matrix Finally, for a symmetric matrix A, we shall denote by λ min (A) and λ max (A) its smallest and largest eigenvalues respectively.

Review of relevant results on hard-thresholding
Algebraic properties of hard-thresholding have been studied in [3]. In particular, two types of hard-thresholding operations have been considered. Let G be a graph with n vertices. The graph G induces a hard-thresholding operation, mapping every symmetric n × n matrix A = (a ij ) to a matrix A G defined by We say that the matrix A G is obtained from A by thresholding A with respect to the graph G.
The following result from [3] fully characterizes the graphs preserving positive definiteness upon thresholding.
Theorem 2.1 ([3, Theorem 3.1]). Let A be an arbitrary symmetric n × n matrix such that A > 0, i.e., A ∈ P + n . Threshold A with respect to a graph G = (V, E) with the resulting thresholded matrix denoted by A G . Then where G i , i = 1, . . . , τ , denote disconnected, complete components of G.
The above theorem asserts that a positive definite matrix A is guaranteed to retain positive definiteness upon thresholding with respect to a graph G only in the trivial case when the thresholded matrix can be reorganized as a block diagonal matrix where, within each block, there is no thresholding. This result can be further generalized to matrices in P + G which are thresholded with respect to a subgraph H of G. The following theorem shows that thresholding matrices from this class yields essentially the same results as in the complete graph case.  Theorems 2.1 and 2.2 treat the case of thresholding elements regardless of their magnitude. In practical applications however, in order to induce sparsity, hard-thresholding is often performed on the smaller elements of the positive definite matrix. The following result shows that only matrices with zeros according to a tree are guaranteed to retain positive definiteness when hard-thresholded at a given level > 0. (1) There exists > 0 such that for every A ∈ P + G , the hard-thresholded version of A at level is positive definite; (2) For every > 0 and every A ∈ P + G , the hard-thresholded version of A at level is positive definite; (3) G is a tree.
The result above demonstrates that hard-thresholding positive definite matrices at a given level can also quickly lead to a loss of positive definiteness, though it is not as severe as when thresholding with respect to a graph. Recall that hard-thresholding a matrix A at level is equivalent to applying the hard-thresholding function given in (1.1) to every off-diagonal element of A. It is thus natural to replace the hard-thresholding function by other functions to see if positive definiteness can be retained. A popular alternative is the soft-thresholding function (see (1.2), (1.4), and Figure 1). The next section is devoted to studying the algebraic properties of soft-thresholded positive definite matrices. We conclude this section by noting that Theorem 2.4 also yields a characterization of trees via thresholding matrices.

Soft-thresholding
We now proceed to the more intricate task of characterizing the graphs G for which every matrix A ∈ P + G retains positive definiteness when soft-thresholded at a given level > 0. As softthresholding is a continuous function as opposed to the hard-thresholding function, it would seem that soft-thresholding may have better properties in terms of retaining positive definiteness. Definition 3.1. For a matrix A = (a ij ) and > 0, the soft-thresholded version of A at level is given by: Theorem 3.2. Let G = (V, E) be a connected undirected graph. Then the following are equivalent: Regardless of the continuity of the soft-thresholding function f S , Theorem 3.2 demonstrates that soft-thresholding has the same effect as hard-thresholding when it comes to retaining positive definiteness (see Theorem 2.4). Theorem 3.2 also gives yet another characterization of trees.
Remark 3.4. The proof of Theorem 3.2 given below for soft-thresholding is more challenging as compared to the proof of Theorem 2.4 for hard-thresholding. In [2], an explicit example of a matrix A ∈ P + Cn losing positive definiteness upon hard-thresholding is constructed for all n ≥ 3. A direct construction of a matrix losing positive definiteness when soft-thresholded is elusive. The proof below proceeds by induction: we start with a matrix A 3 ∈ P + C 3 losing positive definiteness when soft-thresholded at level = 0.1. First, the matrix A 3 is determined numerically. Thereafter, a matrix A n ∈ P + Cn losing positive definiteness when soft-thresholded at the same level is then constructed inductively by exploiting properties of Schur complements.
Proof of Theorem 3.2. (1 ⇒ 3) We shall prove the contrapositive form. Let C n denote the cycle graph with n vertices. Recall that a tree is a graph without cycle of length n ≥ 3. Thus, if G is not a tree, then it contains a cycle of length greater or equal than 3. Therefore, to prove this part of the result, it is sufficient to construct, for every n ≥ 3, a positive definite matrix A n ∈ P + Cn which does not retain positive definiteness when soft-thresholded at the given level > 0. We will begin by providing such examples of matrices for a fixed value of = 0.1. We will then show how matrices with the same properties can be built for arbitrary values of > 0.
The following matrix provides an example for n = 3, with threshold level = 0.1. Also, notice that 1) the matrix A 3 where which is A 3 with the (1, 3) and (3, 1) elements set to zero, is positive definite, and 2) the matrix A stays positive definite when only the (1, 3) and (3, 1) elements are soft-thresholded at level = 0.1. We will construct a similar matrix A n for n ≥ 4 inductively. Properties 1) and 2) will be important to perform the induction step. Indeed, assume that, for some n ≥ 3, there exists a matrix A n ∈ P + Cn which loses positive definiteness when soft-thresholded at level = 0.1. Let us assume also that the matrix A n obtained from A n by setting the (1, n) and (n, 1) elements to 0 is positive definite and that the matrix obtained from A n by soft-thresholding only the (1, n) and (n, 1) elements at level is positive definite. These properties are satisfied for n = 3 by the matrix A 3 given above. We will build a matrix A n+1 ∈ P + C n+1 satisfying the same properties. Let a n denote the (1, n) element of A n . For every real number r, let r := sgn(r)(|r| − ) + denote the value of r soft-thresholded at level . To simplify the notation, let us denote by a n, the value of (a n ) . Now consider the matrix Notice that A n+1 has zeros according to C n+1 . We will prove that a n+1 , b, α can be chosen so that A n+1 satisfies the required properties. Let us first choose the value of a n+1 as a function of α and b in such a way that This is always possible if |b| > . Indeed, if |b| > , then a n+1 satisfies equation (3.5) for We claim that we can choose α > 0 and b > such that: (1) A n+1 is positive definite; (2) A n+1 is positive definite; (3) A n+1 is not positive definite when soft-thresholded at level , i.e., η (A n+1 ) > 0; (4) A n+1 is positive definite when only its (1, n + 1) and (n + 1, 1) elements are soft-thresholded at level .
Conditions (1) and (3) are the two conditions needed to prove that the matrix A n+1 satisfies the theorem. Conditions (2) and (4) are required in the induction step. First, note that the matrix A n+1 has been constructed in such a way that the Schur complement of α in η (A n+1 ) is equal to η (A n ). Therefore, by the induction hypothesis, η (A n+1 ) is not positive definite for any value of |b| > and α > 0. This proves (3).
Since α > 0, to prove properties (1), (2) and (4), we only need to study the Schur complement of α in the three matrices: A n+1 , A n+1 and in the matrix obtained from A n+1 by soft-thresholding the (1, n + 1) and (n + 1, 1) elements. We will prove that properties (1), (2) and (4) hold true asymptotically as α, b → ∞. Therefore, the result will follow by choosing appropriately large values of α and b.
The Schur complement of α in A n+1 is given by where the dots in the above matrices represent zeros. Let us take α = b 3 . Since a n+1 and α depend on the value of b and since is fixed, b becomes the only "free" parameter. We will prove that properties (1), (2) and (4) hold for large values of b. We begin by studying the limiting behavior of different quantities related to the Schur complement (3.7). We will show that a n+1, Now to prove (3.9), recall that, by construction, a n+1 = a n+1, ± where the sign depends on the sign of a n+1 . Therefore The first term tends to 0 as b → ∞ as shown above. Also, since α = b 3 , α → ∞ as b → ∞ and so 2 /α → 0 as b → ∞. This proves equation (3.9).
Using the results in equations (3.8)-(3.10), we now proceed to show that properties (1), (2) and (4) hold true for appropriately large values of b. To prove (1), we only need to show that the Schur complement given by (3.7) is positive definite for large values of b. Indeed, notice that from (3.9) and (3.10), we have as b → ∞. This matrix is exactly the matrix A n with the (1, n) and (n, 1) elements soft-thresholded at level . Therefore, by the induction hypothesis, this matrix is positive definite and so is A n+1 for large values of b. This proves property (1).
To prove property (2) note that the Schur complement of α in A n+1 is given by Notice that the (1, 1) entry of the righthand term is always positive whereas the (n, n) element tends to 0 as b → ∞. Since the matrix A n is positive definite by the induction hypothesis, the Schur complement of α in A n+1 is therefore also positive definite when b is sufficiently large. This proves (2). Similarly, to prove (4), let us consider the Schur complement of α in the matrix A n+1 with the (1, n + 1) and (n + 1, 1) entries soft-thresholded at level We have From (3.5) and (3.8), we therefore have as b → ∞ and so the preceding Schur complement is asymptotic to the matrix A n with the (1, n) and (n, 1) elements soft-thresholded at level . By the induction hypothesis, this matrix is positive definite and therefore the same is true for the matrix A n+1 with the (1, n + 1) and (n + 1, 1) entries soft-thresholded at level when b is large enough. This proves (4). Consequently, a matrix A n+1 satisfying properties (1) to (4) can be obtained by choosing a value of b large enough. This completes the induction. Therefore, for every n ≥ 3, there exists a matrix A n ∈ P + Cn such that η (A n ) is not positive definite for = 0 := 0.1. Now let > 0 be arbitrary. Notice that for α > 0 and any matrix A, it holds that (3.21) η α (αA) = αη (A).
As a consequence, for a given value of n, consider the matrix Then A ∈ P + Cn since A n ∈ P + Cn . Moreover, Since η 0 (A n ) is not positive definite by construction, it follows that η (A) is not positive definite either. This provides the desired example of a matrix A ∈ P + Cn such that η (A) is not positive definite. Therefore, if every matrix A ∈ P + G retains positive definiteness when soft-thresholded at a given level > 0, the graph G must not contain any cycle and so is a tree.
(3 ⇒ 2) The implication in this direction holds for more general functions than the softthresholding function. The proof is therefore postponed to Section 4 (see Theorem 4.18).
Finally, since 2 ⇒ 1 trivially, the three statements of the theorem are equivalent. This completes the proof of the theorem. Corollary 3.5 (Complete graph case). For every n ≥ 3, and every > 0 there exists a matrix A ∈ P + n such that η (A) ∈ P + n .

General thresholding and entrywise maps
The result of the previous section shows that the commonly used soft-thresholding procedure does not map the cone of positive definite matrices into itself. A natural question to ask therefore is whether other mappings are better adept at preserving positive definiteness.
In this section, we completely characterize the functions that do so when applied to every offdiagonal element of a positive definite matrix. We begin by introducing some notation and reviewing previous results from the literature for the case where the function is also applied to the diagonal.

As a consequence, if f [A]
> 0 and the elements of D A are nonnegative, then f * [A] > 0. Such is the case when |f (x)| ≤ |x|.
Remark 4.2. The condition that |f (x)| ≤ |x| is a mild restriction which allows us to conclude that f [A] > 0 ⇒ f * [A] > 0. As we shall see below, the converse is generally false for matrices of a given dimension. Hence the previous results in the literature characterizing functions which preserve positive definiteness, when the function is also applied to diagonal elements, are unnecessarily too restrictive. In this sense, previous results in the field are not directly applicable to problems that arise in modern-day applications.

Background material: Results for f [A]
. It is well-known that functions preserving positive definiteness when applied to every element of the matrix must have a certain degree of smoothness and non-negative derivatives. As we will see later, this is not true anymore when the diagonal is left untouched.  Proof. This follows easily from the non-differentiability of the soft-thresholding function. Corollary 4.5 provides a necessary condition for a function f to preserve positive definiteness when applied elementwise to a positive definite matrix. We shall show below that this condition is also sufficient. We first recall some facts about absolutely monotonic functions and the Hadamard product.
Remark 4.8. Let 0 < α ≤ ∞. A function f : (−α, α) → R can be represented as: for some a n ≥ 0 if and only if f extends analytically to D(0, α) and is absolutely monotonic on (0, α).  Combining Corollary 4.5 and Lemma 4.9, and assuming f is continuous, we obtain the following characterization of functions preserving positive definiteness for every positive semidefinite matrix with positive entries. The same result also appears in [14], where it is shown that the continuity assumption is not required. The following theorem shows that the result remains the same if the entries of the positive semidefinite matrix A are constrained to be in a given interval. Special cases of this result have been proved by different authors; we state only the most general version here.
Recall that one of the primary goals of regularizing positive definite matrices is to "induce sparsity", i.e., set small elements to zero. The following result shows that no thresholding function that induces sparsity is guaranteed to preserve positive definiteness. Proof. Assume f [A] is positive semidefinite for every symmetric positive semidefinite matrix A with entries in (−α, α). Then, by Theorem 4.11,  Proof. Assume first that |V | = 2, and without loss of generality assume (1, 2) ∈ E. Since |f (ξ)| > |ξ|, there exists > 0 such that |f (ξ)| = |ξ| + . Now consider the matrix Recall from Theorem 4.2 that functions preserving positive definiteness when applied to every element of a matrix (including the diagonal) of a given dimension have to be sufficiently smooth, and have non-negative derivatives on the positive real axis. However, when the diagonal is left untouched, the situation changes quite drastically. More precisely, a far larger class of functions preserves positivity, as the following result shows.
for some 0 ≤ c < 1 ∆ . Then f * [A] ∈ P + G for every A ∈ P + G . Proof. For every A ∈ P + G , denote by M A the matrix with entries if a ij = 0 and i = j .
The matrix f * [A] can be written as Since 0 ≤ c < 1 ∆ , an application of Gershgorin's circle theorem demonstrates that M A > 0. As a consequence, by the Schur product theorem, A • M A > 0 and so f * [A] > 0 for every A ∈ P + G . Corollary 4.15 (Complete graph case). Let n ≥ 2 and assume f : R → R satisfies (4.11) |f (x)| ≤ c|x| ∀x ∈ R, for some 0 ≤ c < 1 n−1 . Then f * [A] > 0 for every n × n symmetric positive definite matrix A. The following corollary asserts that when operating on the off-diagonal elements, as compared to all the elements (including the diagonals), there are non-trivial functions "inducing sparsity" (i.e., setting elements to zero) that preserve positive definiteness.
Corollary 4.16. Let G be a graph and let 0 ∈ S ⊂ R. Then there exists a function f : R → R such that: (1) f (x) = 0 if and only if x ∈ S; (2) f * [A] > 0 for every A ∈ P + G . Remark 4.17. Despite the simplicity of the above proofs (especially in contrast to Theorems 3.2, 4.18, and 4.21 of this paper), Proposition 4.14, Corollary 4.15 and Corollary 4.16 have important consequences, namely: (1) Contrary to the case where the function is also applied to the diagonal elements of the matrix (see Theorem 4.3), Corollary 4.15 shows that, when the diagonal is left untouched, preserving every n × n positive semidefinite matrix does not imply any differentiability condition on f . Even continuity is not required. We therefore note the stark differences compared with previous results in the area. (2) Proposition 4.14 shows that preserving positive definiteness is relatively easier for matrices that are already very sparse in term of connectivity, i.e., matrices with bounded vertex degree. (3) Corollary 4.15 suggests that preserving positive definiteness for non-sparse matrices becomes increasingly difficult as the dimension n gets larger.

4.3.
Characterization of functions preserving positive definiteness for trees. Recall that a class of sparse positive definite matrices that is always guaranteed to retain positive definiteness upon either hard or soft-thresholding is the class of matrices with zeros according to a tree (see Theorems 2.4 and 3.2). A natural question to ask therefore is whether functions other than hard and soft-thresholding can also retain positive definiteness. Recall from Lemma 4.13 that for every nonempty graph G, the functions f such that f * [A] ∈ P + G for every A ∈ P + G are necessarily contained in the family (4.12) Note that C is the class of functions contracting at the origin. This "shrinkage" property is often required in practice.
It is natural to ask if we can characterize the set of graphs G for which the functions mapping P + G into itself constitute all of C . The following theorem answers this question.  Thus, the result provides a complete characterization of trees in terms of the maximal family C .
Proof. (⇐) Let G be a tree and assume |f (x)| ≤ |x| for all x. We will prove that f * [A] ∈ P + G for every A ∈ P + G by induction on n = |V |. Consider first the case n = 3. Then G is equal to the A 3 graph with 3 vertices and A can be reconstituted as follows: By computing the determinants of the principal minors, the positive definiteness of A is equivalent to Since |f (x)| ≤ |x|, it follows that (4.17) and so f * [A] > 0. The result is therefore true for n = 3.
Assume the result is true for every tree with n vertices and consider a tree G with n + 1 vertices. LetG be a sub-tree obtained by removing a vertex connected to only one other node. Without loss of generality, assume this vertex is labeled n + 1 and its neighbor is labeled n. Let A ∈ P + G . The matrix A has the form By the induction hypothesis, the n × n principal submatrix A of A stays positive definite when f is applied to its off-diagonal elements, i.e., f * [ A] > 0. It remains to be shown that the Schur complement of α in f * [A] is positive definite. Note first that the Schur complement of α in A is given as: Since by assumption A > 0, we have S > 0. We also have S ∈ P + (⇒) Conversely, assume now that G is not a tree and let > 0. Then, by Theorem 3.2, there exists a matrix A ∈ P + G such that (f S ) * [A] ∈ P + G , where f S denotes the soft-thresholding function (see (1.2)). This concludes the proof. Remark 4.19. A similar result also holds for hard-thresholding with respect to a graph. Indeed, note that every subgraph of a graph G is a union of disconnected induced subgraphs if and only if G is a tree. As a consequence, matrices in P + G are guaranteed to retain positive definiteness when thresholded with respect to any subgraph of G if and only if G is a tree (see Theorem 2.2 and [3, Corollary 3.5]). Hence, trees can be characterized by all four types of thresholding operations that have been considered: 1) graph thresholding, 2) hard-thresholding, 3) soft-thresholding, and 4) general thresholding.
Remark 4.20. Though Theorem 4.18 establishes that the class C is maximal when G is a tree, it is nevertheless important to recognize that even when G is not a tree, there are sparsity inducing functions which retain positive definiteness for all A ∈ P + G (see Corollary 4.16) 4.4. Proof of the main result. We now proceed to completely characterize the functions f preserving positive definiteness for matrices of arbitrary dimension, when the diagonal is not thresholded. (1) g is analytic on the disc D(0, α); (2) g ∞ ≤ 1; (3) g is absolutely monotonic on (0, α). When α = ∞, the only functions satisfying the above conditions are the affine functions f (x) = ax for 0 ≤ a ≤ 1.  Note that Denoting by 1 m the m × m matrix with every entry equal to 1, we obtain for every m ≥ 1, Equivalently, using (4.23), Equivalently, Dividing both sides by m and letting m → ∞, it follows that f [A] is positive semidefinite for every symmetric positive semidefinite n × n matrix A with entries in (−α, α). Hence, by Theorem 4.11, f is analytic on D(0, α) and is absolutely monotonic on (0, α), i.e., f (k) (0) ≥ 0 for every k ≥ 0. In other words, Finally, since f satisfies |f (x)| ≤ |x| (see Lemma 4.13), the function g defined by g(0) = 0 and satisfies |g(x)| ≤ 1 for every x, i.e., g ∞ ≤ 1. Therefore, f (x) = xg(x) for a function g that is analytic on D(0, α), absolutely monotonic on (0, α), and satisfies the condition g ∞ ≤ 1.  Since g ∞ ≤ 1, then |f (x)| ≤ |x| and thus the elements of D are non-negative. Hence, f * [A] ≥ 0 for every A ≥ 0 with entries in (−α, α).
In the case when α = ∞, the only bounded absolutely monotonic functions g on (0, ∞) are the constant functions g(x) ≡ a for some a ≥ 0. Since |f (x)| ≤ |x| we must have 0 ≤ a ≤ 1. This completes the proof of the theorem. Theorem 4.21 shows that only a very narrow class of functions are guaranteed to preserve positive definiteness for an arbitrary positive definite matrix of any dimension. In practical applications, thresholding is often performed on normalized matrices (such as correlation matrices) which have bounded entries. In that case, more functions preserve positive definiteness. However, as in the case where the function is applied to the diagonal, the following result shows that no thresholding function can induce sparsity (i.e., set non-zero elements to zero) and, at the same time, be guaranteed to maintain positive definiteness for matrices of every dimension. Proof. The proof is the same as the proof of Corollary 4.12.

Eigenvalue inequalities
The results of Section 4 show that only a restricted class of functions are guaranteed to preserve positive definiteness when applied elementwise to matrices of arbitrary dimension. Moreover, no function can at the same time induce sparsity (have zeros other than at the origin) and simultaneously preserve positive definiteness for every matrix. Hence, a natural question to ask is whether certain properties of matrices (such as a lower bound on the minimum eigenvalue or an upper bound on the condition number) are sufficient to maintain positive definiteness when a given function f is applied to the off-diagonal elements of the matrix. We provide such sufficient conditions in this section. The results are first derived in Section 5.1 for the case when f is a polynomial. They are then extended to more general functions in the subsequent subsection. 5.1. Bounds for polynomials. We first establish some notation. For a polynomial p(x) = d i=0 a i x i , define its "positive" and "negative" parts by: Many of the results in this section are motivated by the following idea. Note that where D is the diagonal matrix D A = diag(a 11 − p(a 11 ), . . . , a nn − p(a nn )). Repeated applications of the Schur product theorem can be used to show that both p + [A] and p − [A] are positive definite when A is symmetric positive definite. Intuitively, a polynomial with a positive part that is "larger" than its negative part should be able to preserve positive definiteness for a wider class of matrices as compared to a polynomial with a "large" negative part. This idea is formalized in Proposition 5.3 below. Before stating the result, recall the following classical result that can be used to bound the eigenvalues of Schur products.
Theorem 5.1 (Schur [13]). Let A, B ∈ P + n . Then for i = 1, . . . , n, We now proceed to state the main result of this subsection.  p(a 11 ), . . . , a nn − p(a nn )). The second assertion follows by the same argument, but then uses Corollary 5.2 to bound the eigenvalues of the Schur product.
The following surprising result shows that some polynomials having negative coefficients can preserve large classes of positive definite matrices. Recall that a correlation matrix is a symmetric positive definite matrix with ones on the diagonal. Proof. Note that λ max (A) < n for every n × n correlation matrix A since trace(A) = n and the eigenvalues of A are all positive. The result follows by Corollary 5.4.
Corollary 5.6 below shows that p * [A] is guaranteed to be positive definite if the condition number of A is sufficiently small. Note that the bound becomes more restrictive as the "negative part" of p becomes larger compared to its "positive part".

5.2.
Extension to more general functions. We now proceed to extend the results of Section 5.1 to more general thresholding functions. We first recall the following well-known result.
Lemma 5.8. Let P + be the set of polynomials with positive coefficients and let r > 0. Then the uniform closure of P + over [−r, r] is the restriction to [−r, r] of the set of analytic functions f (z) = n≥0 a n z n on the disc D(0, r) = {z ∈ C : |z| < r} with a n ≥ 0 for every n ≥ 0 and n≥0 a n r n < ∞. The space W + := W + (1) is often known as the analytic Wiener algebra of analytic functions. The space W + (r) can be seen as a weighted version of the analytic Wiener algebra.