Skip to Main Content

Persistence Over Posets

Woojin Kim
Facundo Mémoli

Communicated by Notices Associate Editor Emilie Purvine

Article cover

In topological data analysis (TDA), the shape of a dataset is often encoded into a system of vector spaces and linear maps over a partially ordered set (poset). We give an overview of how summaries of such systems can be constructed by using ideas from combinatorics.

Persistent Homology

Datasets are often given as point clouds: finite sets of points in Euclidean space. Examples include the three-dimensional coordinates of all atoms in a sample of a given material, a point cloud produced by a 3D scanner, and high-dimensional data such as a spreadsheet containing clinical features of a group of diabetes patients. The “shape” of a point cloud may provide useful information about the underlying phenomena that generated the data. If stands for the dataset of clinical features of diabetic patients mentioned above, and points in appear to fall into a number of distinct clusters, then this may indicate the presence of different subtypes of diabetes amongst the patients.

In algebraic topology, the shape of a simplicial complex (or of a topological space) can be studied via homology Mun84. Homology provides a way to associate algebraic structures such as groups or vector spaces to a simplicial complex in order to capture some aspect about the shape of ; e.g., is connected? does it have holes?

Given an integer , the -th homology of the simplicial complex with coefficients in a field , denoted , is a vector space over . Its dimension is called the -th Betti number of (with coefficients in ) which is a count of the -dimensional holes in ; e.g., the 0th, 1st, and 2nd Betti numbers equal the numbers of connected components, holes bounded by a closed loop, and cavities bounded by a closed two-dimensional region in , respectively.

Figure 1.

The pipeline of persistent homology. Multiscale clustering features and circular features of the input dataset are encoded into and barcodes respectively. Intervals in the barcode are in bijection with points in the persistence diagram.

Graphic without alt text

Given a simplicial map between simplicial complexes and , homology induces a linear map . Thus, homology is a functor from the category of simplicial complexes and simplicial maps to the category of vector spaces and linear maps over .

Persistent homology

Within TDA, persistent homology refers to a method for associating multiresolution topological features to a given dataset. The usual pipeline is depicted in Figure 1.

In Step 1, one often uses Vietoris–Rips complexes. Given a dataset, modeled as a finite metric space , e.g., a point cloud in , for a real number , the Vietoris–Rips complex is an abstract simplicial complex consisting of all the finite subsets in which every pair of points is within distance . The nested family is called the Vietoris–Rips filtration of . In Step 2, for each , we apply the homology functor to this filtration and obtain a persistence module over .

As we will see momentarily, persistence modules over are a particular instance of the more general concept of persistence modules over posets.

Recall that a poset is a nonempty set equipped with a relation on that is (i) Reflexive: for all . (ii) Transitive: For , if and , then . (iii) Anti-symmetric: If and , then . Throughout this paper, all posets will be assumed to be connected, i.e., for any , there is a sequence in such that and are comparable for each .

Posets are convenient gadgets for “indexing” simplicial filtrations and persistence modules.

Definition 1.

Let be a poset. A persistence module over (-module in short) is a system of finite-dimensional vector spaces , , and linear maps , such that for each , is the identity on , and

The maps are called internal morphisms. Since any poset can be regarded as a category, is actually a functor from to the category of finite-dimensional vector spaces and linear maps over the field .⁠Footnote1 Two -modules and are isomorphic, denoted by , if there are linear isomorphisms for all , so that for ,

1

In the literature, is often referred to as a pointwise finite-dimensional persistence module.

When is the result of applying the homology functor to the Vietoris–Rips filtration of a dataset , a great deal of information about the shape of can be absorbed by the persistence diagram or into the barcode of (cf. Figure 1). The definitions of these invariants as well as their relationship will be recalled in later sections.

The need for a more general framework

Practical data analysis scenarios necessitate methods that can cope with more than one parameter. For instance, a dataset might have nonuniform density (see Figure 2), possibly due to noise produced during the acquisition process or due to underlying scientific phenomena. In such scenarios, in addition to a geometric scale parameter, one may wish to incorporate a (co)density parameter and obtain an increasing family of simplicial complexes indexed by with the product order , i.e., iff for . The result of applying the homology functor to such an -indexed family, or more generally, to an analogously defined -indexed family, is called a multiparameter persistence module CZ09.

Figure 2.

(Left) A dataset consisting of two circular clusters together with some outliers (i.e., “noise”). Define the codensity function of as for . Notice that attains small values only in ‘dense’ regions of . (Right) For , let be the subset of points with . Notice that for small, does not contain any of the outliers. The figure depicts the Vietoris–Rips complexes for some values and .

Graphic without alt text

There are scenarios that give rise to persistence modules indexed by posets other than other than . For example, the time-evolution of the positions of animals during collective motion can lead to considering zigzag posets

where, for each , stands for either or CdS10KM22.

Beyond these scenarios, and with a great deal of foresight, in BdSS15, Bubenik et al. proposed to consider the phenomenon of persistence for parameters taken from general posets.

Remark 1.

In this article, we restrict ourselves to persistence modules over finite connected posets , which are general enough indexing sets for modeling the type of datasets that arise in practice.

For example, for an integer , the linearly ordered poset

is often used to succinctly encode - or -modules.

Example 1 (-module).

Given a finite metric space , the Vietoris–Rips filtration of consists of finitely many distinct simplicial complexes. Hence, for the persistence module , there exist in so that is a linear isomorphism whenever with , for some where . In such a case, the -module

determines the isomorphism type of .

Connections with quiver representations

A quiver is a finite directed graph. Given a quiver , the assignment of a finite-dimensional vector space to each vertex and a linear map to each arrow (between the participating vector spaces) is called a representation of ; see DW05.

A finite poset induces the quiver on the vertex set with arrows for all pairs such that there is no with . Note that a -module canonically induces a representation of : To each vertex of , the vector space is assigned. To each arrow of , the linear map is assigned. Note that the resulting representation of satisfies the following commutativity condition: For every , if there are multiple directed paths from to in , then the compositions of the linear maps along each of those paths agrees with . Conversely, a representation of satisfying the commutativity condition induces a -module in the obvious way.

Example 2.

The poset from Equation 2 induces the quiver

Example 3.

For integers , consider the poset equipped with the product order. Then, for example, the poset () induces the quiver

The poset is often used to encode -modules.

Example 4.

The following commutative diagram defines an -module which can be obtained by applying the 0-th homology functor to the -indexed simplicial filtration depicted next:

Graphic without alt text

Rank Invariant and Persistence Diagrams

We first recall the classical notions of rank invariant and persistence diagrams of -modules CSEH07CZ09, and then we describe a natural way to extend those notions to the setting of -modules.

Rank invariant

For any integer and any , we call an interval in . Let be the set of all intervals in . The rank invariant of a given persistence module is defined to be the function

It is important to note that (i) the rank invariant is preserved under isomorphism and that (ii) it encodes the dimensions of all vector spaces for since we have whenever . Note also that (iii)  is monotone, i.e.,

whenever ; This follows from the fact that the map factors through the map . By convention, we set for every . Next, we utilize the rank invariant to compute, for each , a count of the “persistent features” that start at and end at , leading to the notion of persistence diagram of .

Figure 3.

The rank invariant and the persistence diagram of a given . At each point with in the grid, nonzero and are recorded (e.g., and ).⁠Footnote2

2

If encodes an or -module (as in the scenario of Example 1), the visualization of may require a scale readjustment CSEH07, p.106.

Graphic without alt text

Persistence diagrams

Fix an integer . Let and a vector . We say that is born at the point where

We say that lives until the point (or dies at ) where

The lifespan of is the interval . The persistence diagram of a given persistence module is then defined to be the function

sending each to the maximal number of linearly independent vectors in whose lifespans are exactly .

From to

It turns out that can be computed in terms of . Indeed, let . This implies that there exist linearly independent vectors in that are born at or before , and live until or later, i.e., . Hence,

equals the maximal number of independent vectors in that were born at precisely and live until or later. Similarly,

equals the maximal number of independent vectors in that were born at precisely and live until or later. Hence, the difference equals the maximal number of independent vectors in whose lifespans are exactly and thus we arrive at the following formula:

In practice, persistence diagrams are represented as points (with multiplicity) in the two-dimensional grid: only those intervals for which are recorded; see Figure 3.

The earliest appearance of formula 7 in the TDA community that is known to the authors is LF97. This expression appears prominently in the work of Cohen-Steiner et al. on the stability of persistence diagrams CSEH07.

From to

Let us consider as a poset ordered by containment . The poset consists of elements. By the order-extension principle, we can index the intervals in as so that implies . Below we also use the convenient notation .

Since consists of elements, we identify with a vector in whose -th entry is . Similarly, is identified with a vector of the same dimension. Consider the square matrix of length whose -entry is

The matrix is upper-triangular and all of its diagonal entries are . Therefore, is invertible. Note that Equation 7 amounts to

which implies that and determine each other.

Equation 9 permits computing in terms of as follows. First, one verifies that the inverse of has entries

Therefore, the equality implies that

Example 5 (An application of Equation 10).

The equality in Figure 3 can be derived from the fact that there are exactly two points in the upper-left quadrant with corner point for which .

Example 6 (Persistence diagram of an -module).

Let be an -module:

Let the rank invariant of be given by

for integers . Then, the persistence diagram of is given by

where and are the dimensions of the kernel and cokernel of the map . Under the identification

in this case, Equation 9 reads:

Remark 2.

From Equation 11, we infer that there are bases of the vector spaces and