PDFLINK |

# Persistence Over Posets

Communicated by *Notices* Associate Editor Emilie Purvine

In topological data analysis (TDA), the shape of a dataset is often encoded into a system of vector spaces and linear maps over a partially ordered set (poset). We give an overview of how summaries of such systems can be constructed by using ideas from combinatorics.

## Persistent Homology

Datasets are often given as *point clouds*: finite sets of points in Euclidean space. Examples include the three-dimensional coordinates of all atoms in a sample of a given material, a point cloud produced by a 3D scanner, and high-dimensional data such as a spreadsheet containing clinical features of a group of diabetes patients. The “shape” of a point cloud may provide useful information about the underlying phenomena that generated the data. If stands for the dataset of clinical features of diabetic patients mentioned above, and points in appear to fall into a number of distinct clusters, then this may indicate the presence of different subtypes of diabetes amongst the patients.

In algebraic topology, the shape of a simplicial complex (or of a topological space) can be studied via *homology* Mun84. Homology provides a way to associate algebraic structures such as groups or vector spaces to a simplicial complex in order to capture some aspect about the shape of e.g., is ; connected? does it have holes?

Given an integer the , homology of the simplicial complex -th with coefficients in a field denoted , is a vector space over , Its dimension is called the . -th*Betti number* of (with coefficients in which is a count of the ) holes in -dimensional e.g., the 0th, 1st, and 2nd Betti numbers equal the numbers of ;*connected components*, *holes* bounded by a closed loop, and *cavities* bounded by a closed two-dimensional region in respectively. ,

Given a simplicial map between simplicial complexes and homology induces a linear map , Thus, homology . is a *functor* from the category of simplicial complexes and simplicial maps to the category of vector spaces and linear maps over .

### Persistent homology

Within TDA, *persistent homology* refers to a method for associating multiresolution topological features to a given dataset. The usual pipeline is depicted in Figure 1.

In Step 1, one often uses *Vietoris–Rips complexes*. Given a dataset, modeled as a finite metric space e.g., a point cloud in , for a real number , the Vietoris–Rips complex , is an abstract simplicial complex consisting of all the finite subsets in which every pair of points is within distance The nested family . is called the *Vietoris–Rips filtration* of In Step 2, for each . we apply the homology functor , to this filtration and obtain a *persistence module over *.

As we will see momentarily, persistence modules over are a particular instance of the more general concept of persistence modules over *posets.*

Recall that a *poset* is a nonempty set equipped with a relation on that is (i) Reflexive: for all (ii) Transitive: For . if , and then , (iii) Anti-symmetric: If . and then , Throughout this paper, all posets . will be assumed to be *connected*, i.e., for any there is a sequence , in such that and are comparable for each .

Posets are convenient gadgets for “indexing” simplicial filtrations and persistence modules.

When is the result of applying the homology functor to the Vietoris–Rips filtration of a dataset a great deal of information about the shape of , can be absorbed by the *persistence diagram* or into the *barcode* of (cf. Figure 1). The definitions of these invariants as well as their relationship will be recalled in later sections.

### The need for a more general framework

Practical data analysis scenarios necessitate methods that can cope with more than one parameter. For instance, a dataset might have nonuniform density (see Figure 2), possibly due to noise produced during the acquisition process or due to underlying scientific phenomena. In such scenarios, in addition to a geometric scale parameter, one may wish to incorporate a (co)density parameter and obtain an increasing family of simplicial complexes indexed by with the product order i.e., , iff for The result of applying the homology functor to such an . family, or more generally, to an analogously defined -indexed family, is called a -indexed*multiparameter persistence module* CZ09.

There are scenarios that give rise to persistence modules indexed by posets other than other than For example, the time-evolution of the positions of animals during collective motion can lead to considering zigzag posets .

where, for each , stands for either or CdS10KM22.

Beyond these scenarios, and with a great deal of foresight, in BdSS15, Bubenik et al. proposed to consider the phenomenon of persistence for parameters taken from general posets.

For example, for an integer the linearly ordered poset ,

is often used to succinctly encode or - -modules.

### Connections with quiver representations

A quiver is a finite directed graph. Given a quiver the assignment of a finite-dimensional vector space to each vertex and a linear map to each arrow (between the participating vector spaces) is called a ,*representation* of see ;DW05.

A finite poset induces the quiver on the vertex set with arrows for all pairs such that there is no with Note that a . -module canonically induces a representation of To each vertex : of the vector space , is assigned. To each arrow of the linear map , is assigned. Note that the resulting representation of satisfies the following commutativity condition: For every if there are multiple directed paths from , to in then the compositions of the linear maps along each of those paths agrees with , Conversely, a representation of . satisfying the commutativity condition induces a in the obvious way. -module

The poset is often used to encode -modules.

## Rank Invariant and Persistence Diagrams

We first recall the classical notions of rank invariant and persistence diagrams of -modulesCSEH07CZ09, and then we describe a natural way to extend those notions to the setting of -modules.

### Rank invariant

For any integer and any we call , an *interval* in Let . be the set of all intervals in The .*rank invariant* of a given persistence module is defined to be the function

It is important to note that (i) the rank invariant is preserved under isomorphism and that (ii) it encodes the dimensions of all vector spaces for since we have whenever Note also that (iii) . is *monotone*, i.e.,

whenever This follows from the fact that the map ; factors through the map By convention, we set . for every Next, we utilize the rank invariant to compute, for each . a count of the “persistent features” that start at , and end at leading to the notion of persistence diagram of , .

**Figure 3.**

The rank invariant and the persistence diagram of a given At each point . with in the grid, nonzero and are recorded (e.g., and ).Footnote^{2}

^{2}

If encodes an or (as in the scenario of Example -module1), the visualization of may require a scale readjustment CSEH07, p.106.

### Persistence diagrams

Fix an integer Let . and a vector We say that . is *born* at the point where

We say that lives until the point (or *dies at* where )

The *lifespan* of is the interval The persistence diagram of a given persistence module . is then defined to be the function

sending each to the maximal number of linearly independent vectors in whose lifespans are exactly .

### From to

It turns out that can be computed in terms of Indeed, let . This implies that there exist . linearly independent vectors in that are born at or before and live until , or later, i.e., Hence, .

equals the maximal number of independent vectors in that were born at *precisely* and live until or later. Similarly,

equals the maximal number of independent vectors in that were born at precisely and live until or later. Hence, the difference equals the maximal number of independent vectors in whose lifespans are exactly and thus we arrive at the following formula:

In practice, persistence diagrams are represented as points (with multiplicity) in the two-dimensional grid: only those intervals for which are recorded; see Figure 3.

The earliest appearance of formula 7 in the TDA community that is known to the authors is LF97. This expression appears prominently in the work of Cohen-Steiner et al. on the stability of persistence diagrams CSEH07.

### From to

Let us consider as a poset ordered by containment The poset . consists of elements. By the *order-extension principle*, we can index the intervals in as so that implies Below we also use the convenient notation . .

Since consists of elements, we identify with a vector in whose entry is -th Similarly, . is identified with a vector of the same dimension. Consider the square matrix of length whose is -entry

The matrix is upper-triangular and all of its diagonal entries are Therefore, . is invertible. Note that Equation 7 amounts to

which implies that and determine each other.

Equation 9 permits computing in terms of as follows. First, one verifies that the inverse of has entries

Therefore, the equality implies that

The equality in Figure 3 can be derived from the fact that there are exactly two points in the upper-left quadrant with corner point for which .

Let be an -module:

Let the rank invariant of be given by

for integers Then, the persistence diagram of . is given by

where and are the dimensions of the kernel and cokernel of the map Under the identification .

in this case, Equation 9 reads:

From Equation 11, we infer that there are bases of the vector spaces and