# Persistence Over Posets

Woojin Kim
Facundo Mémoli

Communicated by Notices Associate Editor Emilie Purvine

In topological data analysis (TDA), the shape of a dataset is often encoded into a system of vector spaces and linear maps over a partially ordered set (poset). We give an overview of how summaries of such systems can be constructed by using ideas from combinatorics.

## Persistent Homology

Datasets are often given as point clouds: finite sets of points in Euclidean space. Examples include the three-dimensional coordinates of all atoms in a sample of a given material, a point cloud produced by a 3D scanner, and high-dimensional data such as a spreadsheet containing clinical features of a group of diabetes patients. The “shape” of a point cloud may provide useful information about the underlying phenomena that generated the data. If stands for the dataset of clinical features of diabetic patients mentioned above, and points in appear to fall into a number of distinct clusters, then this may indicate the presence of different subtypes of diabetes amongst the patients.

In algebraic topology, the shape of a simplicial complex (or of a topological space) can be studied via homology Mun84. Homology provides a way to associate algebraic structures such as groups or vector spaces to a simplicial complex in order to capture some aspect about the shape of ; e.g., is connected? does it have holes?

Given an integer , the -th homology of the simplicial complex with coefficients in a field , denoted , is a vector space over . Its dimension is called the -th Betti number of (with coefficients in ) which is a count of the -dimensional holes in ; e.g., the 0th, 1st, and 2nd Betti numbers equal the numbers of connected components, holes bounded by a closed loop, and cavities bounded by a closed two-dimensional region in , respectively.

Given a simplicial map between simplicial complexes and , homology induces a linear map . Thus, homology is a functor from the category of simplicial complexes and simplicial maps to the category of vector spaces and linear maps over .

### Persistent homology

Within TDA, persistent homology refers to a method for associating multiresolution topological features to a given dataset. The usual pipeline is depicted in Figure 1.

In Step 1, one often uses Vietoris–Rips complexes. Given a dataset, modeled as a finite metric space , e.g., a point cloud in , for a real number , the Vietoris–Rips complex is an abstract simplicial complex consisting of all the finite subsets in which every pair of points is within distance . The nested family is called the Vietoris–Rips filtration of . In Step 2, for each , we apply the homology functor to this filtration and obtain a persistence module over .

As we will see momentarily, persistence modules over are a particular instance of the more general concept of persistence modules over posets.

Recall that a poset is a nonempty set equipped with a relation on that is (i) Reflexive: for all . (ii) Transitive: For , if and , then . (iii) Anti-symmetric: If and , then . Throughout this paper, all posets will be assumed to be connected, i.e., for any , there is a sequence in such that and are comparable for each .

Posets are convenient gadgets for “indexing” simplicial filtrations and persistence modules.

When is the result of applying the homology functor to the Vietoris–Rips filtration of a dataset , a great deal of information about the shape of can be absorbed by the persistence diagram or into the barcode of (cf. Figure 1). The definitions of these invariants as well as their relationship will be recalled in later sections.

### The need for a more general framework

Practical data analysis scenarios necessitate methods that can cope with more than one parameter. For instance, a dataset might have nonuniform density (see Figure 2), possibly due to noise produced during the acquisition process or due to underlying scientific phenomena. In such scenarios, in addition to a geometric scale parameter, one may wish to incorporate a (co)density parameter and obtain an increasing family of simplicial complexes indexed by with the product order , i.e., iff for . The result of applying the homology functor to such an -indexed family, or more generally, to an analogously defined -indexed family, is called a multiparameter persistence module CZ09.

There are scenarios that give rise to persistence modules indexed by posets other than other than . For example, the time-evolution of the positions of animals during collective motion can lead to considering zigzag posets

where, for each , stands for either or CdS10KM22.

Beyond these scenarios, and with a great deal of foresight, in BdSS15, Bubenik et al. proposed to consider the phenomenon of persistence for parameters taken from general posets.

For example, for an integer , the linearly ordered poset

is often used to succinctly encode - or -modules.

### Connections with quiver representations

A quiver is a finite directed graph. Given a quiver , the assignment of a finite-dimensional vector space to each vertex and a linear map to each arrow (between the participating vector spaces) is called a representation of ; see DW05.

A finite poset induces the quiver on the vertex set with arrows for all pairs such that there is no with . Note that a -module canonically induces a representation of : To each vertex of , the vector space is assigned. To each arrow of , the linear map is assigned. Note that the resulting representation of satisfies the following commutativity condition: For every , if there are multiple directed paths from to in , then the compositions of the linear maps along each of those paths agrees with . Conversely, a representation of satisfying the commutativity condition induces a -module in the obvious way.

The poset is often used to encode -modules.

## Rank Invariant and Persistence Diagrams

We first recall the classical notions of rank invariant and persistence diagrams of -modules CSEH07CZ09, and then we describe a natural way to extend those notions to the setting of -modules.

### Rank invariant

For any integer and any , we call an interval in . Let be the set of all intervals in . The rank invariant of a given persistence module is defined to be the function

It is important to note that (i) the rank invariant is preserved under isomorphism and that (ii) it encodes the dimensions of all vector spaces for since we have whenever . Note also that (iii)  is monotone, i.e.,

whenever ; This follows from the fact that the map factors through the map . By convention, we set for every . Next, we utilize the rank invariant to compute, for each , a count of the “persistent features” that start at and end at , leading to the notion of persistence diagram of .

### Persistence diagrams

Fix an integer . Let and a vector . We say that is born at the point where

We say that lives until the point (or dies at ) where

The lifespan of is the interval . The persistence diagram of a given persistence module is then defined to be the function

sending each to the maximal number of linearly independent vectors in whose lifespans are exactly .

### From ${\mathrm{rk}}_M$ to ${\mathrm{dgm}}_M$

It turns out that can be computed in terms of . Indeed, let . This implies that there exist linearly independent vectors in that are born at or before , and live until or later, i.e., . Hence,

equals the maximal number of independent vectors in that were born at precisely and live until or later. Similarly,

equals the maximal number of independent vectors in that were born at precisely and live until or later. Hence, the difference equals the maximal number of independent vectors in whose lifespans are exactly and thus we arrive at the following formula:

In practice, persistence diagrams are represented as points (with multiplicity) in the two-dimensional grid: only those intervals for which are recorded; see Figure 3.

The earliest appearance of formula 7 in the TDA community that is known to the authors is LF97. This expression appears prominently in the work of Cohen-Steiner et al. on the stability of persistence diagrams CSEH07.

### From ${\mathrm{dgm}}_M$ to ${\mathrm{rk}}_M$

Let us consider as a poset ordered by containment . The poset consists of elements. By the order-extension principle, we can index the intervals in as so that implies . Below we also use the convenient notation .

Since consists of elements, we identify with a vector in whose -th entry is . Similarly, is identified with a vector of the same dimension. Consider the square matrix of length whose -entry is

The matrix is upper-triangular and all of its diagonal entries are . Therefore, is invertible. Note that Equation 7 amounts to

which implies that and determine each other.

Equation 9 permits computing in terms of as follows. First, one verifies that the inverse of has entries

Therefore, the equality implies that