Skip to Main Content

Koopman Operator, Geometry, and Learning of Dynamical Systems

Igor Mezić

Communicated by Notices Associate Editor Reza Malek-Madani

Article cover

Introduction

The end of the 20th century and the beginning of the 21st century has seen a revolutionary increase in the availability of data. Indeed, we are in the middle of the sensing revolution, where sensing is used in the broadest meaning of data acquisition. Most of this data goes unprocessed, unanalyzed, and, consequently, unused. This causes missed opportunities, in domains of vast societal importance—health, commerce, technology, network security, just to mention some.

A variety of mathematical methods have emerged out of that need. Perhaps the most popular, the methodology of Deep Neural Networks has as the underlying learning elements “neuron functions” that are modeled after biological neurons. The algorithms based on deep learning have achieved substantial success in image recognition, speech recognition, and natural language processing, deploying the “supervised” machine learning philosophy. Convolutional neural networks provided a superstructure to the deep neural network architecture that resembles the organization of the animal visual cortex. This led to an enormous success in image recognition, and even in realistic image generation, via Generative Adversarial Networks (GANs). All of these examples are essentially static pattern recognition or generation tasks. Deep learning methodologies are less successful in dynamically changing contexts, present, for example, in autonomous driving. This is because the learning architectures are not adapted to physical properties of the time variable. In contrast, the symmetry associated with translation in time naturally occurs in the Koopman operator framework and imbues it with the fundamental group structure. It is interesting that, in contrast to biological modules responsible for vision, even the basic issue of finding specific brain structures that are responsible for perception of time, and thus understanding of dynamics, is still being investigated.

Koopman operator theory has recently emerged as one of the main candidates for machine learning of dynamical processes. In this paper, we briefly describe its history, emerging from efforts to extend the methodology used in quantum mechanics, and describe the current focus, setting it within the new concept of dynamic process representation, and connecting along the way to the geometric dynamical systems theory methods that enable data-driven discovery of essential elements of the theory, for example stable and unstable manifolds. What emerges is a powerful framework for unsupervised learning from small amounts of data, enabling self-supervised learning that is much more in line with the theory of human learning than the machine learning methods of the second wave.

History

Driven by the success of the operator-based framework in quantum theory, Bernard Koopman proposed in his 1931 paper Koo31 to treat classical mechanics in a similar way, using the spectral properties of the composition operator associated with dynamical system evolution. The work, restricted to Hamiltonian dynamical systems, did not attract much attention originally, as evidenced by the fact that between 1931 and 1990, the Koopman paper Koo31 was cited 100 times, according to Google Scholar. This can be attributed largely to the major success of the geometric picture of dynamical systems theory in its state-space realization advocated by Poincaré. In fact, with Lorenz’s discovery of a strange attractor in 1963, the dynamical systems community turned to studying dissipative systems and much progress has been made since. Within the current research in dynamical systems, some of the crucial roadblocks are associated with high-dimensionality of the problems and necessity of understanding behavior globally (away from the attractors) in the state space. However, the weaknesses of the geometric approach are related exactly to its locality—as it often relies on perturbative expansions around a known geometrical object—and low-dimensionality, as it is hard to make progress in higher dimensional systems using geometry tools.

Out of today’s 1200+ citations of Koopman’s original work, Koo31, about 80% come from the last 20 years. It was only in the 1990s and 2000s that potential for wider applications of the Koopman operator-theoretic approach has been realized Mez94Mez05RMB09. In the past decade the trend of applications of this approach has continued. This is partially due to the fact that strong connections have been made between the spectral properties of the Koopman operator for dissipative systems and the geometry of the state space. In fact, the hallmark of the work on the operator-theoretic approach in the last two decades is the linkage between geometrical properties of dynamical systems—whose study has been advocated and strongly developed by Poincaré and followers—with the geometrical properties of the level sets of Koopman eigenfunctions Mez94MM12MMM13. The operator-theoretic approach has been shown capable of detecting objects of key importance in geometric study, such as invariant sets, but doing so globally, as opposed to locally as in the geometric approach. It also provides an opportunity for study of high-dimensional evolution equations in terms of dynamical systems concepts Mez05RMB09 via a spectral decomposition, and links with associated numerical methods for such evolution equations Sch10RMB09.

Even the early work in Mez94 and its continuation in MB04Mez05RMB09 already led to the realization that spectral properties, and thus geometrical properties, can be learned from data, thus initiating a strong connection that is forming today between machine learning and dynamical systems communities LDBK17YKH19LKB18TKY17. The key notion driving these developments is that of representation of a—possibly nonlinear—dynamical system as a linear operator on a typically infinite-dimensional space of functions. This then leads to search for linear, finite-dimensional invariant subspaces. In this paper I formalize the concept of dynamical system representation enabling the study of finite-dimensional linear and nonlinear representations, learning, and the geometry of state-space partitions.

Dynamical System Representations

State space vs. observables space

It is customary, since Poincaré, to start the discussion of mathematics of dynamical systems with the notion of the state space, which already includes a numerical representation of‘ the state of a system. However, to set the operator-theoretic approach properly, it is useful to start with just the primitive notion of a set of (nonnumerically described) states of a given system. Elements are abstract to start with, and the dynamics is given by a rule that assigns to for any element of the time set . The time set can be , , but more complicated cases such as can be considered as well.⁠Footnote1 For example, we could be given two transformations on M and the dynamics starting from an initial point can be given by sequences of transformations , where . This case, that is of interest in control theory, will not be expanded on further here. As we are interested in framing the process of learning and modeling dynamics from data in the Koopman (composition) operator framework, we begin by describing the basic notions of representation of dynamics using functions.

Discrete dynamical systems

The set of all complex functions is called the space of observables. It is a linear vector space over the field of complex numbers. A discrete deterministic dynamical system on is a map , and the time set is . For the iteration of the map is defined by . Any such map defines an operator by

The operator is linear, as composition distributes over addition:

Finite-dimensional representations

Ultimately, data is about numbers. We can understand a lot about the map on by collecting data on observables. To formalize this, we need the notion of representation.

Definition 1.

A finite-dimensional representation of in is a set of functions and a mapping such that

where and is the dimension of the representation. If is a real set of functions, then the representation is real.

The image of in under —the space —is called the state space.⁠Footnote2 We might be used to thinking about itself as the “state space.” But the original notion in Poincaré’s work refers to , where is comprised of observables that are positions and momenta of a mechanical system. Perhaps can be called the “abstract state space.” The simplest examples of state spaces are Euclidean spaces of -tuples of real numbers . Consider such that . Any mapping on has a real representation , where . The representation 3 is called linear provided is a linear mapping. Finite-dimensional representations are key to learning dynamical systems from data.

Example 2.

Let , the two-dimensional torus, and let be the mapping that translates points on the torus by angle in the direction of rotation around the symmetry axis, and in the direction of the cross-sectional circle.⁠Footnote3 I intentionally do not start with the notation to emphasize that we can start with abstract points on the torus and physically describe the transformation on it. Consider the representation

where and , being the angle along the rotational symmetry and the angle along the cross-sectional circle. We have

where is a diagonal matrix with as diagonal elements. Thus, is a complex, linear representation of . Note that is not a linear representation.

Mathematically, one of the key questions in this context is whether a finite-dimensional representation exists. Namely, a set of functions does not necessarily satisfy for any . If one considers trajectories of , then it is easy to see that there isn’t necessarily an , such that

i.e., the next value of cannot always be obtained uniquely even if we know the whole history of the evolution of on the trajectory. The representation relationship requires that the next value of is uniquely determined by the current value. This is the Markov property. If a representation does not satisfy the Markov property, but its dynamics depends only on a finite number of previous trajectory points, i.e.,

then the so-called time-delay embedding can be used to make it Markovian: Let

Then, setting

Physically, there is the additional problem of whether experimental observations can provide all the information necessary to describe the finite-dimensional representation. Historically, the analysis of the problem of finite representations has often been considered in the context of the Mori-Zwanzig theory that has close links to Koopman operator theory. We discuss this connection later in the paper.

Representations and conjugacies

There are representations that are capable of separating points on . We call these faithful.

Definition 3.

A representation is called faithful provided is injective:

(i)

, or equivalently

(ii)

.

In terms of representations, the Takens embedding theorem shows that a faithful representation of dynamics on an -dimensional Riemannian manifold can be obtained by using a sufficiently large set of time-delayed observables for generic pairs of smooth real functions and dynamical systems .

Theorem 4 (Takens).

Let be a compact Riemannian manifold of dimension , a , diffeomorphism, and a function. For generic the map given componentwise by

is an embedding and is a compact submanifold of . Thus, for generic , is a faithful real representation of .

Time-delayed observables have been used in approximations of Koopman operators since MB04.

A representation might provide redundant information: for example, it might contain two functions and such that for some function . If it does not, it is called efficient.

Definition 5.

An -dimensional faithful representation is called efficient provided there are no and such that

An -dimensional efficient faithful representation is called minimal provided any other -dimensional efficient faithful representation satisfies .

Example 6.

In Example 2 is a minimal, faithful, efficient representation of .

It is clear that all minimal efficient faithful representations have the same dimension. Thus, the dimension of the system can be defined as the dimension of its minimal efficient representation.⁠Footnote4 The underlying space can have a fractal dimension—e.g., in the case of the Lorenz attractor—but the representation is integer dimensional. Additionally, different faithful representations of the underlying mapping play nicely with each other as they are related by a conjugacy.

Proposition 7.

Let be two different faithful -dimensional representations. Then there is a bijection such that

and is a conjugacy of representations, i.e.,

Proof.

Since and are faithful, and are unique for every , and thus for any there is a unique . The resulting mapping is a bijection. Further, we know

Now,

and thus

The concept of conjugacy has classically been used in dynamical systems for local linearization theorems. Since the Koopman operator description is global, extensions of the local theory are needed (see Mez20 and the preceding Lan and Mezić (2013) reference therein).

Faithful representations are capable of describing all of the dynamics of . However, that dynamics is often high dimensional and has components that are irrelevant for understanding the problem at hand. In this context, the notion of the reduced representation is useful.

Definition 8.

A representation is called reduced provided it is not faithful.

Note that, by the definition of the concept of representation, even for reduced representations we have

since, if , then

We note here the hierarchy of the introduced forms of representations: a faithful representation might not be efficient. However, an efficient representation is “minimally” faithful. A reduced representation is not faithful.

The concept of reduced representations is exemplified in the notion of factors in ergodic theory, for which we need to equip with a measure .

Definition 9.

Let be a measure-preserving dynamical system with respect to a measure on . Then a map is a factor of provided it preserves the measure on , where is a measurable mapping such that .

Let be a reduced representation of , where the components of are measurable functions on . Then we have the following.

Proposition 10.

The dynamical system on equipped with the measure defined by is a factor of .

Proof.

We have

Since is measurable and , is a factor of .

In the context of factors, is required to be measurable, in contrast with the notion of semiconjugacy in topological dynamics, where the representation is required to be continuous.

Proposition 11.

Let be a continuous (proper) surjection, i.e., there are two points in that map to a single point in , and let be a (nonfaithful) representation. Then is semiconjugate to .

Proof.

We again have

and thus and are semiconjugate.

Both of these concepts—factors and semiconjugacies—are key in model reduction of dynamical systems Mez05Mez20. In the larger context of machine learning, factors and semiconjugacies—that, as shown below, can be realized using eigenfunctions—play the role of autoencoders, helping reduce the dimension and reduce “noise” in the dynamic dataset. We discuss continuous time evolutions next.

Representations of continuous time evolution

In the case of continuous time , the evolution on consists of a group of transformations , satisfying

Any such evolution group defines an operator group by

A representation of then consists of a set of real or complex functions and a group of transformations that satisfy

For fixed , is a linear composition operator associated with .

Definition 12.

A representation is called ordinary differential if it is finite and

exists. In this case, the evolution is represented by a finite set of ordinary differential equations

Example 13.

Consider the set of all the states of a mass-spring system, and the real representation , where is a numerical function that represents the deviation of the mass position from the unstretched length of the spring and is the linear momentum, , where is the mass parameter, assumed constant, and is an observable representing change of with time . Derivatives with respect to time are labeled by . Then

is a two-dimensional, faithful, efficient, ordinary differential representation. Setting , we have a one-dimensional, faithful, efficient, minimal representation

On the other hand, using the energy observable , we obtain a one-dimensional, real reduced representation

As the next example shows, simple representations can exist even for strange .

Example 14 (Lorenz representation).

Let be the Lorenz attractor, which is a subset of . Let , where is viewed as embedded in and , are projections of on the axes. Then

is a 3-dimensional efficient ordinary differential equation representation. Note that the underlying set is fractal, and yet the dynamics on it possesses a differential representation. It is of interest to note that the ordinary differential equations 27 are valid off the set when it is viewed as embedded in , but from the current point of view, the representation itself is only valid when restricted to .

As far as the full dynamic process the Lorenz representation is supposed to represent, it is reduced and is in fact inexact: the dynamics it models is that of a Boussinesq approximation of thermal convection dynamics, reduced by truncating Fourier series expansion of the solution.

Remark 15.

In the case where the representation of is not finite, and thus involves a field of observables, e.g., for some continuous space (an example is ), we speak of a field representation. The scalar vorticity field

of a two-dimensional, incompressible, inviscid fluid satisfies the equation

where is a nonlinear operator.

Representations for control systems

The relationship between Koopman operator theory and control theory has been explored intensely over the last decade MMS20. Control systems in discrete time are defined on the product space ,