PDFLINK |

# On the Population Size in Stochastic Differential Games

Communicated by *Notices* Associate Editor Reza Malek-Madani

Commuters looking for the shortest path to their destinations, the security of networked computers, hedge funds trading on the same stocks, governments and populations acting to mitigate an epidemic, or employers and employees agreeing on a contact, are all examples of (dynamic) stochastic differential games. In essence, *game theory* deals with the analysis of strategic interactions among multiple decision-makers. The theory has had enormous impact in a wide variety of fields, but its rigorous mathematical analysis is rather recent. It started with the pioneering work of von Neumann and Morgenstern vNM44 published in 1944. Since then, game theory has taken center stage in applied mathematics and related areas, especially in economics with several game theorists such as John F. Nash Jr, Robert J. Aumann, and Thomas C. Schelling being awarded the Nobel Memorial Prize in Economic Sciences. Game theory has also played an important role in unsuspected areas: for instance in military applications, when the analysis of guided interceptor missiles in the 1950s motivated the study of games evolving dynamically in time. Such games (when possibly subject to randomness) are called *stochastic differential* games. Their study started with the work of Issacs Isa54, who crucially recognized the importance of (stochastic) control theory in the area. Over the past few decades since Isaacs’s work, a rich theory of stochastic differential game has emerged and branched into several directions. This paper will review recent advances in the study of solvability of stochastic differential games, with a focus on a purely probabilistic technique to approach the problem. Unsurprisingly, the number of players involved in the game is a major factor of the analysis. We will explain how the size of the population impacts the analyses and solvability of the problem, and discuss mean field games as well as the convergence of finite player games to mean field games.

## 1. Two-Player Games

Games involving only two players are arguably the most basic differential games in continuous time. In this section we discuss both zero-sum and nonzero-sum games, where the main difference stems from the existence of some symmetry—or maybe more precisely antisymmetry here—between the players’ objectives. Since our goal is to provide intuition, we will use a simple example as our Ariadne’s thread throughout the paper, emphasising the differences and new features emerging as we make the modelling more complex.

We fix throughout this section a probability space carrying a one-dimensional Brownian motion and we let , be the natural filtration of -completed We also fix a time horizon . and assume that , The players are identified by numbers . and and their (uncontrolled) state process is given, for some , by ,

The players choose processes where both , and are taken in some set of so-called *admissible controls*, denoted which we take here to consist essentially of functions of Brownian paths ,Footnote^{1} and taking values in for simplicity. That is, players make decisions based on the randomness of the problem given by Such controls are called .*open-loop*. In fact, throughout these notes we consider only open-loop controls. We adopt here the so-called *weak formulation*, and consider that a choice from the players generates uncertainty on the distribution of by considering it under a new probability measure , (it is implicitly assumed that definition of ensures that this probability measure is well-defined) whose density with respect to is given by

Using standard results of stochastic calculus,Footnote^{2} there is then an motion –Brownian such that

Of course we need to clarify how the players decide which controls they would like to use, and this will be linked to their *criterion*. These criteria will also be exactly what is going to differentiate zero-sum and nonzero-sum games, which we will exemplify in the next sections.

### 1.1. Zero-sum games and equilibria

In zero-sum games, the goals pursued by the two players are antagonistic in the following sense: Suppose that player’s objective is to minimize to functional criterion

with That is, given some control . chosen by player 2, player 1 aims at solving the minimisation problemFootnote^{3}

^{3}

We will not discuss practical applications in these notes. To simplify the exposition, we rather present generic examples which extend to more general situations we will consider later. For practical examples, we refer the readers for instance to Car16.

Then, given some control chosen by player player , aims at solving the minimisation problem

In other words, while player *minimizes* player *maximizes* it. Thus the term “zero-sum,” as the sum of the criterion of each players is always The process . is the (common) path or state of the players and , and are their respective actions and the cost (resp. reward) of player (resp. player In this game, we are interested in so-called ).*Nash equilibrium* corresponding to a pair such that

Thus, whenever one player plays the control corresponding to the Nash equilibrium, the other player will never be better off by not also playing according to the equilibrium. Observe that the Nash equilibrium is not necessarily optimal in the sense that it does not yield the highest reward (or lowest cost) to any one player, it is simply a set of strategies that is simultaneously optimal for both players.

The above definition of Nash equilibria anticipates our soon to come discussion of nonzero-sum games in Section 2. But this not the most standard way to attack zero-sum games. Indeed, using the (anti-)symmetry between the players’ objectives, one can directly realize that if one defines instead a *saddle-point* as being a process such that

then it is also a Nash equilibrium. It is important to notice at this stage that while the notion of Nash equilibrium extends to nonzero-sum games, that of saddle-points makes sense only in a zero-sum setting. Moreover, their interpretation is a bit different: a saddle-point corresponds to each player trying to optimize their criterion assuming that the other player has chosen the *worst possible* control from their point of view. In general, even in the symmetric case we are describing here, not all Nash equilibria are saddle-points, but finding a saddle-point is a standard way to identify a Nash equilibrium.

### 1.2. Intuition and solution

Let us now use the simple setting we are considering to explain how one can, in general, find a saddle-point or a Nash equilibrium. Exactly as in standard control problems involving only one player, there are two main tools available: analytic one using partial differential equations (PDEs)—the celebrated Hamilton–Jacobi–Bellman (HJB) or Hamilton–Jacobi–Bellman–Isaacs equations—and a probabilistic one using so-called backward stochastic differential equations (BSDEs). This work accents the probabilistic approach (more specifically in the weak formulation), and we refer the interested reader to FS06 for more details on the alternative one.

Despite these distinctions, the point of both methods is exactly the same: understanding the structure of the so-called *best-reaction function* of a player, namely what we defined as and above. Both these quantities are actually value functions of a control problem faced by each player. As such, it has become part of the folklore in the corresponding literature that under mild assumptions, the following result will be true.

This result relies on appropriately using the so-called *dynamic programming principle* and the associated *martingale optimality principle*, as well as results from BSDE theory. We refer for instance to Zha17 for details. The intuition behind these equations is the following:

- (i)
(resp. represents the value at time ) of the (natural) dynamic version of the value function of player (resp. player whenever player ) (resp. player has chosen the control ) (resp. In essence, this corresponds to looking at the game over the time period ). only;

- (ii)
the functions appearing in the Lebesgue integrals in 1.1 and 1.2 are linked to the Hamiltonians of each player, which in this example are given by maps and defined on by

where, for ,

- (iii)
the processes and should be understood at an informal level as “derivatives”Footnote

^{4}of and respectively. In practice, they directly allow to compute the optimal control for player , (resp. player when player ) (resp. player plays ) (resp. in the sense that it corresponds to any maximizer in the definition of (resp. ).

Once we have Proposition 1.1 in hand, it becomes relatively straightforward to realize that to obtain a Nash equilibrium, one should be solving *simultaneously* 1.1 and 1.2, so that the behaviors of both players are concomitantly optimal. In other words, the following result holds true.

The previous theorem deserves some comments, especially on how one can find Nash equilibria from the Hamiltonians of the player: the point here is that this more or less boils down to finding “fixed points” for the vector-valued function For any . what we mean by a fixed-point here is a pair , such that

In our simple example, such a fixed-point is trivial to find and is uniquely given by

which is how we identified the Nash equilibrium in 1.2.

Moreover, we would like to insist on the fact that finding a Nash equilibrium in a generic two-player game amounts to solving a two-dimensional BSDE system. Intuitively, the dimension of the aforementioned system should increase accordingly with the number of players, and this is exactly what we will make clear in Section 2 below. However, for zero-sum games (where we look for saddle-points) something interesting happens: It is enough to solve only one equation. In order to understand why, we need to introduce the so-called upper and lower values of the game, respectively denoted by and with ,

as well as the upper and lower Hamiltonians, defined for

Even if there is one major simplification in our example since and only depend on the main point to notice is rather that in general, it holds , This is a minimax property which constitutes what is usually refereed to as .*Isaacs’s condition*, and is a typical necessary condition for the existence of a saddle-point. Now, in order to characterize these two values, it is useful to rely on the best-reaction functions from Proposition 1.1. More precisely, one can show that computing amounts to formally taking an infimum over in (the opposite of) 1.2, while computing amounts to formally taking a supremum over in 1.1. This means that we are naturally led to considering the pairs processes and satisfying, for

and

From there, it is immediate (by uniqueness) that , and this leads to the following result. ,

## 2. Games with an Arbitrary Number of Players

Let us now turn our attention to the analysis of games with players, for some integer In the two-player games discussed thus far, the main difference that we stressed was the one between zero-sum and nonzero-sum games. With three players or more, things get much harder as the possibility of forming coalitions becomes significant, and this interesting feature has not been studied so far in the stochastic differential game literature due to the apparent difficulty of the question. While one can consider zero-sum and cooperative versions of the . games, we will focus here on the (fully) noncooperative case where one is interested in Nash equilibria and no coalition is formed. In this case, for tractability of the problem, the symmetry assumption becomes essential! We will come back to this in the final section of the article. By symmetry here, we mean that the game will be set up in a way that players are, roughly speaking, exchangeable. In other words, the game is exactly the same from each player’s vantage point. Of course, the reader should remark that -player*symmetric players* does not mean *independent players*, as players will still impact each other’s actions and trajectories.

We will again discuss a probabilistic approach to the solvability of stochastic differential games through a simple tractable example. Assume that the probability space -player is now rich enough to carry independent Brownian motions Each player is identified by an index . and seeks to solve the stochastic control problem

with where ,

and for a vector the probability measure , is given, for any vector