PDFLINK |

# Pandemic Control, Game Theory, and Machine Learning

Communicated by *Notices* Associate Editor Reza Malek-Madani

## COVID-19 and Control Policies

The coronavirus disease 2019 (COVID-19) pandemic has brought an enormous impact on our lives. Based on data from the World Health Organization, as of May 2022, there have been more than 520 million confirmed cases of infection and more than 6 million deaths globally; In the United States, there have been more than 83 million confirmed cases of infection and more than one million cases of death. Needless to say, the economic impact has also been catastrophic, resulting in unprecedented unemployment and the bankruptcy of many restaurants, recreation centers, shopping malls, etc.

Control policies play a crucial role in the alleviation of the COVID-19 pandemic. For example, lockdown and work-from-home policies and mask requirements on public transport and public areas have been proved to be effective in stopping the spreading of COVID-19. On the other hand, governors also have to be aware of the economic activity loss due to these pandemic control policies. Therefore, a thorough understanding of the evolution of COVID-19 and the corresponding decision-making provoked by such a virus will be beneficial for future events and in other interconnected systems around the world.

### Epidemiology

Epidemiology is the science of analyzing the distribution and determinants of health-related states and events in specified populations. It is also the application of this study to the control of health problems. Infectious diseases are one of this kind, including the ongoing novel coronavirus (COVID-19).

Since March 2020, when the World Health Organization declared the COVID-19 outbreak a global pandemic, epidemiologists have made tremendous efforts to understand how COVID-19 infections emerge and spread and how they may be prevented and controlled. Many epidemiological methods involve mathematical tools, e.g., using causal inference to identify causative agents and factors for its propagation, and molecular methods to simulate disease transmission dynamics.

The first epidemic model concerning epidemic spreading dates back to 1760 by Daniel Bernoulli Ber60. Since then, many papers have been dedicated to this field and, later on, to epidemic control. Among control strategies, the quarantine, firstly introduced in 1377 in Dubrovnik on Croatia’s Dalmatian Coast GB97, has proven a powerful component of the public health response to emerging and reemerging infectious diseases. However, quarantine and other measures for controlling epidemic diseases have always been controversial due to the potentially raised political, ethical, and socioeconomic issues. Such complication naturally calls for the inclusion of decision-making in epidemic control, as it helps to answer how to take *optimal* actions to balance public interest and individual rights. But not until recent years have there been some research studies in this direction. Moreover, when multiple authorities are involved in the decision-making process, it is challenging to analyze how to collectively or competitively make decisions due to the difficulty of solving this high-dimensional problem.

In this article, we focus on the decision-making development for the intervention of COVID-19, aiming to provide mathematical models and efficient numerical methods, and justifications for related policies that have been implemented in the past and explain how the authorities’ decisions affect their neighboring regions from a game theory viewpoint.

### Mathematical models

In a classic, compartmental epidemiological model, each individual in a geographical region is assigned a label, e.g., **S**usceptible, **E**xposed, **I**nfectious, **R**emoved, **V**accinated. Different labels represent different status – **S**: those who are not yet infected; **E**: who have been infected but are not yet infectious themselves; **I**: who have been infected and are capable of spreading the disease to those in the susceptible category, **R**: who have been infected and then removed from the disease due to recovery or death, and **V**: who have been vaccinated and are immune to the infection. As COVID-19 progressed, it was learned that spread from asymptomatic cases was an important driving force. More refined models may further split **I** into mild-symptomatic/asymptomatic individuals who are in-home for recovery and serious-symptomatic ones that need hospitalization. We point to AZM 20 which considers a similar problem in the optimal control setting, which includes asymptomatic individuals and the effect of impulses.

Individuals transit between these compartments, and the labels’ order in a model indicates the flow patterns between the compartments. For instance, in a simple SEIR model LHL87 (see also Figure 1a), a susceptible individual becomes exposed after close contact with infected individuals; exposed individuals become infectious after a latency period; and infected individuals become removed afterward due to recovery or death. Let , , and be the proportion of population of each compartment at time the following differential equations provide the mathematical model: ,

where

Many infections, such as measles and chickenpox, confer long-term, if not lifelong, immunity, while others, such as influenza, do not. As evidenced by numerous epidemiological and clinical studies analyzing possible factors for COVID reinfections, COVID-19 falls precisely into the second category NBN22. Mathematically, this can be taken into account by adding a transition

Though deterministic models such as 1 have received more attention in the literature, mainly due to their tractability, stochastic models have some advantages. The epidemic-spreading progress is by nature stochastic. Moreover, introducing stochasticity to the system could account for numerical and empirical uncertainties, and also provide probabilistic predictions, i.e., a range of possible scenarios associated with their likelihoods. This is crucial for understanding the uncertainties in the estimates.

One class of stochastic epidemic models uses continuous-time Markov chains, where the state process takes discrete values but evolves in continuous time and is Markovian. In a simple Stochastic SIS (susceptible-infectious-susceptible) model KL89 with a population of

Another way to construct a stochastic model is by introducing white noise

### Control of disease spread

After modeling how diseases are transmitted through a population, epidemiologists then design corresponding control measures and recommend health-related policies to the region planner.

In general, there are two types of interventions: pharmaceutical interventions (PIs), such as getting vaccinated and taking medicines, and nonpharmaceutical interventions (NPIs), such as requiring mandatory social distancing, quarantining infected individuals, and deploying protective resources. For the ongoing COVID-19, intervention policies that have been implemented include, but are not limited to, issuing lockdown or work-from-home policies, developing vaccines, and later expanding equitable vaccine distribution, providing telehealth programs, deploying protective resources and distributing free testing kits, educating the public on how the virus transmits, and focusing on surface disinfection.

Mathematically, this can be formulated as a control problem: the planner chooses the level of each policy affecting the transitions in 1 such that the region’s overall cost is minimized. Generally, NPIs help mitigate the spread by lowering the infection rate

meaning that only

A region planner, taking into account the interventions’ effects on the dynamics 1, decides on policy by weighing different costs. These costs may include the economic loss due to decrease in productivity during a lockdown, the economic value of life due to death of infected individuals, and other social-welfare costs due to the aforementioned measurements.

## Game-theoretic SEIR Model

Game theory studies the strategic interactions among rational players and has applications in all fields of social science, computer science, financial mathematics, and epidemiology. A game is noncooperative if players cannot form alliances or if all agreements need to be self-enforcing. Nash equilibrium is the most common kind of self-enforcing agreement Nas51, in which a collective strategy emerges from all players in the game to which no one has an incentive to deviate unilaterally.

Nowadays, as the world is more interconnected than ever before, one region’s epidemic policy will inevitably influence the neighboring regions. For instance, in the US, decisions made by the governor of New York will affect the situation in New Jersey, as so many people travel daily between the two states. Imagine that both state governors make decisions representing their own benefits, take into account others’ rational decisions, and may even compete for the scarce resources (e.g., frontline workers and personal protective equipment). These are precisely the features of a noncooperative game. Computing the Nash equilibrium from such a game will provide valuable, qualitative guidance and insights for policymakers on the impact of specific policies.

We now introduce a multi-region stochastic SEIR model XBH**S**usceptible, **E**xposed, **I**nfectious, and **R**emoved. Denote by *proportion* of the population in the four compartments of the region

where

We explain the model 2–6 in detail:

### S

In 2,

The planner of region *health policy*. It will influence the vaccination availability

### E

In 3,

### I and R

In 4 and 5,

The more effort put into the region (e.g., expanding hospital capacity and creating more drive-thru testing sites), the more clinical resources the region will have and the more resources will be accessible by patients, which could accelerate recovery and slow down death. The death rate, denoted by

### Cost

In 6, each region planner faces four types of cost. One is the economic activity loss due to the lockdown policy, where

### Nash equilibria and the HJB system

As explained above, the interaction between region planners can be viewed as a noncooperative game, when Nash equilibrium is the notion of optimality.

Under proper conditions, the NE is obtained by solving

For the sake of simplicity, we omit the actual definition of

with

## Enhanced Deep Fictitious Play

Solving for the NE of the game is equivalent to solving the *Enhanced Deep Fictitious Play*, being broadly motivated by the method of fictitious play introduced by Brown Bro51.

Deep Learning. Deep learning leverages a class of computational models composed of multiple processing layers to learn representations of data with multiple levels of abstraction LBH15. Deep neural networks are effective tools for approximating unknown functions in high-dimensional space. In recent years, we have witnessed noticeable success in a marriage of deep learning and computational mathematics to solve high-dimensional differential equations. Specifically, deep neural networks show strong capability in solving stochastic control and games HJE18HL22. Below, we use a simple example to illustrate how a deep neural network is determined for function approximation.

Suppose we would like to approximate a map

where

Note that the HJB system 7 is difficult to solve due to the high-dimensionality of the *fictitious play*, where we update our approximations to the optimal policies of each player iteratively stage by stage. In each stage, instead of updating the approximations of all the players together by solving the giant system, we do it separately and parallelly. Each player solves for her own optimal policy assuming that the other players are taking their approximated optimal strategies from the last stage. Let us denote the optimal policy and corresponding value function of the single player

The *Enhanced Deep Fictitious Play (DFP)* algorithm we have designed, built from the Deep Fictitious Play (DFP) algorithm HH20, reduces time cost from

We illustrate one stage of enhanced deep fictitious play in Figure 3. At the

with

For simplicity of notations, we omit the stage number