Why 95% ?

My intention here is not to come even close to a detailed analysis, but just to explain a basic phenomenon, that of herd immunity, and to show some simple mathematical model of the development of an epidemic. ...

Bill Casselman
University of British Columbia, Vancouver, Canada
Email Bill Casselman


Until recently, measles was almost eliminated as a threat in developed countries, mainly because of nearly universal vaccination programs. But this has changed in the past few years, and measles has again become a problem. Many sources of information will tell you that the threat will disappear only if 95% of the population were to be vaccinated. Where does this number come from?

My intention here is not to come even close to a detailed analysis, but just to explain a basic phenomenon, that of herd immunity, and to show some simple mathematical model of the development of an epidemic. The way measles epidemics often arise is through a single external agent who has caught the disease elsewhere, and who arrives in a homogeneous and previously healthy population. Suppose $\iota$ to be the fraction of the population that is immune to a disease, either because they have been previously infected and subsequently recovered, or because of vaccination. If this fraction is $1$, then of course new cases of infection will not arise. If the fraction is $0$, then the disease will spread. It turns out that there is a certain critical value of $\iota$ with the property that (1) if $\iota$ is less than the critical value (that is to say, a relatively small proportion of people are immune) the disease will spread, whereas (2) if it is above that critical value, the disease will die out. This phenomenon is called herd immunity because it is not necessary that everybody be immune in order for the population as a whole (the herd) to be so. What that critical value is depends on the disease--for influenza it is about 50%, whereas for measles it is about 95%. In effect, this very high number for measles means that only those who are really incapable of being vaccinated, such as extremely young children or those with certain disabilities, should go without. There is a paradox, not unlike that in determining the value of a single vote: any one decision is not critical, but a decision en masse has great consequences, and particularly for those who decide not to be vaccinated. One more example of a basic dilemma in the human condition.

Mathematical simulation of the progress of epidemics goes back a long way. The basic idea has not changed much. The population is divided into a small number of groups, and as time goes on, the size of each group changes in some more or less predictable way. There are several principal parameters of the process, mainly the time lag between various transitions (for example, the amount of time between an initial infection and the start of the infectious period), and a measure of how easily the disease spreads. These are found, as far as I know, by empirical observation alone.

Measles is ridiculously contagious. It is an extremely small RNA virus particle of about 16,000 nucleotides that encodes only eight proteins. It is typically sprayed into the air when a contagious person coughs. It survives in the air for up to 30 minutes, and infects largely through respiration. It is usually introduced to a susceptible community by a visitor from outside, and can then spread rapidly. In one recent epidemic, one infected traveler introduced to a community in the state of New York spread the disease to 29 others at one crowded event (the story is told in the September New Yorker article listed below). In another recent epidemic in Western Australia (ABC.au), one person traveled from New Zealand to Australia and then moved about widely before he was isolated, and infected at least 22 people.

The path of a measles infection is a bit unusual. From the moment of infection, it takes 10-12 days for symptoms to appear. A case of measles is not reported until this happens. The symptoms (rash, fever, ...) last for another 7-10 days, after which the victim is immune to subsequent infection. One rather nasty feature of measles is that an infected person is contagious for a few days before symptoms appear. A contagious person in this state is extremely dangerous.

The following figure illustrates this account:



The following graph shows the number of cases of measles reported in an epidemic in an American boarding school in 1934. The initial case starts off at day 0. By that time, he had already been contagious for a while. (Redrawn from W. L. Aycock, `Immunity to poliomyelitis', American Journal of Medical Science 204 (1942)) You can see clearly here the lag between infection and symptoms.



Another thing this example illustrates is that there is always a random factor to take into account in any epidemic. As in other matters where chance plays a role, the larger the population under consideration, the more systematic things appear.

A simple simulation in discrete time steps

I'll not restrict myself to measles at first, but describe in somewhat abstract terms what goes into the simplest simulation.

A person in the population can be in one of five states:

  • S susceptible
  • E exposed (and infected) but not contagious
  • A infectious but not symptomatic
  • I infectious and symptomatic
  • R recovered and immune, or isolated and incapable of infecting others
Those who are vaccinated are effectively in the last group (through a kind of virtual infection). In addition, as I'll explain in a moment, it is valuable to keep track of the time elapsed (say, in days) since a person's last transition took place. I'll simplify things quite a bit, and keep track of the state of an epidemic in four numbers--the sizes of each of the relevant categories listed above, except that I'll ignore the distinction between symptomatic and asymptomatic contagious. And we'll keep track of states at fixed intervals of time $n \, dt$. So we are tracking numbers $$ S_{t}, \quad E_{t} \quad I_{t}, \quad R_{t} $$ at times $t = n \, dt$. How does the state at time $(n+1)\, dt$ change from that at time $n \, dt$?

Answering this comes in two steps, and each of these is again subdivided.

Change of state

(S) At any given moment, any susceptible person is at risk of becoming infected. The measure of this risk is the proportion $\lambda_{t}$ of susceptibles who will become infected. It depends on circumstances, as we shall see, and is therefore a function of $t$. This gives us the approximate formula (with $t$ measured in days) $$ S_{t + 1} = S_{t} - \Lambda_{t} S_{t} \, . $$

(E) The number of people who are infected but not contagious at time $t$ is increased by the susceptibles who become infected, and is decreased by those who transition to an infectious state. Let $F$ be the rate of transition. It is roughly a constant. Then $$ E_{t + 1} = E_{t} + \Lambda_{t} S_{t} - F E_{t} \, . $$

(I) The number of people who are contagious is increased by those who transition from the previous state, and decreased by those who recover. Let $\Omega$ be the rate of recovery, which is also pretty much a constant. Then $$ I_{t + 1} = I_{t} + F E_{t} - \Omega I_{t} \, . $$

(R) Finally, on the assumption that the size of the population remains constant: $$ R_{t + 1} = R_{t} + \Omega I_{t} \, . $$

This is all very fine, but how are the coefficients $\Lambda_{t}$, $F$, and $\Omega$ determined?

Determining coefficients

It is important to associate to each disease a small number of intrinsic data that generally determine the course of an epidemic, given a particular population. This not entirely feasible, as I'll explain in a moment. Nonetheless, there are three basic data concepts associate to a disease like measles. (1) The average length $P$ of the pre-infectious (infected but non-contagious) period. For measles, this ranges roughly between $7$ and $9$ days. (2) The average duration $D$ of the contagious period. This ranges from $9$ to $12$ days, of which about $3$ days a person is without symptoms. (3) Some more or less mysterious number you might call the contagion factor $\rho$. This is the average number of people directly infected by one person appearing in a large and totally susceptible population.

This is necessarily a somewhat theoretical notion, since in the modern world it is hard to find such populations. But it can be deduced, at least approximately, from observation in the real world. This is not an exact science, however, and is made less exact by the fact that in the real world this factor is not an intrinsic function of the disease itself, but depends on various additional things--some of which are not easily discernible--such as age, general health, and genetic disposition. It also, unfortunately, depends on the individuals introducing the disease! Do they make a lot of contacts traveling, as did the person in Western Australia? Or do they show up contagious at one event at which transmission was disastrously easy, like the one in New York? Incidentally, the person in Australia was really exceptional, since many of the people he made contact with were immunized, so we are seeing, as it were, the tip of an iceberg. Whereas the population in New York was distinguished precisely because a large proportion had rejected vaccination.

Nonetheless, I ask, how are the coefficients determined by these data?

The risk of infection: $$ \Lambda_{t} = { \rho \over D } \cdot { I_{t} \over N } \, . $$

The rate at which individuals become infectious: $$ F = { 1 \over P } \, . $$

The recovery rate: $$ \Omega = { 1 \over C } \, . $$


$$ \eqalign { S_{t + 1} &= S_{t} - \Lambda_{t} S_{t} \cr E_{t + 1} &= E_{t} + \Lambda_{t} S_{t} - F E_{t} \cr I_{t + 1} &= I_{t} + F E_{t} - \Omega I_{t} \cr R_{t+1} &= R_{t} + \Omega I_{t} \, . \cr } $$ The presence of the factor $I_{t}S_{t}$ makes this what a mathematician calls a non-linear system of equations--the various terms do not scale in a linear fashion. In any case, if one starts with known conditions, one can compute approximate values of all these variables for as many days as one wants.

Herd immunity

Given the parameters of a disease, how can we tell whether it will die out or spread extensively? If introduced into a large and totally susceptible population, it will die out soon only if $\rho \lt 1$. Of course in practice what happens is to some extent a random event, but nonetheless if $\rho \lt 1$ one expects no epidemic. But what one really wants to know is, what fraction of the population must be immune in order that the disease not spread? The answer is (roughly) this:


In order that a disease not spread among a large population, the fraction of the population that is immune must be at least $1 - 1/\rho$ .


Why is this? Suppose that the fraction of the population that is immune is $\iota$. An infectious person would infect $\rho$ persons if none were immune. Assuming the people he comes into contact with are randomly mixed, $\iota \rho$ are in contact, but not infected. The number infected is therefore $\rho - \iota\rho$. The number infected will therefore grow if $\rho - \iota\rho > 1$ or $$ \eqalign { \rho- \iota\rho &> 1 \cr -\iota\rho &> 1 - \rho \cr \iota\rho &< \rho - 1 \cr \iota &< 1 - 1 /\rho \, . \cr } $$

In general, if a disease does spread, even in a totally susceptible population, the number of people who acquire it and then recover to be immune will increase until the proportion of these will attain $1 - 1/\rho$.

For measles, $\rho$ is about $12$, so the proportion of immune should be at least 92%. There are other things to be taken into account, however--for example, measles vaccination is not 100% effective, so the 92% is in practice a bit low. Of course this has to be increased if you are looking at a group of what the Australian ABC news site calls "super-spreaders."

Final remarks

$\bullet$ The apparently simplest possible trajectory of measles is terrifying, if mercifully impossible. One person infects 29, but once his symptoms appear he is isolated and infects no more. Ten days later, each of those 29 infects 29 ... It's an explosion of measles!

Why isn't this the way things go? At least part of the answer is that nowadays most people, at least in the developed world, are vaccinated. Unvaccinated people tend to group together, and remain isolated isolated from external contacts. So the "explosion" takes place only among a small population. But a full explosion among even a small population rarely occurs, probably because once an epidemic starts people are more careful about their contacts, and evidently diseased people are separated from the rest. But also, as an epidemic progresses, the number of people who are immune increases--either (a) because they have had the disease and recovered, or because people are warier about contacts (as they used to be in my youth, when polio ravaged parts of the United States and parents kept their children away from public places) or (b) because the first response to an epidemic is to begin a vaccination campaign. One thing to be grateful for about the spread of measles is that vaccination can be effective, although with decreasing efficacy, in the period before infectiousness commences.

This reasoning is all very fine, but I have to say that the details of the interaction between contact and contagion is not something I understand very well. In fact, as far as I can tell, the parameters of diseases are derived mostly from observation, not theory, although with increased computer power comes the ability to investigate contagion in detail, even by keeping track of a huge number of individual interactions.

$\bullet$ Restricting the time interval to one day is a bit artificial, and produces odd behaviour in some solutions. If we change it to an interval $dt$, we get new and more complicated equations. But choosing $dt$ smaller and smaller leads to differential equations rather than difference equations. In solving these one must often resort to approximation by difference equations!

$\bullet$ These days, however, as I have already mentioned, computers are powerful enough that they can track individual contacts in a population and take probability into account. The real advantage of this, as far as I can see, is that they can match predictions with reality to find basic parameters. And then make even better predictions.

There is one more point worth mentioning. The parameters are not really fixed parameters, even for a single epidemic. They are probability distributions. For example, the length of time from first infection to an infectious state varies from case to case. This also can be taken into account in modern simulations.

Reading further


Medical features


Bill Casselman
University of British Columbia, Vancouver, Canada
Email Bill Casselman