# Am I really uninfected?

### COVID-19 and rapid testing

What's new is the appearance of a large number of rapid tests, for both professional and home use. They are relatively inexpensive, more convenient to administer, and capable of returning results quickly...

Bill Casselman
University of British Columbia, Vancouver, Canada
Email Bill Casselman

I discussed the accuracy of tests for COVID-19 in an earlier column, but I am going to take up the same topic in this one. What's new is the appearance of a large number of rapid tests, for both professional and home use. They are relatively inexpensive, more convenient to administer, and capable of returning results quickly. A number of governments now use them to screen health care workers and travelers regularly. Considering how much government policies now depend on them, it is natural to wonder, how accurate are they? To what extent should they be relied on? These are especially important questions, given the natural urge to want certainty rather than nuanced probabilistic judgements.

There are basically two kinds of diagnostic tests for COVID-19. (I recall that a diagnostic test is one that detects whether the patient is currently infected, as opposed to those tests that detect whether the patient has been infected in the past.) These differ in which of the two viral components, genetic or structural, that they measure.

The genetic component of the coronavirus is made up of RNA, and quantities in a sample can be amplified by standard cloning techniques (in a process known as Polymerase Chain Reaction) so that even very small amounts of viral material become evident. Tests using these techniques are called PCR tests. They are extremely accurate, but processing the samples as well as the amplification process unavoidably take time, generally about 24 hours, and more with heavy work loads. Another problem with these is that samples have to be handled very carefully, and thus results are reliable only if the tests are given under strictly controlled conditions.

The structural component is protein, and is detected much more quickly by the same type of lateral flow test that is used in medical tests to detect other proteins, for example those that signify pregnancy. These are called antigen tests, because they rely on essentially the same protein detection techniques that the human body does to recognize foreign matter. The antigen tests have two advantages. They do not have to be given under medical supervision, and they give answers quickly—sometimes within several minutes. But they are less accurate.

A rapid antigen test for COVID-19 (Image by Falco Ermert, CC 2.0)

## How accurate?

The tests for COVID have features in common with all medical screening procedures. The point is to tell whether the patient is afflicted with a certain ailment, and the answer is (almost always) a flat "yes" or "no". Of course you can easily imagine this to be simplistic, but anything more subtle is probably impractical. The real problem is that no medical test is completely accurate. In most COVID tests a swab of the patient's nasal passage or throat is taken, and then analyzed. These swabs are not reputed to be a pleasant experience, so it presumably takes some practice to get it right. In the case of COVID, the accuracy of the test depends very much on what point in the infection cycle the test is made. COVID has one really nasty feature which is probably largely responsible for rapid spread—several days often pass in which a person is does not show any symptoms and is yet highly infectious. In fact, some people never show symptoms but nevertheless pass through a highly infectious period. There does not seem to be a completely satisfactory way of dealing with this. Thus, even with fairly good tests, masks and social distancing are extremely important in preventing the spread of COVID.

There are two numbers that are used to characterize the accuracy of a medical testing procedure: sensitivity and specificity. The first tells you the proportion of people who are infected and for whom the test detects it. The second tells you the percentage of those who are not infected and whose test detects that. Ideally, a test would have both sensitivity 100% and specificity 100%, but that doesn't happen. The PCR tests, at least, are very sensitive—up to 98% for some, at least in a clinical environment. But a clinical environment is necessary for these in any case. They are so sensitive, in fact, that they often detect very small fragments of viral RNA that come from an ancient infection rather a current one. This has the effect of lowering their specificity.

Generally, almost all tests have high specificity, because it is easy to say definitely that there is viral material in a sample—this means basically that the virus is distinct from other viruses, and confusion is unlikely. Sensitivity is more difficult to achieve. There seem to be two major causes of error: $\bullet$ the test was taken in the wrong part of the infection cycle, or $\bullet$ the swab might have been taken poorly.

On the other hand, high sensitivity is more important, because infected people who avoid detection (false negatives) are potential sources of further infection. High specificity is not so crucial, since tagging uninfected people as infected (making false positives) will not ordinarily have fatal consequences. Besides, a positive test result is usually checked by further tests.

The claimed sensitivity of the new rapid tests varies widely, from $70\%$ up, but it is not so clear under what circumstances these things have been measured. Have the tests destined for home use been tested in completely realistic circumstances? Can the sensitivity in practice be as low as $50\%$?

## Significance of a test

Let's look at a typical problem. Suppose you take one of the rapid tests, and it says you are not infected. How relieved should you be? I'll rephrase this. Given a negative test result, what is the probability that you are actually not infected?

If you think about it, the meaning of the question might not be clear. On the one hand, whether or not you are infected by COVID depends on a series of random events—more particularly whom you have recently interacted with, in what circumstances, and whether any of them were infectious. So, in principle at least, it makes sense to talk about the probability that you are infected. But it is doubtful that you have even the least chance of calculating it. Furthermore, this probability is in some sense intrinsic to the pattern of your recent behaviour, It is not affected by the results of a test; it is the same before and after you take the test. So I ask, how do we interpret the question?

In practical terms, we can speak only of estimating this probability, and we can do this only by placing you, in imagination, in an environment of people whose exposure is similar to yours. This will involve geography, for example—at the time I write this, the infection rate in the state of Rhode Island is more than twenty times that in Minnesota. But it might include circumstances in which you have recently decreased dramatically your personal interactions with other people. In other words, if we want to look at things statistically, which is inevitable, we want to decide the proportion of people ‘like you' who are infected. The problem is to interpret the phrase ‘like you'. I'll therefore rephrase my question: how has the negative result of your test changed our estimate of the probability that you are infected? What is the effect of the new information on how we see your environment?

Let's look at an example. We need to know three things before we calculate: (i) the estimated probability $x$ before you took the test, (ii) the sensitivity $s$ of the test, (iii) its specificity $t$. In the example, I'll take $x = 0.03$, $s = 0.70$, and $t = 0.99$. And I'll assume that the test is given to a population of $100,000$.

In the population, $0.03 \cdot 100,000 = 3,000$ will be infected, and $97,000$ will be uninfected. Of those who are infected, $0.70\cdot 3000 = 2100$ will test positive, and $900$ will not. (This test really isn't very sensitive.) Of those who are uninfected, $970$ will test positive, and $96,030$ will test negative. We can put this in a diagram:

There are a couple of noteworthy features here. One is that even though the specificity is quite good, there are a lot of false positives. This is because the number of uninfected is high enough to compensate. The other feature offers an answer to our question. Before the test, we have no reason to classify anybody as special, and the probability of infection is $3\%$. But after the test someone who tests negative is one of the $900 + 96,030 = 96,930$ who tested negative, and the probability of being infected is $900/96,930 = 0.93\%$. It has dropped by a factor of slightly more than $3$.

I want now to find some formulas to replace this numerical computation. Suppose we start with a population of size $N$, a probability $x$ of being infected, and a test with sensitivity $s$ and specificity $t$. Among the population, $N x$ are infected and $N(1-x)$ are not. Of those who are infected, $Nxs$ will test $+$, and $Nx(1-s)$ will test $-$. Of those who are not infected, $N(1-x)(1-t)$ will test $-$ and $N(1-x)t$ will test $+$.

Let's again ask, what does it mean that you test negative? The number of people who test negative is $Nx(1-s) + N(1-x)t$, and among those $Nx(1-s)$ are in fact infected. So even after a negative test result, your probability of being infected is

$${ Nx(1-s) \over Nx(1-s) + N(1-x)t } = { x(1-s) \over x(1-s) + (1-x)t } = x \cdot { 1—s \over x(1-s) + (1-x)t } \, .$$

Now the specificity $t$ is very close to $1$, so this is very well approximated by

$$x \cdot { 1—s \over (x—sx) + (1 -x) } = x \cdot { 1—s \over 1—sx } \, .$$

In the following image I display graphs of the factor $(1-s)/(1-sx)$.

For realistic (small) values of $x$ this factor is very close to $1-s$. In the example we looked at first, $s=0.7$, and this factor is about $0.3$. Sure enough, we found that the probability of infection dropped from $0.03$ to $0.0092 \sim 0.03\cdot(1-0.7)$. The improvement is significant, but not spectacular. The amount of relief one can get from a negative test, even a fairly good one, looks definitely limited.

The values of sensitivity $s$ and specificity $t$ are provided by the test manufacturer, so in order to apply these formulas one has to figure out what $x$ is. The CDC calls this the pretest probability, and advises:

To help estimate pretest probability, CDC recommends that laboratory and testing professionals who perform antigen testing determine infection prevalence based on a rolling average of the positivity rate of their own SARS-CoV-2 testing over the previous 7–10 days. If a specific testing site, such as a nursing home, has a test positivity rate near zero, the prevalence of disease in the community (e.g., cases among the population) should instead be used to help determine pretest probability. State health departments generally publish COVID-19 data on testing positivity rates and case rates for their communities.

You might take "your community" to be the state you live in. The probability varies enormously. Here are some examples in the low and high ranges (based on data taken December 27 from Worldometer):

$$\matrix { \hbox{Minnesota} & 0.0032 \cr \hbox{Wyoming} & 0.0035 \cr \hbox{Wisconsin} & 0.0055 \cr \hbox{Arizona} & 0.0561 \cr \hbox{Montana} & 0.0691 \cr \hbox{Rhode Island} & 0.0711 \cr }$$

### Final remarks

The rapid tests are not absolutely accurate, but are nonetheless a valuable tool in controlling COVID-19. Catching even a fraction of contagious people presumably has a dramatic and disproportionate effect on the spread of the disease. It would be valuable to have a clear explanation of how the use of rapid tests affects models of the pandemic. But it is important to realize that a negative result is not by itself a certificate that someone is in fact not contagious. This is especially important if there are other reasons to suspect infection.