## How do we fix the CV19 testing problem? By re-testing everybody who tested positive!

I guess I’ve re-discovered a form of Bayes’ Theorem  regarding the problem that is posed by the high numbers of false negatives and false positives when testing for the feared coronavirus.  What I found is that it doesn’t really even matter whether our tests are super-accurate or not. The solution is to assume that all those who test negative, really are negative, and then to give a second test to all those who tested positive the first time. Out of this group, a larger fraction will test positive. You can again forget about those who test negative. But re-test again, and if you like, test again. By the end of this process, where each time you are testing fewer people, then you will be over 99% certain that all those who test positive, really have been exposed.

Let me show you why.

Have no fear, what I’m gonna do is just spreadsheets. No fancy math, just percents. And it won’t really matter what the starting assumptions are! The results converge to almost perfect accuracy, if repeated!

To start my explanation, let’s start by assuming that 3% of a population (say of the US) has antibodies to CV19, which means that they have definitely been exposed. How they got exposed is not important for this discussion. Whether they felt anything from their exposure or not is not important in this discussion. Whether they got sick and died or recovered, is not going to be covered here. I will also assume that this test has a 7% false positive rate and a 10% false negative rate, and I’m going to assume that we give tests AT RANDOM to a hundred thousand people (not people who we already think are sick!) I’m also assuming that once you have the antibodies, you keep them for the duration.

This table represents that situation:

If you do the simple arithmetic, using those assumptions, then of the 100,000 people we tested, 3%, or three thousand, actually do have those antibodies, but 97%, or ninety-seven thousand, do not (white boxes, first column with data in it).

Of the 3,000 folks who really do have the antibodies – first line of data – we have a false  negative rate of 10%, so three hundred of these poor folks are given the false good tidings that they have never been exposed (that’s the upper orange box). The other 90% of them, or two thousand seven hundred, are told, correctly, that they have been exposed (that’s the upper green box).

Now of the 97,000 people who really do NOT have any antibodies – the second line of data – we have a false positive rate of 7%, so you multiply 0.07 times 97000 to get six thousand, seven hundred ninety of them who would be told, incorrectly, that they DID test positive for Covid-19 – in the lower orange box. (Remember, positive is bad here, and negative is good.) However, 90,210 would be told, correctly, that they did not have those antibodies. (That’s in the lower green box.)

Now let’s add up the folks who got the positive test results, which is the third data column. We had 2,700 who correctly tested positive and 6,790 who wrongly tested positive. That’s a total of 9,490 people with a positive CV19 antibody test, which means that of that group of people, only 28.5% were correctly so informed!! That’s between a third and a fourth! Unacceptable!

However, if we look at the last column, notice that almost every single person who was told that they were negative, really was negative. (Donno about you, but I think that 99.7% accuracy is pretty darned good!)

However, that 28.5% accuracy among the ‘positives’ (in the left-hand blue box) is really worrisome. What to do?

Simple! Test those folks again! Right away! Let’s do it, and then let’s look at the results:

Wowser! We took the 9490 people who tested positive and gave them another round of tests, using the exact same equipment and protocols and error rates as the first one. The spreadsheet is set up the same; the only thing I changed is the bottom two numbers in the first data column. I’m not going to go through all the steps, but feel free to check my arithmetic. Actually, check my logic. Excel doesn’t really make arithmetic errors, but if I set up the spreadsheet incorrectly, it will spit out incorrect results.

Notice that our error rate (in blue) is much lower in terms of those who tested positive. In fact, of those who test positive, 83.7% really ARE positive this time around, and of those who test negative, 95.9% really ARE negative.

But 84% isn’t accurate enough for me (it’s either a B or a C in most American schools). So what do we do? Test again – all of the nearly three thousand who tested positive the first time. Ignore the rest.

Let’s do it:

At this point, we have much higher confidence, 98.5% (in blue), that the people who tested ‘positive’, really are ‘positive’. Unfortunately, at this point, of the people who tested negative, only about 64% of the time is that correct. 243 people who really have the antibodies tested negative. So perhaps one should test that subgroup again.

The beautiful thing about this method is that it doesn’t even require a terribly exact test! But it does require that you do it repeatedly, and quickly.

Let me assure you that the exact level of accuracy, and the exact number of exposed people, doesn’t matter: If you test and re-test, you can find those who are infected with almost 100% accuracy. With that information you can then discover what the best approaches are to solving this pandemic, what the morbidity and mortality rates are, and eventually to stop it completely.

Why we don’t have enough tests to do this quickly and accurately and repeatedly is a question that I will leave to my readers.

Note that I made some starting assumptions. Let us change them and see what happens. Let’s suppose that the correct percentage of people with COVID-19 antibodies is not 3%, but 8%. Or maybe only 1%. Let’s also assume a 7% false positive and a 10% false negative rate. How would these results change? With a spreadsheet, that’s easy. First, let me start with an 8% infection rate and keep testing repeatedly. Here are the final results:

 Round Positive accuracy rating Negative accuracy rating 1 52.8% 99.1% 2 93.5% 89.3% 3 99.5% 39.3%

So after 3 rounds, we have 99.5% accuracy.

Let’s start over with a population where only 1% has the antibodies, and the false positive rate is 7% and the false negative rate is 10%.

 Round Positive accuracy rating Negative accuracy rating 1 11.5% 99.9% 2 62.6% 98.6% 3 95.6% 84.7% 4 99.6% 30.0%

This time, it took four rounds, but we still got to over 99.6% accuracy at distinguishing those who really had been exposed to this virus. Yes, towards the end our false negative rate rises, but I submit that doesn’t matter that much.

So Parson Tommy Bayes was right.

## More on the “false positive” COVID-19 testing problem

I used my cell phone last night to go into the problem of faulty testing for COVID-19, based on a NYT article. As a result, I couldn’t make any nice tables. Let me remedy that and also look at a few more assumptions.

This table summarizes the testing results on a theoretical group of a million Americans tested, assuming that 5% of the population actually has coronavirus antibodies, and that the tests being given have a false negative rate of 10% and a false positive rate of 3%. Reminder: a ‘false negative’ result means that you are told that you don’t have any coronavirus antibodies but you actually do have them, and a ‘false positive’ result means that you are told that you DO have those antibodies, but you really do NOT. I have tried to highlight the numbers of people who get incorrect results in the color red.

### Table A

 Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 50,000 10% 45,000 5,000 Actually Negative 950,000 3% 28,500 921,500 Totals 1,000,000 73,500 926,500 Percent we assume are actually positive 5% Accuracy Rating 61.2% 99.5%

As you can see, using those assumptions, if you get a lab test result that says you are positive, that will only be correct in about 61% of the time. Which means that you need to take another test, or perhaps two more tests, to see whether they agree.

The next table assumes again a true 5% positive result for the population and a false negative rate of 10%, but a false positive rate of 14%.

### Table B

 Assume 5% really exposed, 14% false positive rate, 10% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 50,000 10% 45,000 5,000 Actually Negative 950,000 14% 133,000 817,000 Totals 1,000,000 178,000 822,000 Percent we assume are actually positive 5% Accuracy Rating 25.3% 99.4%

Note that in this scenario, if you get a test result that says you are positive, that is only going to be correct one-quarter of the time (25.3%)! That is useless!

Now, let’s assume a lower percentage of the population actually has the COVID-19 antibodies, say, two percent. Here are the results if we assume a 3% false positive rate:

### Table C

 Assume 2% really exposed, 3% false positive rate, 10% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 20,000 10% 18,000 2,000 Actually Negative 980,000 3% 29,400 950,600 Totals 1,000,000 47,400 952,600 Percent we assume are actually positive 2% Accuracy Rating 38.0% 99.8%

Notice that in this scenario, if you get a ‘positive’ result, it is likely to be correct only a little better than one-third of the time (38.0%).

And now let’s assume 2% actual exposure, 14% false positive, 10% false negative:

### Table D

 Assume 2% really exposed, 14% false positive rate, 10% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 20,000 10% 45,000 2,000 Actually Negative 980,000 14% 137,200 842,800 Totals 1,000,000 182,200 844,800 Percent we assume are actually positive 2% Accuracy Rating 24.7% 99.8%

Once again, the chances of a ‘positive’ test result being accurate is only about one in four (24.7%), which means that this level of accuracy is not going to be useful to the public at large.

Final set of assumptions: 3% actual positive rate, and excellent tests with only 3% false positive and false negative rates:

### Table E

 Assume 3% really exposed, 3% false positive rate, 3% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 30,000 3% 45,000 900 Actually Negative 970,000 3% 29,100 940,900 Totals 1,000,000 74,100 941,800 Percent we assume are actually positive 3% Accuracy Rating 60.7% 99.9%

Once again, if you test positive in this scenario, that result is only going to be correct about 3/5 of the time (60.7%).

All is not lost, however. Suppose we re-test all the people who tested positive in this last group (that’s a bit over seventy-four thousand people, in Table E). Here are the results:

### Table F

 Assume 60.7% really exposed, 3% false positive rate, 3% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 45,000 3% 43,650 1,350 Actually Negative 29,100 3% 873 28,227 Totals 74,100 44,523 29,577 Percent we assume are actually positive 60.7% Accuracy Rating 98.0% 95.4%

Notice that 98% accuracy rating for positive results! Much better!

What about our earlier scenario, in table B, with a 5% overall exposure rating, 14% false positives, and 10% false negatives — what if we re-test all the folks who tested positive? Here are the results:

### Table G

 Assume 25.3% really exposed, 14% false positive rate, 10% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 45,000 14% 38,700 6,300 Actually Negative 133,000 10% 13,300 119,700 Totals 178,000 52,000 126,000 Percent we assume are really positive 25.3% Accuracy Rating 74.4% 95.0%

This is still not very good: the re-test is going to be accurate only about three-quarters of the time (74.4%) that it says you really have been exposed, and would only clear you 95% of the time. So we would need to run yet another test on those who again tested positive in Table G. If we do it, the results are here:

### Table H

 Assume 74.4% really exposed, 14% false positive rate, 10% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 38,700 14% 33,282 5,418 Actually Negative 13,300 10% 1,330 11,970 Totals 52,000 34,612 17,388 Percent we assume are really positive 74.4% Accuracy Rating 96.2% 68.8%

This result is much better, but note that this requires THREE TESTS on each of these supposedly positive people to see if they are in fact positive. It also means that if they get a ‘negative’ result, that’s likely to be correct only about 2/3 of the time (68.8%).

So, no wonder that a lot of the testing results we are seeing are difficult to interpret! This is why science requires repeated measurements to separate the truth from fiction! And it also explains some of the snafus committed by our current federal leadership in insisting on not using tests offered from abroad.

============

EDIT at 10:30 pm on 4/25/2020: I found a few minor mistakes and corrected them, and tried to format things more clearly.