## The shutdown, while painful, appears to have saved a LOT of lives so far

If you recall, the growth of the new corona virus disease in the US (and many other countries) at first looked to be exponential, meaning that the number of cases (and deaths) were rising at an alarming, fixed percent each and every single day.

Even if you slept through your high school or middle school math lessons on exponential growth, the story of the Shah and the chessboard filled with rice may have told you that the equation 2^x gets very, very hairy after a while. Pyramid schemes eventually run out of suckers people. Or perhaps you have seen a relatively modest credit-card bill get way out of hand as the bank applies 8 percent interest PER MONTH, which ends up multiplying your debt by a factor of 6 after just 2 years!

(If the total number of deaths were still increasing by 25 percent per day, as they were during the middle of March, and if that trend somehow continued without slowing down, then every single person residing inside America’s borders would be dead before the end of May. Not kidding! But it’s also not happening.)

However, judging by numbers released by the CDC and reported by my former colleague Ron Jenkins, I am quite confident that THE NUMBER OF CASES AND DEATHS FROM COVID-19 ARE NO LONGER following a fixed exponential curve. Or at least, the daily rate of increase has been going down. Which is good. But it’s still not zero.

Let me show you the data and fitted curves in a number of graphs, which often make complex things easier to visualize and understand.

My first graph is the total reported number of deaths so far in the US, compared to a best-fit exponential graph:

During the first part of this pandemic, during the first 40 or so days, the data actually fit an exponential graph pretty well – that is, the red dotted line (the exponential curve of best fit) fit the actual cumulative number of deaths (in blue). And that’s not good. However, since about day 50 (last week) the data is WAY UNDER the red dots. To give you an idea of how much of a victory that is: find day 70, which is May 9, and follow the vertical line up until it meets the red dotted line. I’ll wait.

Did you find it? If this pandemic were still following exponential growth, now and into the future, at the same rate, we would have roughly a MILLION PEOPLE DEAD BY JUNE 9 in just the US, just from this disease, and 2 million the week after that, and 4 million the next week, then 8 million, then 16 million, and so on.

THAT AIN’T HAPPENIN’! YAY! HUZZAH!

As you can see — the blue and red graphs have diverged. Ignore the relatively high correlation value of 0.935 – it just ain’t so.

But what IS the curve of best fit? I don’t know, so I’ll let you look for yourself.

Is it linear?

This particular line of best doesn’t fit the data very well; however, if we start at day 36 or thereabouts, we could get a line that fits the data from there on pretty well, like so:

The purple line fits the blue dots quite well after about day 37 (about April 6), and the statistics algorithms quite agree. However, it still calls for over 80,000 Americans dead by May 8. I do not want the slope of that line to be positive! I want it to turn to the right and remain horizontal – meaning NOBODY ELSE DIES ANY MORE FROM THIS DISEASE.

Perhaps it’s not linear? Perhaps it’s one of those other types of equations you might remember from some algebra class, like a parabola, a cubic, or a quartic? Let’s take a look:

This is a parabolic function, or a quadratic. The red dots do fit the data pretty well. Unfortunately, we want the blue dots NOT to fit that graph, because that would, once again, mean about a hundred thousand people dead by May 8. That’s better than a million, but I want the deaths to stop increasing at all. Like this piecewise function (which some of you studied). Note that the purple line cannot go back downwards, because generally speaking, dead people cannot be brought back to life.

Well, does the data fit a cubic?

Unfortunately, this also fits pretty well. If it continues, we would still have about a hundred thousand dead by May 8, and the number would increase without limit (which, fortunately, is impossible).

How about a quartic (fourth-degree polynomial)? Let’s see:

I admit that the actual data, in blue, fit the red calculated quartic red curve quite well, in fact, the best so far, and the number of deaths by Day 70 is the lowest so far. But it’s impossible: for the curve to go downwards like that would mean that you had ten thousand people who died, and who later came back to life. Nah, not happening.

What about logarithmic growth? That would actually be sweet – it’s a situation where a number rises quickly at first, but over time rises more and more slowly. Like this, in red:

I wish this described the real situation, but clearly, it does not.

One last option – a ‘power law’ where there is some fixed power of the date (in this case, the computer calculated it to be the date raised to the 5.377 power) which explains all of the deaths, like so:

I don’t think this fits the data very well, either. Fortunately. It’s too low from about day 38 to day 29, and is much too high from day 50 onwards. Otherwise we would be looking at about 230,000 dead by day 70 (May 8).

But saying that the entire number of deaths in the US is no longer following a single exponential curve doesn’t quite do the subject justice. Exponential growth (or decay) simply means that in any given time period, the quantity you are measuring is increasing (or decreasing) by a fixed percentage (or fraction). That’s all. And, as you can see, for the past week, the daily percentage of increase in the total number of deaths has been in the range of three to seven percent. However, during the first part of March, the rate of increase in deaths was enormous: 20 to 40 percent PER DAY. And the daily percent of increase in the number of cases was at times over A HUNDRED PERCENT!!! – which is off the chart below.

The situation is still not good! If we are stuck at a daily increase in the number of deaths as low as a 3%/day increase, then we are all dead within a year. Obviously, and fortunately, that’s probably not going to happen, but it’s a bit difficult to believe that the math works out that way.

But it does. Let me show you, using logs.

For simple round numbers, let’s say we have 50,000 poor souls who have died so far from this coronavirus in the USA right now, and that number of deaths is increasing at a rate of 3 percent per day. Let’s also say that the US has a population of about 330 million. The question is, when will we all be dead if that exponential growth keeps going on somehow? (Fortunately, it won’t.*) Here is the first equation, and then the steps I went through. Keep in mind that a growth of 3% per day means that you can multiply any day’s value by 1.03, or 103%, to get the next day’s value. Here goes:

Sound unbelievable? To check that, let us take almost any calculator and try raising the expression 1.03 to the 300th power. I think you’ll get about 7098. Now take that and multiply it by the approximate number of people dead so far in the US, namely 50,000. You’ll get about 355,000,000 – well more than the total number of Americans.

So we still need to get that rate of increase in fatalities down, to basically zero. We are not there yet. With our current highly-incompetent national leadership, we might not.

===================================================================

* what happens in cases like this is you get sort of an s-shaped curve, called the Logistic or logit curve, in which the total number levels off after a while. That’s shown below. Still not pleasant.

I have no idea how to model this sort of problem with a logistic curve; for one thing, one would need to know what the total ‘carrying capacity’ – or total number of dead — would be if current trends continue and we are unsuccessful at stopping this virus. The epidemiologists and statisticians who make models for this sort of thing know a lot more math, stats, biology, and so on than I do, but even they are working with a whole lot of unknowns, including the rate of infectiousness, what fraction of the people feel really sick, what fraction die, whether you get immunity if you are exposed, what is the effect of different viral loads, and much more. This virus has only been out for a few months…

What’s the best approach – should we lock down harder, or let people start to go back to work? Some countries have had lockdowns, others have not. How will the future play out? I don’t know. I do know that before we can decide, we need to have fast, plentiful, and accurate tests, so we can quarantine just the people who are infected or are carriers, and let everybody else get back on with their lives. We are doing this lockdown simply because we have no other choice.

Published in: on April 27, 2020 at 12:33 am  Comments (1)
Tags: , , , , , ,

## How do we fix the CV19 testing problem? By re-testing everybody who tested positive!

I guess I’ve re-discovered a form of Bayes’ Theorem  regarding the problem that is posed by the high numbers of false negatives and false positives when testing for the feared coronavirus.  What I found is that it doesn’t really even matter whether our tests are super-accurate or not. The solution is to assume that all those who test negative, really are negative, and then to give a second test to all those who tested positive the first time. Out of this group, a larger fraction will test positive. You can again forget about those who test negative. But re-test again, and if you like, test again. By the end of this process, where each time you are testing fewer people, then you will be over 99% certain that all those who test positive, really have been exposed.

Let me show you why.

Have no fear, what I’m gonna do is just spreadsheets. No fancy math, just percents. And it won’t really matter what the starting assumptions are! The results converge to almost perfect accuracy, if repeated!

To start my explanation, let’s start by assuming that 3% of a population (say of the US) has antibodies to CV19, which means that they have definitely been exposed. How they got exposed is not important for this discussion. Whether they felt anything from their exposure or not is not important in this discussion. Whether they got sick and died or recovered, is not going to be covered here. I will also assume that this test has a 7% false positive rate and a 10% false negative rate, and I’m going to assume that we give tests AT RANDOM to a hundred thousand people (not people who we already think are sick!) I’m also assuming that once you have the antibodies, you keep them for the duration.

This table represents that situation:

If you do the simple arithmetic, using those assumptions, then of the 100,000 people we tested, 3%, or three thousand, actually do have those antibodies, but 97%, or ninety-seven thousand, do not (white boxes, first column with data in it).

Of the 3,000 folks who really do have the antibodies – first line of data – we have a false  negative rate of 10%, so three hundred of these poor folks are given the false good tidings that they have never been exposed (that’s the upper orange box). The other 90% of them, or two thousand seven hundred, are told, correctly, that they have been exposed (that’s the upper green box).

Now of the 97,000 people who really do NOT have any antibodies – the second line of data – we have a false positive rate of 7%, so you multiply 0.07 times 97000 to get six thousand, seven hundred ninety of them who would be told, incorrectly, that they DID test positive for Covid-19 – in the lower orange box. (Remember, positive is bad here, and negative is good.) However, 90,210 would be told, correctly, that they did not have those antibodies. (That’s in the lower green box.)

Now let’s add up the folks who got the positive test results, which is the third data column. We had 2,700 who correctly tested positive and 6,790 who wrongly tested positive. That’s a total of 9,490 people with a positive CV19 antibody test, which means that of that group of people, only 28.5% were correctly so informed!! That’s between a third and a fourth! Unacceptable!

However, if we look at the last column, notice that almost every single person who was told that they were negative, really was negative. (Donno about you, but I think that 99.7% accuracy is pretty darned good!)

However, that 28.5% accuracy among the ‘positives’ (in the left-hand blue box) is really worrisome. What to do?

Simple! Test those folks again! Right away! Let’s do it, and then let’s look at the results:

Wowser! We took the 9490 people who tested positive and gave them another round of tests, using the exact same equipment and protocols and error rates as the first one. The spreadsheet is set up the same; the only thing I changed is the bottom two numbers in the first data column. I’m not going to go through all the steps, but feel free to check my arithmetic. Actually, check my logic. Excel doesn’t really make arithmetic errors, but if I set up the spreadsheet incorrectly, it will spit out incorrect results.

Notice that our error rate (in blue) is much lower in terms of those who tested positive. In fact, of those who test positive, 83.7% really ARE positive this time around, and of those who test negative, 95.9% really ARE negative.

But 84% isn’t accurate enough for me (it’s either a B or a C in most American schools). So what do we do? Test again – all of the nearly three thousand who tested positive the first time. Ignore the rest.

Let’s do it:

At this point, we have much higher confidence, 98.5% (in blue), that the people who tested ‘positive’, really are ‘positive’. Unfortunately, at this point, of the people who tested negative, only about 64% of the time is that correct. 243 people who really have the antibodies tested negative. So perhaps one should test that subgroup again.

The beautiful thing about this method is that it doesn’t even require a terribly exact test! But it does require that you do it repeatedly, and quickly.

Let me assure you that the exact level of accuracy, and the exact number of exposed people, doesn’t matter: If you test and re-test, you can find those who are infected with almost 100% accuracy. With that information you can then discover what the best approaches are to solving this pandemic, what the morbidity and mortality rates are, and eventually to stop it completely.

Why we don’t have enough tests to do this quickly and accurately and repeatedly is a question that I will leave to my readers.

Note that I made some starting assumptions. Let us change them and see what happens. Let’s suppose that the correct percentage of people with COVID-19 antibodies is not 3%, but 8%. Or maybe only 1%. Let’s also assume a 7% false positive and a 10% false negative rate. How would these results change? With a spreadsheet, that’s easy. First, let me start with an 8% infection rate and keep testing repeatedly. Here are the final results:

 Round Positive accuracy rating Negative accuracy rating 1 52.8% 99.1% 2 93.5% 89.3% 3 99.5% 39.3%

So after 3 rounds, we have 99.5% accuracy.

Let’s start over with a population where only 1% has the antibodies, and the false positive rate is 7% and the false negative rate is 10%.

 Round Positive accuracy rating Negative accuracy rating 1 11.5% 99.9% 2 62.6% 98.6% 3 95.6% 84.7% 4 99.6% 30.0%

This time, it took four rounds, but we still got to over 99.6% accuracy at distinguishing those who really had been exposed to this virus. Yes, towards the end our false negative rate rises, but I submit that doesn’t matter that much.

So Parson Tommy Bayes was right.

## More on the “false positive” COVID-19 testing problem

I used my cell phone last night to go into the problem of faulty testing for COVID-19, based on a NYT article. As a result, I couldn’t make any nice tables. Let me remedy that and also look at a few more assumptions.

This table summarizes the testing results on a theoretical group of a million Americans tested, assuming that 5% of the population actually has coronavirus antibodies, and that the tests being given have a false negative rate of 10% and a false positive rate of 3%. Reminder: a ‘false negative’ result means that you are told that you don’t have any coronavirus antibodies but you actually do have them, and a ‘false positive’ result means that you are told that you DO have those antibodies, but you really do NOT. I have tried to highlight the numbers of people who get incorrect results in the color red.

### Table A

 Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 50,000 10% 45,000 5,000 Actually Negative 950,000 3% 28,500 921,500 Totals 1,000,000 73,500 926,500 Percent we assume are actually positive 5% Accuracy Rating 61.2% 99.5%

As you can see, using those assumptions, if you get a lab test result that says you are positive, that will only be correct in about 61% of the time. Which means that you need to take another test, or perhaps two more tests, to see whether they agree.

The next table assumes again a true 5% positive result for the population and a false negative rate of 10%, but a false positive rate of 14%.

### Table B

 Assume 5% really exposed, 14% false positive rate, 10% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 50,000 10% 45,000 5,000 Actually Negative 950,000 14% 133,000 817,000 Totals 1,000,000 178,000 822,000 Percent we assume are actually positive 5% Accuracy Rating 25.3% 99.4%

Note that in this scenario, if you get a test result that says you are positive, that is only going to be correct one-quarter of the time (25.3%)! That is useless!

Now, let’s assume a lower percentage of the population actually has the COVID-19 antibodies, say, two percent. Here are the results if we assume a 3% false positive rate:

### Table C

 Assume 2% really exposed, 3% false positive rate, 10% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 20,000 10% 18,000 2,000 Actually Negative 980,000 3% 29,400 950,600 Totals 1,000,000 47,400 952,600 Percent we assume are actually positive 2% Accuracy Rating 38.0% 99.8%

Notice that in this scenario, if you get a ‘positive’ result, it is likely to be correct only a little better than one-third of the time (38.0%).

And now let’s assume 2% actual exposure, 14% false positive, 10% false negative:

### Table D

 Assume 2% really exposed, 14% false positive rate, 10% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 20,000 10% 45,000 2,000 Actually Negative 980,000 14% 137,200 842,800 Totals 1,000,000 182,200 844,800 Percent we assume are actually positive 2% Accuracy Rating 24.7% 99.8%

Once again, the chances of a ‘positive’ test result being accurate is only about one in four (24.7%), which means that this level of accuracy is not going to be useful to the public at large.

Final set of assumptions: 3% actual positive rate, and excellent tests with only 3% false positive and false negative rates:

### Table E

 Assume 3% really exposed, 3% false positive rate, 3% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 30,000 3% 45,000 900 Actually Negative 970,000 3% 29,100 940,900 Totals 1,000,000 74,100 941,800 Percent we assume are actually positive 3% Accuracy Rating 60.7% 99.9%

Once again, if you test positive in this scenario, that result is only going to be correct about 3/5 of the time (60.7%).

All is not lost, however. Suppose we re-test all the people who tested positive in this last group (that’s a bit over seventy-four thousand people, in Table E). Here are the results:

### Table F

 Assume 60.7% really exposed, 3% false positive rate, 3% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 45,000 3% 43,650 1,350 Actually Negative 29,100 3% 873 28,227 Totals 74,100 44,523 29,577 Percent we assume are actually positive 60.7% Accuracy Rating 98.0% 95.4%

Notice that 98% accuracy rating for positive results! Much better!

What about our earlier scenario, in table B, with a 5% overall exposure rating, 14% false positives, and 10% false negatives — what if we re-test all the folks who tested positive? Here are the results:

### Table G

 Assume 25.3% really exposed, 14% false positive rate, 10% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 45,000 14% 38,700 6,300 Actually Negative 133,000 10% 13,300 119,700 Totals 178,000 52,000 126,000 Percent we assume are really positive 25.3% Accuracy Rating 74.4% 95.0%

This is still not very good: the re-test is going to be accurate only about three-quarters of the time (74.4%) that it says you really have been exposed, and would only clear you 95% of the time. So we would need to run yet another test on those who again tested positive in Table G. If we do it, the results are here:

### Table H

 Assume 74.4% really exposed, 14% false positive rate, 10% false negative Group Total Error rate Test says they are Positive Test says they are Negative Actually Positive 38,700 14% 33,282 5,418 Actually Negative 13,300 10% 1,330 11,970 Totals 52,000 34,612 17,388 Percent we assume are really positive 74.4% Accuracy Rating 96.2% 68.8%

This result is much better, but note that this requires THREE TESTS on each of these supposedly positive people to see if they are in fact positive. It also means that if they get a ‘negative’ result, that’s likely to be correct only about 2/3 of the time (68.8%).

So, no wonder that a lot of the testing results we are seeing are difficult to interpret! This is why science requires repeated measurements to separate the truth from fiction! And it also explains some of the snafus committed by our current federal leadership in insisting on not using tests offered from abroad.

============

EDIT at 10:30 pm on 4/25/2020: I found a few minor mistakes and corrected them, and tried to format things more clearly.

Tags: , , , , , , ,

## COVID-19 Numbers in the US do not seem to be growing exponentially

Looking at the past month of CDC-reported infections and deaths from the new corona virus, I conclude that there has been some good news: the total number of infections and deaths are no longer following an exponential growth curve.

The numbers are indeed growing, by either a quadratic (that is, x^2) or a quartic (x^4) curve, which is not good, and there is no sign of numbers decreasing.

BUT it looks as though the physical-social distancing and self-quarantining that I see going on around me is actually having an effect.

Yippee!

Here is my evidence: the actual numbers of infected people are in blue, and the best-fit exponential-growth equation is in red. You can see that they do not match well at all.

If they did match, and if this were in fact exponential growth, we would have just about the entire US population infected by the end of just this month of April – over 300 million! That no longer seems likely. Take a look at the next graph instead, which uses the same data, but polynomial growth:

Just by eyeballing this, you can see that the red dots and blue dots match really, really well. When I extend the graph until the end of April, I get a predicted number of ‘only’ 1.5 million infected. Not good, but a whole lot better than the entire US population!

Also, let’s look at total cumulative reported deaths so far. Here are the CDC-reported numbers plotted against a best-fit exponential curve:

Up until just a few days ago, this graph was conforming pretty well to exponential growth. However, since about April 8, that seems to be no longer the case. If the total numbers of deaths were in fact growing at the same percentage rate each day, which is the definition of exponential growth, then by the end of April we would have 1.5 million DEAD. That’s THIS MONTH. Continued exponential growth would have 1.2 BILLION dead in this country alone by the end of May.

Fortunately, that is of course impossible.

Unfortunately all that means is that the virus would run out of people to infect and kill, and we would get logistic growth (which is the very last graph, at the bottom).

This fourth-degree mathematical model seems to me to work much better at describing the numbers of deaths so far, and has a fairly good chance of predicting what may be coming up in the near future. It’s still not a good situation, but it shows to me that the social and physical distancing we are doing is having a positive effect.

But let’s not get complacent: if this model correctly predicts the next month or two, then by the end of April, we would have about 60 thousand dead, and by the end of May we would have 180 thousand dead.

But both of those grim numbers are much, much lower than we would have if we were not doing this self-isolation, and if the numbers continued to grow exponentially.

====================================================================

FYI, a logistic curve is shown below. Bacteria or fungi growing in a broth will grow exponentially at first, but after a while, they not only run out of fresh broth to eat, but they also start fouling their own environment with their own wastes. WE DO NOT WANT THIS SITUATION TO HAPPEN WITH US, NAMELY, THAT WE ALL GET INFECTED!!!

## Various graphs for deaths from COVID-19, so far

I wrote that I would show you what various graphs of various types of simple models look like for deaths so far due to the current corona virus: linear, exponential, polynomial, and so on. I think that a fourth-degree (not third-degree, like I wrote earlier) seems to fit the data best so far, and that’s better than exponential growth.

First, let’s look at a straight-line best-fit model, superimposed by Excel on the data. (Note: deaths are on the Y, or vertical, axis; the X-axis represents days since the beginning of March. So today, the 6th of April, is day 37 (31 + 6). The dotted red line represents the line of best fit, and the blue dots are the CDC-announced numbers of deaths so far.

As you can see, the straight dotted line doesn’t fit the data very well at all. R-squared, known as the correlation coefficient, tells us numerically how well it fits. If R or R^2 equals 1.000, then you have absolutely perfect correlation of the data to your model. Which we do NOT have here. By the way, in that model, then by mid-June we would have about 22,000 dead from this disease.

OK, let’s look at an exponential curve-of-best fit next:

As you can see, this red curve fits the data a LOT better, and R-squared is a lot higher.

Unfortunately.

We do NOT WANT EXPONENTIAL GROWTH OF THIS OR ANY OTHER DISEASE, BECAUSE IT MEANS WE ALL GET IT! In fact, if this model is accurate and isn’t slowed down, then by mid-June, just plugging in the numbers, we would have 3.3 BILLION (not million) people dead in the US alone. Fortunately, that won’t happen.

BUT there are some parts of the data where the curve doesn’t fit perfectly — let me point them out:

At the upper right-hand end, the red dotted line is quite a bit higher than the blue dots. Fortunately. And near the middle of the graph, the blue dots of death are higher than the red line.

OK, let’s look at some polynomial models instead:

This is a fancy version of the simple y=x^2 parabolas you may have graphed in Algebra 1. Once again, this doesn’t do a terrific job of conforming to the actual data. At the right-hand end, the blue dots of death are higher than the curve. In addition, if we continued the red curve to the left, we would find that something like two thousand people had already died in the US, and presumably came back to life. Which is ridiculous.

However, if this model were to hold true until mid-June, we would have 127 thousand dead. Not good.

Let’s try a third-degree polynomial (a cubic):

That’s pretty remarkable agreement between the data and the equation! That’s the equation I was using in my earlier post. The R-squared correlation is amazing. Unfortunately, if this continues to hold, then we would have about 468 thousand dead in the US.

Let’s continue by looking at a fourth-degree polynomial curve fitted to the data:

That is an amazingly good fit to the data! Unfortunately, let’s hope that it won’t continue to fit the data, because if it does, then we are looking at a little over a MILLION dead.

Let’s hope we can get these totals to level off by physically distancing ourselves from other households, washing our hands, and getting proper protective garments and testing technology to our medical personnel.

=============

Here’s another model that unfortunately does NOT work: logarithmic growth. If it were the case, then we would have about 10,700 deaths by mid-June.