For a few weeks now ears, eyes and thoughts have been consumed with corona virus issues - status, policy making, crisis management. Early in March, my co-chairs and I pulled the plug on our TEIS academic workshop (but had a very successful event over Zoom, instead of the original Newport Beach location). Next, I conducted my final class of the quarter over Zoom rather than in-person. And then, as the picture started looking bleaker, my friends and I began discussing broader community- and state-level actions that were needed to confront the impending crisis. We put out a blog (Needed: Bold Decisions to Stop Covid-19) and a change.org petition on March 12, calling for suspension of in-person activities at schools, universities and other places of gathering. Around this time, I started laying my hands on daily data about the spread of corona virus in the US.
This blog is based on looking at the data over the last few days, trying to get a sense of how bad things are, their trends, and whether institutional measures are having a mitigating effect. My analysis is based on daily data from https://covidtracking.com/; see the section on “Data Source” and “Data Limitations and Data Quality” for a discussion of data limitations and the need to interpret any analyses cautiously.
For many weeks after this crisis hit the US, the main story was the growing number of infections and the lack of testing. During March, testing capacity has increased rapidly - from about 500 tests per day on March 2 to 50,000 tests per day on March 22 - although it still appears not to be enough, especially in certain states.
We see a huge growth in infections, and part of it could well be due to increased testing. But, before digging into it, you might want to check the section “Data Limitations and Data Quality”.
Is the situation getting better or worse, and at what rate? Are the control measures working? Let’s look at what the data tell us about whether this is getting worse. Consider the increase in known infections i.e., positive test results.
Some of this increase is because coronavirus infections are spreading, but some of it is also because volume of testing is growing rapidly.
At one extreme it could be that a large-enough segment of the population was infected some days ago - and there are no new infections since then - but as we’re testing more people we’re catching more infected ones. At the other extreme, it is that more people are getting infected each day - and we’re seeing exponentially more - and that is why they show up in the results when we test more people.
To visualize this, suppose that at some day, the mix of infected and non-infected people looked like the picture below. The red dots are infected individuals, the orange ones are not infected but otherwise indistinguishable from them, and the green ones are healthy.
Suppose that the red-orange-green status and mix remained constant. That is, many people are already infected - but no more new infections! Now imagine a circular net, representing people we picked up to test one day. The number of positive results for the day (new infections) will depend on the size of the net, the method for picking whom to test (we’ll ignore this for a moment), and the mix of red and orange (which remains constant). Therefore, if we didn’t change the method for picking whom to test, then the number of positive test results should be somewhat proportional to the size of the net. The more you test, the more show up infected - but the proportion or percentage of people who show up positive should remain the same regardless of the size of the net. (Recall this is the case where the red-orange status remains static.)
What do the numbers tell us?
The left panel shows what percentage of daily test results are positive (again, with caveats - no change in the method for deciding whom to test, etc.). The percentage is up and down a bit, and increasing. This suggests we’re somewhere between the two extremes discussed. If there were no new infections the positive rate would have been nearly flat (again, assuming we aren’t picking more from the “more likely to be infected” group).
Do the numbers convey good news or dire news? Too early to tell. There’s evidence of increasing spread, but there’s also just too much “noise” in the data. We’re mixing in numbers across states and counties that are extremely different in terms of their infected populations, testing regimes, etc. But - bottomline - the data certainly do not convey “good news”.
One metric everyone is interested in is “how quickly is the cumulative number of known infections growing each day?” For instance, if we had 100 known infections until yesterday, and then learnt about another 20 new infections in the last day, then the growth rate (over cumulative) is 20%. Let’s look at these numbers.
With just a few days observations, the daily rate of growth is bouncing around. It is also very sensitive to small measurement errors or overlaps (allocating infections on a particular day to previous day or next day).
However, once we have enough data points then we can look for an approximate constant daily rate of growth (say r).
The big question - apart from “what is happening” - is “what should we do about it?” We’ve seen several measures such as lockdowns at the state and federal level.
First, there is no question that significant restrictions - such as a lockdown - are needed. Despite some who argue that this is an overreaction, the reality is that the infection has spread undetected among the population, that many more people have the virus than we know, and these unknown asymptomatic individuals will keep propogating the spread. Slowing the spread down - which the restrictions do - will enable us to “flatten the curve” and make it less unmanageable for hospitals and other responders. There are even others who argue that current lockdown measures are too untargeted (and based on data which is highly unprecise) - and that a more precise targeting would still limit the spread but reduce the accompanying harm (see, e.g., https://www.dailywire.com/news/stanford-professor-data-indicates-were-overreacting-to-coronavirus). If you this stated in simple words, it’s like getting out the big cannons to shoot a few ants. If we could compute a precise targeting solution, we should. However, even then, it may not be implementable. The population is simply not capable of reacting properly to a highly precise, multidimensional, set of lockdown conditions. A general lockdown is more likely to be enforced and implemented.
The second question is, do we know that the lockdowns are having a positive effect? If the lockdowns and other restrictions are working, that should slow the rate of spread of the infected population (from what it was prior to the lockdowns), and this should show up in the numbers of people who test positive, the rate of positive test results, and the rate of growth of the infected population. But there are a lot of important caveats to keep in mind.
Lesson: Don’t give up. Don’t think that the measures are not working just because you don’t observe quick results.
With all these caveats, looking at the data, there’s some reason to believe that the measures are beginning to take effect—the growth rate curve is trending downwards. It will be useful to look at this by state.
Let’s look ahead to see what we can do with growth rate r once we have it.
An important question on everyone’s mind is “how many days does it take to double the number of infections”. We’re seeing very rapid doubling in some countries, e.g., in Spain it is about 2 days to double.
So, we can ask how many days d(r) does it take to double the number of infections when the daily rate of growth is r. The relationship is (1+r)^d = 2. In other words, Log(2) = d*log(1+r), so that d = log(2)/log(1+r). (Note: log = with base e.)
At a 40% growth rate, the total number of infections doubles almost every 2 days. If we can reduce the rate of growth from 40% to 30% to 25%, the doubling-period goes up from about 2 days to 2.6 days to about 3.1 days. Then reducing again from 25% to 20% growth rate takes us from 3.1 days to 3.8 days, and then if we reduce again to 10% it goes up to over 7 days!
A few things.
Testing capacity (though still quite low in an absolute sense) is increasing rapidly. That’s good!
Each day, we’re seeing roughly the same percentage of positive results (among all tests) - which is partially good news.
Each day we’re getting many more known infections (20-40% new ones, relative to the cumulative until then) - which is scary - but a lot of this is due to increased testing. However, it does suggest that a large number of untested people are infected.
Drawing sharp conclusions from national-level data is too dicey at the moment, because the national numbers hide a lot of across-state heterogeneity in how tests are done, who’s tested, how they’re counting various things, levels of lockdown and other measures etc. But the lesson from that is be cautious until you know solidly that there’s no reason to be concerned
Next, we’ll want to look at state level data. We do this below - but with a huge caveat: state-level numbers are still too sketchy and finicky, so any insights are probably even more faulty.
## "","x"
## "1","/cloud/project/states-daily.csv"
Let’s pull in state-level data and focus on a few states.
date | state | n.pos | n.neg | n.result | n.pend | n.death | n.total |
---|---|---|---|---|---|---|---|
2020-03-24 | CA | 2102 | 13452 | 15554 | 12100 | 40 | 27654 |
2020-03-23 | CA | 1733 | 12567 | 14300 | 12100 | 27 | 26400 |
2020-03-22 | CA | 1536 | 11304 | 12840 | NA | 27 | 12840 |
2020-03-21 | CA | 1279 | 11249 | 12528 | NA | 24 | 12528 |
2020-03-20 | CA | 1063 | 10424 | 11487 | NA | 20 | 11487 |
2020-03-19 | CA | 924 | 8787 | 9711 | NA | 18 | 9711 |
2020-03-18 | CA | 611 | 7981 | 8592 | NA | 13 | 8592 |
2020-03-17 | CA | 483 | 7981 | 8464 | NA | 11 | 8407 |
2020-03-16 | CA | 335 | 7981 | 8316 | NA | 6 | 8316 |
2020-03-15 | CA | 293 | 916 | 1209 | NA | 5 | 1209 |
2020-03-14 | CA | 252 | 916 | 1168 | NA | 5 | 1168 |
2020-03-13 | CA | 202 | 916 | 1118 | NA | 4 | 1118 |
2020-03-12 | CA | 202 | 916 | 1118 | NA | 4 | 1118 |
2020-03-11 | CA | 157 | 916 | 1073 | NA | NA | 1073 |
2020-03-10 | CA | 133 | 690 | 823 | NA | NA | 823 |
2020-03-09 | CA | 114 | 690 | 804 | NA | NA | 804 |
2020-03-08 | CA | 88 | 462 | 550 | NA | NA | 550 |
2020-03-07 | CA | 69 | 462 | 531 | NA | NA | 531 |
2020-03-06 | CA | 60 | 462 | 522 | NA | NA | 522 |
2020-03-05 | CA | 53 | 462 | 515 | NA | NA | 515 |
2020-03-04 | CA | 53 | 462 | 515 | NA | NA | 515 |
Let’s look at the data for California.
The reported number of infections is growing in the state.
There’s evidently some flaw in the data here, in terms of tests conducted each day. It seems a few days of test results were reported in bulk and assigned to a single date. More importantly, it is bothersome that the number of tests per day is not increasing, like it is nationally and like it is in states like New York (see below).
And the rate of positive test results each day?
Now we can compute the daily change as an absolute difference and as a rate of growth.
Let’s look at the data for New York.
date | state | n.pos | n.neg | n.result | n.pend | n.death | n.total |
---|---|---|---|---|---|---|---|
2020-03-24 | NY | 25665 | 65605 | 91270 | NA | 210 | 91270 |
2020-03-23 | NY | 20875 | 57414 | 78289 | NA | 114 | 78289 |
2020-03-22 | NY | 15168 | 46233 | 61401 | NA | 114 | 61401 |
2020-03-21 | NY | 10356 | 35081 | 45437 | NA | 44 | 45437 |
2020-03-20 | NY | 7102 | 25325 | 32427 | NA | 35 | 32427 |
2020-03-19 | NY | 4152 | 18132 | 22284 | NA | 12 | 22284 |
2020-03-18 | NY | 2382 | 12215 | 14597 | NA | 12 | 14597 |
2020-03-17 | NY | 1700 | 5506 | 7206 | NA | 7 | 7206 |
2020-03-16 | NY | 950 | 4543 | 5493 | NA | 7 | 5493 |
2020-03-15 | NY | 729 | 4543 | 5272 | NA | 3 | 5272 |
2020-03-14 | NY | 524 | 2779 | 3303 | NA | NA | 3303 |
2020-03-13 | NY | 421 | 2779 | 3200 | NA | NA | 3200 |
2020-03-12 | NY | 216 | NA | NA | NA | NA | 216 |
2020-03-11 | NY | 216 | NA | NA | NA | NA | 216 |
2020-03-10 | NY | 173 | 92 | 265 | NA | NA | 265 |
2020-03-09 | NY | 142 | 92 | 234 | NA | NA | 234 |
2020-03-08 | NY | 105 | 92 | 197 | NA | NA | 197 |
2020-03-07 | NY | 76 | 92 | 168 | 236 | NA | 404 |
2020-03-06 | NY | 33 | 92 | 125 | 236 | NA | 361 |
2020-03-05 | NY | 22 | 76 | 98 | 24 | NA | 122 |
2020-03-04 | NY | 6 | 48 | 54 | 24 | NA | 78 |
The reported number of infections is growing in the state.
And the rate of positive test results each day?
Now we can compute the daily change as an absolute difference and as a rate of growth.
Let’s look at the data for Washington.
date | state | n.pos | n.neg | n.result | n.pend | n.death | n.total |
---|---|---|---|---|---|---|---|
2020-03-24 | WA | 2221 | 31712 | 33933 | NA | 110 | 33933 |
2020-03-23 | WA | 1996 | 28879 | 30875 | NA | 95 | 30875 |
2020-03-22 | WA | 1793 | 25328 | 27121 | NA | 94 | 27121 |
2020-03-21 | WA | 1524 | 21719 | 23243 | NA | 83 | 23243 |
2020-03-20 | WA | 1376 | 19336 | 20712 | NA | 74 | 20712 |
2020-03-19 | WA | 1187 | 15918 | 17105 | NA | 66 | 17105 |
2020-03-18 | WA | 1012 | 13117 | 14129 | NA | 52 | 14129 |
2020-03-17 | WA | 904 | 11582 | 12486 | NA | 48 | 12486 |
2020-03-16 | WA | 769 | 9451 | 10220 | NA | 42 | 10220 |
2020-03-15 | WA | 642 | 7122 | 7764 | NA | 40 | 7764 |
2020-03-14 | WA | 568 | 6001 | 6569 | NA | 37 | 6569 |
2020-03-13 | WA | 457 | 4350 | 4807 | NA | 31 | 4807 |
2020-03-12 | WA | 337 | 3037 | 3374 | NA | 29 | 3403 |
2020-03-11 | WA | 267 | 2175 | 2442 | NA | 24 | 2466 |
2020-03-10 | WA | 162 | 1110 | 1272 | NA | NA | 1272 |
2020-03-09 | WA | 136 | 1110 | 1246 | NA | NA | 1246 |
2020-03-08 | WA | 102 | 640 | 742 | 60 | NA | 802 |
2020-03-07 | WA | 102 | 370 | 472 | 66 | NA | 538 |
2020-03-06 | WA | 79 | 370 | 449 | NA | NA | 449 |
2020-03-05 | WA | 70 | NA | NA | NA | NA | 70 |
2020-03-04 | WA | 39 | NA | NA | NA | NA | 39 |
The reported number of infections is growing in the state.
And the rate of positive test results each day?
Now we can compute the daily change as an absolute difference and as a rate of growth.
Let’s look at the data for Massachusetts.
date | state | n.pos | n.neg | n.result | n.pend | n.death | n.total |
---|---|---|---|---|---|---|---|
2020-03-24 | MA | 1159 | 12590 | 13749 | NA | 11 | 13749 |
2020-03-23 | MA | 777 | 8145 | 8922 | NA | 9 | 8922 |
2020-03-22 | MA | 646 | 5459 | 6105 | NA | 5 | 6128 |
2020-03-21 | MA | 525 | 4752 | 5277 | NA | 1 | 5277 |
2020-03-20 | MA | 413 | 3678 | 4091 | NA | 1 | 4091 |
2020-03-19 | MA | 328 | 2804 | 3132 | NA | NA | 3132 |
2020-03-18 | MA | 256 | 2015 | 2271 | NA | NA | 2271 |
2020-03-17 | MA | 218 | 1541 | 1759 | NA | NA | 1759 |
2020-03-16 | MA | 164 | 352 | 516 | NA | NA | 516 |
2020-03-15 | MA | 138 | 352 | 490 | NA | NA | 490 |
2020-03-14 | MA | 138 | 352 | 490 | NA | NA | 490 |
2020-03-13 | MA | 123 | 92 | 215 | NA | NA | 215 |
2020-03-12 | MA | 95 | NA | NA | NA | NA | 95 |
2020-03-11 | MA | 92 | NA | NA | NA | NA | 92 |
2020-03-10 | MA | 92 | NA | NA | NA | NA | 92 |
2020-03-09 | MA | 41 | NA | NA | NA | NA | 41 |
2020-03-08 | MA | 13 | NA | NA | NA | NA | 13 |
2020-03-07 | MA | 13 | NA | NA | NA | NA | 13 |
2020-03-06 | MA | 8 | NA | NA | NA | NA | 8 |
2020-03-05 | MA | 2 | NA | NA | NA | NA | 2 |
2020-03-04 | MA | 2 | NA | NA | NA | NA | 2 |
The reported number of infections is growing in the state.
And the rate of positive test results each day?
Now we can compute the daily change as an absolute difference and as a rate of growth.
Let’s look at the data for Texas.
The reported number of infections is growing in the state.
And the rate of positive test results each day?
Now we can compute the daily change as an absolute difference and as a rate of growth.
Let’s look at the data for Florida.
date | state | n.pos | n.neg | n.result | n.pend | n.death | n.total |
---|---|---|---|---|---|---|---|
2020-03-24 | FL | 1412 | 13127 | 14539 | 1008 | 18 | 15547 |
2020-03-23 | FL | 1171 | 11063 | 12234 | 860 | 14 | 13094 |
2020-03-22 | FL | 830 | 7990 | 8820 | 963 | 13 | 9783 |
2020-03-21 | FL | 658 | 6579 | 7237 | 1002 | 12 | 8239 |
2020-03-20 | FL | 520 | 1870 | 2390 | 1026 | 10 | 3416 |
2020-03-19 | FL | 390 | 1533 | 1923 | 1019 | 8 | 2942 |
2020-03-18 | FL | 314 | 1225 | 1539 | 954 | 7 | 2493 |
2020-03-17 | FL | 186 | 940 | 1126 | 872 | 6 | 1998 |
2020-03-16 | FL | 141 | 684 | 825 | 514 | 4 | 1339 |
2020-03-15 | FL | 116 | 678 | 794 | 454 | 4 | 1248 |
2020-03-14 | FL | 77 | 478 | 555 | 221 | 3 | 776 |
2020-03-13 | FL | 50 | 478 | 528 | 221 | 2 | 749 |
2020-03-12 | FL | 32 | 301 | 333 | 147 | 2 | 480 |
2020-03-11 | FL | 28 | 301 | 329 | 147 | 2 | 476 |
2020-03-10 | FL | 19 | 222 | 241 | 155 | NA | 396 |
2020-03-09 | FL | 18 | 140 | 158 | 115 | NA | 273 |
2020-03-08 | FL | 17 | 118 | 135 | 108 | NA | 243 |
2020-03-07 | FL | 14 | 100 | 114 | 88 | NA | 202 |
2020-03-06 | FL | 9 | 55 | 64 | 51 | NA | 115 |
2020-03-05 | FL | 9 | 31 | 40 | 69 | NA | 109 |
2020-03-04 | FL | 2 | 24 | 26 | 16 | NA | 42 |
The reported number of infections is growing in the state.
And the rate of positive test results each day?
Now we can compute the daily change as an absolute difference and as a rate of growth.
The first question is where to get the data, whose data to believe? For instance, on March 18 morning, the New York Times’ updated number (March 18, 10 am, based on Johns Hopkins University) was 5,879, but the Johns Hopkins page itself showed 6,519 confirmed cases. The CDC puts out numbers daily (updated at 4pm ET) but this appears to be a substantial undercount relative to others: 4,226. This site shows a history of daily numbers (in US at 4pm EDT): https://covidtracking.com/, with a March 17 update of 5,723 cases.
I’ll use the covidtracking site in this analysis, mainly because it provides a running table of data vs. just current reports. Here’s a look at the data. n.states is the number of states with known infections (including Puerto Rico and other territories). The next 3 columns are about known test results (positive, negative, total) and pending results.
date | n.states | n.pos | n.neg | n.result | n.pend | n.hosp | n.death | n.total |
---|---|---|---|---|---|---|---|---|
2020-03-24 | 56 | 51970 | 292758 | 344728 | 14433 | 4468 | 675 | 359161 |
2020-03-23 | 56 | 42164 | 237321 | 279485 | 14571 | 3325 | 471 | 294056 |
2020-03-22 | 56 | 31888 | 193463 | 225351 | 2842 | 2554 | 398 | 228216 |
2020-03-21 | 56 | 23203 | 155909 | 179112 | 3477 | 1964 | 272 | 182589 |
2020-03-20 | 56 | 17038 | 118147 | 135185 | 3336 | NA | 219 | 138521 |
2020-03-19 | 56 | 11723 | 89119 | 100842 | 3025 | NA | 160 | 103867 |
2020-03-18 | 56 | 7731 | 66225 | 73956 | 2538 | NA | 112 | 76495 |
2020-03-17 | 56 | 5723 | 47604 | 53327 | 1687 | NA | 90 | 54957 |
2020-03-16 | 56 | 4019 | 36104 | 40123 | 1691 | NA | 71 | 41714 |
2020-03-15 | 51 | 3173 | 22548 | 25721 | 2242 | NA | 60 | 27963 |
2020-03-14 | 51 | 2450 | 17102 | 19552 | 1236 | NA | 49 | 20789 |
2020-03-13 | 51 | 1922 | 13613 | 15535 | 1130 | NA | 39 | 16665 |
2020-03-12 | 51 | 1315 | 7949 | 9264 | 673 | NA | 36 | 9966 |
2020-03-11 | 51 | 1053 | 5978 | 7031 | 563 | NA | 27 | 7617 |
2020-03-10 | 51 | 778 | 3807 | 4585 | 469 | NA | NA | 5054 |
2020-03-09 | 51 | 584 | 3367 | 3951 | 313 | NA | NA | 4264 |
2020-03-08 | 51 | 417 | 2335 | 2752 | 347 | NA | NA | 3099 |
2020-03-07 | 51 | 341 | 1809 | 2150 | 602 | NA | NA | 2752 |
2020-03-06 | 36 | 223 | 1571 | 1794 | 458 | NA | NA | 2252 |
2020-03-05 | 24 | 176 | 953 | 1129 | 197 | NA | NA | 1326 |
2020-03-04 | 14 | 118 | 748 | 866 | 103 | NA | NA | 969 |
Test data involves a lag of several days, perhaps 5-15. Reason: testing-and-reporting itself takes a few days, plus most people who get tested probably were infected several days prior to the test.
These are country level numbers for the US, aggregated acrosss cities, counties, states – with lot of data mixing across highly heterogeneous reporting sources. Even, for instance, some locations report only positive tests while others report both positive and negative.
The rate of infection varies across states, so does the rate of testing, and the nature of testing - who is tested and how much of a lag there is in test results.
Lower-level reports arrive at different times of day, so “4pm ET” doesn’t really mean that.
Very few tests are being performed in the US - and you might think this means that those being tested are precisely the ones most likely to have an infection. If so, that should show up as very high ratios of positive to negative results each day.
One could look at state-level data (as I do below, just as an exercise to see how a few specific states are doing), but the same occurs there because state data is just an aggregate of county-level reporting, and so on. I’d love to get a hold of every individual-level (or transactional) data.
With that said, let’s see what we have, see what clues are contained in the data … but, at the end, interpret all results cautiously.