This analysis looks at the figures for Covid-19 cases in England. The data is from https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv. We’ll read it using readr.
The most recent data is from 2020-09-12. This data is quite big: it contains 90564 rows and 7 columns. That’s why we’re processing it in R rather than a Google Spreadsheet or Excel.
Let’s take a look at the first five lines.
| AreaName | AreaCode | AreaType | Date | DailyCases | CumulativeCases | CumulativeRate |
|---|---|---|---|---|---|---|
| Adur | E07000223 | ltla | 2020-09-12 | 0 | 224 | 348.4 |
| Adur | E07000223 | ltla | 2020-09-11 | 2 | 224 | 348.4 |
| Adur | E07000223 | ltla | 2020-09-10 | 1 | 222 | 345.3 |
| Adur | E07000223 | ltla | 2020-09-09 | 2 | 221 | 343.7 |
| Adur | E07000223 | ltla | 2020-09-08 | 4 | 219 | 340.6 |
Sometimes this data contains no information for the most recent date (ie the number of daily cases for England is zero). In these cases, we’re going to remove the most recent date from the data because it’s not helpful.
As you can see, the data contains information about various different types of areas. Let’s get a list of the different types:
We’re going to do analysis at the most granular of these levels: on Lower tier local authorities (ltla), so we’ll filter the data down to rows that relate to that geography. We’ll also group by local authority, order with the most recent date first, select only the columns that actually have useful data in them, and have another look at the table.
| AreaName | AreaCode | Date | DailyCases | CumulativeCases | CumulativeRate |
|---|---|---|---|---|---|
| Adur | E07000223 | 2020-09-12 | 0 | 224 | 348.4 |
| Allerdale | E07000026 | 2020-09-12 | 0 | 429 | 438.8 |
| Amber Valley | E07000032 | 2020-09-12 | 0 | 514 | 401.1 |
| Arun | E07000224 | 2020-09-12 | 0 | 385 | 239.5 |
| Ashfield | E07000170 | 2020-09-12 | 0 | 671 | 524.6 |
Let’s pick an area and see whether we can plot the cumulative lab-confirmed cases for that area.
We’ll pick on Manchester. Let’s find Manchester’s data and look at it in a table.
| AreaName | AreaCode | Date | DailyCases | CumulativeCases | CumulativeRate |
|---|---|---|---|---|---|
| Manchester | E08000003 | 2020-09-12 | 0 | 4983 | 901.3 |
| Manchester | E08000003 | 2020-09-11 | 9 | 4983 | 901.3 |
| Manchester | E08000003 | 2020-09-10 | 48 | 4974 | 899.7 |
| Manchester | E08000003 | 2020-09-09 | 53 | 4926 | 891.0 |
| Manchester | E08000003 | 2020-09-08 | 67 | 4873 | 881.4 |
And now let’s try to plot out the cumulative cases.
Cumulative cases don’t really help with understanding whether the number of cases are going up or down. To do that, we need to look at the daily lab-confirmed cases instead. This is what that shows:
And this isn’t great, for two reasons. First, it’s extremely spiky, especially around weekends. So we’ll calculate a seven day rolling average, including 3 days each side of the day in question.
Second, we need to be careful about the figures from the last week: lab-confirmed cases are dated according to the date of the sample rather than when the result comes in. As results come in, the data is updated to increase the figures on previous dates. So figures from the last week or so tend to be underestimates of the eventual average. We’re going to indicate that on the graphs using an annotation.
| Date | DailyCases | AverageDailyCases |
|---|---|---|
| 2020-09-12 | 0 | 27.50000 |
| 2020-09-11 | 9 | 35.40000 |
| 2020-09-10 | 48 | 42.50000 |
| 2020-09-09 | 53 | 41.57143 |
| 2020-09-08 | 67 | 51.00000 |
| 2020-09-07 | 78 | 61.14286 |
| 2020-09-06 | 36 | 62.57143 |
| 2020-09-05 | 66 | 63.57143 |
| 2020-09-04 | 80 | 61.28571 |
| 2020-09-03 | 58 | 53.85714 |
I’ve created a Google spreadsheet of restrictions to make it easy for the graphs to indicate when restrictions are in place at different levels. Let’s take a look at this data.
The data has missing EndDate values when the restrictions haven’t yet been lifted. To make graphing this work, we need to substitute any missing values with the most recent date.
| AreaCode | AreaName | StartDate | EndDate | LegislationRef | GuidanceRef |
|---|---|---|---|---|---|
| E07000129 | Blaby | 2020-07-04 | 2020-07-18 | NA | https://www.gov.uk/guidance/leicester-lockdown-what-you-can-and-cannot-do |
| E07000130 | Charnwood | 2020-07-04 | 2020-07-18 | NA | https://www.gov.uk/guidance/leicester-lockdown-what-you-can-and-cannot-do |
| E07000135 | Oadby and Wigston | 2020-07-04 | 2020-08-01 | https://www.legislation.gov.uk/uksi/2020/685 | https://www.gov.uk/guidance/leicester-lockdown-what-you-can-and-cannot-do |
| E06000016 | Leicester | 2020-07-04 | 2020-08-03 | https://www.legislation.gov.uk/uksi/2020/685 | https://www.gov.uk/guidance/leicester-lockdown-what-you-can-and-cannot-do |
| E06000008 | Blackburn with Darwen | 2020-07-25 | 2020-08-01 | https://www.legislation.gov.uk/uksi/2020/800 | https://www.gov.uk/guidance/north-west-of-england-local-restrictions-what-you-can-and-cannot-do |
Now, when we plot the cases in a particular area, we can find any restrictions in that area and also plot those, alongside the nation-wide lockdown that ran from 23rd March to 4th July. Here I’ve shown that in blue.
The graphs we’re creating here indicate underestimates due to the time it takes for new cases to filter through into reported data. Another thing that can cause an underestimate in the number of cases shown is the amount of testing that is going on. It would be good to be able to show where that’s likely to have been happening.
In the UK, the numbers of tests that are carried out are only available at the national level. Let’s load the data for England and take a look at it. (I’m changing the names of the columns because the data uses the pillar numbers which are a bit confusing if you’re not steeped in the data. The request I’m using to get this data doesn’t include surveillance data numbers because they only exist for the whole of the UK, not specifically for England.)
| Date | NewNHSTests | NewCommercialTests | NewAntibodyTests |
|---|---|---|---|
| 2020-09-10 | 60550 | 111755 | 5893 |
| 2020-09-09 | 53739 | 104703 | 6184 |
| 2020-09-08 | 41909 | 107697 | 3797 |
| 2020-09-07 | 36995 | 107859 | 1256 |
| 2020-09-06 | 47526 | 88631 | 2807 |
Let’s graph those tests in a stacked bar chart.
Again this data is very spikey so we’ll smooth it out.
One interesting thing to look at is the relative numbers of tests and lab-confirmed cases. Now this isn’t an exact science as the lab-confirmed cases are based on numbers of people while the tests are based on numbers of tests. The same person could receive lots of tests, and the number of tests each person receives could vary over time (for example as testing policy changes). More comparative measures would be the percentage of positive tests, or the number of people tested, but this data isn’t published (which may mean it doesn’t exist, or simply that it isn’t shared).
Since we’re dealing with data at a national scale, let’s join together the data we have for cases in England with this testing data to take a look at testing percentages. When doing this, we’ll sum together the NHS and commercial tests (but not the antibody tests) to see what population is being reached.
If we plot this data, we can see the scale of testing (in blue) is massive compared to the scale of cases (in orange).
We can work out the percentage of tests that are leading to cases each day, and plot that as follows.
The graph shows how the relationship between number of lab-confirmed cases (all of which will have had a test) and overall numbers of tests has changed dramatically over time. At the start, the number of tests was only about three times the number of cases; more recently, there have been more than 100 tests per positive case.
This is a bit of a misleading picture, however. The data about commercial tests isn’t represented in the data until 14th July, so we have to consider before and after that date separately. Testing data is also only released every week, on a Thursday, so we should ignore everything after last Thursday. Let’s narrow down the data to the period between those dates and take another look.
The proportion roughly doubled between mid and late August, but the proportion is still very low compared to that at the beginning of the UK epidemic. Given the proportions before the start of June, it’s likely that the number of cases before then were underestimates simply due to lower testing.
An alternative mechanism for understanding the effect of testing is to use government figures for testing capacity. These are only available for the whole of the UK (when in fact capacity might be different in different nations and local authorities). Nevertheless, let’s look at what that data says.
## Parsed with column specification:
## cols(
## date = col_date(format = ""),
## capacityPillarOne = col_double(),
## capacityPillarTwo = col_double(),
## capacityPillarThree = col_double(),
## capacityPillarFour = col_double(),
## newPillarOneTestsByPublishDate = col_double(),
## newPillarTwoTestsByPublishDate = col_double(),
## newPillarThreeTestsByPublishDate = col_double(),
## newPillarFourTestsByPublishDate = col_double()
## )
| Date | NHSCapacity | CommercialCapacity | AntibodyCapacity | SurveillanceCapacity | NewNHSTests | NewCommercialTests | NewAntibodyTests | NewSurveillanceTests |
|---|---|---|---|---|---|---|---|---|
| 2020-09-10 | 82817 | 161000 | 120000 | 11100 | 72385 | 133274 | 5893 | 15913 |
| 2020-09-09 | 82817 | 161000 | 120000 | 1100 | 63688 | 123163 | 6184 | 16574 |
| 2020-09-08 | 82763 | 156000 | 120000 | 11100 | 48888 | 130364 | 3797 | 12643 |
| 2020-09-07 | 79773 | 147000 | 120000 | 11100 | 43098 | 126832 | 1256 | 18788 |
| 2020-09-06 | 79867 | 142000 | 120000 | 11100 | 54511 | 111608 | 2807 | 22662 |
We can calculate what percentage of capacity we’ve been operating on based on these figures, and then plot them. We’ll exclude surveillance data since the reported capacity is inaccurate (the data shows more tests being done than there is capacity for, for example). We’ll also only look at data since 14th July, since that’s when commercial testing data started being properly reported across the UK.
This shows in particular how reported commercial capacity was almost used up in mid August. The following graph shows overall NHS and commercial testing capacity since mid July.
The overall testing capacity isn’t maxed out, but remember these are figures for the whole of the UK. It’s likely (especially given recent news reports) that testing capacity is different in different parts of the country, and perhaps most stretched in places with the most cases. So we might expect figures since mid August to also be underesimates (compared to those from earlier in the summer).
Overall, though, it’s hard to draw solid conclusions from the available testing data about the degree to which lack of testing might be influencing the numbers of cases reported in the data.
First, we’ll have a look at what’s going on in the areas where there are or have been government lockdowns. These are listed in the restrictions table. We can cycle through them to take a look at what’s happening there.
The areas we really care about are those where there’s a high daily rate, and particularly those where that rate is increasing. We’ll add a couple of columns that track the degree to which the average daily rate is increasing or decreasing.
| Date | DailyRate | AverageDailyRate | DailyRateIncreasePercentage | AverageRateIncreasePercentage |
|---|---|---|---|---|
| 2020-09-12 | 0.000000 | 4.974152 | -0.29 | -0.17 |
| 2020-09-11 | 1.627905 | 6.403091 | -0.20 | -0.18 |
| 2020-09-10 | 8.682157 | 7.687327 | 0.02 | -0.15 |
| 2020-09-09 | 9.586548 | 7.519368 | -0.23 | -0.13 |
| 2020-09-08 | 12.118844 | 9.224792 | -0.20 | -0.09 |
Now let’s narrow down to places that have seen concerning levels of cases over the last week.
The most recent date is 2020-09-12 so one week ago is 2020-09-06. First let’s look at those who had widespread daily rates (over 7 cases / day / 100,000 population) at some point in the last week.
Now we’ll create charts for those places where there is widespread infection that do not currently have additional restrictions:
Finally, we’ll make a list of places where there’s been a significant infection (over 4 cases / day / 100,000 population) at some point over the last week and that are seeing increases in those rates.
Now we’ll create charts for those places that do not already have additional restrictions:
## Warning: Removed 1 rows containing missing values (geom_text).
## Warning: Removed 1 rows containing missing values (geom_text).
## Warning: Removed 1 rows containing missing values (geom_text).