This analysis looks at the figures for Covid-19 cases in England. It’s a little dig into what this data can - and can’t - tell us. The source code is available on Github.

The data is from https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv and reports “lab-confirmed positive COVID-19 PCR test on or up to the specimen date”. The specimen date is the date that someone had the test taken, and it can take a number of days to process the test and to report the result. That has some implications about the numbers from more recent days, which we’ll come back to later. The official site has more details about the meaning of case numbers.

The most recent data shown here is from 2020-10-03. This data is quite big: it contains 100545 rows and 7 columns. That’s why we’re processing it in R rather than a Google Spreadsheet or Excel.

Let’s take a look at the first five lines.

Confirmed Covid-19 cases in England
AreaName AreaCode AreaType Date DailyCases CumulativeCases CumulativeRate
Adur E07000223 ltla 2020-10-03 0 263 409.0
Adur E07000223 ltla 2020-10-02 1 263 409.0
Adur E07000223 ltla 2020-10-01 2 262 407.5
Adur E07000223 ltla 2020-09-30 3 260 404.3
Adur E07000223 ltla 2020-09-29 2 257 399.7

Sometimes this data seems to contain no information for the most recent date. This manifests itself in the data when the number of daily cases given for England is zero (which you wouldn’t expect unless we had really reached zero cases across the country, which is unlikely to happen any time soon). In these cases, we’re going to remove the data from the most recent date (all of which are zeros) because it is inaccurate and throws off the rest of the calculations.

As you can see, the data contains information about various different types of areas. Let’s get a list of the different types:

We’re going to do analysis at the most granular of these levels: on Lower tier local authorities (ltla), so we’ll filter the data down to rows that relate to that geography. We’ll also group by local authority, order with the most recent date first, select only the columns that actually have useful data in them, and have another look at the table.

Confirmed Covid-19 cases in lower tier local authorities
AreaName AreaCode Date DailyCases CumulativeCases CumulativeRate
Adur E07000223 2020-10-03 0 263 409.0
Allerdale E07000026 2020-10-03 0 534 546.2
Amber Valley E07000032 2020-10-03 0 698 544.7
Arun E07000224 2020-10-03 0 470 292.4
Ashfield E07000170 2020-10-03 0 863 674.7

Some of the data that we will look at is only available for regions, not at the local authority level. So we’ll use a lookup file from ONS to map the local authorities we have to regions.

Confirmed Covid-19 cases in lower tier local authorities
RegionName RegionCode AreaName AreaCode Date DailyCases CumulativeCases CumulativeRate
South East E12000008 Adur E07000223 2020-10-03 0 263 409.0
North West E12000002 Allerdale E07000026 2020-10-03 0 534 546.2
East Midlands E12000004 Amber Valley E07000032 2020-10-03 0 698 544.7
South East E12000008 Arun E07000224 2020-10-03 0 470 292.4
East Midlands E12000004 Ashfield E07000170 2020-10-03 0 863 674.7

Examining cases in an area

Let’s pick an area and see whether we can plot the cumulative lab-confirmed cases for that area.

We’ll pick on Manchester. We can look at Manchester’s data in a table:

Confirmed Covid-19 cases in Manchester
RegionName RegionCode AreaName AreaCode Date DailyCases CumulativeCases CumulativeRate
North West E12000002 Manchester E08000003 2020-10-03 2 9824 1776.9
North West E12000002 Manchester E08000003 2020-10-02 132 9822 1776.6
North West E12000002 Manchester E08000003 2020-10-01 391 9690 1752.7
North West E12000002 Manchester E08000003 2020-09-30 560 9299 1682.0
North West E12000002 Manchester E08000003 2020-09-29 495 8739 1580.7

Now let’s plot out the cumulative cases.

Cumulative cases don’t really help with understanding whether the number of cases are going up or down. To do that, we need to look at the daily lab-confirmed cases instead. This is what that shows:

And this isn’t great, for two reasons. First, it’s extremely spiky, especially around weekends when there is less testing. So we’ll calculate a seven day rolling average, including 3 days each side of the day in question.

Second, we need to be careful about the figures from the last week: lab-confirmed cases are dated according to the date of the specimen (when the test was done) rather than when the result comes in, and they can take time to both process and report. As results come in, the data is updated to increase the figures on previous dates. So figures from the last week or so tend to be underestimates of the eventual numbers. We’re going to indicate that on the graphs using an annotation.

Confirmed Covid-19 cases in Manchester
Date DailyCases AverageDailyCases
2020-10-03 2 271.2500
2020-10-02 132 316.0000
2020-10-01 391 336.3333
2020-09-30 560 345.4286
2020-09-29 495 375.7143
2020-09-28 438 391.4286
2020-09-27 400 369.1429
2020-09-26 214 322.2857
2020-09-25 242 277.7143
2020-09-24 235 233.1429

Indicating when restrictions are in place

I’ve created a Google spreadsheet of restrictions to make it easy for the graphs to indicate when restrictions are in place at different levels. Let’s take a look at this data.

Note that the data has missing EndDate values when the restrictions haven’t yet been lifted. To make the graphing work, we need to substitute any missing values with the most recent date.

Restrictions data
AreaCode AreaName StartDate EndDate LegislationRef GuidanceRef
E07000129 Blaby 2020-07-04 2020-07-18 NA https://www.gov.uk/guidance/leicester-lockdown-what-you-can-and-cannot-do
E07000130 Charnwood 2020-07-04 2020-07-18 NA https://www.gov.uk/guidance/leicester-lockdown-what-you-can-and-cannot-do
E07000135 Oadby and Wigston 2020-07-04 2020-08-01 https://www.legislation.gov.uk/uksi/2020/685 https://www.gov.uk/guidance/leicester-lockdown-what-you-can-and-cannot-do
E06000008 Blackburn with Darwen 2020-07-25 2020-08-01 https://www.legislation.gov.uk/uksi/2020/800 https://www.gov.uk/guidance/north-west-england-local-restrictions
E06000032 Luton 2020-07-25 2020-08-01 https://www.legislation.gov.uk/uksi/2020/800 https://www.gov.uk/guidance/luton-local-restrictions

Now, when we plot the cases in a particular area, we can find any restrictions in that area and also plot those, alongside the nation-wide lockdown that ran from 23rd March to 4th July. Here I’ve shown that in blue.

Here we can see that the national lockdown did have an effect in Manchester, but that the cases started rising again a couple of weeks before the local lockdown was put into effect. The local lockdown doesn’t seem to have had as big an effect on the number of cases.

Understanding testing levels

The graphs we’re creating here indicate underestimates due to the time it takes for new cases to filter through into reported data. Another thing that can cause an underestimate in the number of cases shown is the amount of testing that is going on and the degree to which people go and get testing (which they might not do for various reasons, such as if they think it will be unpleasant, time-consuming, or deprioritise their own testing in favour of others such as health and care workers). It would be good to be able to show where that’s likely to have been happening.

In the UK, the numbers of tests that are carried out are only available at the national level. We’ll load the data for England and take a look at it. (We’ll change the names of the columns because the data uses the pillar numbers which are a bit confusing if you’re not steeped in the data. The request I’m using to get this data doesn’t include surveillance data numbers - tests that are sent out randomly to get an understanding of prevalence across the country, rather than tests requested by people who have symptoms - because they only exist for the whole of the UK, not specifically for England.)

Number of lab-confirmed positive or negative COVID-19 test results, by pillar
Date NewNHSTests NewCommercialTests NewAntibodyTests
2020-10-02 59893 143215 5282
2020-10-01 64941 139425 5705
2020-09-30 58870 120666 5088
2020-09-29 50533 119633 3786
2020-09-28 48428 148897 1154

Let’s graph those tests in a stacked bar chart.

Again this data is very spikey so we’ll smooth it out.

Comparing tests to cases

One interesting thing to look at is the relative numbers of tests and lab-confirmed cases. Now this isn’t an exact science as the lab-confirmed cases are based on numbers of people while the tests are based on numbers of tests. The same person could receive lots of tests, and the number of tests each person receives could vary over time (for example as testing policy changes). More comparative measures would be the percentage of positive tests, or the number of people tested, but this data isn’t published (which may mean it doesn’t exist, or simply that it isn’t shared).

Since we’re dealing with data at a national scale, let’s join together the data we have for cases in England with this testing data to take a look at testing percentages. When doing this, we’ll sum together the NHS and commercial tests (but not the antibody tests, which measure when people have had Covid-19 rather than whether they currently have it) to see what population is being reached.

If we plot this data, we can see the scale of testing (in blue) is massive compared to the scale of cases (in orange).

We can work out the percentage of tests that are leading to cases each day, and plot that as follows.

The graph shows how the relationship between number of lab-confirmed cases (all of which will have had a test) and overall numbers of tests has changed dramatically over time. At the start, the number of tests was only about three times the number of cases; more recently, there have been more than 100 tests per positive case.

This is a bit of a misleading picture, however. The data about commercial tests isn’t represented in the data until 14th July, so we have to consider before and after that date separately. Testing data is also only released every week, on a Thursday, so we should ignore everything after last Thursday. Let’s narrow down the data to the period between those dates and take another look.

The proportion roughly doubled between mid and late August, but the proportion is still very low compared to that at the beginning of the UK epidemic. Given the proportions before the start of June, it’s likely that the number of cases before then were underestimates simply due to lower testing.

Comparing tests to capacity

An alternative mechanism for understanding the effect of testing is to use government figures for testing capacity. These are only available for the whole of the UK (when in fact capacity might be different in different nations and local authorities). Nevertheless, let’s look at what that data says.

Number and capacity of lab-confirmed positive or negative COVID-19 test results, by pillar
Date NHSCapacity CommercialCapacity AntibodyCapacity SurveillanceCapacity NewNHSTests NewCommercialTests NewAntibodyTests NewSurveillanceTests
2020-10-02 83299 213400 120000 14689 70429 168669 5282 27172
2020-10-01 83393 227600 120000 13532 77402 156459 5705 22232
2020-09-30 83299 205600 120000 9013 69549 141378 5088 21285
2020-09-29 83090 204600 120000 17194 57960 140442 3786 29735
2020-09-28 80215 188600 120000 22684 56615 170285 1154 39384

We can calculate what percentage of capacity we’ve been operating on based on these figures, and then plot them. We’ll exclude surveillance data since the reported capacity is inaccurate (the data shows more tests being done than there is capacity for, for example). We’ll also only look at data since 14th July, since that’s when commercial testing data started being properly reported across the UK.

This shows in particular how reported commercial capacity was almost used up in mid August. The following graph shows overall NHS and commercial testing capacity since mid July.

The overall testing capacity isn’t maxed out, but remember these are figures for the whole of the UK. It’s likely (especially given recent news reports) that testing capacity is different in different parts of the country, and perhaps most stretched in places with the most cases. So we might expect figures since mid August to also be underestimates (compared to those from earlier in the summer).

Overall, though, it’s hard to draw solid conclusions from the available testing data about the degree to which lack of testing might be influencing the numbers of cases reported in the data.

Comparing local authorities

Let’s return now to the overall data for local authorities. When we start looking across local authorities, we need to bear in mind that different local authorities have different sizes. A certain absolute number of cases in a small local authority is more concerning than if that number of cases were present in a larger local authority, because it would mean a greater percentage of the population were affected.

So we need to calculate infection rates. To do that, we need population data from the ONS which is only available as an Excel file. I’ve cheated and downloaded and converted it to a CSV file locally so that I can use it, rather than try to load the Excel file from source.

Now we can join that data together with the data we have on cases, and from it calculate rates per 100,000 people. The table shows those areas with the highest current average daily rate (which, remember, are likely to be underestimates because recent case numbers are underestimates).

Confirmed Covid-19 case rates per 100,000
AreaName Date AverageDailyCases All ages AverageDailyRate
Nottingham 2020-10-03 186.00 332900 55.87263
Knowsley 2020-10-03 83.50 150862 55.34860
Manchester 2020-10-03 271.25 552858 49.06323
Liverpool 2020-10-03 226.75 498042 45.52829
Newcastle upon Tyne 2020-10-03 136.25 302820 44.99373
Exeter 2020-10-03 41.00 131405 31.20125
Leeds 2020-10-03 242.50 793139 30.57472
Sefton 2020-10-03 82.75 276410 29.93741
Sheffield 2020-10-03 174.75 584853 29.87930
Rochdale 2020-10-03 64.75 222412 29.11264

What kind of rate of new cases should we worry about? Well, in California, they have defined four levels of risk:

As discussed, it is possible to reduce the number of new cases in the data by simply not testing as many people. In the UK, testing capacity is probably not limited in this way, but there isn’t enough granularity in the public testing figures to be able to tell. Regardless, the figures above give a rough indication of how concerned to be about the level of infection in an area.

We’ll try to plot these on a map. There’s a great tutorial for this which we’ll just follow. We’re using ultra generalised 2019 boundaries from ONS, which like the population data we’ve downloaded locally into a folder.

First we load the shape file and ensure that it knows what regions we’re using from it.

Then we filter the local authority data to the most recent date (2020-10-03) and retain only area code and the average daily rate. This is the data that’s relevant for the map.

Then we merge the map data into the shape data and plot the map.

This highlights some areas that have high rates of cases, but unless you’re great at UK geography you might struggle to name them. So we’ll move on to look at some individually in a bit.

Areas where there are or have been government lockdowns

First, we’ll have a look at what’s going on in the areas where there are or have been government lockdowns. These are listed in the restrictions table. We can cycle through them - here we’re sorting the areas based on the date the restriction came into force - to take a look at what’s happening there.

Interesting areas to look at here are:

Current restrictions

Previous restrictions

Identifying worrying areas

The areas we really care about are those where there’s a high daily rate, and particularly those where that rate is increasing. We’ll add a couple of columns that track the degree to which the average daily rate is increasing or decreasing.

Identifying trend in Manchester
Date DailyRate AverageDailyRate DailyRateIncreasePercentage AverageRateIncreasePercentage
2020-10-03 0.3617565 49.06323 -0.16 -0.08
2020-10-02 23.8759320 57.15753 -0.06 -0.08
2020-10-01 70.7234046 60.83539 -0.03 -0.05
2020-09-30 101.2918326 62.48052 -0.09 -0.03
2020-09-29 89.5347449 67.95855 -0.04 0.02

Now let’s narrow down to places that have seen concerning levels of cases over the last week.

Widespread infection

The most recent date is 2020-10-03 so one week ago is 2020-09-27. First let’s look at those who had widespread daily rates (over 7 cases / day / 100,000 population) at some point in the last week.

There are 145 areas where there is widespread infection, 95 of which aren’t currently in lockdown. Those are:

  • Nottingham
  • Sheffield
  • Exeter
  • Stockton-on-Tees
  • Darlington
  • Barrow-in-Furness
  • Rotherham
  • Rushcliffe
  • Craven
  • Redcar and Cleveland
  • Broxtowe
  • York
  • Newark and Sherwood
  • Wakefield
  • Richmondshire
  • Barnsley
  • Cheshire West and Chester
  • Cheshire East
  • Oxford
  • Stafford
  • Doncaster
  • Gedling
  • Walsall
  • Harrogate
  • Hambleton
  • Great Yarmouth
  • High Peak
  • Newcastle-under-Lyme
  • East Riding of Yorkshire
  • Charnwood
  • Coventry
  • Redbridge
  • Hackney and City of London
  • Scarborough
  • Bromsgrove
  • Richmond upon Thames
  • South Staffordshire
  • Lincoln
  • Melton
  • North East Derbyshire
  • South Lakeland
  • Blaby
  • Slough
  • Erewash
  • Ashfield
  • Haringey
  • Kingston upon Hull, City of
  • Ryedale
  • West Lindsey
  • Selby
  • Ealing
  • Bassetlaw
  • Newham
  • Dudley
  • Hertsmere
  • Rugby
  • Elmbridge
  • Tower Hamlets
  • East Northamptonshire
  • North Lincolnshire
  • Hounslow
  • Worcester
  • Derby
  • Barnet
  • Barking and Dagenham
  • Allerdale
  • Islington
  • Harrow
  • Brent
  • Harborough
  • Lichfield
  • Havering
  • Amber Valley
  • Uttlesford
  • Stoke-on-Trent
  • Hammersmith and Fulham
  • Telford and Wrekin
  • Hinckley and Bosworth
  • Luton
  • Waltham Forest
  • St Albans
  • Enfield
  • Nuneaton and Bedworth
  • Windsor and Maidenhead
  • Brentwood
  • Wychavon
  • Bedford
  • Lambeth
  • South Derbyshire
  • Shropshire
  • Watford
  • Epping Forest
  • East Hertfordshire
  • Hillingdon
  • South Kesteven

Now we’ll create charts for those places where there is widespread infection that do not currently have additional restrictions (some of which might be duplicates of the graphs above, because they previously did have local restrictions):

Significant and increasing infection

Finally, we’ll make a list of places where there’s been a significant infection (over 4 cases / day / 100,000 population) at some point over the last week and that are seeing increases in those rates.

  • Guildford
  • Croydon
  • Dacorum
  • Woking
  • Lewisham
  • Wandsworth
  • Kingston upon Thames
  • North East Lincolnshire
  • Kensington and Chelsea
  • Copeland
  • Cannock Chase
  • East Staffordshire
  • Spelthorne
  • Staffordshire Moorlands
  • Stratford-on-Avon
  • Westminster
  • Three Rivers
  • Basildon
  • Daventry
  • Northampton
  • Wellingborough
  • Kettering
  • South Gloucestershire
  • Epsom and Ewell
  • Huntingdonshire
  • Runnymede
  • Chesterfield
  • Tamworth
  • Derbyshire Dales
  • Vale of White Horse
  • Bristol, City of
  • Bexley
  • Mole Valley
  • South Cambridgeshire
  • Bromley
  • Brighton and Hove
  • Chelmsford
  • Southampton
  • Winchester
  • Tandridge
  • Cambridge
  • Bournemouth, Christchurch and Poole
  • Surrey Heath
  • Torbay
  • Camden
  • Thurrock
  • Eden
  • South Northamptonshire
  • Rushmoor
  • North Hertfordshire
  • Norwich
  • Wokingham
  • Somerset West and Taunton
  • Gloucester
  • Cherwell
  • Welwyn Hatfield
  • Horsham
  • Plymouth
  • Sevenoaks

Again we’ll create charts for those places that do not currently have additional restrictions:

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

Identifying areas that aren’t worrying

Finally, for comparison, let’s have a look at the ten places with the lowest current average daily rates.

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_rect).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

## Warning: Removed 1 rows containing missing values (geom_text).

Areas of interest

Just to highlight areas that are in the news:

London

London is also in the news, so we’ll also look at London boroughs:

  • Hackney and City of London
  • Westminster
  • Kensington and Chelsea
  • Hammersmith and Fulham
  • Wandsworth
  • Lambeth
  • Southwark
  • Tower Hamlets
  • Islington
  • Camden
  • Brent
  • Ealing
  • Hounslow
  • Richmond upon Thames
  • Kingston upon Thames
  • Merton
  • Sutton
  • Croydon
  • Bromley
  • Lewisham
  • Greenwich
  • Bexley
  • Havering
  • Barking and Dagenham
  • Redbridge
  • Newham
  • Waltham Forest
  • Haringey
  • Enfield
  • Barnet
  • Harrow
  • Hillingdon