A step-by-step guide

I wanted to get a list of all acute, general hospitals in England that fulfilled the following conditions:

The source data must be downloaded from a government department or body (e.g. DH, ONS, CQC).
That data must be manipulated and filtered exclusively according to rules as set out in code - no manual selection or exclusion of hospitals by name while browsing through a spreadsheet.

The purpose of the second criteria was to allow for the automation of this process as lists are updated, or for selected historical periods.

I found that this task was surprisingly complicated, and I headed down numerous dead-ends before I hit on a reasonable answer that did not rely on manually editing long lists.

CQC API

The ultimate method ended up using the CQC API.

The CQC API Syndication page has example code in JavaScript, Python and C but not in R so I thought it might be helpful to share the syntax for how to make requests.

The first task was to download the CQC spreadsheet entitled CQC care directory with filters.

The purpose of this is to get the CQC codes for all NHS Hospitals, which we can then use to generate the query URLs for the CQC API.

library(readxl) # excel file reader
library(jsonlite) # json parser
library(leaflet) # maps
library(dplyr) # syntax (piping)
library(magrittr) # syntax (assignment pipe)

fileUrl <- "https://www.cqc.org.uk/sites/default/files/1%20April%202019%20HSCA%20active%20locations.xlsx"
destfile <- "cqcdetailed.xls"

download.file(fileUrl, destfile = destfile) #downloaded 19 April 2019 8.54pm GMT+1

# Read in file
cqc <-read_xlsx(path = "~/hospitaldistances/cqc/hospitals_markdown/cqcdetailed.xls", sheet = 2)

Cleaning the data

There needs to be some tidying of the column names which have spaces and punctuation.

## Clean column names

#replace spaces, forward slash in column names with underscores
names(cqc) <- gsub(" |/", "_", names(cqc))
# remove dashes, commas, question marks
names(cqc) <- gsub("-|,|\\?", "", names(cqc))

# Filter the list to just Acute Hospitals and save that as an object called 'hospitals'
cqc %>% filter(   Location_Inspection_Directorate=="Hospitals" & 
                  Location_Primary_Inspection_Category=="Acute hospital - NHS non-specialist"  
                   ) -> hospitals

# Check how many rows there are
dim(hospitals)[1]

## [1] 634

We can see that there are 634 hospitals in this file, which is far too many. There are 168 acute trusts in England. The true number will be a bit higher than this because there are some Trusts with multiple hospitals we would want included, e.g. Barts Health, or Guy’s and St Thomas’s.

Although there are some sensible filters one might try to apply from the data file, there is no combination of filters which perfectly includes all general hospitals and excludes all specialist and community hospitals (at least that I was able to find).

Essentially, the best single indicator for general hospitals is the presence of urgent care services and this is not a column in the CQC spreadsheet.

Using the CQC API

In order to find out which hospitals have A&E, we need to use the CQC API. The output of the request is in json format. I have used an R json parsing package called jsonlite - there are other similar packages e.g. rjson.

The logic of the code to make API requests is:

Append the CQC Location code (e.g. “RWG02”) from the hospital file to the API Location URL address (“https://api.cqc.org.uk/public/v1/locations/”) to generate the full URL for the request (e.g. “”https://api.cqc.org.uk/public/v1/locations/RWG02“).
Download and parse the API output with the fromJSON function. If you want to see the format of the raw json, visit the page in a browser.
Check if the hospital is inspected for urgent care.
If it is then write it to the list of urgent hospitals.
If it is not, write it to the list of dropped hospitals (purely for a sense check - I won’t keep them).

# Let's get their codes to be fed to the CQC API
hospitalcodes <- hospitals$Location_ID

# Create character vector for us to put the codes of hospitals we want in and
# the dropped hospitals just for a sense check afterwards
urgentHospitals <- character()
droppedHospitals <- character()

# Begin loop through all hospitals here - this is a bit ugly in that we have to make 600+ API requests for one record each, which is slow, but there does not seem to be a way to fetch multiple locations in one request. If anyone knows one, please let me know!

for (each_hospital in hospitalcodes){
  
  # Define URL to request - e.g. for Watford General Hospital this will evaluate
  # to the string "https://api.cqc.org.uk/public/v1/locations/RWG02"
  
  targetUrl <- paste0("https://api.cqc.org.uk/public/v1/locations/", each_hospital)
  
  currentHospital <- fromJSON(targetUrl)
  
  # Check if it contains urgent care
  if( sum(currentHospital$inspectionAreas$inspectionAreaId=="urgent") > 0 ) {
    urgentHospitals <- c(urgentHospitals, currentHospital$locationId)
  } else {
    droppedHospitals <- c(droppedHospitals, currentHospital$locationId)
  }  

}

Let’s just have a peek at the files to see if the output is sensible:

# OK let's look at the urgent hospitals
included <- hospitals %>% filter(Location_ID %in% urgentHospitals) %>% sample_n(5) %>% select(4) %>%  rename("Included" = "Location_Name")

# Now let's look at the dropped hospitals
dropped <- hospitals %>% filter(Location_ID %in% droppedHospitals) %>% sample_n(5) %>% select(4) %>% rename("Dropped" = "Location_Name")

knitr::kable(cbind(included, dropped))

Included	Dropped
University College Hospital & Elizabeth Garrett Anderson Wing	Accident and Emergency Department
Royal Cornwall Hospital	Intermediate Care Assessment and Rehabilitation
Wexham Park Hospital	Bridport Community Hospital
West Cornwall Hospital	Hadley Wood Hospital
Southend University Hospital	Newton Abbot Hospital

Mapping the data

The above looks reasonable - you can check through the whole of each list if you run this code, as I have.

Another good way to check is to have a quick look on a map.

# Define the data
    mapframe <- hospitals %>% filter(Location_ID %in% urgentHospitals) %>%  select(Location_Latitude, Location_Longitude, Location_Name)

# Draw the map    
    mapframe %>% leaflet() %>% addTiles %>% 
      addCircleMarkers( lat = ~Location_Latitude, lng = ~Location_Longitude, color = "red",
                       radius = 10, stroke = FALSE, fillOpacity = 0.45,
                       popup = mapframe$Location_Name )

Is this perfect?

It looks pretty close to what I wanted - certainly all the hospitals I have looked for have been there, and all those I have randomly checked from the list should be there. It may be the case that one or two hospitals have been missed or are false positives - I will look at the data more closely and update this as needed.

Listing all acute, general, NHS hospitals with urgent care services in England using R and the CQC API