I wanted to get a list of all acute, general hospitals in England that fulfilled the following conditions:
The purpose of the second criteria was to allow for the automation of this process as lists are updated, or for selected historical periods.
I found that this task was surprisingly complicated, and I headed down numerous dead-ends before I hit on a reasonable answer that did not rely on manually editing long lists.
The ultimate method ended up using the CQC API.
The CQC API Syndication page has example code in JavaScript, Python and C but not in R so I thought it might be helpful to share the syntax for how to make requests.
The first task was to download the CQC spreadsheet entitled CQC care directory with filters.
The purpose of this is to get the CQC codes for all NHS Hospitals, which we can then use to generate the query URLs for the CQC API.
library(readxl) # excel file reader
library(jsonlite) # json parser
library(leaflet) # maps
library(dplyr) # syntax (piping)
library(magrittr) # syntax (assignment pipe)
fileUrl <- "https://www.cqc.org.uk/sites/default/files/1%20April%202019%20HSCA%20active%20locations.xlsx"
destfile <- "cqcdetailed.xls"
download.file(fileUrl, destfile = destfile) #downloaded 19 April 2019 8.54pm GMT+1
# Read in file
cqc <-read_xlsx(path = "~/hospitaldistances/cqc/hospitals_markdown/cqcdetailed.xls", sheet = 2)
There needs to be some tidying of the column names which have spaces and punctuation.
## Clean column names
#replace spaces, forward slash in column names with underscores
names(cqc) <- gsub(" |/", "_", names(cqc))
# remove dashes, commas, question marks
names(cqc) <- gsub("-|,|\\?", "", names(cqc))
# Filter the list to just Acute Hospitals and save that as an object called 'hospitals'
cqc %>% filter( Location_Inspection_Directorate=="Hospitals" &
Location_Primary_Inspection_Category=="Acute hospital - NHS non-specialist"
) -> hospitals
# Check how many rows there are
dim(hospitals)[1]
## [1] 634
We can see that there are 634 hospitals in this file, which is far too many. There are 168 acute trusts in England. The true number will be a bit higher than this because there are some Trusts with multiple hospitals we would want included, e.g. Barts Health, or Guy’s and St Thomas’s.
Although there are some sensible filters one might try to apply from the data file, there is no combination of filters which perfectly includes all general hospitals and excludes all specialist and community hospitals (at least that I was able to find).
Essentially, the best single indicator for general hospitals is the presence of urgent care services and this is not a column in the CQC spreadsheet.
In order to find out which hospitals have A&E, we need to use the CQC API. The output of the request is in json format. I have used an R json parsing package called jsonlite - there are other similar packages e.g. rjson.
The logic of the code to make API requests is:
# Let's get their codes to be fed to the CQC API
hospitalcodes <- hospitals$Location_ID
# Create character vector for us to put the codes of hospitals we want in and
# the dropped hospitals just for a sense check afterwards
urgentHospitals <- character()
droppedHospitals <- character()
# Begin loop through all hospitals here - this is a bit ugly in that we have to make 600+ API requests for one record each, which is slow, but there does not seem to be a way to fetch multiple locations in one request. If anyone knows one, please let me know!
for (each_hospital in hospitalcodes){
# Define URL to request - e.g. for Watford General Hospital this will evaluate
# to the string "https://api.cqc.org.uk/public/v1/locations/RWG02"
targetUrl <- paste0("https://api.cqc.org.uk/public/v1/locations/", each_hospital)
currentHospital <- fromJSON(targetUrl)
# Check if it contains urgent care
if( sum(currentHospital$inspectionAreas$inspectionAreaId=="urgent") > 0 ) {
urgentHospitals <- c(urgentHospitals, currentHospital$locationId)
} else {
droppedHospitals <- c(droppedHospitals, currentHospital$locationId)
}
}
Let’s just have a peek at the files to see if the output is sensible:
# OK let's look at the urgent hospitals
included <- hospitals %>% filter(Location_ID %in% urgentHospitals) %>% sample_n(5) %>% select(4) %>% rename("Included" = "Location_Name")
# Now let's look at the dropped hospitals
dropped <- hospitals %>% filter(Location_ID %in% droppedHospitals) %>% sample_n(5) %>% select(4) %>% rename("Dropped" = "Location_Name")
knitr::kable(cbind(included, dropped))
| Included | Dropped |
|---|---|
| University College Hospital & Elizabeth Garrett Anderson Wing | Accident and Emergency Department |
| Royal Cornwall Hospital | Intermediate Care Assessment and Rehabilitation |
| Wexham Park Hospital | Bridport Community Hospital |
| West Cornwall Hospital | Hadley Wood Hospital |
| Southend University Hospital | Newton Abbot Hospital |
The above looks reasonable - you can check through the whole of each list if you run this code, as I have.
Another good way to check is to have a quick look on a map.
# Define the data
mapframe <- hospitals %>% filter(Location_ID %in% urgentHospitals) %>% select(Location_Latitude, Location_Longitude, Location_Name)
# Draw the map
mapframe %>% leaflet() %>% addTiles %>%
addCircleMarkers( lat = ~Location_Latitude, lng = ~Location_Longitude, color = "red",
radius = 10, stroke = FALSE, fillOpacity = 0.45,
popup = mapframe$Location_Name )
It looks pretty close to what I wanted - certainly all the hospitals I have looked for have been there, and all those I have randomly checked from the list should be there. It may be the case that one or two hospitals have been missed or are false positives - I will look at the data more closely and update this as needed.