This vignette elaborates on https://rpubs.com/jflowers/221129 to demonstrate how to extract the Public Health Outcomes Framework (PHOF) data held on the database behind https://fingertips.phe.org.uk/.
First, I need to call all the libraries I will be using:
library(jsonlite)
library(dplyr)
library(data.table)
I then want to get a few lookup tables and codes, that will be used subsequently to query the database containing the data:
mainurl <- "http://fingertips.phe.org.uk/api/"
profiles <- fromJSON(paste0(mainurl,"profiles")) %>%
filter(Name == "Public Health Outcomes Framework")
head(profiles)
## Id Name Key GroupIds
## 1 19 Public Health Outcomes Framework public-health-outcomes-framework 1000041, 1000042, 1000043, 1000044, 1000049, 1938132983
PHOFid <- profiles[1,"Id"]
groupids <- as.character(profiles[1,"GroupIds"][[1]]) #this returns the groups that contain indicators in PHOF, eg, the domains
head(groupids)
## [1] "1000041" "1000042" "1000043" "1000044" "1000049" "1938132983"
areatypeUTLA <- fromJSON(paste0(mainurl,"area_types")) %>%
filter(Name == "Counties and Unitary Authorities")
head(areatypeUTLA)
## Id Name Short IsSearchable
## 1 102 Counties and Unitary Authorities County & UA TRUE
areatypeid <- areatypeUTLA[1,"Id"] #this is the code for UTLA
areatypeGORID <- fromJSON(paste0(mainurl,"area_types")) %>%
filter(Name == "Government Office Region") %>%
select(Id) %>%
as.numeric() #this is the code for Government Office Regions
parentareacodes <- fromJSON(paste0(mainurl,"areas/by_area_type?area_type_id=",areatypeGORID)) %>%
select(Code) #this is a numeric vector of codes for all the regions
head(parentareacodes)
## Code
## 1 E12000004
## 2 E12000006
## 3 E12000007
## 4 E12000001
## 5 E12000002
## 6 E12000008
sex <- fromJSON(paste0(mainurl,"sexes")) #sex code lookup
head(sex)
## Id Name
## 1 -1 Not applicable
## 2 4 Persons
## 3 1 Male
## 4 2 Female
age <- fromJSON(paste0(mainurl,"ages")) #age code lookup
head(age)
## Id Name
## 1 -1 Not applicable
## 2 1 All ages
## 3 2 < 1 yr
## 4 3 1-4 yrs
## 5 4 5-9 yrs
## 6 5 10-14 yrs
I begin by creating an empty data frame to bind all the data into. The API won’t allow a batch query of the data, you are only able to export data by region per domain within a profile. Therefore, to get all indicator information for all areas in one profile, you need to perform two loops:
df <- data.frame(".id" = numeric(), "AgeId" = numeric(),"SexId" = numeric(),"AreaCode" = character(),
"Val" = numeric())
for (groupid in groupids){ #set up loop by domain
for (pcode in parentareacodes$Code) { #set up loop for GOR
dataurl <- paste0(mainurl,"latest_data/all_indicators_in_profile_group_for_child_areas",
"?profile_id=",PHOFid,
"&group_id=",groupid,
"&area_type_id=",areatypeid,
"&parent_area_code=",pcode)
data <- fromJSON(dataurl)
data$Data <- lapply(data$Data, function(x) x[(names(x) %in% names(df))])
setattr(data$Data, 'names', data$IID)
df <- rbind(df,rbindlist(data$Data,use.names=TRUE,fill=FALSE,idcol = TRUE))
}
}
head(df)
## .id AgeId SexId AreaCode Val
## 1: 90630 199 4 E06000015 22.1
## 2: 90630 199 4 E06000016 25.5
## 3: 90630 199 4 E06000017 7.1
## 4: 90630 199 4 E06000018 31.6
## 5: 90630 199 4 E10000007 15.1
## 6: 90630 199 4 E10000018 10.6
You can see that we now have the latest values for the indicators by sex, age group and area. We don’t know what the indicator codes refer to yet. We can retrieve the lookup table using the following code:
inds <- unique(df$.id) %>%
paste0(collapse = ",")
indicator <- fromJSON(paste0(mainurl,"indicator_metadata/by_indicator_id?indicator_ids=",
inds)) %>%
lapply(`[[`, 7) %>%
lapply(function(x) x[(names(x) == "Name")]) %>%
rbindlist(use.names=TRUE,fill=FALSE,idcol = TRUE)
head(indicator)
## .id Name
## 1: 338 Deprivation score (IMD 2010)
## 2: 10101 1.01ii - Children in low income families (under 16s)
## 3: 10301 1.03 - Pupil absence
## 4: 10401 1.04 - First time entrants to the youth justice system
## 5: 10501 1.05 - 16-18 year olds not in education employment or training
## 6: 10601 1.06i - Adults with a learning disability who live in stable and appropriate accommodation
We can now bring everything together into one table:
df <- left_join(df, sex, by = c("SexId" = "Id")) %>%
left_join(age, by = c("AgeId" = "Id")) %>%
left_join(indicator) %>%
rename(Sex = Name.x, Age = Name.y, Indicator = Name)
## Joining, by = ".id"
head(df[,c("AreaCode","Indicator","Sex","Age","Val")],25)
## AreaCode Indicator Sex Age Val
## 1 E06000015 1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 22.1000
## 2 E06000016 1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 25.5000
## 3 E06000017 1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 7.1000
## 4 E06000018 1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 31.6000
## 5 E10000007 1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 15.1000
## 6 E10000018 1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 10.6000
## 7 E10000019 1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 15.7000
## 8 E10000021 1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 14.7000
## 9 E10000024 1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 15.9000
## 10 E06000015 1.01ii - Children in low income families (under 16s) Persons <16 yrs 22.9000
## 11 E06000016 1.01ii - Children in low income families (under 16s) Persons <16 yrs 25.9000
## 12 E06000017 1.01ii - Children in low income families (under 16s) Persons <16 yrs 7.2000
## 13 E06000018 1.01ii - Children in low income families (under 16s) Persons <16 yrs 32.7000
## 14 E10000007 1.01ii - Children in low income families (under 16s) Persons <16 yrs 15.9000
## 15 E10000018 1.01ii - Children in low income families (under 16s) Persons <16 yrs 11.2000
## 16 E10000019 1.01ii - Children in low income families (under 16s) Persons <16 yrs 16.5000
## 17 E10000021 1.01ii - Children in low income families (under 16s) Persons <16 yrs 15.4000
## 18 E10000024 1.01ii - Children in low income families (under 16s) Persons <16 yrs 16.8000
## 19 E06000015 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons 5 yrs 60.3372
## 20 E06000016 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons 5 yrs 50.6727
## 21 E06000017 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons 5 yrs 74.8111
## 22 E06000018 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons 5 yrs 57.9687
## 23 E10000007 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons 5 yrs 68.3805
## 24 E10000018 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons 5 yrs 63.5418
## 25 E10000019 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons 5 yrs 69.1450