Purpose of the vignette

This vignette elaborates on https://rpubs.com/jflowers/221129 to demonstrate how to extract the Public Health Outcomes Framework (PHOF) data held on the database behind https://fingertips.phe.org.uk/.

First, I need to call all the libraries I will be using:

library(jsonlite)
library(dplyr)
library(data.table)

Retrieving lookups from the API

I then want to get a few lookup tables and codes, that will be used subsequently to query the database containing the data:

mainurl <- "http://fingertips.phe.org.uk/api/"
profiles <- fromJSON(paste0(mainurl,"profiles")) %>%
        filter(Name == "Public Health Outcomes Framework")

head(profiles)
##   Id                             Name                              Key                                                GroupIds
## 1 19 Public Health Outcomes Framework public-health-outcomes-framework 1000041, 1000042, 1000043, 1000044, 1000049, 1938132983
PHOFid <- profiles[1,"Id"]

groupids <- as.character(profiles[1,"GroupIds"][[1]]) #this returns the groups that contain indicators in PHOF, eg, the domains
head(groupids)
## [1] "1000041"    "1000042"    "1000043"    "1000044"    "1000049"    "1938132983"
areatypeUTLA <- fromJSON(paste0(mainurl,"area_types")) %>%
        filter(Name == "Counties and Unitary Authorities")
head(areatypeUTLA)
##    Id                             Name       Short IsSearchable
## 1 102 Counties and Unitary Authorities County & UA         TRUE
areatypeid <- areatypeUTLA[1,"Id"] #this is the code for UTLA


areatypeGORID <- fromJSON(paste0(mainurl,"area_types")) %>%
        filter(Name == "Government Office Region") %>%
        select(Id) %>%
        as.numeric() #this is the code for Government Office Regions

parentareacodes <- fromJSON(paste0(mainurl,"areas/by_area_type?area_type_id=",areatypeGORID)) %>%
        select(Code) #this is a numeric vector of codes for all the regions

head(parentareacodes)
##        Code
## 1 E12000004
## 2 E12000006
## 3 E12000007
## 4 E12000001
## 5 E12000002
## 6 E12000008
sex <- fromJSON(paste0(mainurl,"sexes")) #sex code lookup
head(sex)
##   Id           Name
## 1 -1 Not applicable
## 2  4        Persons
## 3  1           Male
## 4  2         Female
age <- fromJSON(paste0(mainurl,"ages")) #age code lookup
head(age)
##   Id           Name
## 1 -1 Not applicable
## 2  1       All ages
## 3  2         < 1 yr
## 4  3        1-4 yrs
## 5  4        5-9 yrs
## 6  5      10-14 yrs

Retrieving data from the API

I begin by creating an empty data frame to bind all the data into. The API won’t allow a batch query of the data, you are only able to export data by region per domain within a profile. Therefore, to get all indicator information for all areas in one profile, you need to perform two loops:

  1. by domain
  2. by GOR
df <- data.frame(".id" = numeric(), "AgeId" = numeric(),"SexId" = numeric(),"AreaCode" = character(),
                 "Val" = numeric())

for (groupid in groupids){ #set up loop by domain
        for (pcode in parentareacodes$Code) { #set up loop for GOR
                dataurl <- paste0(mainurl,"latest_data/all_indicators_in_profile_group_for_child_areas",
                                  "?profile_id=",PHOFid,
                                  "&group_id=",groupid,
                                  "&area_type_id=",areatypeid,
                                  "&parent_area_code=",pcode)
                data <- fromJSON(dataurl)
                data$Data <- lapply(data$Data, function(x) x[(names(x) %in% names(df))])
                setattr(data$Data, 'names', data$IID)
                df <- rbind(df,rbindlist(data$Data,use.names=TRUE,fill=FALSE,idcol = TRUE))
                
        }
}

head(df)
##      .id AgeId SexId  AreaCode  Val
## 1: 90630   199     4 E06000015 22.1
## 2: 90630   199     4 E06000016 25.5
## 3: 90630   199     4 E06000017  7.1
## 4: 90630   199     4 E06000018 31.6
## 5: 90630   199     4 E10000007 15.1
## 6: 90630   199     4 E10000018 10.6

Indicator lookups

You can see that we now have the latest values for the indicators by sex, age group and area. We don’t know what the indicator codes refer to yet. We can retrieve the lookup table using the following code:

inds <- unique(df$.id) %>%
        paste0(collapse = ",")
indicator <- fromJSON(paste0(mainurl,"indicator_metadata/by_indicator_id?indicator_ids=",
                             inds)) %>%
        lapply(`[[`, 7) %>%
        lapply(function(x) x[(names(x) == "Name")]) %>%
        rbindlist(use.names=TRUE,fill=FALSE,idcol = TRUE)
head(indicator)
##      .id                                                                                       Name
## 1:   338                                                               Deprivation score (IMD 2010)
## 2: 10101                                       1.01ii - Children in low income families (under 16s)
## 3: 10301                                                                       1.03 - Pupil absence
## 4: 10401                                     1.04 - First time entrants to the youth justice system
## 5: 10501                             1.05 - 16-18 year olds not in education employment or training
## 6: 10601 1.06i - Adults with a learning disability who live in stable and appropriate accommodation

Bringing it all together

We can now bring everything together into one table:

df <- left_join(df, sex, by = c("SexId" = "Id")) %>%
        left_join(age, by = c("AgeId" = "Id")) %>%
        left_join(indicator) %>%
        rename(Sex = Name.x, Age = Name.y, Indicator = Name)
## Joining, by = ".id"
head(df[,c("AreaCode","Indicator","Sex","Age","Val")],25)
##     AreaCode                                                                                                          Indicator     Sex      Age     Val
## 1  E06000015                                          1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 22.1000
## 2  E06000016                                          1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 25.5000
## 3  E06000017                                          1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs  7.1000
## 4  E06000018                                          1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 31.6000
## 5  E10000007                                          1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 15.1000
## 6  E10000018                                          1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 10.6000
## 7  E10000019                                          1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 15.7000
## 8  E10000021                                          1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 14.7000
## 9  E10000024                                          1.01i - Children in low income families (all dependent children under 20) Persons 0-19 yrs 15.9000
## 10 E06000015                                                               1.01ii - Children in low income families (under 16s) Persons  <16 yrs 22.9000
## 11 E06000016                                                               1.01ii - Children in low income families (under 16s) Persons  <16 yrs 25.9000
## 12 E06000017                                                               1.01ii - Children in low income families (under 16s) Persons  <16 yrs  7.2000
## 13 E06000018                                                               1.01ii - Children in low income families (under 16s) Persons  <16 yrs 32.7000
## 14 E10000007                                                               1.01ii - Children in low income families (under 16s) Persons  <16 yrs 15.9000
## 15 E10000018                                                               1.01ii - Children in low income families (under 16s) Persons  <16 yrs 11.2000
## 16 E10000019                                                               1.01ii - Children in low income families (under 16s) Persons  <16 yrs 16.5000
## 17 E10000021                                                               1.01ii - Children in low income families (under 16s) Persons  <16 yrs 15.4000
## 18 E10000024                                                               1.01ii - Children in low income families (under 16s) Persons  <16 yrs 16.8000
## 19 E06000015 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons    5 yrs 60.3372
## 20 E06000016 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons    5 yrs 50.6727
## 21 E06000017 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons    5 yrs 74.8111
## 22 E06000018 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons    5 yrs 57.9687
## 23 E10000007 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons    5 yrs 68.3805
## 24 E10000018 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons    5 yrs 63.5418
## 25 E10000019 1.02i - School Readiness: the percentage of children achieving a good level of development at the end of reception Persons    5 yrs 69.1450