The Government Digital Service (GDS) is promoting a new analyical workflow based on R Markdown. R Markdown is a way of writing reports using R statistical software and RStudio which combines analysis and reporting in a single document which can be automated, reproduced and output in html format or as word or pdf documents.
The proposal is that the current flow for reporting and creation of output is simplified from something like this….
to this:
The proposed data flow means that documents can be easily prepared in an appropriate format for publication to .gov.uk. The GDS data science team have produced some graphical templates for use on the gov.uk platform. This approach cuts down the number of steps involved in creating reports, reduces the risk of error, improves quality assurance, and can be automated to produce multiple reports in one go, or adapted as a template to report on different topics or issues without too much effort.
The knitr
package greatly facilitates the production of high quality reports in different formats - the schematic below shows the options.
Fingertips is a major publication platform for Official Statistics in PHE which currently supports a range of visualisation and graphical pdf reports but producing does not support commentary, analysis and interpretation alongside the publication of the statistical data.
We have produced an R package - fingertipsR
- to facilitate data extraction from the Fingertips Automated Programming Interface (API).
This report shows how to:
fingertipsR
packagermarkdown
A good starting point for R Markdown is the Cheat Sheet. There are 3 parts to any markdown document:
In addition R code can be run inside the text to produce figures and tables.
R needs additional packages
1 to perform some functions - these have to be loaded before they can be used. For this analysis we will use:
fingertipsR
ggplot2
dplyr
readr
govstyle
The latter is a ggplot2 theme which complies with gov.uk colours and layouts
To do this we will use thefingertipsR
package, and extract data for teenage conceptions. There are 3 steps:
indicators
functionarea_types
functionfingertips_data
functionThis returns all the relevant data in a ‘tidy’ data format. (Wickham 2014)
We can check that we have the correct indicator:
## Observations: 6,568
## Variables: 21
## $ IndicatorID <int> 20401, 20401, 20...
## $ IndicatorName <chr> "Under 18s conce...
## $ ParentCode <chr> NA, NA, NA, NA, ...
## $ ParentName <chr> NA, NA, NA, NA, ...
## $ AreaCode <chr> "E92000001", "E9...
## $ AreaName <chr> "England", "Engl...
## $ AreaType <chr> "Country", "Coun...
## $ Sex <chr> "Female", "Femal...
## $ Age <chr> "<18 yrs", "<18 ...
## $ CategoryType <chr> NA, "General Pra...
## $ Category <chr> NA, "Most depriv...
## $ Timeperiod <int> 1998, 1998, 1998...
## $ Value <dbl> 46.64402, NA, NA...
## $ LowerCIlimit <dbl> 46.19409, NA, NA...
## $ UpperCIlimit <dbl> 47.09724, NA, NA...
## $ Count <int> 41089, NA, NA, N...
## $ Denominator <int> 880906, NA, NA, ...
## $ Valuenote <chr> NA, NA, NA, NA, ...
## $ RecentTrend <chr> NA, NA, NA, NA, ...
## $ ComparedtoEnglandvalueorpercentiles <chr> "Not compared", ...
## $ Comparedtosubnationalparentvalueorpercentiles <chr> "Not compared", ...
And do some data exploration and filtering to understand the dataset and extract exactly what we need. We’ll look at the CategoryType
variable. This shows that there are 5 different assignments of LAs to deprivation deciles based on the level of disaggregation and the deprivation score.
CategoryType | Category |
---|---|
NA | NA |
General Practice deprivation deciles in England (IMD2010) | Most deprived decile |
General Practice deprivation deciles in England (IMD2010) | Second most deprived decile |
General Practice deprivation deciles in England (IMD2010) | Third more deprived decile |
General Practice deprivation deciles in England (IMD2010) | Fourth more deprived decile |
General Practice deprivation deciles in England (IMD2010) | Fifth more deprived decile |
General Practice deprivation deciles in England (IMD2010) | Fifth less deprived decile |
General Practice deprivation deciles in England (IMD2010) | Fourth less deprived decile |
General Practice deprivation deciles in England (IMD2010) | Third less deprived decile |
General Practice deprivation deciles in England (IMD2010) | Second least deprived decile |
General Practice deprivation deciles in England (IMD2010) | Least deprived decile |
County & UA deprivation deciles in England (IMD2010) | Most deprived decile (IMD2010) |
County & UA deprivation deciles in England (IMD2010) | Second most deprived decile (IMD2010) |
County & UA deprivation deciles in England (IMD2010) | Third more deprived decile (IMD2010) |
County & UA deprivation deciles in England (IMD2010) | Fourth more deprived decile (IMD2010) |
County & UA deprivation deciles in England (IMD2010) | Fifth more deprived decile (IMD2010) |
County & UA deprivation deciles in England (IMD2010) | Fifth less deprived decile (IMD2010) |
County & UA deprivation deciles in England (IMD2010) | Fourth less deprived decile (IMD2010) |
County & UA deprivation deciles in England (IMD2010) | Third less deprived decile (IMD2010) |
County & UA deprivation deciles in England (IMD2010) | Second least deprived decile (IMD2010) |
County & UA deprivation deciles in England (IMD2010) | Least deprived decile (IMD2010) |
District & UA deprivation deciles in England (IMD2010) | Most deprived decile (IMD2010) |
District & UA deprivation deciles in England (IMD2010) | Second most deprived decile (IMD2010) |
District & UA deprivation deciles in England (IMD2010) | Third more deprived decile (IMD2010) |
District & UA deprivation deciles in England (IMD2010) | Fourth more deprived decile (IMD2010) |
District & UA deprivation deciles in England (IMD2010) | Fifth more deprived decile (IMD2010) |
District & UA deprivation deciles in England (IMD2010) | Fifth less deprived decile (IMD2010) |
District & UA deprivation deciles in England (IMD2010) | Fourth less deprived decile (IMD2010) |
District & UA deprivation deciles in England (IMD2010) | Third less deprived decile (IMD2010) |
District & UA deprivation deciles in England (IMD2010) | Second least deprived decile (IMD2010) |
District & UA deprivation deciles in England (IMD2010) | Least deprived decile (IMD2010) |
County & UA deprivation deciles in England (IMD2015) | Most deprived decile (IMD2015) |
County & UA deprivation deciles in England (IMD2015) | Second most deprived decile (IMD2015) |
County & UA deprivation deciles in England (IMD2015) | Third more deprived decile (IMD2015) |
County & UA deprivation deciles in England (IMD2015) | Fourth more deprived decile (IMD2015) |
County & UA deprivation deciles in England (IMD2015) | Fifth more deprived decile (IMD2015) |
County & UA deprivation deciles in England (IMD2015) | Fifth less deprived decile (IMD2015) |
County & UA deprivation deciles in England (IMD2015) | Fourth less deprived decile (IMD2015) |
County & UA deprivation deciles in England (IMD2015) | Third less deprived decile (IMD2015) |
County & UA deprivation deciles in England (IMD2015) | Second least deprived decile (IMD2015) |
County & UA deprivation deciles in England (IMD2015) | Least deprived decile (IMD2015) |
District & UA deprivation deciles in England (IMD2015) | Most deprived decile (IMD2015) |
District & UA deprivation deciles in England (IMD2015) | Second most deprived decile (IMD2015) |
District & UA deprivation deciles in England (IMD2015) | Third more deprived decile (IMD2015) |
District & UA deprivation deciles in England (IMD2015) | Fourth more deprived decile (IMD2015) |
District & UA deprivation deciles in England (IMD2015) | Fifth more deprived decile (IMD2015) |
District & UA deprivation deciles in England (IMD2015) | Fifth less deprived decile (IMD2015) |
District & UA deprivation deciles in England (IMD2015) | Fourth less deprived decile (IMD2015) |
District & UA deprivation deciles in England (IMD2015) | Third less deprived decile (IMD2015) |
District & UA deprivation deciles in England (IMD2015) | Second least deprived decile (IMD2015) |
District & UA deprivation deciles in England (IMD2015) | Least deprived decile (IMD2015) |
To plot trends in under 18 conception rates by deprivation decile we need to decide which deprivation classification to choose. We can plot the different options. This shows that for national data the longest time series (1998 - 2014) is only available for IMD2010 scores; data for 2015 is only available for 2014 and 2015. To plot the time series we therefore need to use IMD2010 scores. The rates for categorisation based on couny/UA or districts are similar. The sharp reduction in under 18 conception rates in the most deprived decile since 2007 is evident.
Next we can choose a single area and plot the trend - we’ll use England as an example. We need to filter the data to choose an area and in this case we’ll used deprivation deciles.
govstyle
themeWe can now plot the data with ggplot2
and apply the govstyle
theme.
[Commentary can be easily added and the analysis or outputs coded into the text so it can be consistent with the analysis and automatically updated].
For example: Under 18 conception rates have fallen substantially since 1998 and the ‘gap’ between rates the most and least deprived tenths of areas has fallen from 51.39 conceptions per 100,000 in 2008 to 29.48 in 2014.
To enhance our report we can add maps.
Let us say we want to create the same plots for every area. This can be achieved with a for loop.
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (1): 1–23. doi:10.18637/jss.v059.i10.
A package is a set of functions for a specific purpose↩