1 Introduction

The Government Digital Service (GDS) is promoting a new analyical workflow based on R Markdown. R Markdown is a way of writing reports using R statistical software and RStudio which combines analysis and reporting in a single document which can be automated, reproduced and output in html format or as word or pdf documents.

The proposal is that the current flow for reporting and creation of output is simplified from something like this….


to this:


The proposed data flow means that documents can be easily prepared in an appropriate format for publication to .gov.uk. The GDS data science team have produced some graphical templates for use on the gov.uk platform. This approach cuts down the number of steps involved in creating reports, reduces the risk of error, improves quality assurance, and can be automated to produce multiple reports in one go, or adapted as a template to report on different topics or issues without too much effort.

The knitr package greatly facilitates the production of high quality reports in different formats - the schematic below shows the options.

1.1 Fingertips

Fingertips is a major publication platform for Official Statistics in PHE which currently supports a range of visualisation and graphical pdf reports but producing does not support commentary, analysis and interpretation alongside the publication of the statistical data.

We have produced an R package - fingertipsR - to facilitate data extraction from the Fingertips Automated Programming Interface (API).

2 Getting started

This report shows how to:

2.1 R and markdown basics

A good starting point for R Markdown is the Cheat Sheet. There are 3 parts to any markdown document:

  1. The header - this contains important information about the title, author, date of the report, and controls the output format and style of the document.
  2. Standard html text for commentary
  3. Code chunks - this runs the analytical R code to import and manipulate data, create analysis and produce visualisations like charts and maps

In addition R code can be run inside the text to produce figures and tables.

R needs additional packages1 to perform some functions - these have to be loaded before they can be used. For this analysis we will use:

  • fingertipsR
  • ggplot2
  • dplyr
  • readr
  • govstyle

The latter is a ggplot2 theme which complies with gov.uk colours and layouts

2.2 Extracting data from Fingertips using the automated programming interface (API)

To do this we will use thefingertipsR package, and extract data for teenage conceptions. There are 3 steps:

  1. We need to identify an ID number in Fingertips for the teenage conceptions data using the indicators function
  2. Identify area type codes - we’ll use data for lower tier LAs, with regions as a ‘parent’ using the area_types function
  3. Extract the data using the fingertips_data function

This returns all the relevant data in a ‘tidy’ data format. (Wickham 2014)

We can check that we have the correct indicator:

## Observations: 6,568
## Variables: 21
## $ IndicatorID                                   <int> 20401, 20401, 20...
## $ IndicatorName                                 <chr> "Under 18s conce...
## $ ParentCode                                    <chr> NA, NA, NA, NA, ...
## $ ParentName                                    <chr> NA, NA, NA, NA, ...
## $ AreaCode                                      <chr> "E92000001", "E9...
## $ AreaName                                      <chr> "England", "Engl...
## $ AreaType                                      <chr> "Country", "Coun...
## $ Sex                                           <chr> "Female", "Femal...
## $ Age                                           <chr> "<18 yrs", "<18 ...
## $ CategoryType                                  <chr> NA, "General Pra...
## $ Category                                      <chr> NA, "Most depriv...
## $ Timeperiod                                    <int> 1998, 1998, 1998...
## $ Value                                         <dbl> 46.64402, NA, NA...
## $ LowerCIlimit                                  <dbl> 46.19409, NA, NA...
## $ UpperCIlimit                                  <dbl> 47.09724, NA, NA...
## $ Count                                         <int> 41089, NA, NA, N...
## $ Denominator                                   <int> 880906, NA, NA, ...
## $ Valuenote                                     <chr> NA, NA, NA, NA, ...
## $ RecentTrend                                   <chr> NA, NA, NA, NA, ...
## $ ComparedtoEnglandvalueorpercentiles           <chr> "Not compared", ...
## $ Comparedtosubnationalparentvalueorpercentiles <chr> "Not compared", ...

And do some data exploration and filtering to understand the dataset and extract exactly what we need. We’ll look at the CategoryType variable. This shows that there are 5 different assignments of LAs to deprivation deciles based on the level of disaggregation and the deprivation score.

CategoryType Category
NA NA
General Practice deprivation deciles in England (IMD2010) Most deprived decile
General Practice deprivation deciles in England (IMD2010) Second most deprived decile
General Practice deprivation deciles in England (IMD2010) Third more deprived decile
General Practice deprivation deciles in England (IMD2010) Fourth more deprived decile
General Practice deprivation deciles in England (IMD2010) Fifth more deprived decile
General Practice deprivation deciles in England (IMD2010) Fifth less deprived decile
General Practice deprivation deciles in England (IMD2010) Fourth less deprived decile
General Practice deprivation deciles in England (IMD2010) Third less deprived decile
General Practice deprivation deciles in England (IMD2010) Second least deprived decile
General Practice deprivation deciles in England (IMD2010) Least deprived decile
County & UA deprivation deciles in England (IMD2010) Most deprived decile (IMD2010)
County & UA deprivation deciles in England (IMD2010) Second most deprived decile (IMD2010)
County & UA deprivation deciles in England (IMD2010) Third more deprived decile (IMD2010)
County & UA deprivation deciles in England (IMD2010) Fourth more deprived decile (IMD2010)
County & UA deprivation deciles in England (IMD2010) Fifth more deprived decile (IMD2010)
County & UA deprivation deciles in England (IMD2010) Fifth less deprived decile (IMD2010)
County & UA deprivation deciles in England (IMD2010) Fourth less deprived decile (IMD2010)
County & UA deprivation deciles in England (IMD2010) Third less deprived decile (IMD2010)
County & UA deprivation deciles in England (IMD2010) Second least deprived decile (IMD2010)
County & UA deprivation deciles in England (IMD2010) Least deprived decile (IMD2010)
District & UA deprivation deciles in England (IMD2010) Most deprived decile (IMD2010)
District & UA deprivation deciles in England (IMD2010) Second most deprived decile (IMD2010)
District & UA deprivation deciles in England (IMD2010) Third more deprived decile (IMD2010)
District & UA deprivation deciles in England (IMD2010) Fourth more deprived decile (IMD2010)
District & UA deprivation deciles in England (IMD2010) Fifth more deprived decile (IMD2010)
District & UA deprivation deciles in England (IMD2010) Fifth less deprived decile (IMD2010)
District & UA deprivation deciles in England (IMD2010) Fourth less deprived decile (IMD2010)
District & UA deprivation deciles in England (IMD2010) Third less deprived decile (IMD2010)
District & UA deprivation deciles in England (IMD2010) Second least deprived decile (IMD2010)
District & UA deprivation deciles in England (IMD2010) Least deprived decile (IMD2010)
County & UA deprivation deciles in England (IMD2015) Most deprived decile (IMD2015)
County & UA deprivation deciles in England (IMD2015) Second most deprived decile (IMD2015)
County & UA deprivation deciles in England (IMD2015) Third more deprived decile (IMD2015)
County & UA deprivation deciles in England (IMD2015) Fourth more deprived decile (IMD2015)
County & UA deprivation deciles in England (IMD2015) Fifth more deprived decile (IMD2015)
County & UA deprivation deciles in England (IMD2015) Fifth less deprived decile (IMD2015)
County & UA deprivation deciles in England (IMD2015) Fourth less deprived decile (IMD2015)
County & UA deprivation deciles in England (IMD2015) Third less deprived decile (IMD2015)
County & UA deprivation deciles in England (IMD2015) Second least deprived decile (IMD2015)
County & UA deprivation deciles in England (IMD2015) Least deprived decile (IMD2015)
District & UA deprivation deciles in England (IMD2015) Most deprived decile (IMD2015)
District & UA deprivation deciles in England (IMD2015) Second most deprived decile (IMD2015)
District & UA deprivation deciles in England (IMD2015) Third more deprived decile (IMD2015)
District & UA deprivation deciles in England (IMD2015) Fourth more deprived decile (IMD2015)
District & UA deprivation deciles in England (IMD2015) Fifth more deprived decile (IMD2015)
District & UA deprivation deciles in England (IMD2015) Fifth less deprived decile (IMD2015)
District & UA deprivation deciles in England (IMD2015) Fourth less deprived decile (IMD2015)
District & UA deprivation deciles in England (IMD2015) Third less deprived decile (IMD2015)
District & UA deprivation deciles in England (IMD2015) Second least deprived decile (IMD2015)
District & UA deprivation deciles in England (IMD2015) Least deprived decile (IMD2015)

To plot trends in under 18 conception rates by deprivation decile we need to decide which deprivation classification to choose. We can plot the different options. This shows that for national data the longest time series (1998 - 2014) is only available for IMD2010 scores; data for 2015 is only available for 2014 and 2015. To plot the time series we therefore need to use IMD2010 scores. The rates for categorisation based on couny/UA or districts are similar. The sharp reduction in under 18 conception rates in the most deprived decile since 2007 is evident.

Next we can choose a single area and plot the trend - we’ll use England as an example. We need to filter the data to choose an area and in this case we’ll used deprivation deciles.

2.3 Plot the data using the govstyle theme

We can now plot the data with ggplot2 and apply the govstyle theme.

2.4 Adding commentary

[Commentary can be easily added and the analysis or outputs coded into the text so it can be consistent with the analysis and automatically updated].

For example: Under 18 conception rates have fallen substantially since 1998 and the ‘gap’ between rates the most and least deprived tenths of areas has fallen from 51.39 conceptions per 100,000 in 2008 to 29.48 in 2014.

2.5 Simple mapping of LA data

To enhance our report we can add maps.

2.6 Automation

Let us say we want to create the same plots for every area. This can be achieved with a for loop.

## `geom_smooth()` using method = 'loess'

## `geom_smooth()` using method = 'loess'

## `geom_smooth()` using method = 'loess'

## `geom_smooth()` using method = 'loess'

References

Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (1): 1–23. doi:10.18637/jss.v059.i10.


  1. A package is a set of functions for a specific purpose