1 Introduction

The Government Digital Service (GDS) is promoting a new analyical workflow based on R Markdown. R Markdown is a way of writing reports using R statistical software and RStudio which combines analysis and reporting in a single document which can be automated, reproduced and output in html format as as word or pdf documents.

The proposed data flow means that documents can be easily prepared in appropriate format for publication to .gov.uk. The GDS data science team have produced some graphical templates for use on the .gov.uk platorm. It cuts down the number of steps involved in creating reports, reduces the risk of error and can be automated to produce multiple reports in one go, or adapted as a template to report on different topics or issues without too much effort.

As a major publication plaform for Official Statistics in PHE, Fingertips, producing reports, analysis and interpretation alongside the publication of statistical data is increasingly important.

We have produced an R package - fingertipsR - to facilitate data extraction from the Fingertips Automated Programming Interface (API).

This report shows how to:

extract data from the API using the fingertipsR package
report using rmarkdown

A good starting point for R Markdown is the Cheat Sheet. There are 3 parts to any markdown document:

The header - this contains important information about the title, author, date of the report, and controls the outpu format and style of the document.
Standard html text for commentary
Code chunks - this runs the analytical R code to import and manipulate data, create analysis and produce visualisations like charts and maps

In addition R code can be run inside the text to produce figures and tables.

1.1 Step 1 Load libraries

R uses additional packages¹ to perform some functions - these have to be loaded before they can be used. For this analysis we will use:

fingertipsR
ggplot2
dplyr
readr
govstyle

The latter is a ggplot2 theme which complies with gov.uk colours and layouts

library(dplyr)
library(ggplot2)
##library(fingertipsR)
library(readxl)
library(readr)
devtools::install_github(repo = "ivyleavedtoadflax/govstyle")

library(govstyle)

1.2 Step 2 Extract data from Fingertips

To do this we will use thefingertipsR package, and extract data for teenage conceptions. We need to identify an ID number in Fingertips for the teenage conceptions data, and area type code - we’ll use data for lower tier LAs and its straightforward to extract the data:

library(stringr)

# which indicator ID is teenage pregnancy?
# ind <- indicators()
# ind <- ind[str_detect(ind$IndicatorName, "Rate of conceptions per"),] ## identify relevant indicator ID
# 
# areas <- area_types("district") ## Identify area type code
# 
# df <- fingertips_data(IndicatorID = 20401,
#                       AreaTypeID = 101,
#                       ParentAreaTypeID = 6) ## download the dataset

We can check that we have the correct indicator:

df <- read_csv("~/Downloads/Teenage_pregnancy.zip")
df %>% 
  glimpse

## Observations: 6,568
## Variables: 21
## $ IndicatorID                                   <int> 20401, 20401, 20...
## $ IndicatorName                                 <chr> "Under 18s conce...
## $ ParentCode                                    <chr> NA, NA, NA, NA, ...
## $ ParentName                                    <chr> NA, NA, NA, NA, ...
## $ AreaCode                                      <chr> "E92000001", "E9...
## $ AreaName                                      <chr> "England", "Engl...
## $ AreaType                                      <chr> "Country", "Coun...
## $ Sex                                           <chr> "Female", "Femal...
## $ Age                                           <chr> "<18 yrs", "<18 ...
## $ CategoryType                                  <chr> NA, "General Pra...
## $ Category                                      <chr> NA, "Most depriv...
## $ Timeperiod                                    <int> 1998, 1998, 1998...
## $ Value                                         <dbl> 46.64402, NA, NA...
## $ LowerCIlimit                                  <dbl> 46.19409, NA, NA...
## $ UpperCIlimit                                  <dbl> 47.09724, NA, NA...
## $ Count                                         <int> 41089, NA, NA, N...
## $ Denominator                                   <int> 880906, NA, NA, ...
## $ Valuenote                                     <chr> NA, NA, NA, NA, ...
## $ RecentTrend                                   <chr> NA, NA, NA, NA, ...
## $ ComparedtoEnglandvalueorpercentiles           <chr> "Not compared", ...
## $ Comparedtosubnationalparentvalueorpercentiles <chr> "Not compared", ...

##unique(df$AreaName)

Next we can choose a single area and plot the trend - we’ll use England as an example. We need to filter the data to choose an area and in this case we’ll used deprivation deciles.

1.3 Plot the data

We can plot the data with ggplot2 and apply the govstyle format.

plot <- df %>%
  filter(AreaName == "England" & !is.na(Value) & CategoryType == "District & UA deprivation deciles in England (IMD2010)") %>%
  ggplot(aes(Timeperiod, Value,colour = Category)) +
  geom_line(aes( group = Category)) +
  theme_gov() +
  expand_limits(y = c(0, 70), x = c(1990, 2015)) +
  labs(y = "Teenage pregnancy rate", 
       x = "Year",
       title = "Trends in teenage pregnancy rate by deprivation decile\n1998-2014") 

plot + 
  geom_text(data = df %>% filter( Timeperiod == "1998" & CategoryType == "District & UA deprivation deciles in England (IMD2010)" ), 
            size  = 2, 
            aes(
      label = Category,
    hjust = 1,
    vjust = 0,
    fontface = "bold"
  ))

1.4 Add commentary

Under 18 conception rates have fallen substantially since 1998 and the ‘gap’ between rates the most and least deprived tenths of areas has fallen from 51.39 conceptions per 100,000 in 2008 to 29.48 in 2014.

1.5 Automation

Let us say we want to create the same plots for every area. This can be achieved with a for loop.

## Single area

df %>%
  filter(AreaName == "Cambridge" & !is.na(Value) & AreaType == "District & UA") %>%
  ggplot(aes(Timeperiod, Value)) +
  geom_line() +
  theme_gov() +
  expand_limits(y = c(0, 70), x = c(1996, 2015)) +
  labs(y = "Teenage pregnancy rate", 
       x = "Year",
       title = paste0("Trends in teenage pregnancy rate\n1998-2015: ", "Cambridge")) + 
  geom_text(data = df %>% filter( (Timeperiod == "1998"|Timeperiod == "2015") & AreaName == "Cambridge" ), 
            size  = 3, 
            aes(
      label = round(Value,2), 
    hjust = 0.5,
    vjust = 0,
    fontface = "bold"))

### Multiple areas

## Example areas
areas <- c("Cambridge","East Cambridgeshire", "Fenland", "Blackburn with Darwen" )

for(area in areas){
  
  print(df %>%
  filter(AreaName == area & !is.na(Value) & AreaType == "District & UA") %>%
  ggplot(aes(Timeperiod, Value)) +
  geom_line() +
  theme_gov() +
  expand_limits(y = c(0, 70), x = c(1997, 2015)) +
  labs(y = "Teenage pregnancy rate", 
       x = "Year",
       title = paste0("Trends in teenage pregnancy rate\n1998-2015: ", area)) + 
  geom_text(data = df %>% filter( (Timeperiod == "1998"|Timeperiod == "2015") & AreaName == area ), 
            size  = 3, 
            aes(
      label = round(Value,2), 
    hjust = 0.5,
    vjust = 0, 
    fontface = "bold")) +
    geom_smooth(lwd = 0.5, lty = "dotted")
  )
    }

## `geom_smooth()` using method = 'loess'

## `geom_smooth()` using method = 'loess'

## `geom_smooth()` using method = 'loess'

## `geom_smooth()` using method = 'loess'

A package is a set of functions for a specific purpose↩

R markdown report using fingertipsR

Julian Flowers

3 April 2017