Introduction

This page will show how I used data from the Office of National Statistics to create a map of depression in the UK. This is for the module PSY6422 Data Management and Visualisation as part of an MSc in Cognitive Neuroscience and Human Neuroimaging. Please see all relevant data in my github repository here. My idea for this project was to project depression data on a map of the United Kingdom and compare the scores across different regions using a colour scale to represent higher and lower symptoms of depression.

Data Origins

The data for depression in across the UK can be found on the ONS website. I didn’t want my data to be affected by the SARS-CoV-2 pandemic and wanted it to represent normal conditions in the UK. Unfortunately the ONS does not have data past 2016 but before the SARS-CoV-2 pandemic but the data is suitable regardless. I did find data from Public Health England however couldn’t find a suitable shapefile to match this data that wouldn’t require too much work like the one I found here.

Both map data files are also produced by the ONS on their open geography website. The map data files containing England’s regions can be found here and the data containing countries can be found here. These essentially give coordinates for r to use to construct a map of the UK.

Research Question

What is the prevalence of depression across the United Kingdom? How does this change between regions?

My data attempts to educate people on how depression across the UK. Before I did this data visualisation, I could only guess at places with lower/higher levels of depression, and my guesses were wrong. Looking at the ONS data gave some indication, however, mapping the data made it much more clear.

Below are the libraries needed for the project to run correctly.

library(tidyverse)
library(readxl)
library(tinytex)
library(here)
library(scales)
library(maps)
library(mapdata)
library(maptools)
library(rgdal)
library(ggmap)
library(ggplot2)
library(rgeos)
library(broom)
library(RColorBrewer)
library(dplyr)

Data Preparation

#find ONS data file
file <- here(
  "data",
  "domainsandmeasuresautumn2019.xls"
  )

#displays the sheets in the file
excel_sheets(
  file
  )
##  [1] "All domains overview"            "Further information"            
##  [3] "Useful links"                    "Assessment of change"           
##  [5] "1.1 Satisfaction"                "1.2 Worthwhile"                 
##  [7] "1.3 Happy"                       "1.4 Anxious"                    
##  [9] "1.5 Pop'n mental well-being"     "2.1 Prop in unhappy rel"        
## [11] "2.2 Feelings of loneliness"      "2.3 People to rely on "         
## [13] "3.1 Healthy life expectancy"     "3.2 People with a disability"   
## [15] "3.3 Satis with general health"   "3.4 Psychosocial health"        
## [17] "4.1 ILO unemployment"            "4.2 Satisfaction with job"      
## [19] "4.3 Satis amount of leisure"     "4.4 Voluntary work"             
## [21] "4.5 Arts participation"          "4.6 Sports participation"       
## [23] "5.1 Crime rates"                 "5.2 Walking alone after dark"   
## [25] "5.3 Visiting natural envir'ment" "5.4 Belonging to neighbourhood" 
## [27] "5.5 Minimum travel time"         "5.6 Satisf'n with accommodation"
## [29] "6.1 Thresholds of income"        "6.2 Household wealth"           
## [31] "6.3 Median household income"     "6.4 Satisfaction with income"   
## [33] "6.5 Managing financially"        "7.1 Real net disposable income "
## [35] "7.2 Public sector debt"          "7.3 CPIH"                       
## [37] "8.1 Human capital"               "8.2 NEETs"                      
## [39] "8.3 Level of qualifications"     "9.1 Voter turnout"              
## [41] "9.2 Trust in Government"         "10.1 Greenhouse gas emissons"   
## [43] "10.2 Extent of protected areas"  "10.3 Energy consumption"        
## [45] "10.4 Household recycling "

It’s clear a lot of work is needed!

Problems:

#read excel sheet
df_uk <- read_excel(
  file,
  "3.4 Psychosocial health"
  ) %>%

#extracts the data from df of regions in the UK
  slice(
    36:48
    ) %>%

#renames columns to a meaningful value and also to work with the map data (see below)
  rename(
    "id" = "National Well-being Measures, October 2019 release",
    "people" = "...2"
    ) %>%
  
#renames East to East of England to match the shapefile data (see below)
  mutate(
    id=recode(
      id,
      East="East of England"
      ))

#displays the top of the dataframe
head(df_uk)
## # A tibble: 6 x 4
##   id                       people             ...3               ...4           
##   <chr>                    <chr>              <chr>              <chr>          
## 1 England                  19.199999999999999 18.5               19.80000000000~
## 2 North East               18.100000000000001 15.199999999999999 21.5           
## 3 North West               20.699999999999999 18.800000000000001 22.69999999999~
## 4 Yorkshire and The Humber 18.699999999999999 16.899999999999999 20.60000000000~
## 5 East Midlands            18.600000000000001 16.699999999999999 20.60000000000~
## 6 West Midlands            20.800000000000001 19                 22.80000000000~

The ONS depression data is now ready to be used! Next the map data needs to be manipulated. Note: map data of regions of England as well as countries in the UK are included as the ONS depression data file contains both. There is not one file with both on the ONS website so this has to be dealt with.

#load shapefile of England's regions
shapefile_regions <- readOGR(
  here(
    "data",
    "RGN_DEC_2021_EN_BFC.shp"
    ))

#load shapefile of the rest of the UK
shapefile_uk <- readOGR(
  here(
  "data",
  "CTRY_DEC_2021_UK_BFC.shp"
  ))
#reshape England shapefile so ggplot can interpret it
mapdata_regions <- tidy(
  shapefile_regions,
  region = "RGN21NM"
  )

#displays 10 random rows of the dataframe
sample_n(mapdata_regions, 10)
## # A tibble: 10 x 7
##       long     lat   order hole  piece group             id             
##      <dbl>   <dbl>   <int> <lgl> <fct> <fct>             <chr>          
##  1 630830. 307610.  158247 FALSE 1     East of England.1 East of England
##  2 422559. 198383.  920331 FALSE 1     South East.1      South East     
##  3 549924. 181488.  492661 FALSE 1     London.1          London         
##  4 332205. 481724.  698162 FALSE 1     North West.1      North West     
##  5 341535. 441150.  661286 FALSE 1     North West.1      North West     
##  6 360223. 171731. 1283081 FALSE 1     South West.1      South West     
##  7 473599. 101569.  969189 FALSE 5     South East.5      South East     
##  8 444191. 106253.  894268 FALSE 1     South East.1      South East     
##  9 463385.  85436.  940382 FALSE 2     South East.2      South East     
## 10 155513.  27722. 1181690 FALSE 1     South West.1      South West
#reshape UK shapefile into a tidy data frame so ggplot can interpret it
mapdata_uk <- tidy(
  shapefile_uk,
  region = "CTRY21NM"
  )

#displays 10 random rows of the dataframe
sample_n(mapdata_uk, 10)
## # A tibble: 10 x 7
##       long     lat   order hole  piece group              id              
##      <dbl>   <dbl>   <int> <lgl> <fct> <fct>              <chr>           
##  1 343581. 992646. 2458787 FALSE 7     Scotland.7         Scotland        
##  2 577385. 185993.  393330 FALSE 1     England.1          England         
##  3 510031. 101528.  492153 FALSE 1     England.1          England         
##  4 172135. 532202. 1307923 FALSE 1     Northern Ireland.1 Northern Ireland
##  5 638983. 291630.  248190 FALSE 1     England.1          England         
##  6 116578. 933812. 2202248 FALSE 2     Scotland.2         Scotland        
##  7 591866. 204011.  362574 FALSE 1     England.1          England         
##  8 561552. 320013.  181902 FALSE 1     England.1          England         
##  9 172743. 869657. 2008083 FALSE 1     Scotland.1         Scotland        
## 10 437224. 569499.   18290 FALSE 1     England.1          England

There are again problems:

#removes England from mapdata_uk as we already have the regions of the UK
mapdata_uk_noeng <- mapdata_uk[
  -c(
    mapdata_uk$group == "England.1"
    ),
  ]

#combines both mapdata dfs
combined_mapdata <- bind_rows(
  mapdata_uk_noeng,
  mapdata_regions
  ) %>%

#adds ONS depression data to shapefiles
inner_join(
  df_uk,
  by="id"
  ) %>%

#defines the people column as a numeric vector
mutate(people = as.numeric(
  people
  ))

#displays the top of the dataframe
head(combined_mapdata)
## # A tibble: 6 x 10
##      long     lat order hole  piece group     id      people ...3  ...4         
##     <dbl>   <dbl> <int> <lgl> <fct> <fct>     <chr>    <dbl> <chr> <chr>        
## 1 398446. 652843.     2 FALSE 1     England.1 England   19.2 18.5  19.800000000~
## 2 398457. 652852.     3 FALSE 1     England.1 England   19.2 18.5  19.800000000~
## 3 398464. 652855.     4 FALSE 1     England.1 England   19.2 18.5  19.800000000~
## 4 398471. 652860.     5 FALSE 1     England.1 England   19.2 18.5  19.800000000~
## 5 398479. 652864.     6 FALSE 1     England.1 England   19.2 18.5  19.800000000~
## 6 398487. 652871.     7 FALSE 1     England.1 England   19.2 18.5  19.800000000~
map_test <- ggplot()
  
map_test + geom_polygon(
  data = combined_mapdata,
  aes(
    x = long,
    y = lat,
    group = group,
    fill = people
    ))

The problems don’t stop there!

Problems:

Data Visualisation

Below is the code to plot the graph which overcome the problems above and thus, the final graph.

#create plot, removes grid and axis and adds labels
map <- ggplot() + 
  theme(
    panel.grid = element_blank()) + 
  theme(
    axis.ticks = element_blank(),
    axis.text = element_blank()) +
  labs(
    x=element_blank(),
    y=element_blank(),
    title = "Depression across the United Kingdom",
    subtitle = "Prevalence per region",
    fill = "People (%)",
    caption = "Source: Office for National Statistics")

#change to more appropriate colours and plots graph
map + geom_polygon(
  data = combined_mapdata,
  aes(
    x = long,
    y = lat,
    group = group,
    fill = people
    ),
  colour = "white",
  size = 0.01
  ) +
  coord_fixed(
    1
              ) +
  scale_fill_gradientn(
    colours = c(
      "#d9f1ff",
      "#59bfff",
      "#009dff",
      "#000137",
      "#000817"
      ))

#saves plot to directory
ggsave(filename = "depression_uk.png")

Summary

In summary, I have used publicly available data to create a plot of prevalence of depression across the UK.

From the graph it is clear that:

Caveats

Whilst the map shows that one area is has more/less people that are depressed, this does not mean that are more/less depressed. For example, it could be assumed that the North West is more depressed than Northern Ireland, however, this is not true and they only have a greater percentage of people that are depressed.

Furthermore, the data does not say anything about severity, gender differences, individual county differences, age differences etc. In fact, the data represents an average and doesn’t account for these differences on the individual level. For example, within the North West, it is more likely that across the whole region that someone will be depressed, yet if we account for gender, it may be that women would be more depressed than men. Therefore, the data assumes that the above variables don’t affect it.

Future Directions

If I had more time on this project I would include counties as opposed to regions. This would enable readers to see the distribution of depression over the UK in more detail. I did find data for Clinical Commissioning Groups (CCG) which is essentially areas the NHS covers over counties however I couldn’t find a suitable shapefile that matched, despite finding a CCG shapefile. To use that shapefile I would have had to amend over approximately 100 CCG rows or find depression data for each county, which I couldn’t.

I would have also compared this data across multiple time-points and so produce multiple maps. It may have also been interesting to investigate age, gender and income against depression, to name a few.