Introduction

As part of the course “Developing Data Products” we’ve been asked to produce a small project creating a webpage using R Markdown which features a map created with leaflet. I will host the web page on RPubs.

I have chosen to use Census information from Ireland to plot a colour scaled map showing unemployment rate at Small Area - this will help determine if there are any areas where there is widespread unemployement. Small Area’s in ireland are the most granular area at which census information is available and consists of districts which are approximately 100 households per small areas. Markers will be plotted at the centre of each small area and colour scaled using the unemployement rate.

Census Data in Ireland & associated shapefiles are freely available and can be found here: https://www.cso.ie/en/census/

Required Packages

You will need the following packages installed:

Please refer to session information in the appendix to see packages used at the time of report creation.

Data Import

In order to successfully plot the unemployment rate for each small area we need to download:

When running the code in the R Markdown script, these files will be downloaded to the default locationyour PC, so you may want to set your working directory accordingly. As these files are all zipped I will also unzip each so they can be used easily in the next stages of the report.

# IMPORT CENSUS CSV FILE
#setwd("F:\\Coursera\\C9_Data_Products\\Project_Week2_Leaflet_Markdown")
# CENSUS DATA IS STORED AS A ZIP FILE FROM FOLLOWING URL
url <- "https://cso.ie/en/media/csoie/census/census2016/census2016boundaryfiles/Saps_2016.zip"
download.file(url, destfile = "./IRL_Census.zip") #UNZIP AND DOWNLOAD FILE
unzip("./IRL_Census.zip")

#ALSO DOWNLOAD SMALL AREA BOUNDARY FILE
# WE CAN USE GENERALISED FILE TO SPEED UP RUN TIME
url <- "http://data-osi.opendata.arcgis.com/datasets/4f55f1a4bcd34e5fb5c8e6e20cadb09e_2.zip"
download.file(url, destfile = "./IRL_Shapefile_50m_G.zip")
unzip("./IRL_Shapefile_50m_G.zip", overwrite = T)

# FINALLY LETS PULL THROUGH THE GLOSSARY TO HELP US IDENTIFY FIELDS OF INTEREST
url <- "https://www.cso.ie/en/media/csoie/census/census2016/census2016boundaryfiles/SAPS_2016_Glossary.xlsx"
download.file(url, destfile = "./SAPS_2016_Glossary.xlsx") #UNZIP AND DOWNLOAD FILE
# THIS ISN'T DOWNLOADING USING SAME METHODS AS ABOVE; SO YOU MAY NEED TO DO IT MANUALLY 

DATA PROCESSING

From the glossary we can see that in order to derive Unemployment metrics the table we are insterested in is Table 8: Principle status and the variable required is T8_1_ULGUPJT (unemployed after having given up previous job). I will also utilise the variable T8_1_TT which is total population aged 15 years, which we will use to convert T8_1_ULGUPJT to a rate to standardise and as such enable comparisons across small areas.

Lets read in the .csv SAPS data, and as there are a huge number of columns here, only keep the ones we need - the two variables highlighted in the last paragraph and also GUID and GEOGID which would be identifier variables for later.

SA_Stats <- read.csv("./Saps_2016/SAPS2016_SA2017.csv")
#LETS KEEP THE COLUMNS REQUIRED; PLUS GUID AND GEOGID INCASE WE NEED THEM LATER TO APPEND TO BOUNDARY FILE
SA_Stats <- SA_Stats[, c("GUID", "GEOGID", "T8_1_ULGUPJT", "T8_1_TT")]
SA_Stats$T8_1_ULGUPJT <- SA_Stats$T8_1_ULGUPJT / SA_Stats$T8_1_TT
#length(unique(SA_Stats$GUID))

So now we have the data we need from the .csv file, we need to read in our boundary file. The function for doing this is readOGR from the rgdal package used for geo-spatial analysis. The following code carries out the following operations:

#USE readOGR in package rgdal so we can force merge with .csv data
IRL_SA <-readOGR(dsn=".", layer = "Small_Areas__Generalised_50m__OSi_National_Boundaries")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Data Science Specialization JHU\C9_Developing_Data_Products", layer: "Small_Areas__Generalised_50m__OSi_National_Boundaries"
## with 18641 features
## It has 21 fields
## Integer64 fields read as strings:  OBJECTID AREA CHANGECODE ESRI_OID
IRL_SA@data$id <- rownames(IRL_SA@data)
IRL_SA <- spTransform(IRL_SA, CRS("+proj=longlat +datum=WGS84")) #Converting the UTM coordinates into Latiude and Longitude since leaflet uses those as an argument.
IRL_SA@data$area_sqkm <- area(IRL_SA) # FIND AREA OF SMALL AREA
centroids <- as.data.frame(gCentroid(IRL_SA, byid = TRUE, id = IRL_SA$GUID)) #GET CENTROID.
centroids$GUID <- row.names(centroids) #Adding ID column for the Centroid table.
sq_area <- data.frame(GUID = IRL_SA@data$GUID, area_sqkm = IRL_SA@data$area_sqkm, county = IRL_SA@data$COUNTYNAME, small_area = IRL_SA@data$SMALL_AREA, ED_Name = IRL_SA@data$EDNAME) #PULL OUT ADDITIONAL LABELS

#MERGE THE DATASETS TO ONE DATAFRAME TO USE FOR THE PLOT
SA_Stats_centroids <- left_join(SA_Stats, centroids, by = "GUID") #MERGE USING DPLYR
SA_Stats_centroids <- left_join(SA_Stats_centroids, sq_area, by = "GUID") 
SA_Stats_centroids <- SA_Stats_centroids[!is.na(SA_Stats_centroids$x),] #REMOVE NA VALUES - 1 OFF THESE
SA_Stats_centroids <- SA_Stats_centroids[!is.na(SA_Stats_centroids$T8_1_ULGUPJT),] #REMOVE NA VALUES - 1 OFF THESE

summary(SA_Stats_centroids$area_sqkm)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##         6     50315    174686   3769991   5610710 163274677

CREATING OUR MAP

First we need to create a function to label our map, we want to return information as the user hovers over a small area including Unemployment rate, Small Area code as well as County and Electoral district - this will make it much easier to read for the user and improve functionality.

This uses lappy and creates a label for each record in the data frame.

labs <- lapply(seq(nrow(SA_Stats_centroids)), function(i) {
  paste0( '<p>', "Unemployement Rate: ", round(SA_Stats_centroids[i, "T8_1_ULGUPJT"],4)*100, "%", '<p></p>', 
          "Small Area: ", SA_Stats_centroids[i, "small_area"], '<p></p>', 
          "County: ", SA_Stats_centroids[i, "county"],'</p><p>', 
          "Electoral District:", SA_Stats_centroids[i, "ED_Name"], '</p>' ) 
})

Finally time plot the map in Leaflet! This next chunk of code:

#INITALISE MAP
m <- leaflet(SA_Stats_centroids, width = "100%", height = 800) %>% addTiles() #Assignig Data to leaflet & Add Tiles
# CREATE COLOUR SCALE
RdYlBu <-colorQuantile("Spectral", domain = unique(SA_Stats_centroids$T8_1_ULGUPJT), n=20,
                       na.color = "#808080", alpha = FALSE, reverse = TRUE, right = FALSE) #Defining the Colorcoding to use.
# ADD CIRCLE MARKERS, LABEL AND ADD LEGEND
m %>% addCircles(~x, ~y, radius = ~sqrt(area_sqkm/pi),
                 stroke = FALSE, fillOpacity = 0.60,
                 color = ~RdYlBu(T8_1_ULGUPJT),
                 label = lapply(labs, HTML),
                 labelOptions = labelOptions(direction = 'left', opacity = 0.7)) %>% 
  addLegend(pal = RdYlBu, values = ~T8_1_ULGUPJT, opacity = .5)

Now we can see unemployment rate by Small Area for Ireland in one neat map, the user can hover over and see more detail on each small area, zooming in you can see more detail and enables the user to focus on specific areas eg. Dublin.

Session Information

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: i386-w64-mingw32/i386 (32-bit)
## Running under: Windows 7 (build 7601) Service Pack 1
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Ireland.1252  LC_CTYPE=English_Ireland.1252   
## [3] LC_MONETARY=English_Ireland.1252 LC_NUMERIC=C                    
## [5] LC_TIME=English_Ireland.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] htmltools_0.3.6 leaflet_2.0.2   dplyr_0.7.6     raster_2.8-4   
## [5] rgeos_0.4-2     rgdal_1.3-6     sp_1.3-1       
## 
## loaded via a namespace (and not attached):
##  [1] codetools_0.2-15 bindrcpp_0.2.2   digest_0.6.17    R6_2.2.2        
##  [5] assertthat_0.2.0 rprojroot_1.3-2  grid_3.5.1       stringr_1.3.1   
##  [9] knitr_1.20       tidyselect_0.2.4 pillar_1.3.0     compiler_3.5.1  
## [13] tibble_1.4.2     httpuv_1.4.5     crosstalk_1.0.0  lattice_0.20-35 
## [17] pkgconfig_2.0.2  mime_0.5         later_0.7.3      shiny_1.2.0     
## [21] purrr_0.2.5      glue_1.3.0       stringi_1.1.7    magrittr_1.5    
## [25] rmarkdown_1.10   evaluate_0.11    rlang_0.3.0.1    promises_1.0.1  
## [29] yaml_2.2.0       tools_3.5.1      bindr_0.1.1      htmlwidgets_1.3 
## [33] xtable_1.8-3     crayon_1.3.4     backports_1.1.2  Rcpp_0.12.18

Back To Report