A First Pass at sf in R

For my work recently, I’ve been doing a lot of GIS analysis, and specifically a lot of point binning on a very local scale. Generally I’ve always done this in the past using various R GDAL libraries, and specifically sp::over.

However, I also went to a recent talk by Hadley Wickham on tidy programming practices wherein the sf library for R, and specifically the relatively new geom_sf() function for plotting sf data.frames in (the dev version of) ggplot2, were mentioned. From thereon I’ve been playing with the sf library and pretty astonished by it’s speed and flexibility, even as a pretty new library.

In terms of learning languages over the last few years, another interest of mine has been the Irish language, and given it’s pretty stark distribution of speakers across the republic, I thought I’d have a go at plotting it using simple features.

First load up some standard data manipulation and GDAL libraries along with sf

#libraries
library(sf)

library(dplyr)
library(ggplot2)
library(data.table)

library(rgdal)
library(rgeos)
library(maptools)

Next I got my data from the 2011 Irish census. There’s been a new one very recently so it will be interesting how the values plotted here change, but unfortunately the Irish language data from that is not available yet as far as I can tell.

#small area census data
ireland_df <- fread("C:/Users/Robert/Desktop/Irish SF/AllThemesTablesSA.csv", stringsAsFactors = FALSE) %>%
  dplyr::select(GEOGID, irish_speakers = T3_1YES, population = T1_1AGETT) %>%
  #work out the percentage of people over 3 years old who speak Irish
  mutate(irish_pc = irish_speakers/population * 100)

ireland_sf <- read_sf(dsn = "C:/Users/Robert/Desktop/Irish SF", layer = "Census2011_Small_Areas_generalised20m") %>%
  dplyr::select(GEOGID, EDNAME, COUNTYNAME, geometry) %>%
  #join in the census data
  left_join(., ireland_df, by = "GEOGID") %>%
  st_buffer(0)

library(ggthemes)

p1 <- ggplot() +
  #add the shapefiles
  geom_sf(data = ireland_sf, aes(fill = irish_pc), colour = NA) +
  #colour them
  #I turned off guides for this post as the maps aren't meant ot be analytic in any capacity
  scale_fill_gradient(low = "#00005C", high = "#F5AD00", guide = FALSE) +
  #title
  ggtitle("Number of Irish Speakers in 2011 Census") +
  #ggtheme to get rid of axis labels/ticks etc.
  theme_map()

p1

Cool! That’s basically a nice plot created very easily and logically. As an aside, I was a little disappointed how many people (and how widespread) speak Irish these days which meant the colour scheme I chose from the feudal coat of arms didn’t come out as well as I had hoped. The arms themselves are absolutely stunning.

However, for my work I’ve also been doing a lot of binning into self-created polygons. For this, I use generated grids of hexagons which I can use to average over underlying shapefiles to bin event point data into equal sized areas. Let’s give it a try with the Irish language data.

I first loaded the data in as a SpatialDataFrame as I wanted a single “template” polygon to bin into. This can be done using sf but I need a GDAL object for later on.

#open as a SpatialPolygonsDataFrame
ireland_gdal <- readOGR(dsn = "C:/Users/Robert/Desktop/Irish SF", layer = "Census2011_Small_Areas_generalised20m") %>%
  gBuffer(byid = TRUE, width = 0) %>%
  #want it to be just 1 polygon of all of the ROI
  unionSpatialPolygons(IDs = .@data$NUTS1)

## OGR data source with driver: ESRI Shapefile 
## Source: "C:/Users/Robert/Desktop/Irish SF", layer: "Census2011_Small_Areas_generalised20m"
## with 18488 features
## It has 22 fields

plot(ireland_gdal)

I then stole some more code from Edzer Pebesma from StackOverflow to create the hexagons (another way I’ve used previously is here). After creating the hexagons, I then work out how much of each of the original shapefiles they cover and linearly estimate the number of people and number of Irish speakers for each hexagon (given the resolution I’m working at here, this is unlikely to be too far from reality)

library(sp)

#buffer the shapefile
ireland_buffer <- gBuffer(ireland_gdal, width = 10)
#sample hexagonal points over the shape
hex_points <- spsample(ireland_buffer, type = "hexagonal", cellsize = 3000)
#convert to a SPDF
hex_polygons <- HexPoints2SpatialPolygons(hex_points) %>%
  #clip the hexagon grid to the edges of Ireland
  gIntersection(ireland_gdal, byid = TRUE) %>%
  st_as_sf(crs = st_crs(ireland_sf)) %>%
  st_transform(crs = st_crs(ireland_sf))

#find which small area shapefile (ireland_sf) fall within each hexagon
bin_list <- st_intersects(hex_polygons, ireland_sf)

what has the bi_list done? a quick plot clears it up

head(bin_list)

## [[1]]
## [1] 740 741
## 
## [[2]]
## [1] 791
## 
## [[3]]
## [1] 791 793
## 
## [[4]]
## [1] 793 805
## 
## [[5]]
## [1] 740
## 
## [[6]]
## [1] 694 695 740 771 772 773

#lets visualise this with the sixth generated hexagon in our grid
example <- bin_list[[6]]

pextra <- ggplot() +
  #the small areas that fall within the sixth hexagon
  geom_sf(data = slice(ireland_sf, example), fill = "blue") +
  #the sixth hexagon's boundaries
  geom_sf(data = slice(hex_polygons, 6), fill = "red", alpha = 0.5)

pextra

for each of those small areas we need the total number of people and speakers of Irish for each and to multiply these by the proportion of the small area with the hexagon. Then we can divide these to get the percentage of people within the hexagon who are estimated to speak Irish

#function to calculate number of speakers per hexagon
hex_subfunc <- function(index){
    speakers <- ireland_sf$irish_speakers[bin_list[[index]]]
    population <- ireland_sf$population[bin_list[[index]]]
    areas <- st_area(st_intersection(hex_polygons[index,], ireland_sf[bin_list[[index]],])) / st_area(ireland_sf[bin_list[[index]],])
    summation <- as.numeric(sum((speakers * areas)) / sum((population * areas)))
    
    return(summation)
}

#run the function to calculate the percentage of Irish speakers per hexagon
hex_polygons <- mutate(hex_polygons, value = unlist(lapply(seq_along(bin_list), hex_subfunc)))

Plot it to see what it looks like

#plot it as before
p2 <- ggplot() +
  geom_sf(data = hex_polygons, aes(fill = value), colour = NA) +
  scale_fill_gradient(low = "#00005C", high = "#F5AD00", guide = FALSE) +
  ggtitle("Number of Irish Speakers in 2011 Census") +
  theme_map()

p2

And that’s a nice map which clearly still shows the Gaeltacht and (in my opinion) is a bit neater than the shapefile one as it smooths the data a bit

BONUS SMOOTHING: I had also come across the Getis-Ord local statistic recently in my work on identifying event hotspots, and especially had come across this fantastic post from the (always excellent) The Pudding. Given that there is clearly a local effect in Irish speaking, I wanted to smooth the data out further to create a visually-pleasing map that retained the Gaeltacht information. Most of the code below is basically a copy from that document, though I had to fiddle with the sf information to get the numeric coordinates out (there is almost surely a better way, but it will suffice for a Sunday morning)

library(spdep)

#get the polygons for each small area shapefile by some dirty munging
hex_coords <- hex_polygons %>%
  select(geometry) %>%
  mutate(centroids = gsub("(POINT\\()(.*)(\\))", "\\2", st_as_text(st_centroid(geometry)))) %>%
  tidyr::separate(centroids, c("grid1", "grid2"), sep = " ")

st_geometry(hex_coords) <- NULL

hex_coords$grid1 <- as.numeric(hex_coords$grid1)
hex_coords$grid2 <- as.numeric(hex_coords$grid2)

hex_coords <- as.matrix.data.frame(hex_coords)
IDS <- rownames(hex_coords)

#copied from the Pudding article, slightly fewer means as otherwises over-smoothes data
knn50 <- knn2nb(knearneigh(hex_coords, k = 20), row.names = IDS) %>%
  include.self()

#compute the local values for percent speaking Irish
localGvalues <- localG(x = as.numeric(hex_polygons$value), listw = nb2listw(knn50, style = "B"), zero.policy = TRUE)
localGvalues <- round(localGvalues,3)

#put these back into the sf data.frame
hex_polygons$smooth_value <- localGvalues

#plot as before
p3 <- ggplot() +
  geom_sf(data = hex_polygons, aes(fill = smooth_value), colour = NA) +
  scale_fill_gradient(low = "#00005C", high = "#F5AD00", guide = FALSE) +
  ggtitle("Number of Irish Speakers in 2011 Census") +
  theme_map()

p3

Thanks for reading! Scríobhann tú R?

A First Pass at sf in R

Robert Hickman

10 August 2017