1. Working With Geographical Data in R

Chris Brunsdon

June 2021

Geographical Data in R

Quick Overview

  • Before modelling it is useful to
    • See some geographical data models
    • Understand about map projections
    • Be familiar with R’s sf (simplefeatures) package
    • See some basic geographical operations
    • Create some maps

So what does geographical data look like?


  • Point data
    • Like a standard data table but each observation is a point in 2D space
    • eg Rain gauge stations
Simple feature collection with 25 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -10.23 ymin: 51.79 xmax: -5.99 ymax: 55.36
Geodetic CRS:  WGS 84
# A tibble: 25 x 4
  Station      Elev      geometry  Rain
  <chr>       <dbl>   <POINT [°]> <dbl>
1 Athboy         87  (-6.93 53.6)  81.0
2 Foulksmills    71  (-6.77 52.3) 103. 
3 Mullingar     112 (-7.37 53.47)  76.7
4 Portlaw         8 (-7.31 52.28) 110. 
5 Rathdrum      131 (-6.22 52.91) 130. 
6 Strokestown    49  (-8.1 53.75)  97.0
# … with 19 more rows

Also, line data


  • Here the geometry column is a list of points
    • Represents paths, or geographical networks
    • eg roads, railways, rivers, footpaths etc.
Simple feature collection with 479 features and 1 field
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -9.87 ymin: 51.85 xmax: -6.04 ymax: 54.27
Geodetic CRS:  WGS 84
First 6 features:
        LEN                       geometry
1 27.832735 LINESTRING (-7.510596 53.46...
2  3.601239 LINESTRING (-8.747836 53.25...
3 11.241874 LINESTRING (-7.996752 51.93...
4 13.993221 LINESTRING (-8.752088 52.57...
5 10.890816 LINESTRING (-8.29744 52.548...
6 14.080600 LINESTRING (-8.467084 51.91...

Finally, area/region (polygon) data


  • Here the geometry column is a list of points but now they are boundaries to an enclosed area
    • Represents zones - eg counties, flood risk zones
    • Sometimes can consists of multiple regions eg a country with islands
Simple feature collection with 26 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -10.66 ymin: 51.42 xmax: -6 ymax: 55.44
Geodetic CRS:  WGS 84
First 6 features:
   COUNTY                           geom
1  CARLOW MULTIPOLYGON (((-6.83286 52...
2   CAVAN MULTIPOLYGON (((-7.344946 5...
3   CLARE MULTIPOLYGON (((-9.916184 5...
4    CORK MULTIPOLYGON (((-10.16165 5...
5 DONEGAL MULTIPOLYGON (((-8.602362 5...
6  DUBLIN MULTIPOLYGON (((-6.00754 53...

Map projections

Map Projections


By Hellerick - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=26737079

  • Lines of Longitude - great circles
  • Lines of Latitude - Vary in length
  • Map projections - try to map parts of globe onto 2D space preserving
    • Length or
    • Areas or
    • Angles
    • or some other criterion
  • Up until now we used the geographical projection
    • x direction is longitude, y is latitude
    • It preserves neither and is particularly bad for regions close to the poles!
    • Usually helpful to have at least a local length preserving projection

A more appropriate projection?

So what does crs=29902 mean?


  • EPSG: Codes introduced by the International Association of Oil and Gas Producers
    • Formerly European Petroleum Survey Group (EPSG)
    • A reference number for map projections
    • ISO 19111:2007 Geographic information—Spatial referencing by coordinates
    • A suggested projection for Ireland is 29902 (Irish Transverse Mercator)
  • See https://epsg.io/29902
  • So here, crs is Coordinate Reference System

A more appropriate graticule?

Some situations where map projections occur


  • In the sf object itself
    • The internal format for spatial data stored in the geometry column
  • The coord_sf in ggplot
    • The projection the map is drawn in.
    • Transforms ‘on the fly’ if necessary.
    • Defaults to the internal format of the sf object
  • The graticule (datum) in ggplot
    • The projection the map grid is shown in
    • Defaults to the internal format of the sf object

Basic Data Operations with sf

Getting Ready…


  • Load up the libraries you will use
library(readr)
library(tidyverse)
library(purrr)
library(sf)
library(tmap)
  • Then, in R:
age_cty <- st_read('ages.json')
age_cty
Simple feature collection with 26 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -10.66265 ymin: 51.4199 xmax: -5.996...
Geodetic CRS:  WGS 84
First 10 features:
     COUNTY Mean_Age                       geometry
1    CARLOW 36.92068 MULTIPOLYGON (((-6.83286 52...
2     CAVAN 37.22736 MULTIPOLYGON (((-7.344946 5...
3     CLARE 38.48726 MULTIPOLYGON (((-9.916184 5...
4      CORK 37.52873 MULTIPOLYGON (((-10.16165 5...
5   DONEGAL 38.47128 MULTIPOLYGON (((-8.602362 5...
6    DUBLIN 36.80492 MULTIPOLYGON (((-6.00754 53...
7    GALWAY 37.46712 MULTIPOLYGON (((-10.18388 5...
8     KERRY 40.17090 MULTIPOLYGON (((-10.37725 5...
9   KILDARE 34.92587 MULTIPOLYGON (((-6.865871 5...
10 KILKENNY 37.91166 MULTIPOLYGON (((-7.068368 5...

Creating Maps - 1


ggplot(age_cty,aes(fill=Mean_Age)) + 
  geom_sf() 

ggplot(age_cty,aes(fill=Mean_Age)) + geom_sf() + 
  scale_fill_viridis_c()

Creating Maps - 2


  • Modify the details of the map
    • Reverse scale direction
    • Add a title
    • Label the colour scale
    • Draw in ITM projection (EPSG 29902)
ggplot(age_cty,aes(fill=Mean_Age)) + 
  geom_sf() +   
  coord_sf(crs=29902) +
  scale_fill_viridis_c(direction=-1,name="Age") + 
  labs(title="Average Age by County") 

ggspatial - A helper library


  • A useful library for nicer looking maps
    • Add OpenStreetMap backdrop tiles
    • Add a scale
library(ggspatial)
ggplot(age_cty,aes(fill=Mean_Age)) + 
  annotation_map_tile() +
  annotation_scale(location="br") +
  geom_sf(alpha=0.5) +   
  coord_sf(crs=29902) +
  scale_fill_viridis_c(direction=-1,
                       name="Age") + 
  labs(title="Average Age by County") 

Some more variations…


  • Change the standard backdrop tile
  • Change the shading palette
  • Add a north arrow
library(ggspatial)
ggplot(age_cty,aes(fill=Mean_Age)) + 
  annotation_map_tile(type="osmgrayscale") +
  annotation_scale(location="br") +
  annotation_north_arrow(location='tl') +
  geom_sf(alpha=0.5) +   
  coord_sf(crs=29902) +
  scale_fill_distiller(palette = 'YlOrRd',
                       direction=1,
                       name="Age") + 
  labs(title="Average Age by County") 

More complex Data Operations

Transform the map projection of age_cty


  • This changes the internal representation
  • store the result into a new variable age_cty_itm
age_cty_itm <- st_transform(age_cty,crs=29902)
age_cty_itm  
Simple feature collection with 26 features and 2 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 17499.98 ymin: 19589.95 xmax: 33455...
epsg (SRID):    29902
proj4string:    +proj=tmerc +lat_0=53.5 +lon_0=-8 +k=1.00...
First 10 features:
     COUNTY Mean_Age                       geometry
1    CARLOW 36.92068 MULTIPOLYGON (((279281.7 14...
2     CAVAN 37.22736 MULTIPOLYGON (((243213 2834...
3     CLARE 38.48726 MULTIPOLYGON (((70129.57 14...
4      CORK 37.52873 MULTIPOLYGON (((50217.46 38...
5   DONEGAL 38.47128 MULTIPOLYGON (((161286 3922...
6    DUBLIN 36.80492 MULTIPOLYGON (((332267.2 25...
7    GALWAY 37.46712 MULTIPOLYGON (((55329.46 25...
8     KERRY 40.17090 MULTIPOLYGON (((37227.54 99...
9   KILDARE 34.92587 MULTIPOLYGON (((276420.7 18...
10 KILKENNY 37.91166 MULTIPOLYGON (((263667.1 11...
ggplot(age_cty_itm,aes(fill=Mean_Age)) + geom_sf() 

Adding other geographical objects


rail <- st_read('rail.json')
  • This can now be added to a plot as an additional layer
ggplot(age_cty_itm) + geom_sf(aes(fill=Mean_Age)) + 
  geom_sf(data=rail,col='yellow')
ggplot(age_cty_itm) + geom_sf(aes(fill=Mean_Age)) + 
  geom_sf(data=rail,col='yellow')

Working with dplyr to manipulate data


  • In short, sf objects work like tibbles
age_cty_gt_38 <- age_cty %>% filter(Mean_Age > 38)
ggplot(age_cty_gt_38) + geom_sf(fill='tomato')

  • Combining last two ideas
ggplot(age_cty) + geom_sf() + 
  geom_sf(data=age_cty_gt_38,color='tomato')

Other simplefeatures functions


st_intersects(age_cty,rail)
Sparse geometry binary predicate list of length 26, where...
first 10 elements:
 1: 85, 98, 102, 252, 253, 270, 274, 275, 276, 279
 2: 66
 3: 97, 254, 255, 256, 257, 271, 272, 273, 283, 289, ...
 4: 3, 6, 7, 53, 54, 55, 56, 110, 113, 114, ...
 5: (empty)
 6: 13, 14, 16, 37, 38, 41, 43, 44, 45, 79, ...
 7: 2, 17, 19, 26, 60, 61, 62, 74, 119, 123, ...
 8: 32, 57, 112, 114, 125, 126, 127, 128, 141, 163, ...
 9: 15, 88, 89, 92, 98, 308, 309, 310, 311, 312, ...
 10: 40, 49, 70, 83, 84, 85, 86, 87, 104, 145, ...
st_touches(age_cty,age_cty)
Sparse geometry binary predicate list of length 26, where...
first 10 elements:
 1: 9, 10, 11, 25, 26
 2: 12, 14, 17, 24
 3: 7, 8, 13, 22
 4: 13, 22
 5: 12
 6: 9
 7: 3, 16, 19, 20, 22
 8: 3
 9: 1, 6, 17, 19, 26
 10: 1, 11, 23, 25
railbuf <- rail %>% st_transform(crs=29902) %>% st_buffer(2000)
ggplot(railbuf) + geom_sf(fill='dodgerblue')

Conclusion

Basic idea

  • At this stage not much spatial statistics
  • But all of the above needed to
    • Read in spatial data
    • Manipulate spatial data
    • Visualise spatial data
  • These will all be needed in practice
  • These and other functions: