The following R mapping packages are used in this tutorial.

In addition the usual packages dplyr, ggplot2, tidr are all used extensively.

A more complete list of packages can be found on the Cran repository. y

Data for the tutorial is loaded from the Robin Lovelace tutorial folder.

The initial examples are all based on the London_sports shapefile

  setwd("~/SpatialAnalysis/IntroToSpatialR/Creating-maps-in-R-master")
  ldn = readOGR(dsn='data', 'london_sport' )

Gographic objects can be subsetted using brackets in the same way other objects can. For example the code to find the high sports-participation areas of london is as follows.

First we create a new filter where the sports participation rate exceeds 25% of the population.

  sel = ldn$Partic_Per > 25

Next we plot a grey filled background map of London and then overlay a turquoise filled plot of London zones where the participation levels exceed 25%.

Selecting quadrants

In this example London is split into quadrants on the basis of latitude and longitude. London Wards are coloured based on which quadrant the centre of each shape falls into.

To produce the graphic above the coordinates of the centre of london were used to generate a dividing north-south line and a dividing east-west line.

Two vectors were created which contained the results of a simple test against each ward, whether the centre of each ward lay to the east of the centre line and whether the ward lay to the north of the centre of london.

The two computed vectors were used to create a new field in the Spatial object @data slot with a description of the ward location; NE, NW, SE, SW.

Finally the wards were plotted on the outline of London. A colour map was used to fill each ward based on whether its centre lies to the nort or east of the centre of london.

Creating and manipulating spatial data

Reprojecting and joining/clipping are fundamental GIS operations. Firstly we join non-spatial data to spatial data so that it can be mapped. Finally, we will cover spatial joins, whre information from two spatial objects is combined based on spatial location.

Spatial data types in R

Spatial data types are based on the sp package which provides a number of classes and methods for dealing with spatial data types. Attributes can be attached to the spatial objects in ___DataFrame variants of each object class.

data type class attributes contains
points SpatialPoints No Spatial
points SpatialPointsDataFrame data.frame SpatialPoints
multipoints SpatialMultiPoints No Spatial
multipoints SpatialMultiPointsDataFrame data.frame SpatialMultiPoints
pixels SpatialPixels No SpatialPoints
pixels SpatialPixelsDataFrame data.frame SpatialPixels SpatialPointsDataFrame
full grid SpatialGrid No SpatialPixels
full grid SpatialGridDataFrame data.frame SpatialGrid
line Line No
lines Lines No Line list
lines SpatialLines No Spatial, Lines list
lines SpatialLinesDataFrame data.frame SpatialLines
polygons Polygon No Line
polygons Polygons No Polygon list
polygons SpatialPolygons No Spatial, Polygons list

Creating new spatial data

A SpatialPoints class can only be created from a matrix, so even if the original data is a data.frame this must be converted as in the example below.

  df = data.frame(x=1:3, y=c(1/2, 2/3, 3/4))
  mat = as.matrix(df)
  sp1 = SpatialPoints(coords = mat)

Attributes can be attached to each of the spatial objects with an associated data frame.

  df2 = data.frame(name=c('one','two','three'))
  spdf = SpatialPointsDataFrame(sp1, df2)
  summary(spdf)

Setting and transforming Coordinate Reference System (CRS)

The proj4string identifies the coordinate reference system used by the dataset. The CRS can be manipulated by manually setting the string value. It should be noted however that simply changing the CRS does not reproject the data.

Attribute joins

Atribute joins link additional pieces of information to the polygons via the spatial object dataframe. In this example recorded crimes are linked to London boroughs.

Clipping and spatial joins

There are three types of spatial joins: many-to-one, one-to-many, one-to-one. This example uses many-to-one spatial joins to link transport infrastructure points to each London borough.

  #Load london stations dataset
  setwd("~/SpatialAnalysis/IntroToSpatialR/Creating-maps-in-R-master")  
  stations = readOGR(dsn = 'data', layer = 'lnd-stns')

Check the coordinate systems of london stations and the london boroughs datasets.

  proj4string(stations)
  proj4string(ldn)

The problem is that both datasets use different CRS. Since OSGB is the official coordinate reference system for the UK the stations dataset will be reprojected into that CRS.

  stations27700 = spTransform(stations, CRSobj = CRS(proj4string(ldn)))
  stations = stations27700
  rm(stations27700)
  #plot london and then overlay station points
  plot(ldn)
  plot(stations, add=TRUE)

The grqph clearly shows that there are far more stations identified than lie within the boroughs. to simplify hte dataset points are trimmed to only those that fall within greater London. There are two functions that can be used to clipp to the greader london boundary: sp::over, rgeos::gIntersects.

  #Clip to greater london using normal square bracket notation
  stations_backup = stations
  stations = stations_backup[ldn,]
  
  #replot to demonstrate that the clipping has worked
  plot(ldn)
  plot(stations, add=TRUE)

Now that the station points have been clipped to the greater london area aggregations can be performed on the datasets. These can either be done using highly summarised notation e.g. aggregate or in a slightly longer, but easier to read form using dplyr and by creating intermediate join tables.

The following code uses the over function to create a join table based on spatial overlap. Dplyr can then be used as normal to handle aggregation functions. Finally the results are joined back onto a master data.frame associated with the spatial points.

In the example below spatial joins using the over function are used to count the number of stations appearing in each london borough.

  #set the borough name for each station point
  stnbor = over(stations,ldn)
  
  #the station order didn't change so re-add the station code to the dataset
  stnbor = mutate(stnbor[1], stations$NUMBER) 
  colnames(stnbor) = c('ons_label', 'StationNumber')
  
  #aggregate counts over borough and join back to borough data frame in london dataset
  stn_agg = stnbor %>% group_by(ons_label) %>% summarise(StationCount = n())
  ldn@data = left_join(ldn@data, stn_agg)
  
  #now plot a density map using the tmap function
  qtm(ldn, 'StationCount', text = 'name')

Mapping using GGPlot/GGMap

ggmap requires spatial data to be supplied as a data.frame using fortify() from the package. In fortifying the data the attribute information is lost. It must therefore be added back by joining per the example below.

  #Load mapTools to use fortify
  ldn_f = fortify(ldn)

  #data lost during the fortification can be joined back onto new dataframe.
  #fortify gives each row of the old object an ID and order is preserved. 
#It therefore works to create a new ID to join on in the original dataset based on row number
  ldn$id = row.names(ldn)
  ldn_f = left_join(ldn_f, ldn@data, by="id")
  
  head(ldn_f)
  
  #Now we can map using ggplot2
  map = ggplot(ldn_f, aes(long, lat, group=group, fill=Partic_Per)) +
      geom_polygon() + 
      coord_equal() +
    labs(x = 'Easting (m)', y='Northin (m)',
          fill='% Sports \nParticipation') + 
    ggtitle("London Sports Participation") +
    scale_fill_gradient(low='grey', high='light blue')+
    theme_minimal()
  map

Interactive mapping using leaflet

The R package leaflet provides R bindings to the Leaflet javascript library. This allows the creation and development of interactive web maps.

library(leaflet)

#leaflet requires an object with WG84 projection

#get EPSG codes
EPSG = make_EPSG()
EPSG[grepl("WGS 84$", EPSG$note), ] #Search for WGS 84 code

#Store proj4 string
WGS84Str = paste0("+init=epsg:" ,
                  EPSG[grepl("WGS 84$", EPSG$note), ])

#reproject london to WGS84
ldn84 = spTransform(ldn, CRS(WGS84Str))

#create and display the leaflet object
pal = colorQuantile(palette ="YlOrRd",
                    domain = ldn84$Partic_Per,
                      n= 3)


m = leaflet(ldn84) %>%
  #addTiles() %>%
  addProviderTiles("Stamen.TonerLite", group = "Toner Lite") %>% 
  addPolygons(
    stroke = FALSE,
    color = ~pal(Partic_Per)
              ) %>%
  addLegend("bottomright", pal = pal, values = ~Partic_Per,
            title = "London Sports participation rates",
            labFormat = labelFormat(prefix = "%"),
            opacity = 1)
m

Exercises to come back to

1: question to colour wards within 10km of centre of london or where part of their mass lies within 10km of London. see page 8 of tutorial.

Load data

Only the london data is required for this exercise

  dsn = '~/SpatialAnalysis/IntroToSpatialR/Creating-maps-in-R-master/data'
  setwd(dsn)  
  ldn = readOGR(dsn='london_sport.shp', layer= 'london_sport')

To get coordinates for the centre of London we first merge (union) all of the shapes together to retrieve the overall shape. Once all shapes have been merged we can get the coordinates of the centre of the shape.

  ldn_bounds = gUnaryUnion(ldn)
  ldn_centre = gCentroid(ldn_bounds)
  plot(ldn_bounds)
  plot(ldn_centre, add=TRUE)

There are several ways to define ‘within’ a 10km radius of the centre of London. Perhaps simplest is to test whether the borough centroid is within a 10km radius. This is done with the funciton rgeos::gWithinDistance.

Note that all distances are relative to the coordinate system in use. In this case the default coordinate system is transverse mercator projection. The important part of the return string below is the +units=m part which shows units in use. In this case meters.

  ldn@proj4string

The distance between the centre of london and Bromley, the first geometry in the dataset is found to be 18.17km, hence the gWithinDistance function returns a false for this test.

p1 = ldn[1,]  #get first polygon from dataset
gDistance(ldn_centre, gCentroid(p1))
gWithinDistance(ldn_centre,gCentroid(p1), dist = 10000)
plot(ldn_bounds)
plot(ldn_centre, add=TRUE)
plot(p1, add=TRUE, col="salmon")
plot(gCentroid(p1), add=TRUE)

This method is applied to all polygons in the dataset. Any boroughs where the centroid is within a 10km radius of the geographic centroid of London are coloured green.

max_dist =10000 #radius in metres
within10k = c(gWithinDistance(ldn_centre, gCentroid(ldn, byid = TRUE), dist=max_dist, byid = TRUE))

plot(ldn_bounds)
plot(ldn, add=TRUE)
plot(ldn[c(within10k), ], add=TRUE, col='lightblue')
plot(gCentroid(ldn, byid = TRUE), add=TRUE)
plot(ldn_centre, col='red', add=TRUE)
All London boroughs within 10km of the geographic centre. Using centroid method.

All London boroughs within 10km of the geographic centre. Using centroid method.

As well as testing based on distance calculations it is also possible to create a buffer and test for inclusion within this geographic shape.

A buffer facilitates alternative approaches including testing for any part of a polygon within the 10km radius or testing for complete inclusion within the radius. Both approaches are shown below for comparison.

ctr_buffer = gBuffer(ldn_centre, width=max_dist)  #create a 10km buffer around the centre
within10k_any = ldn[ctr_buffer, ] #test to see if any points lie within the 10km radius
within10k_whole = as.vector(gCovers(ctr_buffer, ldn, byid = T)) #Use gCovers to avoid problems with boundaries
outside10k = as.vector(gDisjoint(ctr_buffer, ldn, byid=T)) #Test for boroughs where entire shape is outside centre

plot(ldn)
plot(ldn[outside10k,], add=T, col='salmon')
plot(within10k_any,col='lightgrey', add=TRUE)
plot(ldn[c(within10k), ], col='lightblue', add=TRUE)  
plot(ldn[within10k_whole, ],col='darkblue', add=TRUE)

plot(gCentroid(ldn, byid=TRUE), add=T)
plot(ctr_buffer, add=T, border='red', lwd=2)
plot(ldn_centre, add=T, col="red", lwd=3)

Additional dependencies

rgdal also requires installing the following on linux: