An introduction to R for Spatial data

Robin Lovelace

Newcastle, June 2nd 2015

Introduction

cdrc

This course is brought to you the Consumer Data Research Centre (CDRC), a project based at the University of Leeds and UCL. It is funded by the ESRC’s (Big Data Network)

Packages

pkgs <- c("rgdal", # (can be tricky)
  "rgeos",
  "ggmap",
  "tmap",)
install.packages(pkgs)

Course agenda

During this course we will cover these topics

Day 2

After that: you choose!

A bit about R

Why R?

R is up and coming I

scholar-searches1

Source: r4stats.com

II - Increasing popularity in academia

scholar-searches2

Source: r4stats.com

III - R vs Python

Source: Data Camp

IV - employment market

jobs

Source: revolution analytics

Why R for spatial data?

“With the advent of “modern” GIS software, most people want to point and click their way through life. That’s good, but there is a tremendous amount of flexibility and power waiting for you with the command line. Many times you can do something on the command line in a fraction of the time you can do it with a GUI (Sherman 2008, p. 283)

Why R for spatial data II

It can take data in a wide range of formats. E.g. MySQL database dump gives you this:

LINESTRING(-1.81 52.55,-1.81 52.55, … ) - solved:

ps <- as.list(ps) # make a list
for(i in 1:length(ps)){
  ps[[i]] <- gsub("LINESTRING\\(", "", ps[[i]])
  ps[[i]] <- gsub("\\)", "", ps[[i]])
  ps[[i]] <- gsub(" ", ",", ps[[i]])
  ps[[i]] <- matrix(ps[[i]], ncol=2, byrow=T)
  ps[[i]] <- Line(ps[[i]])
}

And if you get stuck? Just ask!

Example: I could not load a MultiFeature GeoJSON. So I asked: http://stackoverflow.com/q/29066198/1694378

Visualisation

Why focus on visualisation?

If you cannot visualise your data, it is very difficult to understand your data. Conversely, visualisation will greatly aid in communicating your results.

Human beings are remarkably adept at discerning relationships from visual representations. A well-crafted graph can help you make meaningful comparisons among thousands of pieces of information, extracting patterns not easily found through other methods. … Data analysts need to look at their data, and this is one area where R shines. (Kabacoff, 2009, p. 45).

Maps, the ‘base graphics’ way

base graphics

Source: Cheshire and Lovelace (2014) - available online

The ‘ggplot2’ way (source: This tutorial!)

Source: This tutorial!

R in the wild 1: Maps of all census variables for local authorities

census

R in the wild 2: Global shipping routes in the late 1700s

Source: R-Bloggers

R in the wild 3: The national propensity to cycle tool (NPCT)

See https://github.com/npct/pct-shiny

R in the wild 4: Mapping bicycle crashes in West Yorkshire I. | Source: Lovelace, Roberts and Kellar (2015)

Source: Lovelace, Roberts and Kellar (2015)

R in the wild 4: Mapping bicycle crashes in West Yorkshire II

R in the wild 5: Infographic of housing project finances

Flexibility of ggplot2 - see robinlovelace.net

Getting up-and-running for the tutorial

Before progressing further: Any questions?

Course materials are all available online from a GitHub repository. Click “Download ZIP” to download all the test data, ready to procede.

The main document to accompany this tutorial is a pdf within the main repository. This is to be made available for free worldwide - any comments/corrections welcome.