Many R packages have been developed to visualize and analyze spatial data. The best comprehensive resource that I’ve found on this topic is the book Applied Spatial Data Analysis with R. This tutorial relies heavily on that book.

The sp package and Spatial* objects

Object oriented programming provides a framework for storing spatial information, along with information associated with spatial data, in self-contained and stable data objects. These objects can then be easily fed into functions that can vizualize and analyze the contents of the object.

The sp package is a foundational package for dealing with spatial data. It provides classes and methods for making S4 objects in R. (For an explanation of S4 objects, see Hadley Wickham’s page on object oriented programming in Advanced R.

Spatial

The most basic object that can be created with the sp package is the Spatial object. This object just contains information for a boundary box and the coordinate system.

Let’s create a boundary box in the Chicago area. We’ll use the constructor function that has the same name as the class, Spatial(). The two arguments are bbox and proj4string. The bbox argument is for the bounding box. It takes a matrix with at least two rows, and the column names must be min and max. The first row contains the eastings (x-axis, or the longitude) and the second row contains the northings (y-axis, or latitude).

# create a boundary box
longitudes <- c(-88.156775, -87.589771)
latitudes <- c(41.652208, 42.154143)
bounding_box <- matrix(c(longitudes, latitudes), nrow = 2, byrow = TRUE,
                       dimnames = list(NULL, c("min", "max")))
bounding_box
##            min       max
## [1,] -88.15677 -87.58977
## [2,]  41.65221  42.15414

The second argument, proj4string, takes a string that specifies which projection the coordinates are in. See the proj4 package for details on which strings to use. The simplest string is “+proj=longlat”.

projection <- "+proj=longlat" 

Now we can create the Spatial object. (The projection string must be wrapped in the CRS() function).

library(sp)
chicago_sp <- Spatial(bbox = bounding_box, proj4string = CRS(projection))
chicago_sp
## An object of class "Spatial"
## Slot "bbox":
##            min       max
## [1,] -88.15677 -87.58977
## [2,]  41.65221  42.15414
## 
## Slot "proj4string":
## CRS arguments: +proj=longlat +ellps=WGS84

SpatialPoints

A Spatial object isn’t much good by itself, but it’s the base object that other sub-classes are built on. To see the list of sub-classes, use the getClass() function.

getClass("Spatial")
## Class "Spatial" [package "sp"]
## 
## Slots:
##                               
## Name:         bbox proj4string
## Class:      matrix         CRS
## 
## Known Subclasses: 
## Class "SpatialPoints", directly
## Class "SpatialMultiPoints", directly
## Class "SpatialGrid", directly
## Class "SpatialLines", directly
## Class "SpatialPolygons", directly
## Class "SpatialPointsDataFrame", by class "SpatialPoints", distance 2
## Class "SpatialPixels", by class "SpatialPoints", distance 2
## Class "SpatialMultiPointsDataFrame", by class "SpatialMultiPoints", distance 2
## Class "SpatialGridDataFrame", by class "SpatialGrid", distance 2
## Class "SpatialLinesDataFrame", by class "SpatialLines", distance 2
## Class "SpatialPixelsDataFrame", by class "SpatialPoints", distance 3
## Class "SpatialPolygonsDataFrame", by class "SpatialPolygons", distance 2

A sub-class is a class whose objects contain all of the information in the parent class, plus additional information. For our purposes, the most useful direct sub-class of Spatial is SpatialPoints. Typically we will want to look at monitor locations (i.e. points), so we need to represent those as points in a SpatialPoints object. We’ll use the airdata dataset from the region5air package.

library(region5air)
library(dplyr)
data(airdata)
as.tbl(airdata)
## Source: local data frame [367,595 x 20]
## 
##            site data_status action_code           datetime parameter
##           (chr)       (int)       (int)              (chr)     (int)
## 1  840170890005           0          10 20141231T0100-0600     44201
## 2  840170311601           0          10 20141231T0100-0600     44201
## 3  840170314002           0          10 20141231T0100-0600     44201
## 4  840170310001           0          10 20141231T0100-0600     44201
## 5  840171110001           0          10 20141231T0100-0600     44201
## 6  840170971007           0          10 20141231T0100-0600     44201
## 7  840170314201           0          10 20141231T0100-0600     44201
## 8  840170310076           0          10 20141231T0100-0600     44201
## 9  840170313103           0          10 20141231T0100-0600     44201
## 10 840171971011           0          10 20141231T0100-0600     44201
## ..          ...         ...         ...                ...       ...
## Variables not shown: duration (int), frequency (int), value (dbl), unit
##   (int), qc (int), poc (int), lat (dbl), lon (dbl), GISDatum (chr), elev
##   (int), method_code (int), mpc (chr), mpc_value (chr), uncertainty (lgl),
##   qualifiers (chr)

Before we can create a SpatialPoints object, we need to know what projection the coordinates are in.

unique(airdata$GISDatum)
## [1] "WGS84" "NAD83"

There are two projections, so we will split up the data, then later we will merge them by transforming one projection into another.

air_wgs84 <- filter(airdata, GISDatum == "WGS84")
air_nad83 <- filter(airdata, GISDatum == "NAD83")

The SpatialPoints() function has the same parameters as the Spatial function, except it also takes a matrix of coorinates for points. The matrix must have unique row names, eastings must be in the first column, and northings must be in the second column. Here we create the coordinate matrices.

air_wgs84_monitors <- unique(select(air_wgs84, site, lat, lon))
air_wgs84_coords <- cbind(air_wgs84_monitors$lon, air_wgs84_monitors$lat)
row.names(air_wgs84_coords) <- air_wgs84_monitors$site
head(air_wgs84_coords, 3)
##                   [,1]     [,2]
## 840170890005 -88.27303 42.04915
## 840170311601 -87.99057 41.66812
## 840170314002 -87.75247 41.85524
air_nad83_monitors <- unique(select(air_nad83, site, lat, lon))
air_nad83_coords <- cbind(air_nad83_monitors$lon, air_nad83_monitors$lat)
row.names(air_nad83_coords) <- air_nad83_monitors$site
air_nad83_coords
##                  [,1]     [,2]
## 840550590025 -87.8860 42.59600
## 840550590019 -87.8093 42.50472

Now we specify the projections.

wgs84 <- CRS("+proj=longlat +ellpsWGS84")
nad83 <- CRS("+proj=longlat +ellpsNAD83")

When we create the SpatialPoints object, if we don’t specify the bounding box it will automatically be created, based on the extreme locations in the coordinate matrix.

air_wgs84_spoints <- SpatialPoints(coords = air_wgs84_coords, proj4string = wgs84)
air_nad83_spoints <- SpatialPoints(coords = air_nad83_coords, proj4string = nad83)

We can look at the bounding boxes by using the bbox() function.

bbox(air_wgs84_spoints)
##                 min       max
## coords.x1 -88.27303 -87.02777
## coords.x2  41.22144  42.46757
bbox(air_nad83_spoints)
##                 min      max
## coords.x1 -87.88600 -87.8093
## coords.x2  42.50472  42.5960

spTransform() will transform a Spatial* object from one coordinate reference system (CRS) to another. Here we change the object with the NAD83 projection to WGS84 and recombine with the other SpatialPoints object.

# you must have the rgdal package installed
air_spoints <- spTransform(air_nad83_spoints, CRSobj = wgs84)
air_spoints <- rbind(air_spoints, air_wgs84_spoints)

Let’s look at the bounding box now.

bbox(air_spoints)
##                 min       max
## coords.x1 -88.27303 -87.02777
## coords.x2  41.22144  42.59600

SpatialPointsDataFrame

In many cases, you will not just want to plot locations of monitors on a map, but you will want to have data associated with each point. The SpatialPointsDataFrame object contains all of the information that a SpatialPoints object has, but it also contains a data frame of values associated with the coordinate points.

The way the spatial points and the data frames are connected is by row names. For example, if the the row name is “A” for monitor “1”, then the row name in the data frame must be “A” for records that contain information about that monitor.

monitor_locations <- cbind(1:5, 1:5)
row.names(monitor_locations) <- LETTERS[1:5]
monitor_locations
##   [,1] [,2]
## A    1    1
## B    2    2
## C    3    3
## D    4    4
## E    5    5
monitor_info <- data.frame(monitor = paste0("monitor", 1:5), 
                           pollutant = c(rep("o3", 3), rep("pm2.5", 2)),
                           row.names = LETTERS[1:5])
monitor_info
##    monitor pollutant
## A monitor1        o3
## B monitor2        o3
## C monitor3        o3
## D monitor4     pm2.5
## E monitor5     pm2.5

When we use SpatialPointsDataFrame() we feed it the points matrix, the data frame, we specify the projection, and we set match.ID = TRUE.

my_spdf <- SpatialPointsDataFrame(monitor_locations, monitor_info, 
                                  proj4string = CRS("+proj=longlat"),
                                  match.ID = TRUE)
my_spdf
##   coordinates  monitor pollutant
## A      (1, 1) monitor1        o3
## B      (2, 2) monitor2        o3
## C      (3, 3) monitor3        o3
## D      (4, 4) monitor4     pm2.5
## E      (5, 5) monitor5     pm2.5

Since we have already created a SpatialPoints object from the monitors in airdata, we can use that as our monitor matrix. The row names of that object are actually the monitor site codes.

head(air_spoints, 3)
## SpatialPoints:
##              coords.x1 coords.x2
## 840550590025 -87.88600  42.59600
## 840550590019 -87.80930  42.50472
## 840170890005 -88.27303  42.04915
## Coordinate Reference System (CRS) arguments: +proj=longlat
## +ellps=WGS84

So we can create a data frame with information about each monitor in airdata, give row names to that data frame that correspond to the SpatialPoints object, then create the SpatialPointsDataFrame object.

# select a few columns with useful information
airdata$parameter <- factor(airdata$parameter, levels = c(44201, 88101, 62101),
                            labels = c("ozone", "pm2.5", "temp"))
air_monitors_df <- unique(airdata[, c("site", "parameter")])
air_monitors_df$value <- "yes"
head(air_monitors_df, 3)
##           site parameter value
## 1 840170890005     ozone   yes
## 2 840170311601     ozone   yes
## 3 840170314002     ozone   yes
library(tidyr)
air_monitors_df <- spread(air_monitors_df, parameter, value, fill = "no")
row.names(air_monitors_df) <- air_monitors_df$site
head(air_monitors_df, 3)
##                      site ozone pm2.5 temp
## 840170310001 840170310001   yes    no   no
## 840170310032 840170310032   yes    no   no
## 840170310064 840170310064   yes    no   no
air_spdf <- SpatialPointsDataFrame(air_spoints, air_monitors_df, 
                                   proj4string = wgs84, match.ID = TRUE)
head(air_spdf, 3)
##                        coordinates         site ozone pm2.5 temp
## 840170310001     (-87.886, 42.596) 840170310001   yes    no   no
## 840170310032  (-87.8093, 42.50472) 840170310032   yes    no   no
## 840170310064 (-88.27303, 42.04915) 840170310064   yes    no   no

Plotting

Plotting Spatial* objects is relatively easy. Methods have been created for most plotting functions in R.

Here, we plot the SpatialPoints object we created, using the monitors in airdata.

library(maps)
m <- map(database = 'county', regions = c('illinois,cook', 'illinois,lake', 'illinois,du page',
                                     'illinois,kane', 'illinois,mchenry', 'indiana,lake',
                                    'indiana,porter',  'wisconsin,kenosha'))
plot(air_spoints, pch = 19, add = TRUE)

We can subset the SpatialPointsDataFrame to just plot the monitors that measure ozone.

map(m)
plot(air_spdf[air_spdf$ozone == "yes", ], pch = 19, col = "blue", add = TRUE)

Then we can add PM2.5 monitors to the map, using a different color to distinguish the two pollutants.

plot(air_spdf[air_spdf$pm2.5 == "yes", ], pch = 19, col = "red", add = TRUE)

Further Reading

This tutorial is a minimal introduction to the spatial data objects in R. Below is a list of resources for learning more about what you can do with R and geographic data.

Exercises

Go to the exercises page for this tutorial and test your comprehension of this material.