Many R packages have been developed to visualize and analyze spatial data. The best comprehensive resource that I’ve found on this topic is the book Applied Spatial Data Analysis with R. This tutorial relies heavily on that book.
sp
package and Spatial*
objectsObject oriented programming provides a framework for storing spatial information, along with information associated with spatial data, in self-contained and stable data objects. These objects can then be easily fed into functions that can vizualize and analyze the contents of the object.
The sp
package is a foundational package for dealing with spatial data. It provides classes and methods for making S4 objects in R. (For an explanation of S4 objects, see Hadley Wickham’s page on object oriented programming in Advanced R.
Spatial
The most basic object that can be created with the sp
package is the Spatial
object. This object just contains information for a boundary box and the coordinate system.
Let’s create a boundary box in the Chicago area. We’ll use the constructor function that has the same name as the class, Spatial()
. The two arguments are bbox
and proj4string
. The bbox
argument is for the bounding box. It takes a matrix with at least two rows, and the column names must be min
and max
. The first row contains the eastings (x-axis, or the longitude) and the second row contains the northings (y-axis, or latitude).
# create a boundary box
longitudes <- c(-88.156775, -87.589771)
latitudes <- c(41.652208, 42.154143)
bounding_box <- matrix(c(longitudes, latitudes), nrow = 2, byrow = TRUE,
dimnames = list(NULL, c("min", "max")))
bounding_box
## min max
## [1,] -88.15677 -87.58977
## [2,] 41.65221 42.15414
The second argument, proj4string
, takes a string that specifies which projection the coordinates are in. See the proj4
package for details on which strings to use. The simplest string is “+proj=longlat”.
projection <- "+proj=longlat"
Now we can create the Spatial
object. (The projection string must be wrapped in the CRS()
function).
library(sp)
chicago_sp <- Spatial(bbox = bounding_box, proj4string = CRS(projection))
chicago_sp
## An object of class "Spatial"
## Slot "bbox":
## min max
## [1,] -88.15677 -87.58977
## [2,] 41.65221 42.15414
##
## Slot "proj4string":
## CRS arguments: +proj=longlat +ellps=WGS84
SpatialPoints
A Spatial
object isn’t much good by itself, but it’s the base object that other sub-classes are built on. To see the list of sub-classes, use the getClass()
function.
getClass("Spatial")
## Class "Spatial" [package "sp"]
##
## Slots:
##
## Name: bbox proj4string
## Class: matrix CRS
##
## Known Subclasses:
## Class "SpatialPoints", directly
## Class "SpatialMultiPoints", directly
## Class "SpatialGrid", directly
## Class "SpatialLines", directly
## Class "SpatialPolygons", directly
## Class "SpatialPointsDataFrame", by class "SpatialPoints", distance 2
## Class "SpatialPixels", by class "SpatialPoints", distance 2
## Class "SpatialMultiPointsDataFrame", by class "SpatialMultiPoints", distance 2
## Class "SpatialGridDataFrame", by class "SpatialGrid", distance 2
## Class "SpatialLinesDataFrame", by class "SpatialLines", distance 2
## Class "SpatialPixelsDataFrame", by class "SpatialPoints", distance 3
## Class "SpatialPolygonsDataFrame", by class "SpatialPolygons", distance 2
A sub-class is a class whose objects contain all of the information in the parent class, plus additional information. For our purposes, the most useful direct sub-class of Spatial
is SpatialPoints
. Typically we will want to look at monitor locations (i.e. points), so we need to represent those as points in a SpatialPoints
object. We’ll use the airdata
dataset from the region5air
package.
library(region5air)
library(dplyr)
data(airdata)
as.tbl(airdata)
## Source: local data frame [367,595 x 20]
##
## site data_status action_code datetime parameter
## (chr) (int) (int) (chr) (int)
## 1 840170890005 0 10 20141231T0100-0600 44201
## 2 840170311601 0 10 20141231T0100-0600 44201
## 3 840170314002 0 10 20141231T0100-0600 44201
## 4 840170310001 0 10 20141231T0100-0600 44201
## 5 840171110001 0 10 20141231T0100-0600 44201
## 6 840170971007 0 10 20141231T0100-0600 44201
## 7 840170314201 0 10 20141231T0100-0600 44201
## 8 840170310076 0 10 20141231T0100-0600 44201
## 9 840170313103 0 10 20141231T0100-0600 44201
## 10 840171971011 0 10 20141231T0100-0600 44201
## .. ... ... ... ... ...
## Variables not shown: duration (int), frequency (int), value (dbl), unit
## (int), qc (int), poc (int), lat (dbl), lon (dbl), GISDatum (chr), elev
## (int), method_code (int), mpc (chr), mpc_value (chr), uncertainty (lgl),
## qualifiers (chr)
Before we can create a SpatialPoints
object, we need to know what projection the coordinates are in.
unique(airdata$GISDatum)
## [1] "WGS84" "NAD83"
There are two projections, so we will split up the data, then later we will merge them by transforming one projection into another.
air_wgs84 <- filter(airdata, GISDatum == "WGS84")
air_nad83 <- filter(airdata, GISDatum == "NAD83")
The SpatialPoints()
function has the same parameters as the Spatial
function, except it also takes a matrix of coorinates for points. The matrix must have unique row names, eastings must be in the first column, and northings must be in the second column. Here we create the coordinate matrices.
air_wgs84_monitors <- unique(select(air_wgs84, site, lat, lon))
air_wgs84_coords <- cbind(air_wgs84_monitors$lon, air_wgs84_monitors$lat)
row.names(air_wgs84_coords) <- air_wgs84_monitors$site
head(air_wgs84_coords, 3)
## [,1] [,2]
## 840170890005 -88.27303 42.04915
## 840170311601 -87.99057 41.66812
## 840170314002 -87.75247 41.85524
air_nad83_monitors <- unique(select(air_nad83, site, lat, lon))
air_nad83_coords <- cbind(air_nad83_monitors$lon, air_nad83_monitors$lat)
row.names(air_nad83_coords) <- air_nad83_monitors$site
air_nad83_coords
## [,1] [,2]
## 840550590025 -87.8860 42.59600
## 840550590019 -87.8093 42.50472
Now we specify the projections.
wgs84 <- CRS("+proj=longlat +ellpsWGS84")
nad83 <- CRS("+proj=longlat +ellpsNAD83")
When we create the SpatialPoints
object, if we don’t specify the bounding box it will automatically be created, based on the extreme locations in the coordinate matrix.
air_wgs84_spoints <- SpatialPoints(coords = air_wgs84_coords, proj4string = wgs84)
air_nad83_spoints <- SpatialPoints(coords = air_nad83_coords, proj4string = nad83)
We can look at the bounding boxes by using the bbox()
function.
bbox(air_wgs84_spoints)
## min max
## coords.x1 -88.27303 -87.02777
## coords.x2 41.22144 42.46757
bbox(air_nad83_spoints)
## min max
## coords.x1 -87.88600 -87.8093
## coords.x2 42.50472 42.5960
spTransform()
will transform a Spatial*
object from one coordinate reference system (CRS) to another. Here we change the object with the NAD83 projection to WGS84 and recombine with the other SpatialPoints
object.
# you must have the rgdal package installed
air_spoints <- spTransform(air_nad83_spoints, CRSobj = wgs84)
air_spoints <- rbind(air_spoints, air_wgs84_spoints)
Let’s look at the bounding box now.
bbox(air_spoints)
## min max
## coords.x1 -88.27303 -87.02777
## coords.x2 41.22144 42.59600
SpatialPointsDataFrame
In many cases, you will not just want to plot locations of monitors on a map, but you will want to have data associated with each point. The SpatialPointsDataFrame
object contains all of the information that a SpatialPoints
object has, but it also contains a data frame of values associated with the coordinate points.
The way the spatial points and the data frames are connected is by row names. For example, if the the row name is “A” for monitor “1”, then the row name in the data frame must be “A” for records that contain information about that monitor.
monitor_locations <- cbind(1:5, 1:5)
row.names(monitor_locations) <- LETTERS[1:5]
monitor_locations
## [,1] [,2]
## A 1 1
## B 2 2
## C 3 3
## D 4 4
## E 5 5
monitor_info <- data.frame(monitor = paste0("monitor", 1:5),
pollutant = c(rep("o3", 3), rep("pm2.5", 2)),
row.names = LETTERS[1:5])
monitor_info
## monitor pollutant
## A monitor1 o3
## B monitor2 o3
## C monitor3 o3
## D monitor4 pm2.5
## E monitor5 pm2.5
When we use SpatialPointsDataFrame()
we feed it the points matrix, the data frame, we specify the projection, and we set match.ID = TRUE
.
my_spdf <- SpatialPointsDataFrame(monitor_locations, monitor_info,
proj4string = CRS("+proj=longlat"),
match.ID = TRUE)
my_spdf
## coordinates monitor pollutant
## A (1, 1) monitor1 o3
## B (2, 2) monitor2 o3
## C (3, 3) monitor3 o3
## D (4, 4) monitor4 pm2.5
## E (5, 5) monitor5 pm2.5
Since we have already created a SpatialPoints
object from the monitors in airdata
, we can use that as our monitor matrix. The row names of that object are actually the monitor site codes.
head(air_spoints, 3)
## SpatialPoints:
## coords.x1 coords.x2
## 840550590025 -87.88600 42.59600
## 840550590019 -87.80930 42.50472
## 840170890005 -88.27303 42.04915
## Coordinate Reference System (CRS) arguments: +proj=longlat
## +ellps=WGS84
So we can create a data frame with information about each monitor in airdata
, give row names to that data frame that correspond to the SpatialPoints
object, then create the SpatialPointsDataFrame
object.
# select a few columns with useful information
airdata$parameter <- factor(airdata$parameter, levels = c(44201, 88101, 62101),
labels = c("ozone", "pm2.5", "temp"))
air_monitors_df <- unique(airdata[, c("site", "parameter")])
air_monitors_df$value <- "yes"
head(air_monitors_df, 3)
## site parameter value
## 1 840170890005 ozone yes
## 2 840170311601 ozone yes
## 3 840170314002 ozone yes
library(tidyr)
air_monitors_df <- spread(air_monitors_df, parameter, value, fill = "no")
row.names(air_monitors_df) <- air_monitors_df$site
head(air_monitors_df, 3)
## site ozone pm2.5 temp
## 840170310001 840170310001 yes no no
## 840170310032 840170310032 yes no no
## 840170310064 840170310064 yes no no
air_spdf <- SpatialPointsDataFrame(air_spoints, air_monitors_df,
proj4string = wgs84, match.ID = TRUE)
head(air_spdf, 3)
## coordinates site ozone pm2.5 temp
## 840170310001 (-87.886, 42.596) 840170310001 yes no no
## 840170310032 (-87.8093, 42.50472) 840170310032 yes no no
## 840170310064 (-88.27303, 42.04915) 840170310064 yes no no
Plotting Spatial*
objects is relatively easy. Methods have been created for most plotting functions in R.
Here, we plot the SpatialPoints
object we created, using the monitors in airdata
.
library(maps)
m <- map(database = 'county', regions = c('illinois,cook', 'illinois,lake', 'illinois,du page',
'illinois,kane', 'illinois,mchenry', 'indiana,lake',
'indiana,porter', 'wisconsin,kenosha'))
plot(air_spoints, pch = 19, add = TRUE)
We can subset the SpatialPointsDataFrame
to just plot the monitors that measure ozone.
map(m)
plot(air_spdf[air_spdf$ozone == "yes", ], pch = 19, col = "blue", add = TRUE)
Then we can add PM2.5 monitors to the map, using a different color to distinguish the two pollutants.
plot(air_spdf[air_spdf$pm2.5 == "yes", ], pch = 19, col = "red", add = TRUE)
This tutorial is a minimal introduction to the spatial data objects in R. Below is a list of resources for learning more about what you can do with R and geographic data.
For a good introduction to using GIS data in R, see this vignette.
For a comprehensive overview of spatial data and analysis in R, see Applied Spatial Data Analysis with R.
The CRAN Task View on Analysis of Spatial Data should be a comprehensive list of R packages on CRAN that can be used for spatial analysis.
Go to the exercises page for this tutorial and test your comprehension of this material.