Geospatial Data Processing and Analysis in R

Andy Lyons
Stanford University

About Myself

  • Cal alumni, defected to Stanford after 2013 football season
  • research areas include:
    • animal movement
    • land use / land cover change
    • environmental impacts of proposed roads
  • animal movement package: T-LoCoH

Resources

Animove Cheatsheet

Uses of Spatial Data in Science

  • spatial analysis:
    using location or spatial relationships as an explanatory or predictive variable
  • examining the effects of scale
  • backdrops to show context

cholera map

Challenges with Spatial Data in R

  • many types and formats of spatial data
  • R is not the easiest program to learn
  • there are a ton of packages from a very diverse user community

Presentation Goals

  • help you get to the point where you can do an analysis or visualization
  • point out some of the commonly-used packages for spatial data manipulation and analysis
  • be a resource you can come back to

Outline

Topics

  • spatial data classes
  • packages for importing data
  • classic spatial data processing
    • projections
    • subsetting
    • buffer / union / intersection
  • spatial selection and joins
  • raster operations
  • basic visualization
  • R vs. GIS

Omitted

  • more advanced visualization
  • spatial statistics
  • geodatabases
  • network analysis
  • many other…

Representing Physical Features

Spatial Data Types: sp package

Vector data classes

SpatialPoints, SpatialPointsDataFrame
SpatialLines, SpatialLinesDataFrame
SpatialPolygons, SpatialPolygonsDataFrame

To see all spatial data classes:

require(sp)
getClass('Spatial')

Raster classes

SpatialPixels
SpatialGrid

Creating SpatialPoints Objects

Three options:

  • from scratch
  • promote a data frame with x and y columns
  • import a GIS file

Create a SpatialPoints Object from Scratch

require(sp)
xy <- matrix(data=runif(n=100), ncol=2)
head(xy)
       [,1]   [,2]
[1,] 0.4893 0.6320
[2,] 0.9973 0.5271
[3,] 0.6833 0.8312
[4,] 0.5483 0.3394
[5,] 0.7157 0.7651
[6,] 0.2262 0.1201
xySp <- SpatialPoints(xy)
summary(xySp)
Object of class SpatialPoints
Coordinates:
              min    max
coords.x1 0.01241 0.9973
coords.x2 0.01451 0.9934
Is projected: NA 
proj4string : [NA]
Number of points: 50

Create a SpatialPoints Object from Scratch

plot(xySp)

plot of chunk plotxy

SpatialPointsDataFrame

Promote a DataFrame to a SpatialPointsDataFrame

## See which datasets come bundled with the maps package
## data(package="maps")

## Load the data frame called 'canada.cities'
require(maps)
data(canada.cities)
head(canada.cities)
           name country.etc    pop   lat    long capital
1 Abbotsford BC          BC 157795 49.06 -122.30       0
2      Acton ON          ON   8308 43.63  -80.03       0
3 Acton Vale QC          QC   5153 45.63  -72.57       0
4    Airdrie AB          AB  25863 51.30 -114.02       0
5    Aklavik NT          NT    643 68.22 -135.00       0
6    Albanel QC          QC   1090 48.87  -72.42       0

Promote a DataFrame to a SpatialPointsDataFrame

## Get just the cities just for Ontario
citiesOnt <- canada.cities[canada.cities$country.etc == "ON",]

## Promote this data frame to a SpatialPointsDataFrame
## by stating which columns contain the x and y coordinates
coordinates(citiesOnt) <- ~long + lat
class(citiesOnt)
[1] "SpatialPointsDataFrame"
attr(,"package")
[1] "sp"
plot(citiesOnt)

plot of chunk promote_data_frame

Working with the Data Frame of a SpatialPointsDataFrame

  • To grab the attribute table of a SpatialPointsDataFrame, use the data slot (with '@' character):
  • You can work with the data frame as you would with any data frame
nrow(citiesOnt@data)
[1] 258
head(citiesOnt@data)
             name country.etc   pop capital
2        Acton ON          ON  8308       0
10  Alexandria ON          ON  3604       0
12    Alliston ON          ON 10353       0
14     Almonte ON          ON  4984       0
17 Amherstburg ON          ON 11605       0
18 Amigo Beach ON          ON  1145       0

Wait a minute - what happened to the coordinates?

head(citiesOnt@data)
             name country.etc   pop capital
2        Acton ON          ON  8308       0
10  Alexandria ON          ON  3604       0
12    Alliston ON          ON 10353       0
14     Almonte ON          ON  4984       0
17 Amherstburg ON          ON 11605       0
18 Amigo Beach ON          ON  1145       0

Fortunately we can get read them with the coordinates() function

head(coordinates(citiesOnt))
     long   lat
2  -80.03 43.63
10 -74.63 45.30
12 -79.87 44.15
14 -76.20 45.22
17 -83.10 42.10
18 -79.34 44.77

Adding a Column to the Data Frame of a SPDF Object

selectForStudy <- sample(c(FALSE,TRUE), size=nrow(citiesOnt@data), replace=TRUE)
citiesOnt@data[["samp"]] <- selectForStudy
head(citiesOnt@data)
             name country.etc   pop capital samp
2        Acton ON          ON  8308       0 TRUE
10  Alexandria ON          ON  3604       0 TRUE
12    Alliston ON          ON 10353       0 TRUE
14     Almonte ON          ON  4984       0 TRUE
17 Amherstburg ON          ON 11605       0 TRUE
18 Amigo Beach ON          ON  1145       0 TRUE

Filter on an Attribute

## Plot the cities
plot(citiesOnt, axes=T, asp=1, pch=16)

plot of chunk plot_cities_ont

plot(citiesOnt, axes=T, asp=1, pch=16)
## Highlight big cities
plot(citiesOnt[citiesOnt@data$pop > 100000, ], pch=1, col="red", cex=3, add=TRUE)

plot of chunk plot_big_cities

Importing Vector Data with rgdal

package rgdal has a large set of import and export filters for both vector and raster GIS data

require(rgdal)
head(ogrDrivers())
        name write
1 AeronavFAA FALSE
2     ARCGEN FALSE
3     AVCBin FALSE
4     AVCE00 FALSE
5        BNA  TRUE
6        CSV  TRUE

Example: Import a ShapeFile

Let's get one of the sample shapefiles that comes with the maptools package

vectorDir <- system.file("vectors", package="rgdal")
vectorDir
[1] "C:/Users/Andy/Documents/R/win-library/3.0/rgdal/vectors"
list.files(vectorDir, pattern="*.shp")
[1] "cities.shp"                   "kiritimati_primary_roads.shp"
[3] "scot_BNG.shp"                 "trin_inca_pl03.shp"          

Examine a Sample Shapefile (rgdal)

orgInfo() returns metadata about GIS data saved on disk. Note how it also reads the projection info

ogrInfo(dsn=vectorDir, layer="scot_BNG")
Source: "C:/Users/Andy/Documents/R/win-library/3.0/rgdal/vectors", layer: "scot_BNG"
Driver: ESRI Shapefile number of rows 56 
Feature type: wkbPolygon with 2 dimensions
Extent: (7095 529495) - (468285 1218342)
CRS: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs  
LDID: 0 
Number of fields: 13 
     name type length typeName
1   SP_ID    4      5   String
2    NAME    4     13   String
3    ID_x    2     19     Real
4   COUNT    2     19     Real
5     SMR    2     19     Real
6    LONG    2     19     Real
7     LAT    2     19     Real
8      PY    2     19     Real
9    EXP_    2     19     Real
10    AFF    2     19     Real
11 X_COOR    2     19     Real
12 Y_COOR    2     19     Real
13   ID_y    2     19     Real

Import a Vector Shapefile (rgdal)

readOGR() is the function you use to import a vector file.

scotland <- readOGR(dsn=vectorDir, layer="scot_BNG")
OGR data source with driver: ESRI Shapefile 
Source: "C:/Users/Andy/Documents/R/win-library/3.0/rgdal/vectors", layer: "scot_BNG"
with 56 features and 13 fields
Feature type: wkbPolygon with 2 dimensions
class(scotland)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"

Plot a SpatialPolygonDataFrame

plot(scotland, axes=T, asp=1, col=palette())

plot of chunk plot_scotland

Inspect Attributes of a SpatialPolygonDataFrame

Inspect attached data frame (attribute table).

head(scotland@data)
  SP_ID         NAME ID_x COUNT   SMR  LONG  LAT     PY EXP_ AFF X_COOR
0     0   Sutherland   12     5 279.3 58.06 4.64  37521  1.8  16 247757
1     1        Nairn   13     3 277.8 57.47 3.98  29374  1.1  10 289592
2     2    Inverness   19     9 162.7 57.24 4.73 162867  5.5   7 249685
3     3 Banff-Buchan    2    39 450.3 57.56 2.36 231337  8.7  16 385776
4     4     Bedenoch   17     2 186.9 57.06 4.09  27075  1.1  10 280492
5     5   Kincardine   16     9 197.8 57.00 3.00 111665  4.6  16 351351
  Y_COOR ID_y
0 924954   12
1 842305   13
2 825855   19
3 852378    2
4 801036   17
5 792162   16

Filter Spatial Features Based on an Attribute Value

## Inspect attached data frame (attribute table)
idx <- which(scotland@data$NAME == "Inverness")
idx
[1] 3
plot(scotland[idx,], col="darkblue")

plot of chunk plot_inverness

Get Vector Data Online (raster)

The raster package has a function getData() to grab both vector and raster data from online repositories.

require(raster)
args(getData)
function (name = "GADM", download = TRUE, path = "", ...) 
NULL
  • countries - polygons for all countries
  • GADM - a database of global administrative boundaries
  • SRTM - hole-filled CGIAR-SRTM (90 m resolution) Projections
  • alt - altitude (elevation) aggregated from SRTM 90 m resolution data between -60 and 60 latitude. '
  • worldclim - a database of global interpolated climate data

Example: Regions Boundaries in Peru

peruRegions <- getData('GADM', country='PER', level=1)
class(peruRegions)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
plot(peruRegions, axes=T, asp=1)

plot of chunk get_peru_regions

Example: Adding Projection Info

First, let's create some fake GPS points in Peru:

ctr <- c(-77, -8)
x <- runif(n=20, min=ctr[1] - 1, max=ctr[1] + 1)
y <- runif(n=20, min=ctr[2] - 1, max=ctr[2] + 1)
mygps_df <- data.frame(long=x, lat=y)
head(mygps_df)
    long    lat
1 -77.63 -7.540
2 -76.50 -7.692
3 -77.32 -7.312
4 -77.58 -7.945
5 -76.87 -7.616
6 -76.89 -7.553
mygps_spdf <- SpatialPoints(mygps_df)
mygps_spdf
class       : SpatialPoints 
features    : 20 
extent      : -77.77, -76.19, -8.785, -7.154  (xmin, xmax, ymin, ymax)
coord. ref. : NA 

Example: Adding Projection Info

projection(mygps_spdf) <- CRS("+proj=longlat +ellps=WGS84")
plot(peruRegions, axes=T, asp=1)
plot(mygps_spdf, pch=16, col="red", add=TRUE)

plot of chunk add_proj_info_gps

Note: we are not projecting coordinates here, only entering known projection info.

Example: Reprojecting

require(rgdal)
mygps_utm <- spTransform(mygps_spdf, CRS("+proj=utm +south +zone=18 +ellps=WGS84")) 
summary(mygps_utm)
Object of class SpatialPoints
Coordinates:
         min     max
long  195110  368690
lat  9028590 9209042
Is projected: TRUE 
proj4string : [+proj=utm +south +zone=18 +ellps=WGS84]
Number of points: 20

Example: Reprojecting

peruRegions_utm <- spTransform(peruRegions, CRS("+proj=utm +south +zone=18 +ellps=WGS84")) 
plot(peruRegions_utm, axes=TRUE)
plot(mygps_utm, add=TRUE, col="red", pch=16)

plot of chunk plot_peru_regions_utm

Geoprocessing: rgeos package

require(rgeos)
  • interface to GEOS (Geometry Engine Open Source) Library
  • functions include:
    • union, distance, intersection, convex hull, envelope, buffer, simplify, polygon assembly, valid, area, length, intersects, touches, disjoint, crosses, within, contains, overlaps, equals, covers, etc.

Buffer

inver <- scotland[scotland@data$NAME == "Inverness", ]
plot(inver, axes=TRUE, asp=1)
inver_buf <- gBuffer(inver, width=2000)
plot(inver_buf, border="blue", lty=2, lwd=2, add=TRUE)

plot of chunk buf_inverness

Delaunay Triangulation

delTri <- gDelaunayTriangulation(mygps_utm)
plot(delTri, axes=TRUE)
plot(mygps_utm, col="red", pch=16, cex=2, add=TRUE)

plot of chunk delaunay_tri

Structure of Spatial* Objects

spatial* data structure

Exporting Spatial* Objects

Writable ogrDrivers

x <- ogrDrivers()
as.character(x[x$write==TRUE,"name"])
 [1] "BNA"            "CSV"            "DGN"            "DXF"           
 [5] "ESRI Shapefile" "Geoconcept"     "GeoJSON"        "GeoRSS"        
 [9] "GML"            "GMT"            "GPSBabel"       "GPSTrackMaker" 
[13] "GPX"            "KML"            "MapInfo File"   "Memory"        
[17] "MSSQLSpatial"   "ODBC"           "ODS"            "PCIDSK"        
[21] "PDF"            "PGDump"         "S57"            "TIGER"         
[25] "XLSX"          

Export To KML

projection(citiesOnt) <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84")
writeOGR(citiesOnt[citiesOnt[["samp"]], ], dsn="OntarioCities.kml", layer="Ontario Cities", driver="KML", dataset_options=c("NameField=name"))

Take Home Points: Spatial* Data Classes

  • the Spatial* data classes from the sp package are the most common data formats for vector data
  • you can create them from scratch within R or import from disk or geodatabases with functions from rgdal
    • with enough gumption you can drill down and grab anything
  • spatial features can have an attached data frame (attribute table), which is similar to a regular data frame
  • rgeos has most of the geoprocessing functions you would find in GIS software

Raster Data

Raster package

require(raster)
  • read, write and manipulate gridded spatial data
  • basic and high-level analysis functions
  • processing of large files

Raster Data Types

  • RasterLayer - a single layer raster
  • RasterStack - multi-layer raster
  • RasterBrick - multi-layer raster (from a single file on disk, faster)

Note that Raster* objects can live in memory or disk (see vignette for details)

Raster Layer: Create from Scratch

myrast <- raster(extent(c(0,20,0,20)), ncol=10, nrow=10)
myrast[] <- 1:100
myrast
class       : RasterLayer 
dimensions  : 10, 10, 100  (nrow, ncol, ncell)
resolution  : 2, 2  (x, y)
extent      : 0, 20, 0, 20  (xmin, xmax, ymin, ymax)
coord. ref. : NA 
data source : in memory
names       : layer 
values      : 1, 100  (min, max)

Raster Layer: Raster Algebra

plot(myrast)

plot of chunk plot_rast01

myrastSquared <- -1 * myrast^2
plot(myrastSquared)

plot of chunk plot_rast_squared

Raster Data: Import a GeoTiff

rasterDir <- system.file("pictures", package="rgdal")
rasterDir
[1] "C:/Users/Andy/Documents/R/win-library/3.0/rgdal/pictures"
x <- list.files(rasterDir, pattern="*.tif", full.names=TRUE)
x
[1] "C:/Users/Andy/Documents/R/win-library/3.0/rgdal/pictures/cea.tif"          
[2] "C:/Users/Andy/Documents/R/win-library/3.0/rgdal/pictures/erdas_spnad83.tif"
spna83.fn <- x[2]

Raster Data: Import a GeoTiff

GDALinfo(spna83.fn)
rows        658 
columns     571 
bands       1 
lower left origin.x        78999 
lower left origin.y        1412948 
res.x       40 
res.y       40 
ysign       -1 
oblique.x   0 
oblique.y   0 
driver      GTiff 
projection  +proj=tmerc +lat_0=30 +lon_0=-82.16666666666667 +k=0.9999
+x_0=656166.6666666665 +y_0=0 +datum=NAD83 +units=us-ft +no_defs 
file        C:/Users/Andy/Documents/R/win-library/3.0/rgdal/pictures/erdas_spnad83.tif 
apparent band summary:
  GDType hasNoDataValue NoDataValue blockSize1 blockSize2
1   Byte          FALSE           0         14        571
apparent band statistics:
  Bmin Bmax Bmean Bsd
1    0  255    NA  NA
Metadata:
AREA_OR_POINT=Area 
TIFFTAG_RESOLUTIONUNIT=1 (unitless) 
TIFFTAG_SOFTWARE=IMAGINE TIFF Support
Copyright 1991 - 1999 by ERDAS, Inc. All Rights Reserved
@(#)$RCSfile: etif.c $ $Revision: 1.4 $ $Date: 1999/08/27 17:48:49 $ 
TIFFTAG_XRESOLUTION=1 
TIFFTAG_YRESOLUTION=1 

Raster Data: Import a GeoTiff

spna83.sgdf <- readGDAL(spna83.fn)
C:/Users/Andy/Documents/R/win-library/3.0/rgdal/pictures/erdas_spnad83.tif has GDAL driver GTiff 
and has 658 rows and 571 columns
image(spna83.sgdf, col=gray(0:255/255))

plot of chunk import_geotiff

Raster Data: Import online data

demPeru <- getData('SRTM', lon=-77, lat=-8, download=TRUE)
demPeru
class       : RasterLayer 
dimensions  : 6000, 6000, 3.6e+07  (nrow, ncol, ncell)
resolution  : 0.0008333, 0.0008333  (x, y)
extent      : -80, -75, -10, -5  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
data source : C:\GitHub\rspatialdata\srtm_21_14.TIF 
names       : srtm_21_14 
values      : -32768, 32767  (min, max)

Raster Data: Crop a Raster

my_ext <- extent(-78,-76,-9,-7)
demCentralPeru <- crop(demPeru, my_ext)
rm(demPeru)  ## we don't need this big raster anymore so lets get the memory back
plot(demCentralPeru)

plot of chunk crop_peru_rast

Raster Data: Slope, Aspect, Hillshade

peruSlope <- terrain(demCentralPeru, opt="slope")
peruAspect <- terrain(demCentralPeru, opt="aspect")
peruHill <- hillShade(peruSlope, peruAspect, 40,270)
plot(peruHill, col=grey(0:100/100), legend=FALSE)

plot of chunk peru_hillshade

Identify Steep Slopes

hist(getValues(peruSlope))

plot of chunk hist_slope_vals

## Create a blank copy of peruSlope
steepSlope <- raster(peruSlope)

## Fill in cells where slope > 0.5
steepSlope[peruSlope > 0.5] <- 1
plot(demCentralPeru)
plot(steepSlope, col="red", add=TRUE, legend=FALSE)

plot of chunk steep_slope_define

Project a Raster

demCentralPeruUTM <- projectRaster(demCentralPeru, crs=CRS("+proj=utm +south +zone=18 +ellps=WGS84")) 
demCentralPeruUTM
class       : RasterLayer 
dimensions  : 2421, 2421, 5861241  (nrow, ncol, ncell)
resolution  : 91.9, 92.2  (x, y)
extent      : 168039, 390529, 9003378, 9226595  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +south +zone=18 +ellps=WGS84 
data source : in memory
names       : srtm_21_14 
values      : 223.7, 6161  (min, max)

Resample a Raster

demCentralPeru1km <- aggregate(demCentralPeruUTM, fact=11, fun=mean)
demCentralPeru1km
class       : RasterLayer 
dimensions  : 221, 221, 48841  (nrow, ncol, ncell)
resolution  : 1011, 1014  (x, y)
extent      : 168039, 391448, 9002456, 9226595  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +south +zone=18 +ellps=WGS84 
data source : in memory
names       : srtm_21_14 
values      : 228.3, 5678  (min, max)

Resample a Raster

plot(demCentralPeruUTM, xlim=c(250000,300000), ylim=c(9100000,9150000))

plot of chunk plot_dem_peru

plot(demCentralPeru1km, xlim=c(250000,300000), ylim=c(9100000,9150000))

plot of chunk plot_dem_resampled_peru

Overlay Points and Extract Values

plot(demCentralPeru)
plot(mygps_spdf, pch=16, col="red", add=TRUE)

plot of chunk extract_vals_at_pts01

extract(demCentralPeru, mygps_spdf)
 [1] 2625  439 1724 3471 1214  754 1400 2542 3120  574 3495 2049 4248 4700
[15]  564 2812 1285 1511  665 1204

Create a Distance Surface

samplePtsRast <- rasterize(mygps_spdf, demCentralPeru, field=1)
dist2SamplePtsRast <- distance(samplePtsRast)
plot(dist2SamplePtsRast)
plot(mygps_spdf, col="red", pch=16, add=TRUE)

plot of chunk distance_surface_create

Display Google Map with ggmap

require(ggmap)
bboxPeru <- c(-78,-9,-76,-7)
ggMapPeru <- get_map(location = bboxPeru, color = "color", source = "google", maptype = "satellite")
ggmap(ggMapPeru, extent = "device", ylab = "Latitude", xlab = "Longitude") + geom_point(aes(x = long, y = lat, colour = "red"), data = as.data.frame(coordinates(mygps_spdf))) + theme(legend.position="none")

plot of chunk ggmap_example

Take Home Points: Raster Data Classes

  • the raster package has a pretty good suite of raster functions
  • there are tricks to handling large data files
  • rasterize and overlay functions allow you come combine vector and raster data

GIS vs. R

GIS

  • Visual interaction
  • Data management
  • Geometric operations
  • Standard workflows
  • Single map production
  • Click, click, click, click
  • Speed of execution

R

  • Data and model focused
  • Analysis
  • Attributes as important
  • Creativity and innovation
  • Many (simpler) maps
  • Repeatability (script)
  • Speed of development

When to Use GIS vs. R?

GIS

  • no mapping experience
  • one-off maps
  • serious cartography
  • your data live in a geodatabase

R

  • no money for commercial software
  • need repeated analysis
  • integration with other analyses
  • development of new methods
  • working with large raster files

Additional Resources

Additional Resources

Discussion