Spatial data manipulation

Jérôme Guélat, Swiss Ornithological Institute (2013)

Introduction

Spatial data types

R is becoming a powerful GIS package, allowing us to use one software to manage and to model our spatial data! The sp package defines the main spatial classes. In order to better understand the subsequent R code, here's a quick reminder of the main spatial data types.

The two main types are vector data and raster data. Vector data is normally used for high precision datasets and can be represented as points, lines or polygons. The best known format for storing vector data is the shapefile, a a rather old format developed by ESRI. The type of vectors you will use depends of course on your own data and on your analyses. For example: points are appropriate for bird nests and sightings, lines for moving animals and linear structures (paths, rivers), and polygons for territories and landcover categories. Of course a river can also be modelled as a polygon if you're interested in its width (or you can also store its width as an attribute of the line). The sp package defines the corresponding classes: SpatialPoints for points, SpatialLines for lines and SpatialPolygons for polygons.

Normally, you also have some attributes linked to each element of the vector dataset (e.g. an identification number, the corresponding landcover category, etc.). In most GIS softwares, this is called the attribute table. The sp package also defines classes for vector datasets linked with attribute data: SpatialPointsDataFrame, SpatialLinesDataFrame and SpatialPolygonsDataFrame. As the names indicate, the attribute data is stored inside the spatial objects as a dataframe.

Before going any further, we are going to download and install the packages needed for this tutorial (just remove the # before the install functions if you need them):

# Install the packages
#install.packages("rgdal")
#install.packages("rgeos")
#install.packages("geosphere")
#install.packages("raster")
#install.packages("spdep")

It is extremely easy to import spatial data into R. We will use the readOGR function from the rgdal package. This function is an interface to the OGR software, a famous GIS library used to import many different vector formats.

library(sp)
library(rgdal)

# The first argument is the name of the directory where the data is
# stored. If your data is already in the R working directory, just use
# '.'. The second argument is the name of the shapefile without the
# extension (.shp)
pts <- readOGR("data", "testpts", verbose = FALSE)
lin <- readOGR("data", "testlines", verbose = FALSE)
pol <- readOGR("data", "testpol", verbose = FALSE)

# Check the structure of the spatial objects
str(pts)

## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
##   ..@ data       :'data.frame':  3 obs. of  1 variable:
##   .. ..$ Id: int [1:3] 1 2 3
##   ..@ coords.nrs : num(0) 
##   ..@ coords     : num [1:3, 1:2] 2657302 2657292 2657294 1219739 1219750 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
##   ..@ bbox       : num [1:2, 1:2] 2657292 1219739 2657302 1219761
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
##   .. .. ..$ : chr [1:2] "min" "max"
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
##   .. .. ..@ projargs: chr "+proj=somerc +lat_0=46.95240555555556 +lon_0=7.439583333333333 +k_0=1 +x_0=2600000 +y_0=1200000 +ellps=bessel +units=m +no_defs"| __truncated__

str(lin)

## Formal class 'SpatialLinesDataFrame' [package "sp"] with 4 slots
##   ..@ data       :'data.frame':  1 obs. of  1 variable:
##   .. ..$ Id: int 1
##   ..@ lines      :List of 1
##   .. ..$ :Formal class 'Lines' [package "sp"] with 2 slots
##   .. .. .. ..@ Lines:List of 1
##   .. .. .. .. ..$ :Formal class 'Line' [package "sp"] with 1 slots
##   .. .. .. .. .. .. ..@ coords: num [1:5, 1:2] 2657253 2657285 2657306 2657326 2657330 ...
##   .. .. .. ..@ ID   : chr "0"
##   ..@ bbox       : num [1:2, 1:2] 2657253 1219721 2657330 1219806
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "x" "y"
##   .. .. ..$ : chr [1:2] "min" "max"
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
##   .. .. ..@ projargs: chr "+proj=somerc +lat_0=46.95240555555556 +lon_0=7.439583333333333 +k_0=1 +x_0=2600000 +y_0=1200000 +ellps=bessel +units=m +no_defs"| __truncated__

str(pol)

## Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
##   ..@ data       :'data.frame':  2 obs. of  1 variable:
##   .. ..$ Id: int [1:2] 1 2
##   ..@ polygons   :List of 2
##   .. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots
##   .. .. .. ..@ Polygons :List of 1
##   .. .. .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
##   .. .. .. .. .. .. ..@ labpt  : num [1:2] 2657281 1219744
##   .. .. .. .. .. .. ..@ area   : num 1451
##   .. .. .. .. .. .. ..@ hole   : logi FALSE
##   .. .. .. .. .. .. ..@ ringDir: int 1
##   .. .. .. .. .. .. ..@ coords : num [1:14, 1:2] 2657263 2657271 2657278 2657283 2657307 ...
##   .. .. .. ..@ plotOrder: int 1
##   .. .. .. ..@ labpt    : num [1:2] 2657281 1219744
##   .. .. .. ..@ ID       : chr "0"
##   .. .. .. ..@ area     : num 1451
##   .. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots
##   .. .. .. ..@ Polygons :List of 1
##   .. .. .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
##   .. .. .. .. .. .. ..@ labpt  : num [1:2] 2657267 1219720
##   .. .. .. .. .. .. ..@ area   : num 54.8
##   .. .. .. .. .. .. ..@ hole   : logi FALSE
##   .. .. .. .. .. .. ..@ ringDir: int 1
##   .. .. .. .. .. .. ..@ coords : num [1:5, 1:2] 2657264 2657269 2657270 2657265 2657264 ...
##   .. .. .. ..@ plotOrder: int 1
##   .. .. .. ..@ labpt    : num [1:2] 2657267 1219720
##   .. .. .. ..@ ID       : chr "1"
##   .. .. .. ..@ area     : num 54.8
##   ..@ plotOrder  : int [1:2] 1 2
##   ..@ bbox       : num [1:2, 1:2] 2657259 1219714 2657313 1219777
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "x" "y"
##   .. .. ..$ : chr [1:2] "min" "max"
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
##   .. .. ..@ projargs: chr "+proj=somerc +lat_0=46.95240555555556 +lon_0=7.439583333333333 +k_0=1 +x_0=2600000 +y_0=1200000 +ellps=bessel +units=m +no_defs"| __truncated__

# Accessing the data slot
pts@data

##   Id
## 1  1
## 2  2
## 3  3

# Adding a new column with values in the attribute table
pts$new <- c("a", "b", "c")
pts@data

##   Id new
## 1  1   a
## 2  2   b
## 3  3   c

# Accessing the coordinates of the points
coordinates(pts)

##      coords.x1 coords.x2
## [1,]   2657302   1219739
## [2,]   2657292   1219750
## [3,]   2657294   1219761

# Getting the coordinates of the centroids of polygons
coordinates(pol)

##      [,1]    [,2]
## 0 2657281 1219744
## 1 2657267 1219720

# Plotting spatial data
maxXY <- pmax(bbox(pts)[, 2], bbox(lin)[, 2], bbox(pol)[, 2])
minXY <- pmin(bbox(pts)[, 1], bbox(lin)[, 1], bbox(pol)[, 1])

plot(pol, xlim = c(minXY[1], maxXY[1]), ylim = c(minXY[2], maxXY[2]))
plot(lin, add = TRUE)
plot(pts, add = TRUE)

plot of chunk intro2

spplot(pol, sp.layout = list(list("sp.lines", lin), list("sp.points", pts)), 
    xlim = c(minXY[1], maxXY[1]), ylim = c(minXY[2], maxXY[2]))

plot of chunk intro2

However you don't always have a GIS file for your data. Quite often, you'll work with point data stored in a standard dataframe using 2 columns to store the coordinates. If you have such a dataset, it is possible to convert it to a SpatialPointsDataFrame (the result would then be the same as an imported point shapefile).

# Generate a fake dataset
fakedata <- data.frame(x = c(3, 6, 3), y = c(9, 5, 5), id = c(1, 2, 3), elevation = c(550, 
    1200, 400))
# Convert the dataframe to a SpatialPointsDataFrame object
coordinates(fakedata) <- c("x", "y")

class(fakedata)

## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

The second main type of GIS data is raster data. It is basically an image divided into pixels, and each pixel have an associated value. Satellite imagery and digital elevation models are two examples of raster dataset. Note that a raster dataset can have several bands, for example most satellite images have a least three bands: red, green and blue. The sp packages provide two classes to work with raster data: SpatialGrid/SpatialGridDataFrame and SpatialPixels/SpatialPixelsDataFrame. These two classes differ in the way they store the data, however the actual values are identical. The raster package provides a more efficient class which we're going to use later in this tutorial.

# Using the classes provided by sp
ras1 <- readGDAL("data/elevation.tif", silent = TRUE)
str(ras1)

## Formal class 'SpatialGridDataFrame' [package "sp"] with 4 slots
##   ..@ data       :'data.frame':  10000 obs. of  1 variable:
##   .. ..$ band1: num [1:10000] 83.6 96.1 88.4 75.6 84.5 ...
##   ..@ grid       :Formal class 'GridTopology' [package "sp"] with 3 slots
##   .. .. ..@ cellcentre.offset: Named num [1:2] 0.5 0.5
##   .. .. .. ..- attr(*, "names")= chr [1:2] "x" "y"
##   .. .. ..@ cellsize         : num [1:2] 1 1
##   .. .. ..@ cells.dim        : int [1:2] 100 100
##   ..@ bbox       : num [1:2, 1:2] 0 0 100 100
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "x" "y"
##   .. .. ..$ : chr [1:2] "min" "max"
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
##   .. .. ..@ projargs: chr NA

spplot(ras1)

plot of chunk intro4

# Convert the SpatialGridDataFrame to a SpatialPixelsDataFrame
ras2 <- as(ras1, "SpatialPixelsDataFrame")
str(ras2)

## Formal class 'SpatialPixelsDataFrame' [package "sp"] with 7 slots
##   ..@ data       :'data.frame':  10000 obs. of  1 variable:
##   .. ..$ band1: num [1:10000] 83.6 96.1 88.4 75.6 84.5 ...
##   ..@ coords.nrs : num(0) 
##   ..@ grid       :Formal class 'GridTopology' [package "sp"] with 3 slots
##   .. .. ..@ cellcentre.offset: Named num [1:2] 0.5 0.5
##   .. .. .. ..- attr(*, "names")= chr [1:2] "x" "y"
##   .. .. ..@ cellsize         : num [1:2] 1 1
##   .. .. ..@ cells.dim        : int [1:2] 100 100
##   ..@ grid.index : int [1:10000] 1 2 3 4 5 6 7 8 9 10 ...
##   ..@ coords     : num [1:10000, 1:2] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:2] "x" "y"
##   ..@ bbox       : num [1:2, 1:2] 0 0 100 100
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "x" "y"
##   .. .. ..$ : chr [1:2] "min" "max"
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
##   .. .. ..@ projargs: chr NA

spplot(ras2)

plot of chunk intro4

# Using the class provided by raster
library(raster)
ras3 <- raster("data/elevation.tif")
ras3

## class       : RasterLayer 
## dimensions  : 100, 100, 10000  (nrow, ncol, ncell)
## resolution  : 1, 1  (x, y)
## extent      : 0, 100, 0, 100  (xmin, xmax, ymin, ymax)
## coord. ref. : NA 
## data source : C:\Users\jgu\Dropbox\Movement ecology\data\elevation.tif 
## names       : elevation 
## values      : 0.3017, 341.1  (min, max)

plot(ras3)

plot of chunk intro4

# Convert SpatialGrid/SpatialPixels objects into raster objects
ras4 <- raster(ras1)
ras5 <- raster(ras2)

Useful R packages for GIS analyses

The following R packages are especially useful to perform complex GIS operations (they also use the sp classes). With these packages, you'll be able to perform spatial operations similar to the ones provided by ArcToolbox in ArcGIS.

rgeos: topological operations
geosphere: spherical trigonometry (operations on unprojected data)
raster: management and analysis of raster data
spdep: neighborhood analyses (and much much more)

Of course many more spatial packages are available. The following are especially useful if you need to model your spatial data: gstat (kriging), geoR (kriging and spatial regressions), spatstat (point processes), spBayes (spatial regression).

sp

Besides defining the main spatial classes in R, the sp package also provides some basic GIS analyses. The most interesting functions allow us to generate random or regular sampling points, to compute distances and to spatially join different datasets.

Example

We'd like to study the habitat preferences of some animal in our study area. One possibility would be to generate random points, then for each point we could determine the habitat type using other GIS layers (for example some vegetation maps). Finally we need to go in the field and survey each site to check if the animal is present. This example will show a solution for the first 2 steps.

First we need to define our study area (we could also import a GIS file, e.g. a shapefile)

peri <- SpatialPolygons(list(Polygons(list(Polygon(cbind(c(0, 0, 6, 6, 0), c(6, 
    0, 0, 6, 6)))), "peri")))

Now we can generate the sampling points with the spsample function using several designs (random, regular, stratified, nonaligned, hexagonal, clustered and Fibonacci). We are not restricted to a polygon, we can also sample random points along a transect or on a grid. If you need a random sample with a minimal distance between the points, check the rSSI function from the spatstat package.

ptsrand <- spsample(peri, 100, type = "random")
ptsreg <- spsample(peri, 100, type = "regular")
ptshex <- spsample(peri, 100, type = "hexagonal")

par(mfrow = c(1, 3))
plot(peri)
points(ptsrand)
plot(peri)
points(ptsreg)
plot(peri)
points(ptshex)

plot of chunk sp2

As a small exploratory analysis, we can check the distances between the sampling points. After using the spsample function, all the points should have different coordinates (no duplicate), but to be 100% confident we can also check that.

# See the geosphere section if the points are not projected
ptsdist <- dist(coordinates(ptsrand))

hist(ptsdist, xlab = "Distance between sampling points")

plot of chunk sp3

# Check for duplicates. If any, then use the remove.duplicates function
nrow(zerodist(ptsrand))

## [1] 0

For the sake of reproducibility, let's import a simple vegetation map.

vegmap <- readOGR("data", "vegetation", verbose = FALSE)

spplot(vegmap, col.regions = rainbow(length(vegmap)), sp.layout = list("sp.points", 
    ptsrand, pch = 16, col = "black"))

plot of chunk sp4

Now we'd like to know the vegetation type for each sampling point. In the GIS terminology this is called a “spatial join”, and this can be done with the over function. This function is really powerful, don't hesitate to read the vignette to learn more about it. Just type: vignette("over")

join <- over(ptsrand, vegmap)
# Integrate this vegetation data into the SpatialPoints object
ptsrand <- SpatialPointsDataFrame(ptsrand, join)
head(ptsrand)

##   type
## 1    H
## 2    F
## 3    F
## 4    F
## 5    B
## 6    G

table(ptsrand$type)

## 
##  A  B  C  D  E  F  G  H  I  J 
##  4 17  4  6  7 28  6  2 13 13

If you want to use your generated points in another GIS, you can easily export them in the vector formats provided by your version of OGR (including shapefiles). The export is done with the writeOGR function (rgdal package).

writeOGR(ptsrand, ".", "randompts", driver = "ESRI Shapefile", overwrite_layer = TRUE)

rgeos

The rgeos package is only an interface to the GEOS library (Geometry Engine - Open Source). GEOS is very powerful geometry engine written in C++ (actually it is a C++ port of the Java Topology Suite), used for example in PostGIS databases. It can handle complex geometries with a good speed because all the computations are performed by the C++ code.

The rgeos package can use the spatial vector classes defined in the sp package (SpatialPoints*, SpatialLines* and SpatialPolygons*), but it is also possible to define the geometries using the WKT (Well-known text) format.

library(rgeos)
library(rgdal)

pol1 <- readOGR("data", "pol", verbose = FALSE)

gIsValid(pol1)

## [1] TRUE

pol2 <- readWKT("GEOMETRYCOLLECTION (POLYGON ((683448 250375, 683620 250373, 683618 250170, 683342 250184, 683448 250375)), POLYGON ((683981 250321, 683808 250155, 683632 250294, 683981 250321)))")

gIsValid(pol2)

## [1] TRUE

Let's compare the 2 geometries, first visually and then programmatically:

par(mfrow = c(1, 2))
plot(pol1)
plot(pol2)
box(which = "outer")

plot of chunk rgeos2

# The function is applied on the entire object
gEqualsExact(pol1, pol2, tol = 0.001)

## [1] TRUE

# The function is applied accross IDs (features)
gEqualsExact(pol1, pol2, byid = TRUE, tol = 0.001)

##       0     1
## 1  TRUE FALSE
## 2 FALSE  TRUE

It is always a good idea to check the topological validity of our geometries with the gIsValid() function before going further with other analyses. As you will notice, all interesting rgeos functions start with a “g”.

Example

We have a few sightings of a red kite (Milvus milvus) near a windmill. In order to take appropriate conservation measures we need some information about the spatial arrangement of the main elements. First, we'd like to know the distance between the windmill and the nearest edge of the red kite territory. We'd also like to know the area of the intersection between the red kite territory and a buffer around the windmill.

First we create some points for the sightings and one point for the windmill:

sightings <- SpatialPoints(data.frame(x = c(7, 19, 11, 19, 4, 20, 13, 15, 20), 
    y = c(5, 0, 4, 15, 9, 6, 1, 10, 14)))
windmill <- SpatialPoints(data.frame(x = 32, y = 10))

We can now compute a simple homerange (a minimum convex polygon) using rgeos gConvexHull function. As an exercise, let's also verify that all the sightings are included in the homerange. If you're a bit suprised by the results of gContains, have a look at the examples in the help (?gContains).

homerange <- gConvexHull(sightings)
gContains(homerange, sightings)

## [1] TRUE

gContains(homerange, sightings, byid = TRUE)

##       1
## 1 FALSE
## 2 FALSE
## 3  TRUE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## 7 FALSE
## 8  TRUE
## 9 FALSE

gCovers(homerange, sightings, byid = TRUE)

##      1
## 1 TRUE
## 2 TRUE
## 3 TRUE
## 4 TRUE
## 5 TRUE
## 6 TRUE
## 7 TRUE
## 8 TRUE
## 9 TRUE

It is now easy to compute the distance between the windmill and the nearest edge of the territory.

gDistance(windmill, homerange)

## [1] 12

For the rest of the analysis, we need to create a buffer around the windmill. This can be done with the gBuffer function (the quadsegs argument controls the number of line segments used to approximate a circle).

buf <- gBuffer(windmill, width = 15, quadsegs = 50)

# Let's plot everything
plot(homerange, col = "lightblue", xlim = c(0, 47), ylim = c(0, 20))
plot(buf, add = TRUE)
plot(sightings, pch = 16, add = TRUE)
plot(windmill, pch = 17, cex = 2, col = "blue", add = TRUE)
box()

plot of chunk rgeos6

We see that a small part of the homerange intersects with the windmill buffer. We can also test it programmatically and extract only the intersection of the 2 geometries. Once we have the intersection, it is easy to get its area using the gArea function (have a look at the gLength function for the perimeter).

gIntersects(homerange, buf)

## [1] TRUE

gDisjoint(homerange, buf)

## [1] FALSE

inter <- gIntersection(homerange, buf)

# Getting area using rgeos
gArea(inter)

## [1] 29.48

# Or directly extracting the information from the SpatialPolygons object
sapply(slot(inter, "polygons"), slot, "area")

## [1] 29.48

# Let's plot everything
plot(homerange, col = "lightblue", xlim = c(0, 47), ylim = c(0, 20))
plot(buf, add = TRUE)
plot(inter, col = "red", add = TRUE)
plot(sightings, pch = 16, add = TRUE)
plot(windmill, pch = 17, cex = 2, col = "blue", add = TRUE)
box()

plot of chunk rgeos7

Of course, we can also extract the homerange part which is not affected by the windmill.

dif <- gDifference(homerange, buf)
gArea(dif)

## [1] 120

We can now merge the 2 parts of the homerange (affected and unaffected) to get our initial geometry. The gUnaryUnion function can be used to aggregate a single sp object (see ?gUnaryUnion).

hr_union <- gUnion(inter, dif)
# The 2 geometries should be spatially equivalent, even if hr_union has 2
# more vertices (probably some rounding error)
gEquals(homerange, hr_union)

## [1] FALSE

par(mfrow = c(1, 2))
plot(homerange)
points(homerange@polygons[[1]]@Polygons[[1]]@coords, pch = 16, col = "red")
plot(hr_union)
points(hr_union@polygons[[1]]@Polygons[[1]]@coords, pch = 16, col = "red")
box(which = "outer")

plot of chunk rgeos9

geosphere

Sometimes you need to work on unprojected data, for example bird migration data with a large extent. Some functions of the sp and rgeos packages are able to use such data, but if you need to compute distances or areas, you should use the geosphere package. Attention: if you study area is small enough, you'll get a better precision by using the appropriate projection for your area.

Example

We are studying the migration of the common swift (Apus apus) between Switzerland and Africa. We got a ring recovery in South Africa and we'd like to know the shortest distance (great circle distance) between the ringing site and the site where the ring was found (of course, birds are not always taking the shortest route…).

First let's import/create the data and visualize everything on a map.

library(geosphere)

# Import background map
map <- readOGR("data", "world", verbose = FALSE)
proj4string(map) <- CRS("+proj=longlat +datum=WGS84")

# Create the origin (ringing site: Switzerland) and the destination (ring
# recovery: South Africa)
pts <- SpatialPoints(data.frame(x = c(8.33, 30), y = c(47.22, -23)))
proj4string(pts) <- CRS("+proj=longlat +datum=WGS84")

# Compute the great-circle
gcLine <- greatCircle(pts[1], pts[2], sp = TRUE)
spplot(map, col.regions = "grey80", sp.layout = list(list("sp.lines", gcLine, 
    lwd = 2, col = "blue"), list("sp.points", pts, pch = 16, cex = 1.5, col = "red")), 
    colorkey = FALSE)

plot of chunk geosphere1

The great circles are easy to compute but it's more complicated to compute the distance. Several methods are available in the geosphere package, some of them assume a spherical earth (law of cosines, haversine), others are more precise and use an ellipsoid (Meeus, Vincenty). The sp package also provides another method.

# Great circle distance accordingn to the Meeus method (geosphere package)
distMeeus(pts[1], pts[2])

## [1] 8075075

# Great circle distance according to the Haversine method (geosphere
# package)
distHaversine(pts[1], pts[2])

## [1] 8113961

# Another great circle distance (sp package, output in km)
spDists(pts[1], pts[2], longlat = TRUE)

##      [,1]
## [1,] 8063

We know that the ringing site was in Switzerland… But we can also check it with some R code (just for fun). We use the gIntersects function from the rgeos package to find which polygon intersects with the ringing site (we can use other rgeos functions: see gContains, gWithin, gCovers, etc.). Once we've extracted the desired polygon, it's easy to compute its area and perimeter. But don't forget these results are only approximations because the functions assume a spherical earth.

swiss <- map[which(gIntersects(pts[1], map, byid = TRUE)), ]
proj4string(swiss)

## [1] "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"

# Real area: 41284 km2
round(areaPolygon(swiss)/1e+06)

## [1] 41265

# Real perimeter: 1944 km
round(perimeter(swiss)/1000)

## [1] 1919

If you don't trust my values, let's check the real area and perimeter after an accurate projection in the new swiss coordinate reference system (CH1903+ LV95).

swissproj <- spTransform(swiss, CRS("+init=epsg:2056"))
gArea(swissproj)/1e+06

## [1] 41284

gLength(swissproj)/1000

## [1] 1944

raster

Many interesting datasets are only available in raster format (digital elevation models, landcover, etc.). The sp package also provides 2 useful classes (SpatialGrid* and SpatialPixels*) but the classes defined in the raster package are much more intuitive. Moreover, raster objects can be held on disk rather than in memory, which allows working with bigger datasets (sp objects are always stored in memory). The package also provides many standard raster analyses.

Example

We'd like to study the habitat preferences of some animal. As a starting point we have a GIS layer with sightings (points) and one raster describing the elevation in our study area.

First we need to create/import the data. The GDAL library is used to import the rasters, therefore you can import all the common formats. If your raster data is stored in a 3 columns table (coordinates + values), then check the rasterFromXYZ function.

library(raster)

# Import a GeoTIFF raster
rElev <- raster("data/elevation.tif")
projection(rElev) <- CRS("+init=epsg:2056")

# Create some sightings
pts <- SpatialPoints(data.frame(x = sample(20:80, size = 10), y = sample(20:80, 
    size = 10)))

plot(rElev)
points(pts, pch = 16)

plot of chunk raster1

We can use the digital elevation model to derive other interesting variables. The terrain function allows us to compute many variables: slope, aspect, topographic position index, etc. The aspect variable is a circular variable (0° = 360°) and can therefore cause some troubles in a statistical analysis. One possibility is to decompose the variable into 2 orthogonal variables: northness and eastness (both variables go from -1 to +1).

rSlope <- terrain(rElev, opt = "slope", unit = "degrees")
rAspect <- terrain(rElev, opt = "aspect")

# We can use mathematical functions directly on the raster objects
rEastness <- sin(rAspect)
rNorthness <- cos(rAspect)

# If several rasters have the same extent and resolution, we can use
# raster algebra to combine them and derive new variables
rMix <- (rSlope + (rNorthness + 2)^5)/cellStats(rElev, stat = "mean") + 42

# We can store them in a RasterStack object (see also ?brick)
rTerrain <- stack(list(slope = rSlope, aspect = rAspect, eastness = rEastness, 
    northness = rNorthness))

plot(rTerrain, addfun = function() points(pts, pch = 16))

plot of chunk raster2

Quite often, we need to extract the raster values for our points, for example when we want to use this information as a covariate in a model. We've already seen how to do this with sp objects using the over function. With raster objects, we need to use the extract function, which also offers more options targeted to raster data (see ?extract).

terrainSamples <- extract(rTerrain, pts)
print(terrainSamples, digits = 2)

##       slope aspect eastness northness
##  [1,]    87    5.2    -0.90     0.446
##  [2,]    87    5.3    -0.85     0.525
##  [3,]    84    4.2    -0.86    -0.509
##  [4,]    85    4.9    -0.99     0.158
##  [5,]    86    5.0    -0.97     0.260
##  [6,]    83    6.0    -0.26     0.965
##  [7,]    88    5.6    -0.61     0.791
##  [8,]    86    1.6     1.00    -0.022
##  [9,]    88    4.3    -0.92    -0.389
## [10,]    78    4.2    -0.85    -0.527

As we saw earlier, it is possible to compute distances between spatial objects with several functions (spDists, dist, gDistance). It is also possible to create a distance raster. In such a raster, the value for each pixel reprensents the shortest distance to the next geometry.

rDist <- distanceFromPoints(rElev, pts)

plot(rDist)
points(pts, pch = 16)

plot of chunk raster4

# Let's have a look at the distances distribution. This is of course
# depends strongly on the raster extent
hist(rDist, main = "Histogram of distances", xlab = "Distances")

plot of chunk raster4

Sometimes we're not only interested in the raster values at the points. We'd also like to know the values contained in a buffer around the points. In the GIS world, this is called a zonal analysis. The R implementation is actually, in my opinion, better than the ArcMap one because you can use overlapping polygons without problem.

# Don't forget the byid = TRUE, otherwise we get a merged buffer
ptsBuf <- gBuffer(pts, width = 10, quadsegs = 50, byid = TRUE)
plot(rElev)
plot(ptsBuf, add = TRUE)

plot of chunk raster5

zonalres <- extract(rElev, ptsBuf)
# We get a list, with one element per buffer
str(zonalres)

## List of 10
##  $ : num [1:316] 194 194 200 193 169 ...
##  $ : num [1:316] 171 171 163 173 165 ...
##  $ : num [1:316] 310 311 298 292 292 ...
##  $ : num [1:316] 126 146 181 196 201 ...
##  $ : num [1:316] 210 193 204 167 169 ...
##  $ : num [1:316] 176 171 163 142 133 ...
##  $ : num [1:316] 225 202 196 207 229 ...
##  $ : num [1:316] 218 172 194 183 183 ...
##  $ : num [1:316] 257 277 246 243 269 ...
##  $ : num [1:316] 211 197 198 226 217 ...

# Another possibility: use the points directly with the buffer argument
zonalres2 <- extract(rElev, pts, buffer = 10)

# We see that the extracted values are identical
identical(lapply(zonalres, sort), lapply(zonalres2, sort))

## [1] TRUE

# Let's compute the average value for each buffer
sapply(zonalres, mean)

##  [1] 236.1 172.1 199.5 176.0 174.6 234.8 241.9 185.9 225.7 171.2

For the rest of this analysis, we are going to use a discretized version of the digital elevation model. The low-elevation areas will get a value of 1, the high-elevation ones will get a value of 3, and the ones in the middle will get a value of 2. A similar transformation is also interesting if you need to reclassify a landcover raster. Sometimes one needs to group 2 differents landcover categories.

(reclMat <- matrix(c(0, 100, 1, 100, 200, 2, 200, Inf, 3), ncol = 3, byrow = TRUE))

##      [,1] [,2] [,3]
## [1,]    0  100    1
## [2,]  100  200    2
## [3,]  200  Inf    3

rElevRecl <- reclassify(rElev, rcl = reclMat)
plot(rElevRecl)

plot of chunk raster6

We now have a new raster with 3 categories. In order to perform other analyses, it is sometimes useful to create 3 “dummy” rasters containing boolean values for each category. Once we have boolean rasters, we can compute the proportion of the desired category within a given radius for each pixel. This is called a focal analysis, and, in this example, it is equivalent to a smoothing filter.

# Create the dummy rasters
rElevStack <- layerize(rElevRecl)
plot(rElevStack)

plot of chunk raster7

# We extract the 3rd raster from the RasterBrick object
rElev3 <- rElevStack[[3]]

# Create the filter
(f <- focalWeight(rElev3, d = 3, type = "circle"))

##         [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7]
## [1,] 0.00000 0.00000 0.00000 0.03448 0.00000 0.00000 0.00000
## [2,] 0.00000 0.03448 0.03448 0.03448 0.03448 0.03448 0.00000
## [3,] 0.00000 0.03448 0.03448 0.03448 0.03448 0.03448 0.00000
## [4,] 0.03448 0.03448 0.03448 0.03448 0.03448 0.03448 0.03448
## [5,] 0.00000 0.03448 0.03448 0.03448 0.03448 0.03448 0.00000
## [6,] 0.00000 0.03448 0.03448 0.03448 0.03448 0.03448 0.00000
## [7,] 0.00000 0.00000 0.00000 0.03448 0.00000 0.00000 0.00000

# Apply the filter (default function = sum)
rElev3Foc <- focal(rElev3, w = f)

plot(rElev3Foc)

plot of chunk raster7

If you want to use your generated raster in another GIS, you can easily export it in the raster formats provided by your version of GDAL (including GeoTiff, ESRI ASCII, etc.). The export is done with the writeRaster function.

writeRaster(rElev3Foc, "smoothElev.tif", format = "GTiff", overwrite = TRUE)

## class       : RasterLayer 
## dimensions  : 100, 100, 10000  (nrow, ncol, ncell)
## resolution  : 1, 1  (x, y)
## extent      : 0, 100, 0, 100  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=somerc +lat_0=46.95240555555556 +lon_0=7.439583333333333 +k_0=1 +x_0=2600000 +y_0=1200000 +ellps=bessel +towgs84=674.374,15.056,405.346,0,0,0,0 +units=m +no_defs 
## data source : C:\Users\jgu\Dropbox\Movement ecology\smoothElev.tif 
## names       : smoothElev 
## values      : 0, 1  (min, max)

spdep

If you need to analyse and/or model spatial dependences, spdep is one of the most powerful package for this task. However, I'm only going to demonstrate a small but very important part of this package: how to create neibhours list objects. These neighbourhoods provide a lot of information if you need to investigate spatial autocorrelation in your dataset. Actually, they are mandatory for quite a few spatial models.

Example

Let's work once again with the small vegetation map. We'll try different methods to compute neighbours list objects and check the differences visually. The method used to define the neibhourhood can have a big impact on the modelling results.

If we have polygons (a SpatialPolygons* oject), the most intuitive solution is to build a neighbours list based on regions with contiguous boundaries. This is exactly what the function poly2nb does.

library(spdep)

vegmap <- readOGR("data", "vegetation", verbose = FALSE)
# Find the neighbours... If you don't use the row.names part, the elements
# of the list will be numbered from 0
polnb <- poly2nb(vegmap, row.names = 1:length(vegmap))

plot(vegmap, lwd = 2)
# The coordinates function applied to a SpatialPolygons object will return
# the coordinates of the centroids
text(coordinates(vegmap), labels = 1:length(vegmap), pos = 3)
plot(polnb, coordinates(vegmap), pch = 16, col = "blue", add = TRUE)

plot of chunk spdep1

summary(polnb)

## Neighbour list object:
## Number of regions: 10 
## Number of nonzero links: 36 
## Percentage nonzero weights: 36 
## Average number of links: 3.6 
## Link number distribution:
## 
## 2 3 4 5 6 
## 2 3 3 1 1 
## 2 least connected regions:
## 3 8 with 2 links
## 1 most connected region:
## 6 with 6 links

str(polnb)

## List of 10
##  $ : int [1:3] 8 9 10
##  $ : int [1:4] 3 4 6 10
##  $ : int [1:2] 2 4
##  $ : int [1:4] 2 3 5 6
##  $ : int [1:3] 4 6 7
##  $ : int [1:6] 2 4 5 7 9 10
##  $ : int [1:3] 5 6 9
##  $ : int [1:2] 1 9
##  $ : int [1:5] 1 6 7 8 10
##  $ : int [1:4] 1 2 6 9
##  - attr(*, "class")= chr "nb"
##  - attr(*, "region.id")= int [1:10] 1 2 3 4 5 6 7 8 9 10
##  - attr(*, "call")= language poly2nb(pl = vegmap, row.names = 1:length(vegmap))
##  - attr(*, "type")= chr "queen"
##  - attr(*, "sym")= logi TRUE

Another possibility is to define a desired number of neighbours (k) for each geometry. The k nearest neighbours will then be selected. When working with polygons, the distances between centroids are commonly used. This method can of course lead to asymmetric neighbourhoods.

knb <- knn2nb(knearneigh(coordinates(vegmap), k = 2))
is.symmetric.nb(knb)

## [1] FALSE

knb2 <- make.sym.nb(knb)
is.symmetric.nb(knb2)

## [1] TRUE

par(mfrow = c(1, 2))
plot(vegmap, lwd = 2)
plot(knb, coordinates(vegmap), pch = 16, col = "blue", arrows = TRUE, add = TRUE)
plot(vegmap, lwd = 2)
plot(knb2, coordinates(vegmap), pch = 16, col = "blue", arrows = TRUE, add = TRUE)

plot of chunk spdep2

If we set k = 1, we can easily compute the minimum distance needed to have at least one neighbour. This is done with the nbdists function which returns the euclidean distances along the links in a list of the same form as the neighbours list.

knb3 <- knn2nb(knearneigh(coordinates(vegmap), k = 1))
dsts <- unlist(nbdists(knb3, coordinates(vegmap)))
dist1nb <- max(dsts)

It is also possible to choose a specific distance (d) and to look for all the neighbours within this distance. Once again, when working with polygons, the distances between centroids are commonly used. Of course if the distance is too small, some geometries won't have any neighbour.

dnb1 <- dnearneigh(coordinates(vegmap), d1 = 0, d2 = 0.75 * dist1nb)
dnb2 <- dnearneigh(coordinates(vegmap), d1 = 0, d2 = 1 * dist1nb)
dnb3 <- dnearneigh(coordinates(vegmap), d1 = 0, d2 = 1.25 * dist1nb)

par(mfrow = c(1, 3))
plot(vegmap, lwd = 2)
plot(dnb1, coordinates(vegmap), pch = 16, col = "blue", add = TRUE)
plot(vegmap, lwd = 2)
plot(dnb2, coordinates(vegmap), pch = 16, col = "blue", add = TRUE)
plot(vegmap, lwd = 2)
plot(dnb3, coordinates(vegmap), pch = 16, col = "blue", add = TRUE)

plot of chunk spdep4

If you want to analyse your data with a spatial model using another package/software, you will sometimes need to export the neighbours list object in a usable format. The simplest one is a simple neighbourhood matrix.

# Export a neighbourhood matrix
nb2mat(polnb, style = "B")

##    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## 1     0    0    0    0    0    0    0    1    1     1
## 2     0    0    1    1    0    1    0    0    0     1
## 3     0    1    0    1    0    0    0    0    0     0
## 4     0    1    1    0    1    1    0    0    0     0
## 5     0    0    0    1    0    1    1    0    0     0
## 6     0    1    0    1    1    0    1    0    1     1
## 7     0    0    0    0    1    1    0    0    1     0
## 8     1    0    0    0    0    0    0    0    1     0
## 9     1    0    0    0    0    1    1    1    0     1
## 10    1    1    0    0    0    1    0    0    1     0
## attr(,"call")
## nb2mat(neighbours = polnb, style = "B")

# Export a neighbourhood to WinBUGS format
nbwin <- nb2WB(polnb)
str(nbwin)

## List of 3
##  $ adj    : int [1:36] 8 9 10 3 4 6 10 2 4 2 ...
##  $ weights: num [1:36] 1 1 1 1 1 1 1 1 1 1 ...
##  $ num    : int [1:10] 3 4 2 4 3 6 3 2 5 4

# Export a neighbourhood to INLA format
nb2INLA(file = "nbinla.txt", polnb)
# Export a neighbourhood to a SpatialLinesDataFrame object
spLines <- nb2lines(polnb, coords = coordinates(vegmap))