Working with Spatial Data in R

by Mark R Payne
DTU-Aqua, Charlottenlund, Denmark
http://www.staff.dtu.dk/mpay

Sun Apr 13 21:43:43 2014

A quick overview of packages and tools in R that are useful for working with spatial, oceanographic and climate data.

Maps, Mapdata

The maps package is the classic mapping package in R - it simply provides a set of polygons that can be combined with the base graphics, to do what you want. It’s as simple as:

library(maps)
map()

plot of chunk unnamed-2

That gives us a world map. We can add zoom in on a region either by name

map(regions="Sweden")

plot of chunk unnamed-3

or by coordinates

map(xlim=c(5,15),ylim=c(53,58),fill=TRUE,col="grey")

plot of chunk unnamed-4

But notice that its a bit chunky. Use the mapdata package to get higher resolution coastlines

library(mapdata)
map("worldHires",xlim=c(5,15),ylim=c(53,58),fill=TRUE,col="grey")

plot of chunk unnamed-5

Add prettiness

map("worldHires",xlim=c(5,15),ylim=c(53,58),fill=TRUE,col="grey")
map.axes()
map.scale()

plot of chunk unnamed-6

The map function plays nicely with base graphics. You can either add to an existing map

timmervegas <- data.frame(y=-c(44+24/60),x=c(171+15/60))
map("nz") #Yes, there *is* a special high-res map of NZ in "mapdata"
points(timmervegas$x,timmervegas$y,pch=16,col="red")
text(timmervegas$x,timmervegas$y,"Timaru",pos=4,xpd=NA)

plot of chunk unnamed-7

or add a map to an existing plot

grd <- expand.grid(x=0:10,y=seq(53,54,by=0.25))
plot(grd$x,grd$y,xlab="",ylab="",pch=3)
map("worldHires",add=TRUE,fill=TRUE)

plot of chunk unnamed-8

Note the rather ugly aspect ratio here. One of the advantages of using map() directly is that it takes care of the aspect ratio automatically via the asp argument - RTFM for more. Setting up a base map with map() can often be a good idea first, even if you overwrite it later.

map("worldHires",xlim=range(pretty(grd$x)),
                ylim=range(pretty(grd$y)))
map.axes()
points(grd$x,grd$y,pch=3)
map("worldHires",add=TRUE,fill=TRUE)

plot of chunk unnamed-9

Sp

The heart and soul of most spatial endevours in R is the “sp” package. There isn’t so much functionality here - what is useful is the set of classes that it provides, which are best used to store data in various forms, and to process it using the more standard set of R tools. An entire book has been written about this package: http://www.asdar-book.org/ We have access to electronic versions of the book directly via findit.dtu.dk. Generally, I think this package is excellent, but it is perhaps a little clumsy at times - some things are needlessly complex.

I would point you at the book if you’re going to get into this seriously but here are the basics. There are four types of objects that you need to know about - SpatialPoints, SpatialLines, SpatialPolygons, SpatialPixels and SpatialGrids. Ok, so that’s five. Anyway. Each of these then has a *DataFrame version eg. SpatialPixelsDataFrame etc.. which can be thought of as a way to bind metadata to an object, much like you would with a normal data.frame, where each row corresponds to a list of variables. I’ll give a quick example of what can be done here, using the “meuse” dataset that is built into sp.

First, load the package

library(sp) 

Then setup the object

data(meuse)
class(meuse)
## [1] "data.frame"
head(meuse)
##        x      y cadmium copper lead zinc  elev       dist   om ffreq soil lime landuse dist.m
## 1 181072 333611    11.7     85  299 1022 7.909 0.00135803 13.6     1    1    1      Ah     50
## 2 181025 333558     8.6     81  277 1141 6.983 0.01222430 14.0     1    1    1      Ah     30
## 3 181165 333537     6.5     68  199  640 7.800 0.10302900 13.0     1    1    1      Ah    150
## 4 181298 333484     2.6     81  116  257 7.655 0.19009400  8.0     1    2    0      Ga    270
## 5 181307 333330     2.8     48  117  269 7.480 0.27709000  8.7     1    2    0      Ah    380
## 6 181390 333260     3.0     61  137  281 7.791 0.36406700  7.8     1    2    0      Ga    470

Just a normal data frame. However, x and y are the coordinates in UTM, so lets tell sp that

coordinates(meuse) <- ~ x + y
class(meuse)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
meuse[1:6,]
## class       : SpatialPointsDataFrame 
## features    : 6 
## extent      : 181025, 181390, 333260, 333611  (xmin, xmax, ymin, ymax)
## coord. ref. : NA 
## variables   : 12
## names       : cadmium, copper, lead, zinc,  elev,       dist,   om, ffreq, soil, lime, landuse, dist.m 
## min values  :    11.7,     48,  116, 1022, 6.983, 0.00135803, 13.0,     1,    1,    0,      Ah,    150 
## max values  :     8.6,     85,  299,  640, 7.909, 0.36406700,  8.7,     1,    2,    1,      Ga,     50

Suddenly, its a SpatialPointsDataFrame, with a whole lot of built in functionality. Lets look at the spatial distribution

plot(meuse)

plot of chunk unnamed-13

Cool. What about plotting the values though? Spplot is the way

spplot(meuse)
## Warning in stack.data.frame(df[zcol]): non-vector columns will be ignored

plot of chunk unnamed-14

That plots them all. Lets focus on one or two

spplot(meuse,1)

plot of chunk unnamed-15

There are heaps of options here. One of the most useful is to use the built-in colour scheme, and to add a continuous colorkey

sp.theme(TRUE)
spplot(meuse,1,colorkey=TRUE)

plot of chunk unnamed-16

spplot is based on lattice graphics. This is both an advantage and a limitation, and generally increases complexity. However, it is useful to be able to do make panels like this

spplot(meuse,1:2,colorkey=TRUE)

plot of chunk unnamed-17

I’m not going to go through all of the options in spplot, because there are a lot. I suggest you check out this webpage when you start to use spplot in anger: http://rspatial.r-forge.r-project.org/gallery/ The book referenced above is also good. ## Raster Raster is more or less the companion to sp. Whereas sp provides classes for working with all types of spatial data, including lines and polygons as well as gridded data, raster is only for gridded data. However, the two work very well together, and there is lots of functionality to flip and flop back and forth between the classes: raster is in fact built on top of sp, by many of the same people. RTFM for more.

One of the main strengths of raster is that it is designed to work with files on disk, and does not require that you read them into memory - indeed, you can work easily with files that are larger than the memory that you actutally have available in your machine. This is a fantastic capability when dealing with oceanographic data.

First a little demo. Lets fire it up.

library(raster)

We can create a raster directly from a netcdf (provided you have one of the two netcdf packages installed. Use “netcdf4” in all cases - “netcdf” is deprecated). The first example is a global elevation model, ETOPO1, that I have previously reduced in resolution from the native 1 minute version to 15 minutes (0.25 degree)

bath <- raster("data//ETOPO15_mean_bath.nc")

That was easy. Lets check it out some more

bath
## class       : RasterLayer 
## dimensions  : 720, 1440, 1036800  (nrow, ncol, ncell)
## resolution  : 0.25, 0.25  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : /home/mpayne/dtu/Lab/20140416_Spatial_tools_demo/data/ETOPO15_mean_bath.nc 
## names       : mean_bath 
## zvar        : mean_bath
summary(bath)
## Warning in .local(object, ...): summary is an estimate based on a sample of 1e+05 cells (9.65% of all cells)
##         mean_bath
## Min.    -8893.134
## 1st Qu. -4284.633
## Median  -2507.196
## 3rd Qu.   216.280
## Max.     5647.529
## NA's        0.000

Note that the summary hasn’t read the entire thing, but has instead just made a random sample, to avoid having to read iy all into RAM. Cool. Lets plot it

plot(bath)

plot of chunk unnamed-21

The raster plotting function seems quite useful, but I havne’t really got into it much.

What’s cool in raster is some of the functions. For example, lets regrid our 0.25 bathymetry to 1 degree

bath.1deg <- aggregate(bath,fact=4,fun=mean)
bath.1deg
## class       : RasterLayer 
## dimensions  : 180, 360, 64800  (nrow, ncol, ncell)
## resolution  : 1, 1  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : in memory
## names       : mean_bath 
## values      : -7378.206, 5408.73  (min, max)

Notice the change in size and resolution. The fun argument can be used to do all sorts of interesting things. For example, calculate the sd within a 1x1 degree pixel, which is a measure of the roughness and variability of the surface in that region:

bath.sd <- aggregate(bath,fact=4,fun=sd)
plot(bath.sd)

plot of chunk unnamed-23

Unsurprisingly, the shelf edges are the most variable region.

You can also, for example, increase the resolution (but remember of course that you’re never going to recreate information that has been lost).

bath.hires <- disaggregate(bath,fact=2,method="bilinear")
bath.hires
## class       : RasterLayer 
## dimensions  : 1440, 2880, 4147200  (nrow, ncol, ncell)
## resolution  : 0.125, 0.125  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : in memory
## names       : mean_bath 
## values      : -9272.051, 5949.967  (min, max)

Raster can also quickly and easier write its results out to NetCDF, which is a fantastic way to work.

writeRaster(bath.sd,filename="Surface_roughness.nc",format="CDF",overwrite=TRUE)
## class       : RasterLayer 
## dimensions  : 180, 360, 64800  (nrow, ncol, ncell)
## resolution  : 1, 1  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : /home/mpayne/dtu/Lab/20140416_Spatial_tools_demo/Surface_roughness.nc 
## names       : mean_bath 
## zvar        : mean_bath

The raster packaage has three types of objects - rasters, bricks and stacks. The examples above are based on rasters, and I don’t use stacks that much, but bricks are useful when dealing with multi-layered files. e.g.

b.all <- brick("data/ICEAC_OSTIA_2013.nc",varname="analysed_sst")

So, lets check it out

b.all
## class       : RasterBrick 
## dimensions  : 121, 301, 36421, 61  (nrow, ncol, ncell, nlayers)
## resolution  : 0.05, 0.05  (x, y)
## extent      : -20.05, -5, 61.95, 68  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : /home/mpayne/dtu/Lab/20140416_Spatial_tools_demo/data/ICEAC_OSTIA_2013.nc 
## names       : X1025524800, X1025611200, X1025697600, X1025784000, X1025870400, X1025956800, X1026043200, X1026129600, X1026216000, X1026302400, X1026388800, X1026475200, X1026561600, X1026648000, X1026734400, ... 
## z-value     : 1025524800, 1030708800 (min, max)
## varname     : analysed_sst

So, 61 layers, corresponding to the individual dates. Lets get rid of a few of these to make it easier to deal with. Extraction from a brick is done using the [[]] notation that you know from lists.

b <- b.all[[c(1,61)]]
b
## class       : RasterStack 
## dimensions  : 121, 301, 36421, 2  (nrow, ncol, ncell, nlayers)
## resolution  : 0.05, 0.05  (x, y)
## extent      : -20.05, -5, 61.95, 68  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## names       : X1025524800, X1030708800

Plotting then works well

plot(b)

plot of chunk unnamed-29

You can also start to think about working with these layers e.g. calculate differences

temp.diff <- b[[1]]-b[[2]]
plot(temp.diff,col=colorRampPalette(c("red","white","blue"))(100))

plot of chunk unnamed-30

or creating an averaged temperature

temp.mean <- calc(b,fun=mean)
plot(temp.mean)

plot of chunk unnamed-31

and blah, blah, blah…

RTFM. In particular, the “raster-package” help is a great overview

?"raster-package"

but the vignettes and the help directory are also really good. See here

help(package="raster")

for more. ## Polygon funkiness The three packages above form the basis. There are lots of other packages functions and tricks that pop up along the way that are also useful, but I won’t go into in full detail here. Instead I’ll just show you a few of the analysises that are useful at times.

One of the more common things I do is using the maptools package to create spatial objects, and the running with them. Lets start with a map of NZ

nz.map <- map("nz",fill=TRUE,plot=FALSE)
library(maptools)
nz.sp <- map2SpatialPolygons(nz.map,IDs=nz.map$names,
                             proj4string=CRS("+proj=longlat +ellps=WGS84"))
nz.sp
## class       : SpatialPolygons 
## features    : 22 
## extent      : 166.3961, 178.5629, -47.40573, -34.39895  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=WGS84

Now lets convert it to UTM coordinates - this requires the rgdal package, which can be a bit tricky to install

library(rgdal)
nz.utm <- spTransform(nz.sp,
                CRS("+proj=utm +zone=59 +datum=WGS84 +units=km +ellps=WGS84 +towgs84=0,0,0"))

Now, lets expand the coastline, by, say, 100km. Use the rgeos package

library(rgeos)
nz.expand.utm <- gBuffer(nz.utm,width=100)

Shift back to long-lat and plot

nz.coastal <- spTransform(nz.expand.utm,CRS("+proj=longlat +ellps=WGS84"))
plot(nz.coastal,border="red")
plot(nz.sp,add=TRUE)

plot of chunk unnamed-37

We can now create a sampling grid within the 100km coastal region, but not on land. The spSample() is useful here, as is overlay. First, create a systematic sampling grid within the coastal bounds

pts.all <- spsample(nz.coastal,type="regular")
plot(pts.all,pch=".")

plot of chunk unnamed-38

Then test for points that are onland using the over() function (very handy that!)

onland <- over(pts.all,nz.sp)
pts.wet <- pts.all[is.na(onland),]
plot(nz.coastal,border="red")
plot(nz.sp,add=TRUE)
plot(pts.wet,add=TRUE,pch=".",col="blue")

plot of chunk unnamed-39

And viola! ## Putting it all together… and more…. For my final trick, lets make a time series of temperatures on the eastern half Icelandic of the Icelandic shelf, here defined as the 200m isobath. We have all the elements, so lets roll

# Setup temperature brick
temp.b <- brick("data//ICEAC_OSTIA_2013.nc",varname="analysed_sst")
# Setup bathymetry of the region
bath <- raster("data/ETOPO15_mean_bath.nc")
bath.ice <- crop(bath,extent(-30,0,60,70))  #Crop to the Iceland region
plot(bath.ice)

plot of chunk unnamed-40

#Create a mask
ice.mask <- bath.ice > -200
ice.mask[coordinates(ice.mask)[,1]>-10] <- FALSE  #Removes Faroese plateau
plot(ice.mask)

plot of chunk unnamed-40

#To apply the mask, we first need to change the resolution
mask.hires <- disaggregate(ice.mask,fact=5)
mask.hires <- crop(mask.hires,temp.b)
mask.hires[mask.hires] <- NA   
#Now apply the mask
masked.temp <- mask(temp.b,mask.hires,inverse=TRUE)
plot(masked.temp[[1:2]])

plot of chunk unnamed-40

#Now calcualte the mean temperatures
library(reshape)
temp.ts <- cellStats(masked.temp,stat=mean)
temp.ts <- melt(temp.ts)
temp.ts$time <- gsub("X","",rownames(temp.ts))
temp.ts$POSIX <- as.numeric(temp.ts$time) + ISOdatetime(1981,1,1,0,0,0)
temp.ts$celsius <- temp.ts$value - 273.15
plot(celsius ~ POSIX,temp.ts,type="b",
     xlab="Date",ylab="Temperature (deg C)")

plot of chunk unnamed-40

And that’s about it really.

This work by Mark R Payne is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. For details, see http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US Basically, this means that you are free to “share” and “remix” for non-commerical purposes as you see fit, so long as you “attribute” me for my contribution. Derivatives can be distributed under the same or similar license.
This work comes with ABSOLUTELY NO WARRANTY or support.
This work should also be considered as BEER-WARE. For details, see http://en.wikipedia.org/wiki/Beerware