SPATIAL: Spatial data in R

This lab is designed to give you a brief introduction to using spatial data in R. For students with no prior experience of R, the first section provides a very brief introduction, covering reading and writing files and accessing variables. The main part of the lab covers creation, importation and exporting the main spatial data types, and some examples for producing maps. One example will introduce basic geostatistical analysis using the gstat package. You can use R as it is, but for this course we will use R in combination with the RStudio interface (also freeware), which has an organized layout and several extra options. As you work through the lab, you should read the help documents for the functions you use as well as try different parameters or variables.

A quick note on formatting

In this and subsequent labs, code that can be entered into R will be highlit, e.g.:

plot(x,y)

And R output will be formatted with ## at the start of the line. File names will be given in italics and will be available through the CMES server.

Linking to CMES data

To get access to the data files, simply connect to my data directory on the CMES server. To do this under Mac OSX, open the ‘data’ shortcut on your desktop, then go to the Finder menu at the top of the screen. Choose [Go] > [Go to Folder]. When the dialog box opens up asking which folder to go to, simply type by University ID ‘u0784726’, and this should open my data directory. Copy to the file ‘Spatial R Lab.zip’, which contains this file and the data files needed for the lab to your ‘home’ directory. Unzip the file to create a directory called ‘Spatial R Lab’.

Brief introduction to R

This section is designed for students with no prior experience of R. Other can skip ahead to the section “Spatial Objects”

RStudio layout

The RStudio interface consists of several windows. Start RStudio from the ‘Start’ menu under Windows, and the following window should appear:

RStudio Window

Bottom left: ‘Console window’ (also called command window). Here you can type simple commands after the >" prompt and R will then execute your command. This is the most important window, because this is where R actually does stuff.
Top left: ‘Editor window’ (also called script window). Collections of commands (scripts) can be edited and saved. When you don’t see this window, you can open it with [File > New > R script]. Just typing a command in the editor window is not enough, it has to get into the command window before R executes the command. If you want to run a line from the script window (or the whole script), copy and paste it to the console. Alternatively, you can click [Run] or press CTRL+ENTER to send it to the command window.
Top right: ‘Workspace / history window’. In the workspace window you can see which data and values R has in its memory. You can view and edit the values by clicking on them. The history window shows what has been typed before.
Bottom right: ‘Files / plots / packages / help window’. Here you can open files, view plots (also previous plots), install and load packages or use the help function. You can change the size of the windows by dragging the grey bars between the windows.

As you will be typing a lot of commands in R, sometimes with quite complex syntax, it is highly recommended that you use scripts to record successful commands as you progress (you can copy and past from the console to the script editor). This allows you to go back and easily re-run these or modify them as you go along, or to transfer your analysis to another computer or another user. Note that just typing a command in the editor window is not enough, it has to get into the command window before R executes the command. If you want to run a line from the script window (or the whole script), copy and paste it to the console (or highlight it and press ‘Cmd-R’).

If you are using base R, you will only see the console and the plot window. Other windows (e.g. help) can be opened from the menu or during a working session.

Reading data from files

R can use many different file types, but csv files are recommended as the easiest way to transfer between R and Excel. Start downloading the file iris.csv to your computer, then change R’s working directory to where this file is located. Then get a list of files as follows (note the use of the pattern parameter to get only certain files):

list.files("./", pattern=".csv")

## [1] "co2_mm_mlo.csv" "iris.csv"       "test.csv"       "WNAclimate.csv"

Now read in a file

iris <- read.csv("iris.csv")

And check to see the variable iris that has been created in the R workspace (note that variable x we created earlier is also there).

ls()

## [1] "iris"

R will store data read in a dataframe. To print out the contents of any variable, simply type the name of that variable at the command prompt. Other useful commands are class() to see what data class a variable is, and names() to get a list of the column headers. The function str() reports the structure of the data set, describing the column names and the type of data stored in them.

iris
class(iris)
names(iris)
str(iris)

Manipulating variables in R

In order to access the values in a dataframe or other R object, there are two main methods: column notation and indexing. Column notation uses the dollar sign ($) after the name of dataframe, with the column name appended (note that R replaces any white spaces in the column names with ‘.’). For example:

iris$Species
iris$Sepal.Length

Indexing uses a coordinate system to access variables. These are given in brackets after the name of the object, and take as many indexes as there are dimensions of the data object. So for a 1D vector, we require one, for a 2D matrix or data frame, we require two, etc. For matrices and data frames, the indices are [row,column]. If either of these are left blank, then all rows (or columns) are selected:

iris[ ,1:4]         # Columns 1 to 4
iris[1:10, ]            # First 10 rows
iris[1:20,1:2]  # Indices can be combined
iris$Sepal.Length[1:25] # Note that indices can be combined
iris$Sepal.Length[3]        # 3rd element
iris$Sepal.Length[-3]       # All but 3rd element

Writing data to files

Variables created in R may also written out to csv files using the write.csv() function. As an example we create a variable that is the ratio of petal length to petal width, and then write this to a csv file:

plpw = iris$Petal.Length / iris$Petal.Width
write.csv(plpw, "test.csv")

This will write a csv file called test.csv in your working directory, containing the values in the vector plpw, created from the original data frame . Note that R uses the variable name as a column header and adds a column with line numbers. You can remove the line numbers by adding the parameter row.names=FALSE to the write.csv() function.

Some basic statistical functions

We will now create some new vectors in R containing the sepal lengths and petal lengths, in order to use these with functions. We do this here simply to cut down on the amount of typing you will have to do; you could equally use the data frame directly with the functions by using the indexing method described.

sl = iris$Sepal.Length
sw = iris$Sepal.Width
sp = iris$Species

R has a large number of inbuilt functions. This section is designed to simply introduce you to the some basic functions for describing data. To find out more about the functions, and how you can change parameters, use the help() function or the ? operator, e.g.:

help(mean)
?mean

This will open a window with the help file for that function. If you do not know the name of a function, there is a search function help.search(). Alternatively, use Google or RDocumentation to search more broadly.

Functions to describe the central tendancy:

mean(sl)
median(sl)

Functions to describe the dispersion:

sd(sl) ## Standard deviation
var(sl) ## Variance
min(sl) ## Minimum
max(sl) ## Maximum
range(sl) ## Min and Max
quantile(sl) ## Quantile (50% by default)

Note that quantile() takes a parameter that allows you to choose the quantile to be calculated, e.g. quantile(age, c(0.1,0.9)), will calculate the 10th and 90th percentile. I highly recommend that you read the help pages for these functions to understand what they do and how they can be modified.

Functions to assess the relationship between pairs of variables.

Standard functions include calculation of the covariance and correlation:

cov(sl,sw)
cor(sl,sw)

The correlation function gives Pearson’s coefficient by default, but can be changed by setting the method parameter. The correlations obtained can be tested against a null hypothesis of no correlation by using the cor.test() function.

cor.test(sl,sw)

## 
##  Pearson's product-moment correlation
## 
## data:  sl and sw
## t = -1.4403, df = 148, p-value = 0.1519
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.27269325  0.04351158
## sample estimates:
##        cor 
## -0.1175698

Some other useful functions:

levels() and table() are very useful for summarizing categorical data (table() can also be used to make cross tables). The summary() function is one of the most important as it is used frequently in R to print output from analysis as well as summary statistics.

levels(sp)
table(sp)
sum(sl)
summary(sl)

Plotting

Univariate plots

Histograms are commonly used to summarize data, where values are ‘binned’ into a set of classes, and the histogram represents the frequency of occurrences in that bin. The function hist() in R produces a histogram. Bins are defined with the breaks parameter, which may be set to a constant number in which case the data range is split into that many bins, or as a sequence of numbers defining the intervals between bins. In this latter case, we can make use of the seq() function from earlier.

hist(sl, breaks=20, main="Histogram of sepal lengths")
hist(sl, breaks=seq(0,10,0.5),
     main="Histogram of sepal lengths")

An alternative to histograms are boxplots, which show information about the data quartiles. Here the box represents the interquartile data (25-75% of the data), the thick bar is the median, and the ‘whiskers’ show the data range.

boxplot(sl)

To make this more useful, we can produce a boxplot for each of the three species of Iris. We do this using the model formula in R. This is denoted by using the tilde (~), which defines a relationship between the dependent variable (on the left of the tilde) and the independent variable (on the right). Here we want to plot sepal lengths dependent on the species:

boxplot(sl ~ sp)

Bivariate plots

Bivariate plots are designed to show the relationship between two variables, and how this may vary. The simplest form is the scatter plot. We use the plot() function again, but now we give it two variables (x and y). We can use this to visualize the relationship between petal length and sepal length:

plot(sl,sw)

The plot() function has a large number of parameters to help modify the figure. Here, we will simply add two parameters to change the symbols: pch which defines the plotting symbol and col which defines the symbol color. We also add a x and y label:

plot(sl,sw, pch=16, col=2,
     xlab='Petal Length', ylab='Sepal Length')

Rather than using the same color for all points, we can use a vector defining a color for each individual point. Here, we do this by using the vector sp to color each point according to the species, which shows a clear seperation of morphology for the three species:

plot(sl,sw, pch=16, col=sp,
     xlab='Petal Length', ylab='Sepal Length')

Line plots

Line plots are a good way to visualize data that forms a natural series, e.g. time series. We will illustrate how to do this in R using the Mauna Loa monthly CO₂ data in the file co2_mm_mlo.csv. Get this file and read it into R using read.csv(), then use the names() function to see the column names. The column ‘decdate’ contains decimal dates (i.e. as a decimal between two years) and can be used for the x variable. We’ll start by plotting the interpolated CO₂ values against these dates:

plot(co2$decdate, co2$interpolated, type='l')

As before, we can use the ‘xlab’ and ‘ylab’ parameters to improve the axis labels, and we can change the color (‘col’) and thickness (‘lwd’) of the line:

plot(co2$decdate, co2$interpolated, type='l', 
     xlab="Date", ylab="ppm", main="Mauna Loa CO2 record",
     lwd=2, col="red")

We can add a second series to the plot using the lines() function. Here we’ll add the trend line from the file to the current line:

lines(co2$decdate, co2$trend, lwd=2, col=3)

A similar function (points()) can be used to add points to an existing plot.

Spatial Objects

The package sp defines a set of spatial objects for R, including point, line and polygon vectors and gridded/pixel raster data.

The easiest way to get spatial data into R is to export it from a GIS package in a standard vector or raster format. Shapefiles are one of the most commonly used formats for vector data, and the maptools package includes a function (readShapeSpatial()) for reading these directly into R as spatial objects.

SpatialPoint* Objects

The zip file UT_WI2.zip contains a shapefile with water isotope data recorded from a set of locations in Utah. In order to read this in to R, we first load the required packages, then read in the shapefile data:

library(maptools)

## Loading required package: sp

## Checking rgeos availability: TRUE

ut_wi = readShapeSpatial("./UT_WI2/ut_wi2.shp")
class(ut_wi)
str(ut_wi, max=2)

The data is read into a SpatialPointsDataFrame, and the str() command shows that this has contains a slot called ’data, which is the data frame (the attribute table) and other slots with information about the coordinates and projection. There are some standard functions to obtain basic data and metadata about any Spatial* object:

bbox(ut_wi)
coordinates(ut_wi)
proj4string(ut_wi)

The last of these simply returns a missing value ‘NA’ for the projection. As we know this data is in a longitude/latitude based on the WGS84 ellipsoid, we can add this information as a coordinate reference system with the function CRS(). Adding this information will allow you to reproject the data to match other datasets (an example of this is provided later):

proj4string(ut_wi)

## [1] NA

proj4string(ut_wi) <- CRS("+proj=longlat +ellps=WGS84")
proj4string(ut_wi)

## [1] "+proj=longlat +ellps=WGS84"

We can access the data in the attribute table in the same way that we use a data frame. To find the names of the variables in this table type:

names(ut_wi)

For example, to get the d18O values, type:

ut_wi$d18O

And we can calculate the usual statistics or make descriptive plots:

summary(ut_wi$d18O)
hist(ut_wi$d18O, breaks=20)

Using the plot() function provides a simple location plot of the data:

plot(ut_wi)

R’s conditional selection can be used to cut out various parts of the data object. For example, the name of the group who provided the data is held in the column ‘WI_Analy_2’. We can use this to create a new spatial point object containing only these data, then make a plot showing where these locations are compared to the full dataset:

ut_wi_spatial = ut_wi[ut_wi$WI_Analy_2 == "SPATIAL", ]
plot(ut_wi, axes=TRUE)
points(ut_wi_spatial, col="red")

The sp package has a standard spatial plotting function spplot(), which can be used to make basic shaded plots. Note that the syntax for selecting which variable to plot is a little different here. This is because we want to use the full spatial object coordinates and select out a single variable:

spplot(ut_wi["d18O"], edge.col="black")

Or a categorical variable

spplot(ut_wi["WI_Analy_2"], edge.col="black")

This is only generally useful for making quick visualizations - we’ll look at alternative ways to plot spatial objects later.

SpatialPolygon* Objects

Polygons can be created using a combination of the Polygon() function and the SpatialPolygon() function, but the easiest way to get polygon data into R is, again, to export it from a GIS package as a Shapefile. The zip file utahcounty.zip contains a shapefile with county boundaries for Utah as polygons. Unzip this in your current working directory, then read in the data as follows:

ut_cnty <- readShapeSpatial("utahcounty/utahcounty.shp")
proj4string(ut_cnty) <- CRS("+proj=longlat +ellps=WGS84")
class(ut_cnty)

## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

This creates a SpatialPolygonDataFrame — a polygon object with spatial attributes and an associated data frame. We can see the names of the variables in the attribute table as follows, and extract the list of county names:

names(ut_cnty)
ut_cnty$NAME

We can again make a simple plot of the ut_tract boundaries. We can add the location of the isotope samples by using the add=TRUE parameter:

plot(ut_cnty)
plot(ut_wi, col=2, pch=16, add=TRUE)

SpatialGrid* Objects

As with the other types of objects, SpatialGrid objects exist with and without associated data frames. The first type (SpatialGridDataFrames) are used for storing raster data (remote sensed images, gridded climate fields, etc). The second type are less useful as they contain no data, only the grid topology.

The maptools package includes a function (readAsciiGrid()) for reading raster files in the ESRI ArcInfo format. The file predkrig2.asc contains a raster data set with interpolated values of winter season H isotope ratios of precipitation from IsoMAP. The file can be read in as follows:

win_d2h <- readAsciiGrid("predkrig2.asc")
proj4string(win_d2h) <- CRS("+proj=longlat +ellps=WGS84")
class(win_d2h)
str(win_d2h)
spplot(win_d2h)

Note that the data is given a variable name taken from the file name (predkrig.asc). We can change this to something more meaningful:

names(win_d2h) <- "d2H"

Visualizing spatial data

SpatialPointDataFrame Objects

The basic plotting function provided with the sp package, spplot() provides a simple way to map out data with a simple color scale (e.g. spplot(ut_wi["d18O"])). However, it is difficult to make modifications to this. The following examples demonstrate alternative methods for mapping. These require two extra packages to be installed: classInt (provides methods for selecting data intervals for colors) and RColorBrewer (provides better color scales).

This takes several steps, and is easier to copy this to a script and run it from there, rather than trying to type all this into the R console. The steps are as follows:

Define the number of color intervals (nclr)
Define the variable to be plotted (‘d180’ in ‘ut_wi’)
Divide the data range into a set of intervals based on quantiles (try changing this to pretty or sd)
Make up a color palette and store this in plotclr, using the ‘Yellow/Orange/Red’ color range - use the function display.brewer.all() for other options. (See the ColorBrewer website for more examples)
Assign a color to each point based on the interval it falls in and store this in ‘colcode’
Add a plot with points, colored by d18O values
Add the county boundaries
Add a title
Add a legend with the class intervals and associated color.

For any of these examples, try changing: the number of classes, the color palette and the method of choosing intervals.

library(classInt)
library(RColorBrewer)

nclr <- 9
plotvar <- ut_wi$d18O
class <- classIntervals(plotvar, nclr, style = "quantile", dataPrecision = 2)
plotclr <- brewer.pal(nclr, "YlOrRd") 
colcode <- findColours(class, plotclr, digits = 3)
plot(ut_wi, col = colcode, pch=16, axes = TRUE)
plot(ut_cnty, add=T)
title(main = "Utah d18O Data")
legend("topright", legend = names(attr(colcode,"table")), 
       fill = attr(colcode, "palette"), cex=0.8, ncol=1)

SpatialPolygonDataFrame Objects

Now do the same for the county dataset, plotting the current population size. We first log10 transform the data to emphasize some of the variation among low population counties. Note that we can simply add a new vector to the existing Spatial*DataFrame:

ut_cnty$log10POP <- log10(ut_cnty$POP_CURRES)
nclr <- 9
plotvar <- ut_cnty$log10POP
class <- classIntervals(plotvar, nclr, style = "pretty")
plotclr <- brewer.pal(nclr, "Blues") 
colcode <- findColours(class, plotclr, digits = 3)
plot(ut_cnty, col = colcode, border = "grey", axes = T)
title(main = "UT County Population")
legend("bottomleft", legend = names(attr(colcode,"table")), 
       fill = attr(colcode, "palette"), cex=0.8)

SpatialGridDataFrame Objects

And with the gridded dataset:

nclr <- 9
plotvar <- win_d2h$d2H
class <- classIntervals(plotvar, nclr, style = "quantile", dataPrecision = 0)
plotclr <- rev(brewer.pal(nclr, "YlGn"))
colcode <- findColours(class, plotclr, digits = 3)
image(win_d2h, "d2H", col=plotclr, axes=FALSE, breaks=class$brks)
box()
plot(ut_cnty, add=TRUE)
title(main = "Winter d2H isotope ratio")
legend("topleft", legend = names(attr(colcode,"table")), 
       fill = attr(colcode, "palette"), cex=0.8)

Geostatistics in R

R has several packages for spatial analysis including standard kriging methods, random fields, spatial regression and point process analysis. The Spatial taskview on the CRAN website lists the recommended packages for each of these.

We will use on of these gstat to carry out a quick variogram analysis and interpolation of the isotope data. This package, developed by Edzer Pebesma, has functions for variogram calculation, visualization and modeling, as well as methods for various types of kriging (simple, ordinary, cokriging) and simulation.

We’ll use this here to analyze part of the Utah Water Isotope dataset loaded earlier. As this contains a large number of samples derived from different sources, we’ll cut out a subset consisting of the tap water isotopes recorded by the SPATIAL group as follows:

ut_wi_spatial = ut_wi[ut_wi$Type == "Tap" & ut_wi$WI_Analy_2 == "SPATIAL", ]
dim(ut_wi_spatial)

The output from the dim() function shows that we have 1023 samples left in this subset. However, some of these represent repeat samples from the same location. This can present a problem for kriging as these samples will have a distance of zero which causes matrix singularities. Here, we simply remove duplicate samples (a better method would be to average these).

ut_wi_spatial = remove.duplicates(ut_wi_spatial)

The file predgrid.asc contains a set of grid points from Northern Utah, where most of the samples are located. We will now import this to use it for predictions (note that we need to set the projection information):

grid = read.asciigrid("predgrid.asc")
proj4string(grid) <- CRS("+proj=longlat +ellps=WGS84")

Next, we need to load the gstat package, and use it to calculate and visualize the sample variogram of the d18O data in the subset, using the variogram() function. As we have previously set the projection parameters for this dataset to a geographical projection, this automatically uses great circle distances in the variogram calculations, so the units on the x-axis are in kilometers.

library(gstat)
d18O.var <- variogram(d18O ~ 1, ut_wi_spatial)
plot(d18O.var)

Note that we can also set the cutoff (maximum range over which distances are calculated), and the width of each distance lag (see help(variogram) for more details). As the variogram appears to plateau at aroudn 40km, we might want to reduce the cutoff distance and the lag size:

d18O.var <- variogram(d18O ~ 1, ut_wi_spatial, cutoff=50, width=2.5)
plot(d18O.var)

Try different values to see the effect on the variogram.

The next step is to fit a variogram model to capture this spatial structure. To start, we fit a model by eye. We need to specify: a) the model form; b) the value of the nugget (the intercept with the Y axis); c) the model range (the distance at which the sample varigram becomes flat); d) the sill, the semivariance value (y-axis) of the range. Here we specify these as seperate variables, then use the vgm() function to build our model.

modSill = 0.8
modRange = 40
modNugget = 0.25
d18O.vgm1 = vgm(psill=modSill, "Sph", range=modRange, nugget=modNugget)
plot(d18O.var, d18O.vgm1)

The fit is only approximate to the data, so we can now use an iterative weighted OLS method (fit.variogram()) to fit the model variogram to the sample variogram.

d18O.vgm2 = fit.variogram(d18O.var, d18O.vgm1) 
plot(d18O.var, d18O.vgm2)

And we can output the parameters from the fitted model:

d18O.vgm2

##   model     psill    range
## 1   Nug 0.2875631  0.00000
## 2   Sph 0.6471206 28.17532

We now use our fitted model to predict d18O values. The krige() function performs spatial prediction, using ordinary kriging as a default. This requires a set of input: - A model formula specifying the variable to be predicted (this can be extended to include covariates, here we add ‘1’ as no covariates are used) - The Spatial* object with the observed values - A Spatial* object with the coordiantes to be used for prediction - The fitted varigram model - An optional parameter that limits the number of points to be used in predicting any given location

d18O.pred.ok <- krige(d18O ~ 1, ut_wi_spatial, grid, d18O.vgm2, 
                      nmin=4, nmax=40, maxdist=30)

## [using ordinary kriging]

Now we make a plot of the predictions. The output from the krige() function is a SpatialPixelsDataFrame, which has a slot called data, containing predictions in a variable called ‘var1.pred’ and prediction errors in ‘var1.var’.

nclr = 9
plotvar <- d18O.pred.ok@data[, "var1.pred"]
class <- classIntervals(plotvar, nclr, style = "quantile", dataPrecision = 2)
plotclr <- brewer.pal(nclr, "Blues")
colcode <- findColours(class, plotclr, digits = 3)
image(d18O.pred.ok, "var1.pred", col=plotclr, axes=T, breaks=class$brks)
title(main = "Utah Tap Water d18O (Ordinary Kriging)")
plot(ut_cnty, add=TRUE)
plot(ut_wi_spatial, add=TRUE)
legend("bottomleft", legend = names(attr(colcode, "table")), 
       fill = attr(colcode, "palette"), cex = 0.8, bg="white")

Plot the prediction errors:

nclr = 9
plotvar <- d18O.pred.ok@data[, "var1.var"]
class <- classIntervals(plotvar, nclr, style = "quantile", dataPrecision = 2)
plotclr <- brewer.pal(nclr, "YlOrRd")
colcode <- findColours(class, plotclr, digits = 3)
image(d18O.pred.ok, "var1.var", col=plotclr, axes=T, breaks=class$brks)
title(main = "Utah Tap Water d18O (Ordinary Kriging Error)")
plot(ut_cnty, add=T)
legend("bottomleft", legend = names(attr(colcode, "table")), 
       fill = attr(colcode, "palette"), cex = 0.8, bg="white")

To assess the performance of our kriging model, we use a $n$-fold cross-validation. This splits the data into $n$ subsets, then iteratively predicts each subset from the other $n-1$ sets. The krige.cv() function performs the cross-validation: this takes the same arguments as the krige() function, but we leave out the object with coordinates for new predictions, and specify nfold, the number of subsets to be used.

d18O.cv.ok <- krige.cv(d18O ~ 1, ut_wi_spatial, d18O.vgm2, nmax=40, nfold=5)

Once run, we calculate two statistics from this: the root mean squared error of prediction (RMSEP) and the R$^2_P$ of prediction:

## RMSEP
sqrt(mean(d18O.cv.ok$residual^2))

## [1] 0.8145687

##R2P
cor(d18O.cv.ok$observed, d18O.cv.ok$var1.pred)^2

## [1] 0.2800463

Map projections

R has functions for reprojecting spatial data using the PROJ4 libraries, and functions from the rgdal package. Install this package now, then try the following code to reproject the isotope location data and Utah tracts into an equal area projection.

library(rgdal)
aea.proj <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 
  +lon_0=-110 +ellps=GRS80 +datum=NAD83 +units=m"
ut_wi.proj <- spTransform(ut_wi, CRS(aea.proj))
ut_cnty.proj <- spTransform(ut_cnty, CRS(aea.proj))

In this code, we read in the data from shape files (points and polygons), then assign a geographical projection (longitude/latitude) to both objects using the CRS() function. In the next line, we create a projection object that contains the parameters for the new projection, as key/value couples. For example `+proj=aea’ means set the map projection to Albers Equal Area. Other parameters include:

The name of the projection (+proj)
The two standard parallels (+lat_1 +lat_2)
The coordinates of the projection center (+lon_0 +lat_0)
The ellipsoid (+GRS80)
The units of distance (+units)

Finally, we use the spTransform() function to do the reprojection. This requires a Spatial* object, and our projection system parameters. We use the CRS() function to tell R that this is a coordinate reference system.

If you now compare the coordinates of the projected and unprojected objects, you will see the difference:

summary(ut_wi)
summary(ut_wi.proj)
par(mfrow=c(1,2))
plot(ut_cnty)
plot(ut_wi, add=TRUE)
plot(ut_cnty.proj)
plot(ut_wi.proj, add=TRUE)

There are many other projections available in the PROJ4 library, and details can be found on the PROJ4 website

R and KML files

R can write out KML files for use with Google Earth. A few things to note here:

The function is kmlPoints() from the maptools package (there are functions for KML polygons and lines)
The function requires a Spatial* object (here a SpatialPointsDataFrame), and the name of an output file
Optionally we can provide a layer name (kmlname) and a vector of names for each point (name)
The optional parameter description associates some descriptive text with each point, including HTML tags. Here we create a series of descriptions that include the January and July temperatures, together with text labels. Note the use of the paste() function to glue together different text strings (which can include numeric vectors)

description = paste("<b>d180:</b>", ut_wi$d18O, "<br><b>d2H:</b>", ut_wi$d2H)
kmlPoints(ut_wi,"ut_wi.kml", kmlname="Utah Isotope Data",name=ut_wi$Sample_ID, 
          description=description)

You should now see a file ut_wi.kml in your working directory. This can be opened with Google Earth to display your data.

Creating your own spatial objects

Read in the WNA Climate dataset from the csv file WNAclimate.csv, and use the names() function to see the columns in the dataframe:

wnaclim <- read.csv("WNAclimate.csv")
names(wnaclim)

## [1] "WNASEQ"   "LONDD"    "LATDD"    "ELEV"     "totsnopt" "annp"    
## [7] "Jan_Tmp"  "Jul_Tmp"

We now create a simple SpatialPoint object of the location of the sites in the dataframe using the SpatialPoints() function. Note that we first create a coordinate reference system for the data (here, simple longitude/latitude on a WGS84 sphere).

llCRS <- CRS("+proj=longlat +ellps=WGS84")
wnaclim.sp <- SpatialPoints(cbind(wnaclim$LONDD, wnaclim$LATDD), proj4string=llCRS)
summary(wnaclim.sp)

The original dataframe also contains various other parameters (elevation, climate, site name) in columns that we can associated with the SpatialPoints, by creating a SpatialPointsDataFrame.

wnaclim.spdf <- SpatialPointsDataFrame(cbind(wnaclim$LONDD, wnaclim$LATDD), 
                                       wnaclim, proj4string=llCRS)
summary(wnaclim.spdf)

Now we have a spatial object with associated values, we can use the standard spatial plotting function spplot() to make some basic plots

spplot(wnaclim.spdf["Jan_Tmp"])
spplot(wnaclim.spdf["Jul_Tmp"])

Better plots with ggmap

This is an optional section. The package ggmap is an extension of the ggplot2 library, which produces more advanced graphics in R. ggmap allows you to produce simple maps using Google Maps API (and other APIs). Start by loading the library:

library(ggmap)

Now, to make a map of the University of Utah, run the following commands:

map = get_map(location="Salt Lake City", zoom=10, 
              source="google", maptype="satellite")
ggmap(map)

In the first function, we used four options:

location: this should be a location defined by 1) a search engine string (used here); 2) a longitude/latitude pair; 3) a bounding box (see below)
zoom: the level of zoom (0 = world). This may need some trial and error.
source: This is an optional parameter that defines the image source. By default this is set to Google Maps. Other options include Stamen maps (stamen) and Open Street Maps (osm)
maptype: the type of image. Options include terrain',satellite’, `roadmap’, etc.

Try changing these to make your own maps of a different location. One neat thing to try is using Stamen maps as a source (stamen) and watercolor as a map type.

We can also plot data onto a map. The file WNAclimate.csv that we read in before contains climate information for a set of locations across Western North America. We can use this directly to map January temperatures (‘Jan_Tmp’) as follows. We start by getting the limits of a bounding box that surrounds the data, then use get_map() to get the base map, which is then stored as an object. We then add points using the command geom_points(). This is a ggplot2 function, and it plots in a slightly different way from the base R functions. It requires an ‘aesthetic’ which defines the x and y coordinates and the variable used for colors. We then add a color scale, and display the resulting map object.

xmin = min(wnaclim$LONDD)
xmax = max(wnaclim$LONDD)
ymin = min(wnaclim$LATDD)
ymax = max(wnaclim$LATDD)

map = get_map(location=c(xmin,ymin,xmax,ymax))
wnamap = ggmap(map, maptype="terrain")
wnamap = wnamap + geom_point(aes(x=LONDD,y=LATDD,color=Jan_Tmp), 
                    size=3, data=wnaclim)
wnamap = wnamap + scale_color_gradient2(midpoint=0,
                        low="blue", mid="white", high="red" )
wnamap

Finally, we will replot the results of the kriging with gstat, using ggmap to add a background reference. This requires a number of steps. First, we create a data frame with the output from the kriging, and the coordinates of each grid. We make a second dataframe with the sample locations

crds = coordinates(d18O.pred.ok)
d18O.df <- data.frame(lon=crds[,1], lat=crds[,2])
d18O.df$pred <- slot(d18O.pred.ok, "data")$var1.pred

crds = coordinates(ut_wi_spatial)
d18O.pnts <- data.frame(lon=crds[,1], lat=crds[,2])
d18O.pnts$d18O <- ut_wi_spatial$d18O

Second, we get a base map using the limits of the coordinates:

map.in <- get_map(location = c(min(crds[,1]),
                               min(crds[,2]),
                               max(crds[,1]),
                               max(crds[,2])),
                  source = "stamen", maptype="toner-lite")

Finally, we plot the results. There is no way to plot the output directly, but we can use a binning function to turn the output into a gridded surface. This uses the function stat_summary_2d(), here to get the median value in 0.025x0.025 degrees bins (you can change this easily to get higher or lower resolution). We use an alpha of 0.5 for transparency, and use a ColorBrewer palette (YlOrRd) to color:

x = ggmap(map.in) 
x = x + stat_summary2d(data=d18O.df, aes(x = lon, y = lat,z = pred),
                       fun = median, 
                       binwidth = c(.025, .025),
                       alpha = 0.5)
x = x + scale_fill_gradientn(name = "Median",
                       colours = brewer.pal(5, "YlOrRd"),
                       space = "Lab")
x

Finally, we add the locations of the samples that were used in interpolation.

x = x + geom_point(data=d18O.pnts, aes(x = lon, y = lat))          
x