Geostatistics

The data we will use are a common dataset in R for learning geostatistical analysis. They are a dataset of heavy metal concentrations along a section of the Meuse River. Flooding of the river over time has left deposits of heavy metals along riverside farmland. It would be helpful to map these concentrations out.

library(lattice)
library(gstat)
library(sp)

## Warning: package 'sp' was built under R version 3.0.2

load(system.file("data", "meuse.rda", package = "sp"))


# Find out about these data
names(meuse)

##  [1] "x"       "y"       "cadmium" "copper"  "lead"    "zinc"    "elev"   
##  [8] "dist"    "om"      "ffreq"   "soil"    "lime"    "landuse" "dist.m"

help(meuse)

# The key variable of interest will be zinc.  So plot a histogram
hist(meuse$zinc)

plot of chunk unnamed-chunk-1

hist(log(meuse$zinc))

plot of chunk unnamed-chunk-1

# Plot with ggplot
library(ggplot2)
library(scales)
# Create a categorical variable
meuse$zinc_cat <- cut(meuse$zinc, breaks = c(0, 200, 400, 800, 1200, 2000))
zinc.plot <- ggplot(aes(x = x, y = y), data = meuse)  # Initialize the plot, with data and axes.
zinc.plot <- zinc.plot + geom_point(aes(color = zinc_cat))  # Tell it to plot points, with a colour for zinc_cat
zinc.plot <- zinc.plot + coord_equal()  #Tell it the scaling of the x and y coordinates are equal
zinc.plot <- zinc.plot + scale_color_brewer(palette = "YlGnBu")
zinc.plot  # Plot!

plot of chunk unnamed-chunk-2

There is an obvious trend in the data. These data are in a bend in the Meuse River, and the zinc concentrations are higher where it floods regularly. We will eventually be kriging these data, which assumes that the data Gaussian. The zinc concentrations clearly aren't Gaussian, but the log transformed data are more Gaussian. So we will use a log Gaussian transformation here.

Let's look at a variogram of the data.

vgm <- variogram(log(zinc) ~ 1, locations = ~x + y, data = meuse)
vgm  # Take a look at the data that are returned by the variogram function

##     np    dist  gamma dir.hor dir.ver   id
## 1   57   79.29 0.1234       0       0 var1
## 2  299  163.97 0.2162       0       0 var1
## 3  419  267.36 0.3028       0       0 var1
## 4  457  372.74 0.4121       0       0 var1
## 5  547  478.48 0.4634       0       0 var1
## 6  533  585.34 0.5647       0       0 var1
## 7  574  693.15 0.5690       0       0 var1
## 8  564  796.18 0.6187       0       0 var1
## 9  589  903.15 0.6471       0       0 var1
## 10 543 1011.29 0.6916       0       0 var1
## 11 500 1117.86 0.7034       0       0 var1
## 12 477 1221.33 0.6039       0       0 var1
## 13 452 1329.16 0.6517       0       0 var1
## 14 457 1437.26 0.5665       0       0 var1
## 15 415 1543.20 0.5748       0       0 var1

plot(vgm)

plot of chunk unnamed-chunk-3

Note: One thing I do not like about the gstat is the use of “plot”. In R, “plot” is a regular graphics function, but here, it uses a different graphics package called “lattice”. There are some things you can do with regular plots you can not do with lattice plots, and vice versa. Even though this package calls is “plot,” it is actually a lattice function. Very confusing if you ask me. Later on, we'll use ggplot to plot it.

Now, make a plot of the variogram cloud. These are all the values \( 1/2 (z_i-z_j)^2 \). The empirical variogram is a smooth of these data.

meuse.sp <- meuse
coordinates(meuse.sp) <- ~x + y
# plot the variogram cloud
vgm.cloud <- variogram(log(zinc) ~ 1, meuse.sp, cloud = TRUE)
plot(vgm.cloud)

plot of chunk unnamed-chunk-4

The variogram cloud is a little useful for finding outliers. I honestly don't think there are any outliers in these data, but it doesn't hurt to see how we might do this in R. We can use the basic plot function, and its identify function to label points.

# Make an interactive plot to identify any outliers
# plot(vgm.cloud$dist,vgm.cloud$gamma)
# vgm.out<-identify(vgm.cloud$dist,vgm.cloud$gamma) vgm.cloud[vgm.out,] #
# not that each outlier is not a single observation, but a pair of
# observations.  Here are the points I identified... meuse[c(54,55,59),]

These all happen to be the highest zinc values. I don't think they are outliers in the sense that they don't belong there.

Anisotropy

If you look back at the map, you'll see that there is a general Southeast - Northwest trend to the data. This is a violation of the stationarity assumption, and we might want to account for spatial structure being different in different directions. We will fit a variogram in the four different directions: 0, 45, 90, 135, east of north. First, I will turn the meuse data.frame into a SpatialPointsDataFrame. This will simplify the commands slightly, because it takes care of specifying the coordinates for us.

meuse.sp <- meuse
coordinates(meuse.sp) <- ~x + y
vgm.aniso <- variogram(log(zinc) ~ 1, meuse.sp, alpha = c(0, 45, 90, 135))
plot(vgm.aniso)

plot of chunk unnamed-chunk-6


# Redo with ggplot
zinc.aniso.plot <- ggplot(aes(x = dist, y = gamma), data = vgm.aniso)  #Initialize
zinc.aniso.plot <- zinc.aniso.plot + geom_point()  # Tell it to plot points
zinc.aniso.plot <- zinc.aniso.plot + facet_wrap(~dir.hor)  # tell it to separate by dir.hor
zinc.aniso.plot

plot of chunk unnamed-chunk-6


# clean up the axes a bit.
zinc.aniso.plot <- zinc.aniso.plot + scale_x_continuous(limits = c(0, 1500), 
    expand = c(0, 0)) + scale_y_continuous(limits = c(0, 1.2), expand = c(0, 
    0))
zinc.aniso.plot

## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-6

Spatial correlation seems to be strongest in the 45 degree direction, and weakest in the 135 degress direction (things stay similar for longer if you move in the 45 degree direction). I will fit the model in the 45 degree direction.

#Variogram Model Fitting We can try to fit a variogram model to this. The tried and true method is to just eyeball the nugget, sill and range parameters from the empirical variogram. I have chosen a nugget = .05, partial sill = 0.42 and a range = 1100. The vgm function is used for specifying variogram models.

vgm.45 <- subset(vgm.aniso, vgm.aniso$dir.hor == 45)  # Subset the variogram data for just this direction
plot(vgm.45)

plot of chunk unnamed-chunk-7

# help(vgm)
eye.sph <- vgm(0.42, "Sph", 1100, 0.05)
plot(vgm.45, eye.sph)

plot of chunk unnamed-chunk-7

Not too bad. We might try an automatic regression procedure to fit it. This might be more objective.

fit.sph <- fit.variogram(vgm.45, vgm(0.42, "Sph", 1200, 0.05))
fit.sph

##   model   psill range
## 1   Nug 0.03753     0
## 2   Sph 0.42534  1088

plot(vgm.45, fit.sph)

plot of chunk unnamed-chunk-8

The parameters are very similar. We can try other models as well. I will fit spherical, exponential, matern, and gaussian models to the data. According to the online gstat manual, the effective ranges for these models are a, 3a, sqrt(3a), and 4a for a Matern(kappa=1) model.

# The next lines no longer work.  I wonder what happened to them...  I'll
# leave them in in case they come back in later versions of gstat vgm.show()
# # Show different variogram models vgm.show(kappa.range=c(.25,.5,1,2,10))
# #Show different Matern Models
fit.exp <- fit.variogram(vgm.45, vgm(0.42, "Exp", 1100/3, 0.05))
fit.mat.025 <- fit.variogram(vgm.45, vgm(0.42, "Mat", 1100, 0.05, kappa = 0.25))
fit.mat.100 <- fit.variogram(vgm.45, vgm(0.42, "Mat", 1100/4, 0.05, kappa = 1))
fit.gau <- fit.variogram(vgm.45, vgm(0.42, "Gau", 1100/sqrt(3), 0.05))

plot(vgm.45, fit.exp)

plot of chunk unnamed-chunk-9

plot(vgm.45, fit.mat.025)

plot of chunk unnamed-chunk-9

plot(vgm.45, fit.mat.100)

plot of chunk unnamed-chunk-9

plot(vgm.45, fit.gau)

plot of chunk unnamed-chunk-9

Kriging

We have a variogram model we are happy with, so let's move on to kriging with it…

load(system.file("data", "meuse.grid.rda", package = "sp"))

meuse.spgrid <- meuse.grid
coordinates(meuse.spgrid) <- ~x + y

pred.iso <- krige(log(zinc) ~ 1, ~x + y, meuse, meuse.grid, model = fit.sph)

## [using ordinary kriging]

names(pred.iso)

## [1] "x"         "y"         "var1.pred" "var1.var"

You will see x and y coordinates for the grid, as well as kriging predictions and standard error estimates. We will plot these last two.

pred.plot <- ggplot(aes(x = x, y = y), data = pred.iso)
pred.plot <- pred.plot + geom_tile(aes(fill = var1.pred))
pred.plot <- pred.plot + scale_fill_gradient(low = "yellow", high = "blue")
pred.plot + coord_equal()

plot of chunk unnamed-chunk-11


var.plot <- ggplot(aes(x = x, y = y), data = pred.iso)
var.plot <- var.plot + geom_tile(aes(fill = var1.var))
var.plot <- var.plot + scale_fill_gradient(low = "blue", high = "yellow")
var.plot <- var.plot + geom_point(data = meuse)  #Overlay the data locations
var.plot + coord_equal()

plot of chunk unnamed-chunk-11

We will now repeat with an anisotropic variogram. First, I will fit an anisotropic variogram for you. But after that, you should be able to recalculate the variogram, and recalculate the prediction maps.

plot(vgm.aniso)  # Plot the empirical variogram

plot of chunk unnamed-chunk-12

eye.sph.aniso <- vgm(0.42, "Sph", 1200, 0.05, anis = c(45, 1/3))  #Eyeball a model
# Note the angle and ratio that are needed for an anisotropic model
plot(vgm.aniso, eye.sph.aniso)

plot of chunk unnamed-chunk-12

# Now let R try to fit it through regression
fit.sph.aniso <- fit.variogram(vgm.aniso, model = eye.sph.aniso)
plot(vgm.aniso, fit.sph.aniso)

plot of chunk unnamed-chunk-12

With this fit variogram model, you should be able to recalculate the kriging model and replot the prediction and variance map. In the next homework, I will ask you to explain the differences between the isotropic and anisotropic models for these predictions and variances maps.

#Regression Kriging

We will now fit a regression kriging model to the meuse data. The dist from the river is appropriate as a covariate. Also, we have this as a raster layer for the entire study region.

First, it is good to verify that distance is an appropriate predictor of log zinc.

We'll also check for anisotropy. The zinc is anisotropic, but maybe that's completely explained by elevation differences. We should check.

summary(lm(log(zinc) ~ dist, data = meuse))

## 
## Call:
## lm(formula = log(zinc) ~ dist, data = meuse)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1257 -0.3644 -0.0016  0.3193  1.6777 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   6.5338     0.0617   105.9   <2e-16 ***
## dist         -2.6999     0.1987   -13.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.488 on 153 degrees of freedom
## Multiple R-squared:  0.547,  Adjusted R-squared:  0.544 
## F-statistic:  185 on 1 and 153 DF,  p-value: <2e-16

# Then we should plot the variogram of the residuals, and see if there is
# spatial structure.
vgm <- variogram(log(zinc) ~ dist, locations = ~x + y, data = meuse)
plot(vgm)

plot of chunk unnamed-chunk-13

# Plot anisotropic variogram
vgm.aniso <- variogram(log(zinc) ~ dist, meuse.sp, alpha = c(0, 45, 90, 135))
plot(vgm.aniso)

plot of chunk unnamed-chunk-13

Maybe it's still anisotropic, but it's not clear to me. Any difference could just be noise. When in doubt, keep it simple.

fit.reg.sph <- fit.variogram(vgm, vgm(0.25, "Sph", 1500, 0.02))
# note, it's important that the regressor - dist - is present in both the
# sample data (meuse) and in the prediction data (meuse.grid).
pred.reg <- krige(log(zinc) ~ dist, ~x + y, meuse, meuse.grid, model = fit.reg.sph)

## [using universal kriging]

pred.plot <- ggplot(aes(x = x, y = y), data = pred.reg)
pred.plot <- pred.plot + geom_tile(aes(fill = var1.pred))
pred.plot <- pred.plot + scale_fill_gradient(low = "yellow", high = "blue")
pred.plot + coord_equal()

plot of chunk unnamed-chunk-14

From here, you should be able to replot the variance map on your own, too.

#The Smoothing Effect of Kriging The kriging map is the best prediction (when the data are Gaussian! otherwise, it's just a good map, but not the best). But don't for a second, think the kriging map is a realistic interpretation of the true, underlying process. In particular, kriging is a “smoother,'' it loses texture from the original process. Think of the kriging prediction far from data; this prediction is just the mean. This is a flat map, completely devoid of texture. So kriging does not duplicate texture.

pred.reg$dist <- meuse.grid$dist
plot(variogram(var1.pred ~ dist, ~x + y, data = pred.reg), fit.reg.sph, ylim = c(0, 
    0.3))

plot of chunk unnamed-chunk-15

Compare the empirical variogram of the predictions with the empirical variogram of the data. You should see that they are quite different.

Conditional Simulation

The best way to duplicate structure is through conditional simulation, as discussed in class. Here is how to do condition in R.

sim.reg <- krige(log(zinc) ~ dist, ~x + y, meuse, meuse.grid, model = fit.reg.sph, 
    nsim = 9, nmax = 20)

## drawing 9 GLS realisations of beta...
## [using conditional Gaussian simulation]

# Convert from 'wide' format to 'long' format
library(reshape)

## Loading required package: plyr
## 
## Attaching package: 'reshape'
## 
## The following object is masked from 'package:plyr':
## 
##     rename, round_any

sim.gdf <- melt(sim.reg, id.vars = c("x", "y"), variable_name = "sim")
names(sim.gdf)

## [1] "x"     "y"     "sim"   "value"

sim.plot <- ggplot(aes(x = x, y = y), data = sim.gdf) + facet_wrap(~sim)
sim.plot <- sim.plot + geom_tile(aes(fill = value))
sim.plot <- sim.plot + scale_fill_gradient(low = "blue", high = "yellow")
sim.plot + coord_equal()

plot of chunk unnamed-chunk-16

Each of these simulations respects both (1) the actual data locations, and (2) the variogram we used to simulate from. We can check that…

# Now create the variogram of the first simulation
sim.reg$dist <- meuse.grid$dist
plot(variogram(sim1 ~ dist, ~x + y, data = sim.reg), fit.reg.sph, ylim = c(0, 
    0.3))

plot of chunk unnamed-chunk-17

# You should repeat for a few other simulations as well.

It is not uncommon for the sill (the variance) to be off. This is, after all, a simulation. But you should note that the variogram of the simulations is much closer to the variogram of the actual data than the kriging one. In particular, the variogram will match the spatial structure at close distances much better.

A little bit of extra coolness

It's possible to load these maps into google map with the plotGoogleMaps package. In order to do this, you have to specify the projection of the data, and reproject to Mercator (which is what google used).

The plotGoogleMaps library does the reprojection for you, but you still have to know what projection your data are in. I did that here for you. I did this a few years ago, and I now longer remember where I found this information out.

You can also do this with ggmap. Maybe Elizabeth Schneider can help you do that. She's the ggmap queen. But you'll have to reproject the data to lat/lon on your own. (using spTransform())

library(plotGoogleMaps)  # Load the library
library(RColorBrewer)
# Create a spatialGridDataFrame object...
meuse_sppred <- pred.reg  #First, just copy the data
coordinates(meuse_sppred) <- ~x + y  # Create a SpatialPoints DataFrame
gridded(meuse_sppred) <- TRUE  # Create a SpatialGridDataFrame
# Give a Projection... you would have to look this up for your data.
# spatialreference.org is a good place.
proj4string(meuse_sppred) <- CRS("+init=epsg:28992")  # Specify the projection.
# I no longer remember how I found out the projection of these data.
plotGoogleMaps(meuse_sppred, zcol = "var1.pred", file = "map.htm", colPalette = brewer.pal(5, 
    "Blues"))

This will create a file called map.htm. Open it up in your favorite browser. Voila! (hopefully)