The data we will use are a common dataset in R for learning geostatistical analysis. They are a dataset of heavy metal concentrations along a section of the Meuse River. Flooding of the river over time has left deposits of heavy metals along riverside farmland. It would be helpful to map these concentrations out.
library(lattice) library(gstat) library(sp)
## Warning: package 'sp' was built under R version 3.0.2
load(system.file("data", "meuse.rda", package = "sp")) # Find out about these data names(meuse)
##  "x" "y" "cadmium" "copper" "lead" "zinc" "elev" ##  "dist" "om" "ffreq" "soil" "lime" "landuse" "dist.m"
help(meuse) # The key variable of interest will be zinc. So plot a histogram hist(meuse$zinc)
# Plot with ggplot library(ggplot2) library(scales) # Create a categorical variable meuse$zinc_cat <- cut(meuse$zinc, breaks = c(0, 200, 400, 800, 1200, 2000)) zinc.plot <- ggplot(aes(x = x, y = y), data = meuse) # Initialize the plot, with data and axes. zinc.plot <- zinc.plot + geom_point(aes(color = zinc_cat)) # Tell it to plot points, with a colour for zinc_cat zinc.plot <- zinc.plot + coord_equal() #Tell it the scaling of the x and y coordinates are equal zinc.plot <- zinc.plot + scale_color_brewer(palette = "YlGnBu") zinc.plot # Plot!
There is an obvious trend in the data. These data are in a bend in the Meuse River, and the zinc concentrations are higher where it floods regularly. We will eventually be kriging these data, which assumes that the data Gaussian. The zinc concentrations clearly aren't Gaussian, but the log transformed data are more Gaussian. So we will use a log Gaussian transformation here.
Let's look at a variogram of the data.
vgm <- variogram(log(zinc) ~ 1, locations = ~x + y, data = meuse) vgm # Take a look at the data that are returned by the variogram function
## np dist gamma dir.hor dir.ver id ## 1 57 79.29 0.1234 0 0 var1 ## 2 299 163.97 0.2162 0 0 var1 ## 3 419 267.36 0.3028 0 0 var1 ## 4 457 372.74 0.4121 0 0 var1 ## 5 547 478.48 0.4634 0 0 var1 ## 6 533 585.34 0.5647 0 0 var1 ## 7 574 693.15 0.5690 0 0 var1 ## 8 564 796.18 0.6187 0 0 var1 ## 9 589 903.15 0.6471 0 0 var1 ## 10 543 1011.29 0.6916 0 0 var1 ## 11 500 1117.86 0.7034 0 0 var1 ## 12 477 1221.33 0.6039 0 0 var1 ## 13 452 1329.16 0.6517 0 0 var1 ## 14 457 1437.26 0.5665 0 0 var1 ## 15 415 1543.20 0.5748 0 0 var1
Note: One thing I do not like about the gstat is the use of “plot”. In R, “plot” is a regular graphics function, but here, it uses a different graphics package called “lattice”. There are some things you can do with regular plots you can not do with lattice plots, and vice versa. Even though this package calls is “plot,” it is actually a lattice function. Very confusing if you ask me. Later on, we'll use ggplot to plot it.
Now, make a plot of the variogram cloud. These are all the values \( 1/2 (z_i-z_j)^2 \). The empirical variogram is a smooth of these data.
meuse.sp <- meuse coordinates(meuse.sp) <- ~x + y # plot the variogram cloud vgm.cloud <- variogram(log(zinc) ~ 1, meuse.sp, cloud = TRUE) plot(vgm.cloud)
The variogram cloud is a little useful for finding outliers. I honestly don't think there are any outliers in these data, but it doesn't hurt to see how we might do this in R. We can use the basic plot function, and its identify function to label points.
# Make an interactive plot to identify any outliers # plot(vgm.cloud$dist,vgm.cloud$gamma) # vgm.out<-identify(vgm.cloud$dist,vgm.cloud$gamma) vgm.cloud[vgm.out,] # # not that each outlier is not a single observation, but a pair of # observations. Here are the points I identified... meuse[c(54,55,59),]
These all happen to be the highest zinc values. I don't think they are outliers in the sense that they don't belong there.
If you look back at the map, you'll see that there is a general Southeast - Northwest trend to the data. This is a violation of the stationarity assumption, and we might want to account for spatial structure being different in different directions. We will fit a variogram in the four different directions: 0, 45, 90, 135, east of north. First, I will turn the meuse data.frame into a SpatialPointsDataFrame. This will simplify the commands slightly, because it takes care of specifying the coordinates for us.
meuse.sp <- meuse coordinates(meuse.sp) <- ~x + y vgm.aniso <- variogram(log(zinc) ~ 1, meuse.sp, alpha = c(0, 45, 90, 135)) plot(vgm.aniso)
# Redo with ggplot zinc.aniso.plot <- ggplot(aes(x = dist, y = gamma), data = vgm.aniso) #Initialize zinc.aniso.plot <- zinc.aniso.plot + geom_point() # Tell it to plot points zinc.aniso.plot <- zinc.aniso.plot + facet_wrap(~dir.hor) # tell it to separate by dir.hor zinc.aniso.plot
# clean up the axes a bit. zinc.aniso.plot <- zinc.aniso.plot + scale_x_continuous(limits = c(0, 1500), expand = c(0, 0)) + scale_y_continuous(limits = c(0, 1.2), expand = c(0, 0)) zinc.aniso.plot
## Warning: Removed 1 rows containing missing values (geom_point). ## Warning: Removed 1 rows containing missing values (geom_point). ## Warning: Removed 1 rows containing missing values (geom_point). ## Warning: Removed 1 rows containing missing values (geom_point).
Spatial correlation seems to be strongest in the 45 degree direction, and weakest in the 135 degress direction (things stay similar for longer if you move in the 45 degree direction). I will fit the model in the 45 degree direction.
#Variogram Model Fitting We can try to fit a variogram model to this. The tried and true method is to just eyeball the nugget, sill and range parameters from the empirical variogram. I have chosen a nugget = .05, partial sill = 0.42 and a range = 1100. The vgm function is used for specifying variogram models.
vgm.45 <- subset(vgm.aniso, vgm.aniso$dir.hor == 45) # Subset the variogram data for just this direction plot(vgm.45)
# help(vgm) eye.sph <- vgm(0.42, "Sph", 1100, 0.05) plot(vgm.45, eye.sph)
Not too bad. We might try an automatic regression procedure to fit it. This might be more objective.
fit.sph <- fit.variogram(vgm.45, vgm(0.42, "Sph", 1200, 0.05)) fit.sph
## model psill range ## 1 Nug 0.03753 0 ## 2 Sph 0.42534 1088
The parameters are very similar. We can try other models as well. I will fit spherical, exponential, matern, and gaussian models to the data. According to the online gstat manual, the effective ranges for these models are a, 3a, sqrt(3a), and 4a for a Matern(kappa=1) model.
# The next lines no longer work. I wonder what happened to them... I'll # leave them in in case they come back in later versions of gstat vgm.show() # # Show different variogram models vgm.show(kappa.range=c(.25,.5,1,2,10)) # #Show different Matern Models fit.exp <- fit.variogram(vgm.45, vgm(0.42, "Exp", 1100/3, 0.05)) fit.mat.025 <- fit.variogram(vgm.45, vgm(0.42, "Mat", 1100, 0.05, kappa = 0.25)) fit.mat.100 <- fit.variogram(vgm.45, vgm(0.42, "Mat", 1100/4, 0.05, kappa = 1)) fit.gau <- fit.variogram(vgm.45, vgm(0.42, "Gau", 1100/sqrt(3), 0.05)) plot(vgm.45, fit.exp)
We have a variogram model we are happy with, so let's move on to kriging with it…
load(system.file("data", "meuse.grid.rda", package = "sp")) meuse.spgrid <- meuse.grid coordinates(meuse.spgrid) <- ~x + y pred.iso <- krige(log(zinc) ~ 1, ~x + y, meuse, meuse.grid, model = fit.sph)
## [using ordinary kriging]
##  "x" "y" "var1.pred" "var1.var"
You will see x and y coordinates for the grid, as well as kriging predictions and standard error estimates. We will plot these last two.
pred.plot <- ggplot(aes(x = x, y = y), data = pred.iso) pred.plot <- pred.plot + geom_tile(aes(fill = var1.pred)) pred.plot <- pred.plot + scale_fill_gradient(low = "yellow", high = "blue") pred.plot + coord_equal()
var.plot <- ggplot(aes(x = x, y = y), data = pred.iso) var.plot <- var.plot + geom_tile(aes(fill = var1.var)) var.plot <- var.plot + scale_fill_gradient(low = "blue", high = "yellow") var.plot <- var.plot + geom_point(data = meuse) #Overlay the data locations var.plot + coord_equal()
We will now repeat with an anisotropic variogram. First, I will fit an anisotropic variogram for you. But after that, you should be able to recalculate the variogram, and recalculate the prediction maps.
plot(vgm.aniso) # Plot the empirical variogram
eye.sph.aniso <- vgm(0.42, "Sph", 1200, 0.05, anis = c(45, 1/3)) #Eyeball a model # Note the angle and ratio that are needed for an anisotropic model plot(vgm.aniso, eye.sph.aniso)
# Now let R try to fit it through regression fit.sph.aniso <- fit.variogram(vgm.aniso, model = eye.sph.aniso) plot(vgm.aniso, fit.sph.aniso)
With this fit variogram model, you should be able to recalculate the kriging model and replot the prediction and variance map. In the next homework, I will ask you to explain the differences between the isotropic and anisotropic models for these predictions and variances maps.
We will now fit a regression kriging model to the meuse data. The dist from the river is appropriate as a covariate. Also, we have this as a raster layer for the entire study region.
First, it is good to verify that distance is an appropriate predictor of log zinc.
We'll also check for anisotropy. The zinc is anisotropic, but maybe that's completely explained by elevation differences. We should check.
summary(lm(log(zinc) ~ dist, data = meuse))
## ## Call: ## lm(formula = log(zinc) ~ dist, data = meuse) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.1257 -0.3644 -0.0016 0.3193 1.6777 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.5338 0.0617 105.9 <2e-16 *** ## dist -2.6999 0.1987 -13.6 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.488 on 153 degrees of freedom ## Multiple R-squared: 0.547, Adjusted R-squared: 0.544 ## F-statistic: 185 on 1 and 153 DF, p-value: <2e-16
# Then we should plot the variogram of the residuals, and see if there is # spatial structure. vgm <- variogram(log(zinc) ~ dist, locations = ~x + y, data = meuse) plot(vgm)
# Plot anisotropic variogram vgm.aniso <- variogram(log(zinc) ~ dist, meuse.sp, alpha = c(0, 45, 90, 135)) plot(vgm.aniso)
Maybe it's still anisotropic, but it's not clear to me. Any difference could just be noise. When in doubt, keep it simple.
fit.reg.sph <- fit.variogram(vgm, vgm(0.25, "Sph", 1500, 0.02)) # note, it's important that the regressor - dist - is present in both the # sample data (meuse) and in the prediction data (meuse.grid). pred.reg <- krige(log(zinc) ~ dist, ~x + y, meuse, meuse.grid, model = fit.reg.sph)
## [using universal kriging]
pred.plot <- ggplot(aes(x = x, y = y), data = pred.reg) pred.plot <- pred.plot + geom_tile(aes(fill = var1.pred)) pred.plot <- pred.plot + scale_fill_gradient(low = "yellow", high = "blue") pred.plot + coord_equal()
From here, you should be able to replot the variance map on your own, too.
#The Smoothing Effect of Kriging The kriging map is the best prediction (when the data are Gaussian! otherwise, it's just a good map, but not the best). But don't for a second, think the kriging map is a realistic interpretation of the true, underlying process. In particular, kriging is a “smoother,'' it loses texture from the original process. Think of the kriging prediction far from data; this prediction is just the mean. This is a flat map, completely devoid of texture. So kriging does not duplicate texture.
pred.reg$dist <- meuse.grid$dist plot(variogram(var1.pred ~ dist, ~x + y, data = pred.reg), fit.reg.sph, ylim = c(0, 0.3))
Compare the empirical variogram of the predictions with the empirical variogram of the data. You should see that they are quite different.
The best way to duplicate structure is through conditional simulation, as discussed in class. Here is how to do condition in R.
sim.reg <- krige(log(zinc) ~ dist, ~x + y, meuse, meuse.grid, model = fit.reg.sph, nsim = 9, nmax = 20)
## drawing 9 GLS realisations of beta... ## [using conditional Gaussian simulation]
# Convert from 'wide' format to 'long' format library(reshape)
## Loading required package: plyr ## ## Attaching package: 'reshape' ## ## The following object is masked from 'package:plyr': ## ## rename, round_any
sim.gdf <- melt(sim.reg, id.vars = c("x", "y"), variable_name = "sim") names(sim.gdf)
##  "x" "y" "sim" "value"
sim.plot <- ggplot(aes(x = x, y = y), data = sim.gdf) + facet_wrap(~sim) sim.plot <- sim.plot + geom_tile(aes(fill = value)) sim.plot <- sim.plot + scale_fill_gradient(low = "blue", high = "yellow") sim.plot + coord_equal()