Advanced Kriging

Indicator cokriging

We'll reuse the meuse data here.

The zinc data are quite nonnormal. This is a violation of kriging. There are at least three possible solutions:

This is what we will do now. Essentially, we'll want to krige each bin separately, to calculate the probability that zinc is in that bin. But each bin is not independent. Suppose data are in the 2nd bin. Then nearby data are more likely to be in the 2nd, or 1st or 3rd bin, than they are to be in the 8th bin. So, when we estimate each bin, we need to also look at the other bins as well. This is a co-kriging problem.

library(sp)
## Warning: package 'sp' was built under R version 3.0.2
library(gstat)
library(RColorBrewer)
load(system.file("data", "meuse.rda", package = "sp"))

# Create a SpatialPointsDataFrame Object from the data.frame
meuse.sp <- meuse  #Copy the data.  It's still a data.frame
coordinates(meuse.sp) <- ~x + y  # Now it's SpatialPointsDataFrame, with coordinates x and y
# Create a categorical variable and plot it
q <- quantile(meuse$zinc, seq(0.1, 0.9, 0.1))
# These are the actual values of the quantiles
q
##   10%   20%   30%   40%   50%   60%   70%   80%   90% 
## 152.8 186.8 207.2 246.4 326.0 439.6 589.8 737.2 986.4
# Plot the data in 5 bins
meuse.sp$zncat <- cut(meuse.sp$zinc, c(0, q[c(2, 4, 6, 8)], 2000))
spplot(meuse.sp, "zncat", col.regions = brewer.pal(5, "YlGnBu"))

plot of chunk unnamed-chunk-1

To do this cokriging, we'll create 9 separate datasets, one for each quantile. Example, one dataset will be 1-0 whether a zinc value is below the 10% mark. Another dataset will be 1-0 whether a zinc value is below the 20% mark. Etc.

Note, with ordinary kriging, we have to estimate the mean. But, how much of our map should be below the 10% mark? 10%, right? So we can do simple kriging here, because we know the mean.


meuse.i <- gstat(id = "zinc1", formula = I(zinc < q[1]) ~ 1, data = meuse.sp, 
    nmax = 7, beta = 0.1, set = list(order = 4, zero = 1e-05))
# The order=4 varible is set as per the instructions in the gstat manual.
# this tells gstat that each indicator is cumulative.  You can't be in the
# second category without also being in the first category.
meuse.i <- gstat(meuse.i, "zinc2", formula = I(zinc < q[2]) ~ 1, data = meuse.sp, 
    nmax = 7, beta = 0.2)
meuse.i <- gstat(meuse.i, "zinc3", formula = I(zinc < q[3]) ~ 1, data = meuse.sp, 
    nmax = 7, beta = 0.3)
meuse.i <- gstat(meuse.i, "zinc4", formula = I(zinc < q[4]) ~ 1, data = meuse.sp, 
    nmax = 7, beta = 0.4)
meuse.i <- gstat(meuse.i, "zinc5", formula = I(zinc < q[5]) ~ 1, data = meuse.sp, 
    nmax = 7, beta = 0.5)
meuse.i <- gstat(meuse.i, "zinc6", formula = I(zinc < q[6]) ~ 1, data = meuse.sp, 
    nmax = 7, beta = 0.6)
meuse.i <- gstat(meuse.i, "zinc7", formula = I(zinc < q[7]) ~ 1, data = meuse.sp, 
    nmax = 7, beta = 0.7)
meuse.i <- gstat(meuse.i, "zinc8", formula = I(zinc < q[8]) ~ 1, data = meuse.sp, 
    nmax = 7, beta = 0.8)
meuse.i <- gstat(meuse.i, "zinc9", formula = I(zinc < q[9]) ~ 1, data = meuse.sp, 
    nmax = 7, beta = 0.9)

Variogram fitting

We'll need to fit variograms to all these indicators…

# Create a semivariogram model with range equal 1200, and 'dummy' partial
# sill and nugget of 1.  We will fit these later.  'One size fits all'
meuse.i <- gstat(meuse.i, model = vgm(1, "Sph", 1000, 1), fill.all = T)

# Estimate the empiricalvariogram of each indicator
x <- variogram(meuse.i)
plot(x)

plot of chunk unnamed-chunk-3

x <- variogram(meuse.i, width = 50)
plot(x)

plot of chunk unnamed-chunk-3

The default estimation of bins size was pretty good. I was a little worried about the 9th quantile. So I tried a smaller width. It wasn't much more informative.

Variogram model fitting

We will fit a single semivariogram model to all these empirical varioagrams using a linear model of coregionalization (the fit.lmc() function). Note, this changes the nuggets and sills, but not the ranges. For a linear model of coreginalization, the range is a “one size fits all” with a cokriging model, and the model won't fit it. It will use whatever range you started off with. In this case, I started off with a good range. That wasn't by accident.

There is a line through these points. You may not see it online, because it blends in so well with the points.

meuse.fit = fit.lmc(x, meuse.i)

plot(x, model = meuse.fit)

plot of chunk unnamed-chunk-4

Kriging

We'll load in a layer with the points ot make the kriging predictions at. Normally, you'd probably create a grid of points somewhere, and the clip those points to a boundary layer for your study region. In this case, I didn't have a boundary layer of the study region, so we'll just use a grid of points that were provided as part of the R package for analysis of these data.

# Load in the grid of prediction locations
load(system.file("data", "meuse.grid.rda", package = "sp"))
meuse.spgrid <- meuse.grid
coordinates(meuse.spgrid) <- ~x + y

zk <- predict(meuse.fit, newdata = meuse.spgrid, indicators = TRUE)
## Linear Model of Coregionalization found. Good.
## [using simple cokriging]
library(gridExtra)
## Loading required package: grid
grid.arrange(spplot(zk, "zinc1.pred", at = c(0, 0.25, 0.6, 0.75, 1), col.regions = brewer.pal(4, 
    "YlGnBu")), spplot(zk, "zinc2.pred", at = c(0, 0.25, 0.6, 0.75, 1), col.regions = brewer.pal(4, 
    "YlGnBu")), spplot(zk, "zinc3.pred", at = c(0, 0.25, 0.6, 0.75, 1), col.regions = brewer.pal(4, 
    "YlGnBu")), spplot(zk, "zinc4.pred", at = c(0, 0.25, 0.6, 0.75, 1), col.regions = brewer.pal(4, 
    "YlGnBu")), spplot(zk, "zinc5.pred", at = c(0, 0.25, 0.6, 0.75, 1), col.regions = brewer.pal(4, 
    "YlGnBu")), spplot(zk, "zinc6.pred", at = c(0, 0.25, 0.6, 0.75, 1), col.regions = brewer.pal(4, 
    "YlGnBu")), spplot(zk, "zinc7.pred", at = c(0, 0.25, 0.6, 0.75, 1), col.regions = brewer.pal(4, 
    "YlGnBu")), spplot(zk, "zinc8.pred", at = c(0, 0.25, 0.6, 0.75, 1), col.regions = brewer.pal(4, 
    "YlGnBu")), spplot(zk, "zinc9.pred", at = c(0, 0.25, 0.6, 0.75, 1), col.regions = brewer.pal(4, 
    "YlGnBu")), ncol = 3)

plot of chunk unnamed-chunk-5

Conditional Simulation

# Simulate each indicator
z <- predict(meuse.fit, newdata = meuse.spgrid, nsim = 5, indicators = TRUE)
## Linear Model of Coregionalization found. Good.
## [using conditional indicator cosimulation]
# Look at the first 5 rows to see what it did.
z[1:5, ]
##        coordinates zinc1.sim1 zinc1.sim2 zinc1.sim3 zinc1.sim4 zinc1.sim5
## 1 (181200, 333700)          0          0          0          0          0
## 2 (181100, 333700)          0          0          0          0          0
## 3 (181200, 333700)          0          0          0          0          0
## 4 (181200, 333700)          0          0          0          0          0
## 5 (181100, 333700)          0          0          0          0          0
##   zinc2.sim1 zinc2.sim2 zinc2.sim3 zinc2.sim4 zinc2.sim5 zinc3.sim1
## 1          0          0          0          0          0          0
## 2          0          0          0          0          0          0
## 3          0          0          0          0          0          0
## 4          0          0          0          0          0          0
## 5          0          0          0          0          0          0
##   zinc3.sim2 zinc3.sim3 zinc3.sim4 zinc3.sim5 zinc4.sim1 zinc4.sim2
## 1          0          0          0          0          0          0
## 2          0          1          0          0          0          0
## 3          0          0          1          0          0          0
## 4          0          0          1          1          0          0
## 5          0          1          0          0          0          0
##   zinc4.sim3 zinc4.sim4 zinc4.sim5 zinc5.sim1 zinc5.sim2 zinc5.sim3
## 1          0          0          0          1          1          1
## 2          1          0          0          0          1          1
## 3          0          1          0          0          0          0
## 4          0          1          1          1          1          0
## 5          1          0          0          0          1          1
##   zinc5.sim4 zinc5.sim5 zinc6.sim1 zinc6.sim2 zinc6.sim3 zinc6.sim4
## 1          0          0          1          1          1          1
## 2          1          1          1          1          1          1
## 3          1          0          1          1          1          1
## 4          1          1          1          1          1          1
## 5          0          1          1          1          1          0
##   zinc6.sim5 zinc7.sim1 zinc7.sim2 zinc7.sim3 zinc7.sim4 zinc7.sim5
## 1          0          1          1          1          1          1
## 2          1          1          1          1          1          1
## 3          0          1          1          1          1          1
## 4          1          1          1          1          1          1
## 5          1          1          1          1          1          1
##   zinc8.sim1 zinc8.sim2 zinc8.sim3 zinc8.sim4 zinc8.sim5 zinc9.sim1
## 1          1          1          1          1          1          1
## 2          1          1          1          1          1          1
## 3          1          1          1          1          1          1
## 4          1          1          1          1          1          1
## 5          1          1          1          1          1          1
##   zinc9.sim2 zinc9.sim3 zinc9.sim4 zinc9.sim5
## 1          1          1          1          1
## 2          1          1          1          1
## 3          1          1          1          1
## 4          1          1          1          1
## 5          1          1          1          1

There are predictions for the 9 indictors. We want to turn that back into a categoricial variable. i.e., a variable with 1 if in the first quantile, 2 in the second, 3 in the third, etc.

z$sim1 <- z$zinc1.sim1 + z$zinc2.sim1 + z$zinc3.sim1 + z$zinc4.sim1 + z$zinc5.sim1 + 
    z$zinc6.sim1 + z$zinc7.sim1 + z$zinc8.sim1 + z$zinc9.sim1
z$sim1 <- 9 - z$sim1

# Write a loop to repeat that for simulations 2 through 5...
for (i in 2:5) {
    varname <- paste("sim", i, sep = "")
    sim.cols <- grepl(varname, names(z@data))
    z@data[, varname] <- apply(z@data[, sim.cols], 1, sum)
    z@data[, varname] <- 9 - z@data[, varname]
}
# Turn in to a Spatial Pixels Data Frame
z <- as(z, "SpatialPixelsDataFrame")
spplot(z, "sim1", at = seq(-0.5, 9.5, by = 2), col.regions = brewer.pal(5, "YlGnBu"))

plot of chunk unnamed-chunk-7

library(gridExtra)
grid.arrange(spplot(z, "sim1", at = seq(-0.5, 9.5, by = 2), col.regions = brewer.pal(5, 
    "YlGnBu")), spplot(z, "sim2", at = seq(-0.5, 9.5, by = 2), col.regions = brewer.pal(5, 
    "YlGnBu")), spplot(z, "sim3", at = seq(-0.5, 9.5, by = 2), col.regions = brewer.pal(5, 
    "YlGnBu")), spplot(z, "sim4", at = seq(-0.5, 9.5, by = 2), col.regions = brewer.pal(5, 
    "YlGnBu")), spplot(z, "sim5", at = seq(-0.5, 9.5, by = 2), col.regions = brewer.pal(5, 
    "YlGnBu")), ncol = 3)

plot of chunk unnamed-chunk-8

That's indicator conditional simulation. It doesn't assume normality, but we've lost information within a class. Sometimes it works nicely, though. One of the neat things about indicator simulation, is that the variogram is allowed to have different nugget at the high and low ends. We often see that the pattern of spatial continuity is different across the range of a variable. This is not possible to model with either Gaussian simultion or log Gaussian simulation.