Inverse Distance Weighting

(Let’s begin with the inferior method…)

Here is the interpolation for lead using IDW

#install.packages("automap")
library(sp)
library(gstat)
library(automap)
data(meuse)

coordinates(meuse) <- ~x+y
proj4string(meuse) <- CRS("+init=epsg:28992")
spplot(meuse,"lead",colorkey=TRUE)

PbVar <- autofitVariogram(lead~1,meuse)
summary(PbVar)
## Experimental variogram:
##      np       dist     gamma dir.hor dir.ver   id
## 1    17   59.33470  6408.794       0       0 var1
## 2    36   86.01449  2771.653       0       0 var1
## 3   114  131.02870  5321.110       0       0 var1
## 4   149  176.18845  8594.403       0       0 var1
## 5   184  226.75652  5298.913       0       0 var1
## 6   711  337.60359  9936.921       0       0 var1
## 7   830  502.04773 11819.392       0       0 var1
## 8  1349  713.21485 14507.242       0       0 var1
## 9  1314  961.27179 16061.474       0       0 var1
## 10 1139 1213.41157 15676.574       0       0 var1
## 11 1355 1506.55052 13474.368       0       0 var1
## 
## Fitted variogram model:
##   model     psill   range kappa
## 1   Nug  4581.722   0.000     0
## 2   Ste 10863.619 462.919     5
## Sums of squares betw. var. model and sample var.[1] 82081.25
data(meuse.grid)
coordinates(meuse.grid) <- ~x+y
proj4string(meuse.grid) <- CRS("+init=epsg:28992")
meuse.grid <- as(meuse.grid,"SpatialPixelsDataFrame")
plot(meuse.grid)
points(meuse,pch=20)

Here are the IDWs with powers of 1, 2 and 3:

PbIDW1 <- idw(formula=lead~1,locations=meuse,newdata=meuse.grid,idp=1)
## [inverse distance weighted interpolation]
head(PbIDW1@data)
##   var1.pred var1.var
## 1  162.2972       NA
## 2  166.8671       NA
## 3  163.7769       NA
## 4  161.1196       NA
## 5  177.1940       NA
## 6  170.2910       NA
PbIDW2 <- idw(formula=lead~1,locations=meuse,newdata=meuse.grid,idp=2)
## [inverse distance weighted interpolation]
head(PbIDW2@data)
##   var1.pred var1.var
## 1  192.8410       NA
## 2  212.4564       NA
## 3  198.4250       NA
## 4  186.6732       NA
## 5  250.1697       NA
## 6  224.0157       NA
PbIDW3 <- idw(formula=lead~1,locations=meuse,newdata=meuse.grid,idp=3)
## [inverse distance weighted interpolation]
head(PbIDW3@data)
##   var1.pred var1.var
## 1  225.5130       NA
## 2  252.0060       NA
## 3  230.7777       NA
## 4  211.1666       NA
## 5  285.2467       NA
## 6  261.2095       NA
spplot(PbIDW1,"var1.pred") #(Power = 1)

spplot(PbIDW2,"var1.pred") #(Power = 2)

spplot(PbIDW3,"var1.pred") #(Power = 3)

As the power increases, lead concentrations farther away have less and less weight in the interpolated values. In the above plots, this is evidenced by lower and lower interpolated concentrations (darker colors) for regions farther away from the Meuse River where lead concentrations were measured to be low, and brighter and brighter colors in regions next to the river with high measured lead concentrations (the max value in the color scale on the right even increases). In this way, increasing power effectively accentuates differences in concentrations between the regions.

Kriging

(Let’s find some gold, first by fitting a variogram by hand…if you can call this complicated code “by hand”…)

PbVar <- variogram(lead~1, meuse)
summary(PbVar)
##        np             dist             gamma          dir.hor     dir.ver 
##  Min.   : 57.0   Min.   :  79.29   Min.   : 3881   Min.   :0   Min.   :0  
##  1st Qu.:435.5   1st Qu.: 425.61   1st Qu.:10771   1st Qu.:0   1st Qu.:0  
##  Median :477.0   Median : 796.18   Median :13610   Median :0   Median :0  
##  Mean   :458.9   Mean   : 799.98   Mean   :12676   Mean   :0   Mean   :0  
##  3rd Qu.:545.0   3rd Qu.:1169.60   3rd Qu.:15163   3rd Qu.:0   3rd Qu.:0  
##  Max.   :589.0   Max.   :1543.20   Max.   :18082   Max.   :0   Max.   :0  
##     id    
##  var1:15  
##           
##           
##           
##           
## 
plot(PbVar,pch=20,cex=1.5,col="black",
     ylab=expression("Semivariance ("*gamma*")"),
     xlab="Distance (m)", main = "Lead concentrations (ppm)")

sph.model <- vgm(psill=15000, model="Sph", range=1000, nugget=4000)
  ##(some of the values had to be changed relative to Andy's Cadmium example)
sph.fit <- fit.variogram(object = PbVar, model = sph.model)
plot(PbVar,pch=20,cex=1.5,col="black",
     ylab=expression("Semivariance ("*gamma*")"),
     xlab="Distance (m)", main = "Lead concentrations (ppm)",model=sph.fit)

(Now, let’s krige, first using ordinary kriging…)

PbSph <- krige(lead~1, meuse, meuse.grid, model = sph.fit)
## [using ordinary kriging]
spplot(PbSph,"var1.pred")

(And now using autokriging…)

Let’s use autofitVariogram() to do all the heavy lifting. Oh baby, gold prospecting was never easier. Bring on the nuggets:

PbVar <- autofitVariogram(lead~1,meuse)
summary(PbVar)
## Experimental variogram:
##      np       dist     gamma dir.hor dir.ver   id
## 1    17   59.33470  6408.794       0       0 var1
## 2    36   86.01449  2771.653       0       0 var1
## 3   114  131.02870  5321.110       0       0 var1
## 4   149  176.18845  8594.403       0       0 var1
## 5   184  226.75652  5298.913       0       0 var1
## 6   711  337.60359  9936.921       0       0 var1
## 7   830  502.04773 11819.392       0       0 var1
## 8  1349  713.21485 14507.242       0       0 var1
## 9  1314  961.27179 16061.474       0       0 var1
## 10 1139 1213.41157 15676.574       0       0 var1
## 11 1355 1506.55052 13474.368       0       0 var1
## 
## Fitted variogram model:
##   model     psill   range kappa
## 1   Nug  4581.722   0.000     0
## 2   Ste 10863.619 462.919     5
## Sums of squares betw. var. model and sample var.[1] 82081.25
plot(PbVar)

Handy, huh? It automatically fits and plots the variogram and tells you how many point pairs there are at each distance. Not only can we autofit variograms, but we can autokrige surfaces as well. The following gives us the nice autofitted variogram from (although the plot is super squashed in this layout), as well as interpolated lead concentrations within the Meuse River grid area:

PbKrig <- autoKrige(lead~1, meuse, meuse.grid)
## [using ordinary kriging]
plot(PbKrig)

The standard error of the interpolated values tends to be lower in regions within the grid where there are more data points, for example along the river. This makes good sense, and if we were prospecting for lead (hot commodity that it is) it would probably be smartest to do so near the river, for two reasons:

  • that is where the highest lead concentrations are predicted to be
  • that is where we have the most certainty/confidence in those predicted (interpolated) concentrations

Thinking about the usefulness of the IDW approach, I do believe that it would be possible to generate an interpolated surface of lead concentrations using a subset of the meuse data, using different numbers for the power term in the inverse distance weighting formula, and to then use that surface to predict lead concentrations for the remainder of the dataset. I could then compare those interpolated values to the actual measured values at specific locations in order to assess which power makes the most accurate interpolations.

The code required to do this would definitely take me a long time to figure out, especially beacuse I think that I would need to take a random sample of the datapoints as opposed to just subsetting out the first 2/3 of the data or something like that. However, I can’t actually figure out how the data are organized - not by lat/long, and not by measured values (as is evident below) - so maybe the first 2/3 of the values would indeed be a random enough-ish collection of points.

head(meuse@data)
##   cadmium copper lead zinc  elev       dist   om ffreq soil lime landuse
## 1    11.7     85  299 1022 7.909 0.00135803 13.6     1    1    1      Ah
## 2     8.6     81  277 1141 6.983 0.01222430 14.0     1    1    1      Ah
## 3     6.5     68  199  640 7.800 0.10302900 13.0     1    1    1      Ah
## 4     2.6     81  116  257 7.655 0.19009400  8.0     1    2    0      Ga
## 5     2.8     48  117  269 7.480 0.27709000  8.7     1    2    0      Ah
## 6     3.0     61  137  281 7.791 0.36406700  7.8     1    2    0      Ga
##   dist.m
## 1     50
## 2     30
## 3    150
## 4    270
## 5    380
## 6    470