Here is the interpolation for lead using IDW
#install.packages("automap")
library(sp)
library(gstat)
library(automap)
data(meuse)
coordinates(meuse) <- ~x+y
proj4string(meuse) <- CRS("+init=epsg:28992")
spplot(meuse,"lead",colorkey=TRUE)
PbVar <- autofitVariogram(lead~1,meuse)
summary(PbVar)
## Experimental variogram:
## np dist gamma dir.hor dir.ver id
## 1 17 59.33470 6408.794 0 0 var1
## 2 36 86.01449 2771.653 0 0 var1
## 3 114 131.02870 5321.110 0 0 var1
## 4 149 176.18845 8594.403 0 0 var1
## 5 184 226.75652 5298.913 0 0 var1
## 6 711 337.60359 9936.921 0 0 var1
## 7 830 502.04773 11819.392 0 0 var1
## 8 1349 713.21485 14507.242 0 0 var1
## 9 1314 961.27179 16061.474 0 0 var1
## 10 1139 1213.41157 15676.574 0 0 var1
## 11 1355 1506.55052 13474.368 0 0 var1
##
## Fitted variogram model:
## model psill range kappa
## 1 Nug 4581.722 0.000 0
## 2 Ste 10863.619 462.919 5
## Sums of squares betw. var. model and sample var.[1] 82081.25
data(meuse.grid)
coordinates(meuse.grid) <- ~x+y
proj4string(meuse.grid) <- CRS("+init=epsg:28992")
meuse.grid <- as(meuse.grid,"SpatialPixelsDataFrame")
plot(meuse.grid)
points(meuse,pch=20)
Here are the IDWs with powers of 1, 2 and 3:
PbIDW1 <- idw(formula=lead~1,locations=meuse,newdata=meuse.grid,idp=1)
## [inverse distance weighted interpolation]
head(PbIDW1@data)
## var1.pred var1.var
## 1 162.2972 NA
## 2 166.8671 NA
## 3 163.7769 NA
## 4 161.1196 NA
## 5 177.1940 NA
## 6 170.2910 NA
PbIDW2 <- idw(formula=lead~1,locations=meuse,newdata=meuse.grid,idp=2)
## [inverse distance weighted interpolation]
head(PbIDW2@data)
## var1.pred var1.var
## 1 192.8410 NA
## 2 212.4564 NA
## 3 198.4250 NA
## 4 186.6732 NA
## 5 250.1697 NA
## 6 224.0157 NA
PbIDW3 <- idw(formula=lead~1,locations=meuse,newdata=meuse.grid,idp=3)
## [inverse distance weighted interpolation]
head(PbIDW3@data)
## var1.pred var1.var
## 1 225.5130 NA
## 2 252.0060 NA
## 3 230.7777 NA
## 4 211.1666 NA
## 5 285.2467 NA
## 6 261.2095 NA
spplot(PbIDW1,"var1.pred") #(Power = 1)
spplot(PbIDW2,"var1.pred") #(Power = 2)
spplot(PbIDW3,"var1.pred") #(Power = 3)
As the power increases, lead concentrations farther away have less and less weight in the interpolated values. In the above plots, this is evidenced by lower and lower interpolated concentrations (darker colors) for regions farther away from the Meuse River where lead concentrations were measured to be low, and brighter and brighter colors in regions next to the river with high measured lead concentrations (the max value in the color scale on the right even increases). In this way, increasing power effectively accentuates differences in concentrations between the regions.
PbVar <- variogram(lead~1, meuse)
summary(PbVar)
## np dist gamma dir.hor dir.ver
## Min. : 57.0 Min. : 79.29 Min. : 3881 Min. :0 Min. :0
## 1st Qu.:435.5 1st Qu.: 425.61 1st Qu.:10771 1st Qu.:0 1st Qu.:0
## Median :477.0 Median : 796.18 Median :13610 Median :0 Median :0
## Mean :458.9 Mean : 799.98 Mean :12676 Mean :0 Mean :0
## 3rd Qu.:545.0 3rd Qu.:1169.60 3rd Qu.:15163 3rd Qu.:0 3rd Qu.:0
## Max. :589.0 Max. :1543.20 Max. :18082 Max. :0 Max. :0
## id
## var1:15
##
##
##
##
##
plot(PbVar,pch=20,cex=1.5,col="black",
ylab=expression("Semivariance ("*gamma*")"),
xlab="Distance (m)", main = "Lead concentrations (ppm)")
sph.model <- vgm(psill=15000, model="Sph", range=1000, nugget=4000)
##(some of the values had to be changed relative to Andy's Cadmium example)
sph.fit <- fit.variogram(object = PbVar, model = sph.model)
plot(PbVar,pch=20,cex=1.5,col="black",
ylab=expression("Semivariance ("*gamma*")"),
xlab="Distance (m)", main = "Lead concentrations (ppm)",model=sph.fit)
PbSph <- krige(lead~1, meuse, meuse.grid, model = sph.fit)
## [using ordinary kriging]
spplot(PbSph,"var1.pred")
Let’s use autofitVariogram()
to do all the heavy lifting. Oh baby, gold prospecting was never easier. Bring on the nuggets:
PbVar <- autofitVariogram(lead~1,meuse)
summary(PbVar)
## Experimental variogram:
## np dist gamma dir.hor dir.ver id
## 1 17 59.33470 6408.794 0 0 var1
## 2 36 86.01449 2771.653 0 0 var1
## 3 114 131.02870 5321.110 0 0 var1
## 4 149 176.18845 8594.403 0 0 var1
## 5 184 226.75652 5298.913 0 0 var1
## 6 711 337.60359 9936.921 0 0 var1
## 7 830 502.04773 11819.392 0 0 var1
## 8 1349 713.21485 14507.242 0 0 var1
## 9 1314 961.27179 16061.474 0 0 var1
## 10 1139 1213.41157 15676.574 0 0 var1
## 11 1355 1506.55052 13474.368 0 0 var1
##
## Fitted variogram model:
## model psill range kappa
## 1 Nug 4581.722 0.000 0
## 2 Ste 10863.619 462.919 5
## Sums of squares betw. var. model and sample var.[1] 82081.25
plot(PbVar)
Handy, huh? It automatically fits and plots the variogram and tells you how many point pairs there are at each distance. Not only can we autofit variograms, but we can autokrige surfaces as well. The following gives us the nice autofitted variogram from (although the plot is super squashed in this layout), as well as interpolated lead concentrations within the Meuse River grid area:
PbKrig <- autoKrige(lead~1, meuse, meuse.grid)
## [using ordinary kriging]
plot(PbKrig)
The standard error of the interpolated values tends to be lower in regions within the grid where there are more data points, for example along the river. This makes good sense, and if we were prospecting for lead (hot commodity that it is) it would probably be smartest to do so near the river, for two reasons:
Thinking about the usefulness of the IDW approach, I do believe that it would be possible to generate an interpolated surface of lead concentrations using a subset of the meuse
data, using different numbers for the power term in the inverse distance weighting formula, and to then use that surface to predict lead concentrations for the remainder of the dataset. I could then compare those interpolated values to the actual measured values at specific locations in order to assess which power makes the most accurate interpolations.
The code required to do this would definitely take me a long time to figure out, especially beacuse I think that I would need to take a random sample of the datapoints as opposed to just subsetting out the first 2/3 of the data or something like that. However, I can’t actually figure out how the data are organized - not by lat/long, and not by measured values (as is evident below) - so maybe the first 2/3 of the values would indeed be a random enough-ish collection of points.
head(meuse@data)
## cadmium copper lead zinc elev dist om ffreq soil lime landuse
## 1 11.7 85 299 1022 7.909 0.00135803 13.6 1 1 1 Ah
## 2 8.6 81 277 1141 6.983 0.01222430 14.0 1 1 1 Ah
## 3 6.5 68 199 640 7.800 0.10302900 13.0 1 1 1 Ah
## 4 2.6 81 116 257 7.655 0.19009400 8.0 1 2 0 Ga
## 5 2.8 48 117 269 7.480 0.27709000 8.7 1 2 0 Ah
## 6 3.0 61 137 281 7.791 0.36406700 7.8 1 2 0 Ga
## dist.m
## 1 50
## 2 30
## 3 150
## 4 270
## 5 380
## 6 470