date()
## [1] "Wed Apr 23 16:40:36 2014"
The objective of this problem set is to have you demonstrate skill at using functions in the geoR (or gstat) package for creating an interpolated surface using the methods of kriging.
require(geoR)
## Loading required package: geoR
## Loading required package: sp
## Loading required package: MASS
## --------------------------------------------------------------
## Analysis of geostatistical data
## For an Introduction to geoR go to http://www.leg.ufpr.br/geoR
## geoR version 1.7-4 (built on 2012-06-29) is now loaded
## --------------------------------------------------------------
require(ggplot2)
## Loading required package: ggplot2
summary(parana)
## $n
## [1] 143
##
## $coords.summary
## east north
## min 150.1 70.36
## max 768.5 461.97
##
## $distances.summary
## min max
## 1.0 619.5
##
## $data.summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 163 234 270 274 318 414
##
## $borders.summary
## east north
## min 138.0 46.77
## max 798.6 507.93
##
## $others
## [1] "loci.paper"
##
## attr(,"class")
## [1] "summary.geodata"
plot(parana, bor = borders)
The original plot of the spatial data shows evidence of both a north-south and an east-west trend.
plot(parana, bor = borders, trend = "1st")
Removing the first-order trend still leaves evidence of the east-west trend, and the residual plot is not distributed normally.
plot(parana, bor = borders, trend = "2nd")
Removing the second-order and all lower trends results in a much more normally distributed residuals plot. There is no remaining spatial trend, although some clustering is still noted.
The maximum distance between points is 620 degrees, so the max distance plotted in the variogram is set at half of that value, or 310 degrees. The second-order trend term is included here so the variogram is computed on the residuals.
plot(variog(parana, trend = "2nd", max.dist = 310))
## variog: computing omnidirectional variogram
Check for anisotropy by plotting directional variograms.
plot(variog4(parana, trend = "2nd", max.dist = 310))
## variog: computing variogram for direction = 0 degrees (0 radians)
## tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing variogram for direction = 45 degrees (0.785 radians)
## tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing variogram for direction = 90 degrees (1.571 radians)
## tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing variogram for direction = 135 degrees (2.356 radians)
## tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing omnidirectional variogram
The four directional variograms look similar to one another, especially at closer distances where there are more observations. Directional variograms are not needed in this case because there is not strong evidence for anisotropy.
Fitting the variogram model involves choosing an exponential or spherical model and examining the AIC for final parameter selection. Initial values for the sill and range are estimated to be approximately 650 and 200 from the variogram in Step 2.
iv = c(650, 200)
summary(likfit(parana, ini = iv, cov.model = "exp", trend = "2nd", fix.nug = TRUE,
message = FALSE))$likelihood$AIC
## [1] 1347
summary(likfit(parana, ini = iv, cov.model = "exp", trend = "2nd", fix.nug = FALSE,
message = FALSE))$likelihood$AIC
## [1] 1338
summary(likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = TRUE,
message = FALSE))$likelihood$AIC
## [1] 1350
summary(likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = FALSE,
message = FALSE))$likelihood$AIC
## [1] 1338
It appears that a good variogram model would be on the residuals of a 2nd order trend using a spherical function with a variable nugget. An exponential function with a variable nugget is also a good model. We now get the model parameters for these two competing models, plot them on the empirical variogram, and save the model.
likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = FALSE)
## kappa not used for the spherical correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
## likfit: estimated model parameters:
## beta0 beta1 beta2 beta3 beta4 beta5
## "435.4929" " 0.0477" " -0.7009" " -0.0004" " 0.0001" " 0.0006"
## tausq sigmasq phi
## "419.9149" "360.8187" "214.5301"
## Practical Range with cor=0.05 for asymptotic range: 214.5
##
## likfit: maximised log-likelihood = -660
likfit(parana, ini = iv, cov.model = "exp", trend = "2nd", fix.nug = FALSE)
## kappa not used for the exponential correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
## likfit: estimated model parameters:
## beta0 beta1 beta2 beta3 beta4 beta5
## "423.9281" " 0.0620" " -0.6360" " -0.0004" " 0.0000" " 0.0006"
## tausq sigmasq phi
## "381.2269" "372.5977" " 77.5437"
## Practical Range with cor=0.05 for asymptotic range: 232.3
##
## likfit: maximised log-likelihood = -660
plot(variog(parana, trend = "2nd", max.dist = 310))
## variog: computing omnidirectional variogram
lines.variomodel(cov.model = "sph", cov.pars = c(360.81, 214.53), nug = 419.91,
max.dist = 310, col = "red")
lines.variomodel(cov.model = "exp", cov.pars = c(372.59, 77.5437), nug = 381.23,
max.dist = 310, col = "green")
parana.modelE = likfit(parana, ini = iv, cov.model = "exp", trend = "2nd", fix.nug = FALSE)
## kappa not used for the exponential correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
parana.modelS = likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = FALSE)
## kappa not used for the spherical correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
## arguments for the maximisation function.
## For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
## times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.
Based on the plot, it appears the exponential model is a better fit to the empirical variogram.
pred.grid = expand.grid(seq(0, 800, l = 200), seq(0, 600, l = 200))
kcE = krige.conv(parana, loc = pred.grid, krige = krige.control(trend.d = "2nd",
trend.l = "2nd", obj.m = parana.modelE))
## krige.conv: results will be returned only for prediction locations inside the borders
## krige.conv: model with mean given by a 2nd order polynomial on the coordinates
## krige.conv: Kriging performed using global neighbourhood
kcS = krige.conv(parana, loc = pred.grid, krige = krige.control(trend.d = "2nd",
trend.l = "2nd", obj.m = parana.modelS))
## krige.conv: results will be returned only for prediction locations inside the borders
## krige.conv: model with mean given by a 2nd order polynomial on the coordinates
## krige.conv: Kriging performed using global neighbourhood
Having generated the Kriged surface, we now plot the surface.
image(kcE, col = rev(heat.colors(20)), ylab = "", xlab = "")
title(main = "Exponential Model w/ Non-Zero Nugget")
map("state", add = TRUE, col = "grey")
## Error: could not find function "map"
contour(kcE, add = TRUE, nlevel = 20)
image(kcS, col = rev(heat.colors(20)), ylab = "", xlab = "")
title(main = "Spherical Model w/ Non-Zero Nugget")
map("state", add = TRUE, col = "grey")
## Error: could not find function "map"
contour(kcS, add = TRUE, nlevel = 20)
The spherical model is smoother than the exponential model, which seems to better capture some of the local variability. Overall, the two models behave quite similarly.
xv = xvalid(parana, model = parana.modelE)
## xvalid: number of data locations = 143
## xvalid: number of validation locations = 143
## xvalid: performing cross-validation at location ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,
## xvalid: end of cross-validation
df = data.frame(x = xv$data, y = xv$predicted)
ggplot(df, aes(x, y)) + geom_point() + geom_abline(intercept = 0, slope = 1) +
geom_smooth(method = lm, color = "red")
mean(xv$error^2)
## [1] 536.8
mean(abs(xv$error))
## [1] 17.9
Validating the exponential model, the absolute mean error in predicted Parana winter rainfall across all reporting stations is about 17mm. There does not appear to be any bias in the residuals, so the model does appear to be well-fitted.