Applied Spatial Statistics: Problem Set # 3

Ryan Truchelut

date()

## [1] "Wed Apr 23 16:40:36 2014"

Due Date: April 23, 2014

Total Points: 50

The objective of this problem set is to have you demonstrate skill at using functions in the geoR (or gstat) package for creating an interpolated surface using the methods of kriging.

require(geoR)

## Loading required package: geoR
## Loading required package: sp
## Loading required package: MASS
## --------------------------------------------------------------
##  Analysis of geostatistical data
##  For an Introduction to geoR go to http://www.leg.ufpr.br/geoR
##  geoR version 1.7-4 (built on 2012-06-29) is now loaded
## --------------------------------------------------------------

require(ggplot2)

## Loading required package: ggplot2

summary(parana)

## $n
## [1] 143
## 
## $coords.summary
##      east  north
## min 150.1  70.36
## max 768.5 461.97
## 
## $distances.summary
##   min   max 
##   1.0 619.5 
## 
## $data.summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     163     234     270     274     318     414 
## 
## $borders.summary
##      east  north
## min 138.0  46.77
## max 798.6 507.93
## 
## $others
## [1] "loci.paper"
## 
## attr(,"class")
## [1] "summary.geodata"

plot(parana, bor = borders)

plot of chunk data

Examine the data for spatial trends and normality.

The original plot of the spatial data shows evidence of both a north-south and an east-west trend.

plot(parana, bor = borders, trend = "1st")

plot of chunk 1stordertrend

Removing the first-order trend still leaves evidence of the east-west trend, and the residual plot is not distributed normally.

plot(parana, bor = borders, trend = "2nd")

plot of chunk 2ndordertrend

Removing the second-order and all lower trends results in a much more normally distributed residuals plot. There is no remaining spatial trend, although some clustering is still noted.

Compute empirical variograms on the residuals.

The maximum distance between points is 620 degrees, so the max distance plotted in the variogram is set at half of that value, or 310 degrees. The second-order trend term is included here so the variogram is computed on the residuals.

plot(variog(parana, trend = "2nd", max.dist = 310))

## variog: computing omnidirectional variogram

plot of chunk plotVariogramWCA

Check for anisotropy by plotting directional variograms.

plot(variog4(parana, trend = "2nd", max.dist = 310))

## variog: computing variogram for direction = 0 degrees (0 radians)
##         tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing variogram for direction = 45 degrees (0.785 radians)
##         tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing variogram for direction = 90 degrees (1.571 radians)
##         tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing variogram for direction = 135 degrees (2.356 radians)
##         tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing omnidirectional variogram

plot of chunk directionalVariogramvariog4FunctionWCA

The four directional variograms look similar to one another, especially at closer distances where there are more observations. Directional variograms are not needed in this case because there is not strong evidence for anisotropy.

Fit a variogram model.

Fitting the variogram model involves choosing an exponential or spherical model and examining the AIC for final parameter selection. Initial values for the sill and range are estimated to be approximately 650 and 200 from the variogram in Step 2.

iv = c(650, 200)
summary(likfit(parana, ini = iv, cov.model = "exp", trend = "2nd", fix.nug = TRUE, 
    message = FALSE))$likelihood$AIC

## [1] 1347

summary(likfit(parana, ini = iv, cov.model = "exp", trend = "2nd", fix.nug = FALSE, 
    message = FALSE))$likelihood$AIC

## [1] 1338

summary(likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = TRUE, 
    message = FALSE))$likelihood$AIC

## [1] 1350

summary(likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = FALSE, 
    message = FALSE))$likelihood$AIC

## [1] 1338

It appears that a good variogram model would be on the residuals of a 2nd order trend using a spherical function with a variable nugget. An exponential function with a variable nugget is also a good model. We now get the model parameters for these two competing models, plot them on the empirical variogram, and save the model.

likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = FALSE)

## kappa not used for the spherical correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

## likfit: estimated model parameters:
##      beta0      beta1      beta2      beta3      beta4      beta5 
## "435.4929" "  0.0477" " -0.7009" " -0.0004" "  0.0001" "  0.0006" 
##      tausq    sigmasq        phi 
## "419.9149" "360.8187" "214.5301" 
## Practical Range with cor=0.05 for asymptotic range: 214.5
## 
## likfit: maximised log-likelihood = -660

likfit(parana, ini = iv, cov.model = "exp", trend = "2nd", fix.nug = FALSE)

## kappa not used for the exponential correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

## likfit: estimated model parameters:
##      beta0      beta1      beta2      beta3      beta4      beta5 
## "423.9281" "  0.0620" " -0.6360" " -0.0004" "  0.0000" "  0.0006" 
##      tausq    sigmasq        phi 
## "381.2269" "372.5977" " 77.5437" 
## Practical Range with cor=0.05 for asymptotic range: 232.3
## 
## likfit: maximised log-likelihood = -660


plot(variog(parana, trend = "2nd", max.dist = 310))

## variog: computing omnidirectional variogram

lines.variomodel(cov.model = "sph", cov.pars = c(360.81, 214.53), nug = 419.91, 
    max.dist = 310, col = "red")
lines.variomodel(cov.model = "exp", cov.pars = c(372.59, 77.5437), nug = 381.23, 
    max.dist = 310, col = "green")

plot of chunk modelParametersandPlot

parana.modelE = likfit(parana, ini = iv, cov.model = "exp", trend = "2nd", fix.nug = FALSE)

## kappa not used for the exponential correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

parana.modelS = likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = FALSE)

## kappa not used for the spherical correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

Based on the plot, it appears the exponential model is a better fit to the empirical variogram.

Construct an interpolated surface using a 200 by 200 grid.

pred.grid = expand.grid(seq(0, 800, l = 200), seq(0, 600, l = 200))
kcE = krige.conv(parana, loc = pred.grid, krige = krige.control(trend.d = "2nd", 
    trend.l = "2nd", obj.m = parana.modelE))

## krige.conv: results will be returned only for prediction locations inside the borders
## krige.conv: model with mean given by a 2nd order polynomial on the coordinates
## krige.conv: Kriging performed using global neighbourhood

kcS = krige.conv(parana, loc = pred.grid, krige = krige.control(trend.d = "2nd", 
    trend.l = "2nd", obj.m = parana.modelS))

## krige.conv: results will be returned only for prediction locations inside the borders
## krige.conv: model with mean given by a 2nd order polynomial on the coordinates
## krige.conv: Kriging performed using global neighbourhood

Having generated the Kriged surface, we now plot the surface.

image(kcE, col = rev(heat.colors(20)), ylab = "", xlab = "")
title(main = "Exponential Model w/ Non-Zero Nugget")
map("state", add = TRUE, col = "grey")

## Error: could not find function "map"

contour(kcE, add = TRUE, nlevel = 20)

plot of chunk plotImage

image(kcS, col = rev(heat.colors(20)), ylab = "", xlab = "")
title(main = "Spherical Model w/ Non-Zero Nugget")
map("state", add = TRUE, col = "grey")

## Error: could not find function "map"

contour(kcS, add = TRUE, nlevel = 20)

plot of chunk plotImage

The spherical model is smoother than the exponential model, which seems to better capture some of the local variability. Overall, the two models behave quite similarly.

Examine the prediction errors.

xv = xvalid(parana, model = parana.modelE)

## xvalid: number of data locations       = 143
## xvalid: number of validation locations = 143
## xvalid: performing cross-validation at location ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 
## xvalid: end of cross-validation

df = data.frame(x = xv$data, y = xv$predicted)
ggplot(df, aes(x, y)) + geom_point() + geom_abline(intercept = 0, slope = 1) + 
    geom_smooth(method = lm, color = "red")

plot of chunk xValidate

mean(xv$error^2)

## [1] 536.8

mean(abs(xv$error))

## [1] 17.9

Validating the exponential model, the absolute mean error in predicted Parana winter rainfall across all reporting stations is about 17mm. There does not appear to be any bias in the residuals, so the model does appear to be well-fitted.