Applied Spatial Statistics: Problem Set # 3

Tyler Fricker

date()

## [1] "Tue Apr 22 14:47:13 2014"

Due Date: April 23, 2014

Total Points: 50

The objective of this problem set is to have you demonstrate skill at using functions in the geoR (or gstat) package for creating an interpolated surface using the methods of kriging.

require(ggplot2)

## Loading required package: ggplot2

require(geoR)

## Loading required package: geoR Loading required package: sp Loading
## required package: MASS
## -------------------------------------------------------------- Analysis of
## geostatistical data For an Introduction to geoR go to
## http://www.leg.ufpr.br/geoR geoR version 1.7-4 (built on 2012-06-29) is
## now loaded --------------------------------------------------------------

summary(parana)

## $n
## [1] 143
## 
## $coords.summary
##      east  north
## min 150.1  70.36
## max 768.5 461.97
## 
## $distances.summary
##   min   max 
##   1.0 619.5 
## 
## $data.summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     163     234     270     274     318     414 
## 
## $borders.summary
##      east  north
## min 138.0  46.77
## max 798.6 507.93
## 
## $others
## [1] "loci.paper"
## 
## attr(,"class")
## [1] "summary.geodata"

plot(parana, bor = borders)

plot of chunk data

There are 143 recording stations. The spatial bounding box is given under coordinates summary. The minimum distance between sites is 1.00 degrees and the maximum distance is 619.5 degrees.

Examine the data for spatial trends and normality.

plot(parana, trend = "1st")

plot of chunk partone

plot(parana, trend = "2nd")

plot of chunk partone

There is a clear trend in the original data set. With the 1st trend removed, there still appears to be a trend between the x coordinate and residuals graph. Clustering appears to be visible on both graphs, so the next step is removing the 2nd trend. When this is done, the clusting remains slightly, but the linear relationship leaves.

Compute empirical variograms on the residuals.

plot(variog(parana, trend = "2nd", max.dist = 309.75))

## variog: computing omnidirectional variogram

plot of chunk parttwo

plot(variog4(parana, trend = "2nd", max.dist = 309.75))

## variog: computing variogram for direction = 0 degrees (0 radians)
##         tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing variogram for direction = 45 degrees (0.785 radians)
##         tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing variogram for direction = 90 degrees (1.571 radians)
##         tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing variogram for direction = 135 degrees (2.356 radians)
##         tolerance angle = 22.5 degrees (0.393 radians)
## variog: computing omnidirectional variogram

plot of chunk parttwo

Fit a variogram model.

iv = c(900, 200)
summary(likfit(parana, ini = iv, cov.model = "exp", trend = "2nd", fix.nug = TRUE, 
    message = FALSE))$likelihood$AIC

## [1] 1347

summary(likfit(parana, ini = iv, cov.model = "exp", trend = "2nd", fix.nug = FALSE, 
    message = FALSE))$likelihood$AIC

## [1] 1338

summary(likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = TRUE, 
    message = FALSE))$likelihood$AIC

## [1] 1350

summary(likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = FALSE, 
    message = FALSE))$likelihood$AIC

## [1] 1338

It appears that a good variogram model would be on the residuals of a 2nd order trend using a spherical function without a fixed nugget equal to zero. An exponential function without a fixed nugget is also a reasonable model.

Now obtain the model parameters

summary(likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = FALSE))

## kappa not used for the spherical correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

## Summary of the parameter estimation
## -----------------------------------
## Estimation method: maximum likelihood 
## 
## Parameters of the mean component (trend):
##   beta0   beta1   beta2   beta3   beta4   beta5 
## 435.493   0.048  -0.701   0.000   0.000   0.001 
## 
## Parameters of the spatial component:
##    correlation function: spherical
##       (estimated) variance parameter sigmasq (partial sill) =  361
##       (estimated) cor. fct. parameter phi (range parameter)  =  215
##    anisotropy parameters:
##       (fixed) anisotropy angle = 0  ( 0 degrees )
##       (fixed) anisotropy ratio = 1
## 
## Parameter of the error component:
##       (estimated) nugget =  420
## 
## Transformation parameter:
##       (fixed) Box-Cox parameter = 1 (no transformation)
## 
## Practical Range with cor=0.05 for asymptotic range: 214.5
## 
## Maximised Likelihood:
##    log.L n.params      AIC      BIC 
##   "-660"      "9"   "1338"   "1364" 
## 
## non spatial model:
##    log.L n.params      AIC      BIC 
##   "-671"      "7"   "1356"   "1376" 
## 
## Call:
## likfit(geodata = parana, trend = "2nd", ini.cov.pars = iv, fix.nugget = FALSE, 
##     cov.model = "sph")

summary(likfit(parana, ini = iv, cov.model = "exp", trend = "2nd", fix.nug = FALSE))

## kappa not used for the exponential correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

## Summary of the parameter estimation
## -----------------------------------
## Estimation method: maximum likelihood 
## 
## Parameters of the mean component (trend):
##   beta0   beta1   beta2   beta3   beta4   beta5 
## 423.928   0.062  -0.636   0.000   0.000   0.001 
## 
## Parameters of the spatial component:
##    correlation function: exponential
##       (estimated) variance parameter sigmasq (partial sill) =  373
##       (estimated) cor. fct. parameter phi (range parameter)  =  77.5
##    anisotropy parameters:
##       (fixed) anisotropy angle = 0  ( 0 degrees )
##       (fixed) anisotropy ratio = 1
## 
## Parameter of the error component:
##       (estimated) nugget =  381
## 
## Transformation parameter:
##       (fixed) Box-Cox parameter = 1 (no transformation)
## 
## Practical Range with cor=0.05 for asymptotic range: 232.3
## 
## Maximised Likelihood:
##    log.L n.params      AIC      BIC 
##   "-660"      "9"   "1338"   "1365" 
## 
## non spatial model:
##    log.L n.params      AIC      BIC 
##   "-671"      "7"   "1356"   "1376" 
## 
## Call:
## likfit(geodata = parana, trend = "2nd", ini.cov.pars = iv, fix.nugget = FALSE, 
##     cov.model = "exp")

Next, plot the two models on the empirical variogram.

plot(variog(parana, trend = "2nd", uvec = seq(0, 309.75, l = 29)))

## variog: computing omnidirectional variogram

lines.variomodel(cov.model = "sph", cov.pars = c(360.8, 214.5), nug = 419.9, 
    max.dist = 309.75, col = "green")
lines.variomodel(cov.model = "exp", cov.pars = c(372.6, 77.54), nug = 381.2, 
    max.dist = 309.75, col = "red")

plot of chunk partthree3

Save the models.

t.modelS = likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = FALSE)

## kappa not used for the spherical correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

t.modelE = likfit(parana, ini = iv, cov.model = "exp", trend = "2nd", fix.nug = FALSE)

## kappa not used for the exponential correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

Construct an interpolated surface using a 200 by 200 grid.

pred.grid = expand.grid(seq(50, 800, l = 200), seq(0, 600, l = 200))
kcS = krige.conv(parana, loc = pred.grid, krige = krige.control(trend.d = "2nd", 
    trend.l = "2nd", obj.m = t.modelS))

## krige.conv: results will be returned only for prediction locations inside the borders
## krige.conv: model with mean given by a 2nd order polynomial on the coordinates
## krige.conv: Kriging performed using global neighbourhood

kcE = krige.conv(parana, loc = pred.grid, krige = krige.control(trend.d = "2nd", 
    trend.l = "2nd", obj.m = t.modelE))

## krige.conv: results will be returned only for prediction locations inside the borders
## krige.conv: model with mean given by a 2nd order polynomial on the coordinates
## krige.conv: Kriging performed using global neighbourhood

Plot the surface

image(kcS, col = rev(heat.colors(20)), xlab = "Longitude", ylab = "Latitude")
title(main = "Spherical Model w/ Non-zero Nugget")
contour(kcS, add = TRUE, nlevel = 20)

plot of chunk partfour2

image(kcE, col = rev(heat.colors(20)), xlab = "Longitude", ylab = "Latitude")
title(main = "Exponential Model w/ Non-zero Nugget")
contour(kcE, add = TRUE, nlevel = 20)

plot of chunk partfour2

The model with a non-zero nugget is smoother. The greater the nugget (relative to the sill), the smoother the interpolation. Repeat using a spherical model and fixed nugget equal to 1.

t.modelS2 = likfit(parana, ini = iv, cov.model = "sph", trend = "2nd", fix.nug = TRUE, 
    nugget = 1)

## kappa not used for the spherical correlation function
## ---------------------------------------------------------------
## likfit: likelihood maximisation using the function optim.
## likfit: Use control() to pass additional
##          arguments for the maximisation function.
##         For further details see documentation for optim.
## likfit: It is highly advisable to run this function several
##         times with different initial values for the parameters.
## likfit: WARNING: This step can be time demanding!
## ---------------------------------------------------------------
## likfit: end of numerical maximisation.

kcS2 = krige.conv(parana, loc = pred.grid, krige = krige.control(trend.d = "2nd", 
    trend.l = "2nd", obj.m = t.modelS2))

## krige.conv: results will be returned only for prediction locations inside the borders
## krige.conv: model with mean given by a 2nd order polynomial on the coordinates
## krige.conv: Kriging performed using global neighbourhood

image(kcS2, col = rev(heat.colors(20)), xlab = "Longitude", ylab = "Latitude")
title(main = "Spherical Model w/ Nugget = 1")
contour(kcS2, add = TRUE, nlevel = 20)

plot of chunk partfour3

Examine the prediction errors.

xv = xvalid(parana, model = t.modelS)

## xvalid: number of data locations       = 143
## xvalid: number of validation locations = 143
## xvalid: performing cross-validation at location ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 
## xvalid: end of cross-validation

plot(xv$data, xv$predicted, xlab = "Observed Rainfall (inches)", ylab = "Predicted Rainfall (inches) at Observed Location", 
    asp = 1, col = "red", pch = 20)
abline(0, 1)
abline(lm(xv$predicted ~ xv$data), col = "red")

plot of chunk partfive

mean(xv$error^2)

## [1] 536.6

mean(abs(xv$error))

## [1] 17.87

Thus, on average, we can expect to make about 18 inches of error in the prediction of the mean rainfall at an arbitrary location in the domain.