The Data

The ca20 dataset in the geoR package contains the calcium content measured in soil samples taken from the 0-20cm layer at 178 locations within a certain study area divided in three sub-areas. The elevation at each location was also recorded.

The first region is typically flooded during the rain season and not used as an experimental area. The calcium levels would represent the natural content in the region. The second region has received fertilisers a while ago and is typically occupied by rice fields. The third region has received fertilisers recently and is frequently used as an experimental area.

Load geoR package and the calcium content data

library(geoR)
data(ca20)

Part (a)

The following code produces a single informative map of the calcium content values at all soil sample locations. I make the ca20 data into a data frame here with the major features to use ggplot. color is determined by what region the sample is taken from, and size by the calcium content values at each point.

library(ggplot2)
calcium <- as.data.frame(ca20$coords)
calcium$values <- ca20$data
calcium <- cbind(calcium,ca20$covariate)

ggplot(calcium,aes(x=east,y=north,colour=area,size=values)) +
    geom_point() + 
    scale_colour_manual(values=c("firebrick3","navy","gold"),name="Sub-area") +
    scale_size(guide=guide_legend(title="Calcium Content \n (mmol / liter)")) +
    theme(legend.position="right") +
    labs(title="Location and Calcium Content Values of ca20 dataset") +
    xlab("Eastern Soil Sample Location Coordinates") +
    ylab("Northern Soil Sample Location Coordinates")

Part (b)

Here I compute the empirical variogram of the data, and make a plot of the variogram values.

# function variog() is used to easily compute the empirical variogram of a geoR dataset
vg <- variog(ca20)
## variog: computing omnidirectional variogram
plot(vg,main="Empirical Variogram")
lines(vg$u,vg$v)

Next, I fit an exponential variogram model with a nugget term to the empirical variogram, and add the fitted variogram function to the empirical variogram plot

vfit <- variofit(vg, ini.cov.pars = c(100,100), cov.model = "exponential")
## variofit: covariance model used is exponential 
## variofit: weights used: npairs 
## variofit: minimisation function used: optim
covvec <- vfit$nugget*(vg$u==0) + vfit$cov.pars[1]*exp(-vg$u/vfit$cov.pars[2])
plot(vg$u,vg$v,ylim=c(0,160),xlab="Distance",ylab="Semivariance",col="red")
lines(vg$u,vg$v,col="red")
lines(vg$u,vfit$nugget + vfit$cov.pars[1] - covvec,lwd=2)
title(main = "Empirical Variogram with Fitted Variogram Function")
legend(800,30,c("original","fitted"),
       lty=c(1,1),lwd=c(1,2),col=c("red","black"))

Part (c)

Using the simulate_data_exp() function provided on the course webpage, I now simulate 20 datasets from the fitted variogram model.

simulate_data_exp = function( coords, meanvalue, cov.pars, nugget ){
  n = dim(coords)[1]
  distmat = as.matrix(dist(coords,diag=TRUE,upper=TRUE))
  covmat = cov.pars[1]*exp(-distmat/cov.pars[2]) + nugget*diag(n)
  cholmat = t(chol(covmat))
  simdata = (cholmat %*% rnorm(n) ) + meanvalue
  return(simdata)  
}

for(i in 1:20){
  assign(paste("simdata", i, sep = ""), simulate_data_exp(calcium[,1:2], mean(calcium$values),vfit$cov.pars,vfit$nugget))   
}

For each of these 20 simulated datasets, I now compute their empirical variogram.

for(i in 1:20){
     var <- get(paste("simdata",i,sep=""))
  assign(paste("simvg", i, sep = ""), variog(coords=calcium[,1:2],data=var))    
}
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram
## variog: computing omnidirectional variogram

I now plot the original data variogram, the fitted variogram, and the 20 empirical variograms from the simulated data in a single figure.

plot(vg,main="Variogram Plots",ylim=c(0,285),cex=1.5)  # plot original vg
lines(vg$u,vg$v,col="red",lwd=2.5) # connect original vg points with red line
lines(vg$u,vfit$nugget + vfit$cov.pars[1] - covvec,lwd=2.5) # add fitted vg


# add simulated data empirical vg's
for(j in 1:20){
    var <- get(paste("simvg",j,sep=""))
    lines(var$u,var$v,col="blue",lwd=0.75)
}
legend(40,270,c("original","fitted","simulated"),
       lty=c(1,1,1),lwd=c(2.5,2.5,0.75),col=c("red","black","blue"))

Part (d)

Overall, the empirical variogram values of the simulated data are quite variable.

In particular, this variability is quite large past a lag distance of 400.

When the fitted variogram is farthest from the original (lag distance > 800), almost all of the simulated data empirical variogram values plunge downward.

The large degree of variability in the observed values for the simulated data is likely due to the adequacy of the fitted variogram function in simulating the data.