Lab 12 Exercises

Eric Goodwin || November 27, 2021

Exercise 1

library(sf)
library(stars)
library(gstat)
library(RColorBrewer)

The compressed file oregon.zip contains data on average annual temperatures for Oregon from a set of climate stations in the shapefile oregontann.shp in a variable called ‘tann’, and station elevation in a variable called elevation. A second file orgrid.shp contains a set of gridded elevations for the state at 10 minute resolution. Code is given below to read in these data and convert to sf and stars objects for geostatistical analysis.

This is the code to read in and convert all the necessary data to sf and starts objects. Note that we have to assign the CRS to each object. The data are in unprojected longitude/latitude, so we use the WGS84 description (EPSG code 4326). The gridded elevation is in an odd format: point data into a shapefile. Here we use st_rasterize() to force this into a full grid for interpolation.

# Oregon boundaries
orotl <- st_read("../datafiles/oregon/orotl.shp", quiet = TRUE)
st_crs(orotl) <- 4326

# Oregon temperatures
ortann <- st_read("../datafiles/oregon/oregontann.shp", quiet = TRUE)
st_crs(ortann) <- 4326

# Oregon DEM file
orgrid <- st_read("../datafiles/oregon/orgrid.shp", quiet = TRUE) 
st_crs(orgrid) <- 4326
orgrid.dem <- st_rasterize(orgrid, dx = 0.1667, dy = 0.1667)
st_crs(orgrid.dem) <- 4326

# plot
plot(orgrid.dem, reset = FALSE)

plot(ortann["tann"], 
     add = TRUE, 
     pch = 16, 
     cex = 1.5)

Using the gstat library, carry out the following analyses:

Read in the files and produce a sample variogram for average annual temperatures in Oregon. (Use the variogram() function)

First I will plot a histogram of the Oregon temperature variables to see whether we need to log-transform them

hist(ortann$tann)

We can see the data is left-skewed so I will log transform the temperature variable and re-plot it to see whether the log-transformed data follows a normal distribution

ortann$ltann = log(ortann$tann)
hist(ortann$ltann)

There still seems to be a positive skewness, so I tried to apply further transformations…

ortann$sqtann = sqrt(max(ortann$tann +1) - ortann$tann)
hist(ortann$sqtann)

That’s better. Thanks, Google!

ortann.var = variogram(sqtann ~ 1, data = ortann)
plot(ortann.var, plot.numbers = TRUE, pch = '+')

Create a variogram model for this data using the vgm() function. You will need to choose an appropriate model and initial parameters for the nugget, sill and range. Report the values and model you have used

ortann.nugget = 0.03
ortann.range = 150
ortann.sill = 0.24

ortann.vgm1 <- vgm(psill = ortann.sill, 
                  model = "Sph", 
                  range = ortann.range, 
                  nugget = ortann.nugget)

plot(ortann.var, ortann.vgm1, main = "Oregon temperature variogram")

That doesn’t look right. While range parameters might be good, the sill is set too high, the nugget might need adjustment, and the spherical model increases too rapidly. Perhaps this is a result of my log transformation? After trying other models, the best fit I could get came from a circular model, with adjusted nugget and sill.

ortann.nugget2 = 0.0
ortann.sill3 = 0.23

ortann.vgm3 <- vgm(psill = ortann.sill3, 
                   model = "Cir", 
                   range = ortann.range, 
                   nugget = ortann.nugget2)

plot(ortann.var, ortann.vgm3, main = "Oregon temperature variogram")

I am still not completely satisfied as the semivariance before the model flattens looks overpredicted, but this was the best I could get after multiple iterations. I tried Gaussian and exponential models; neither worked better than this. I think this is close enough to fit the variogram.

Use the fit.variogram() function to fit this model to the sample variogram from step a. Produce a plot showing the final variogram

ortann.vgmfit <- fit.variogram(ortann.var, ortann.vgm3)

plot(ortann.var, ortann.vgmfit, main = "Oregon temperature variogram")

Better!

Now use this model to interpolate the annual temperatures using the grid from the DEM, using the krige() function. Produce a map showing the predicted value on the grid and the prediction error.

ortemp.krige = krige(sqtann ~ 1, 
                     locations = ortann, 
                     newdata = orgrid.dem, 
                     model = ortann.vgmfit, 
                     nmax = 100)
## [using ordinary kriging]
BuPu.10breaks = brewer.pal(10, "BuPu")
## Warning in brewer.pal(10, "BuPu"): n too large, allowed maximum for palette BuPu is 9
## Returning the palette you asked for with that many colors
plot(ortemp.krige, col = BuPu.10breaks, main = "Oregon interpolated temp. values")

Map of the prediction error:

names(ortemp.krige)
## [1] "var1.pred" "var1.var"
plot(ortemp.krige["var1.var"], 
     main = "Oregon interpolated temp. values (error)")

Use the krige.cv() function with 5-fold cross-validation to report the root mean squared error and R^2

ortemp.cv.krige <- krige.cv(sqtann ~ 1, 
                      locations = ortann, 
                      model = ortann.vgmfit, 
                      nmax = 100, 
                      nfold = 5)
head(ortemp.cv.krige)  
## Simple feature collection with 6 features and 6 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -123.883 ymin: 42.217 xmax: -117.817 ymax: 46.15
## Geodetic CRS:  WGS 84
##   var1.pred   var1.var observed    residual     zscore fold
## 1  2.265273 0.06676328 2.000000 -0.26527340 -1.0266558    5
## 2  1.646372 0.08653667 1.048809 -0.59756324 -2.0313455    3
## 3  1.501096 0.07976511 1.581139  0.08004252  0.2834094    5
## 4  1.708684 0.07257127 1.816590  0.10790643  0.4005576    1
## 5  2.195598 0.04592583 2.449490  0.25389192  1.1847327    4
## 6  2.362628 0.04849635 2.280351 -0.08227673 -0.3736135    5
##                  geometry
## 1 POINT (-120.717 44.917)
## 2   POINT (-120.2 45.717)
## 3 POINT (-122.717 42.217)
## 4  POINT (-123.883 46.15)
## 5 POINT (-117.817 44.833)
## 6 POINT (-117.833 44.783)
sqrt(mean(ortemp.cv.krige$residual^2))
## [1] 0.2881333
cor(ortemp.cv.krige$observed, ortemp.cv.krige$var1.pred)^2
## [1] 0.6109906

Because I used the adjusted values, these numbers don’t really mean much. So, I decided to compare the RMSEP to the mean sqtann value

rmsepcompare = 100*((sqrt(mean(ortemp.cv.krige$residual^2)))/(mean(ortann$sqtann)))
rmsepcompare
## [1] 15.39712

I may be wrong, but: For a mean temperature value, my model will be, generally, about 15.3971243% off. (this value varies each time I run the code!)