Exercise 1

The compressed file oregon.zip contains data on average annual temperatures for Oregon from a set of climate stations in the shapefile oregontann.shp in a variable called ‘tann’, and station elevation in a variable called elevation. A second file orgrid.shp contains a set of gridded elevations for the state at 10 minute resolution. Code is given below to read in these data and convert to sf and stars objects for geostatistical analysis. Using the gstat library, carry out the following analyses:

Read in the files and produce a sample variogram for average annual temperatures in Oregon. (Use the variogram() function)

# Oregon boundaries
orotl <- st_read("../datafiles/oregon/oregon/orotl.shp", quiet = TRUE)
st_crs(orotl) <- 4326

# Oregon temperatures
ortann <- st_read("../datafiles/oregon/oregon/oregontann.shp", quiet = TRUE)
st_crs(ortann) <- 4326

# Oregon DEM file
orgrid <- st_read("../datafiles/oregon/oregon/orgrid.shp", quiet = TRUE) 
st_crs(orgrid) <- 4326
orgrid.dem <- st_rasterize(orgrid, dx = 0.1667, dy = 0.1667)

# plot
plot(orgrid.dem, reset = FALSE)

plot(ortann["tann"], 
     add = TRUE, 
     pch = 16, 
     cex = 1.5)

ORtemp_var <- variogram(tann ~ 1, ortann)

plot(ORtemp_var, plot.numbers = TRUE)

It looks like the variogram flattens out around 200.

ORtemp_var2 <- variogram(tann ~ 1, ortann,
                        cutoff = 175,
                        width = 15)

plot(ORtemp_var2, plot.numbers = TRUE)

Create a variogram model for this data using the vgm() function. You will need to choose an appropriate model and initial parameters for the nugget, sill and range. Report the values and model you have used

The nugget looks pretty small (close to zero), the sill is around 4, and the range is around 200.

ORTemp_Nugget <- 0.1
ORTemp_Range <- 175
ORTemp_Sill <- 3.7

ORTemp_vgm1 <- vgm(psill = ORTemp_Sill, 
                  model = "Cir", 
                  range = ORTemp_Range, 
                  nugget = ORTemp_Nugget)

plot(ORtemp_var, ORTemp_vgm1, main = "Oregon Temperatures Variogram")

After adjusting the nugget, sill, and range values until I was happy with the fit, I ended with a nugget value of 0.1, sill of 3.7, and range of 175.

Use the fit.variogram() function to fit this model to the sample variogram from step a. Produce a plot showing the final variogram

ORTemp_vgm2 <- fit.variogram(ORtemp_var, ORTemp_vgm1)

plot(ORtemp_var, ORTemp_vgm2, main = "Oregon Temperatures Variogram")

Fitted values are:

ORTemp_vgm2
##   model     psill    range
## 1   Nug 0.2076095   0.0000
## 2   Cir 3.7165710 208.1351

I was pretty close for the nugget and sill, but underestimated the range.

Now use this model to interpolate the annual temperatures using the grid from the DEM, using the krige() function. Produce a map showing the predicted value on the grid and the prediction error.

ORTemp_krige <- krige(tann ~ 1,
                      locations = ortann,
                      newdata = orgrid.dem,
                      model = ORTemp_vgm2, 
                      nmax = 40)
## [using ordinary kriging]
my.pal <- brewer.pal(9, "Blues")

plot(ORTemp_krige, col = my.pal,
     main = "Interpolated precipitation values (OK)")

Predicted error

plot(ORTemp_krige["var1.var"], 
     main = "Interpolated precipitation prediction error (OK)")

Use the krige.cv() function with 5-fold cross-validation to report the root mean squared error and R\(^2\)

ORTemp_Kcv <- krige.cv(tann ~ 1, 
                      locations = ortann, 
                      model = ORTemp_vgm2, 
                      nmax = 40, 
                      nfold = 5)

## RMSEP
rmsep <- sqrt(mean(ORTemp_Kcv$residual^2))
rmsep
## [1] 1.283871
##R2P
R2P <- cor(ORTemp_Kcv$observed, ORTemp_Kcv$var1.pred)^2
R2P
## [1] 0.529979

The average error for making a prediction (RMSEP) is 1.284. The amount of variance in the test data set predicted by the model (R\(^2_P\)) is 0.53. This value seems lower than what would be desired.

Looking at the residuals:

sp::bubble(as_Spatial(ORTemp_Kcv)[,"residual"], pch = 16)

The residual looks like it is higher in the high-desert portions of the state that are near mountains (such as the southern area near Klamath Falls, and the northeast corner near the Wallowa Mountains). These patterns suggest that there are other underlying factors that are missing from the dataset.

plot(ORTemp_Kcv$var1.pred, ORTemp_Kcv$residual, 
     xlab = 'PPT Predicted Values', 
     ylab = 'PPT Residuals')

abline(h = 0, lty = 2)

Plotting the residuals vs. the predicted values, there doesn’t appear to be any trend. This means that the predicted model is fairly unbiased.


fin