Spatial interpolation of climate data

1. Introduction

As the geoR library is not currently available at CRAN, this notebook illustrates how to conduct spatial interpolation of point data using the stars, geostat and automap libraries.

The code written here was adapted from this book.

The processing steps include:

Interpolate using three methods:
- Inverse Distance Weighted (IDW) interpolation
- Ordinary Kriging (OK)
- Universal Kriging (UK)
Calculate an empirical variogram (OK and UK)
Fit a variogram model (OK and UK)
Evaluate interpolation accuracy using Leave-One-Out Cross Validation

2. Preliminaries

Setup

Clean the R memory:

#uncomment the following line
#rm(list=ls())

Make sure to install the following libraries from the console: sf, stars, gstat, automap, leaflet, and leafem.

Then, load the libraries:

library(sf)
library(stars)
library(gstat)
library(automap)
library(leaflet)
library(leafem)

Read study area shapefile

(boyaca <- st_read("./datos/15_BOYACA/ADMINISTRATIVO/MGN_MPIO_POLITICO.shp"))

## Reading layer `MGN_MPIO_POLITICO' from data source 
##   `/Users/ivanlizarazo/Documents/GB/datos/15_BOYACA/ADMINISTRATIVO/MGN_MPIO_POLITICO.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 123 features and 2 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.66496 ymin: 4.655196 xmax: -71.94885 ymax: 7.055557
## Geodetic CRS:  WGS 84

Read a raster of precipitation

Previously, we downloaded climate data from TerraClimate and saved a geotif file with the target variable. Now, we read it:

archivo <- ("./datos/precip_boyaca_X2019.07.tif")
(precip <- read_stars(archivo))

## stars object with 2 dimensions and 1 attribute
## attribute(s):
##                             Min. 1st Qu. Median     Mean 3rd Qu. Max.
## precip_boyaca_X2019.07.tif    18      66    102 125.0439     181  399
## dimension(s):
##   from to   offset      delta refsys point values x/y
## x    1 66 -74.6667  0.0416667 WGS 84 FALSE   NULL [x]
## y    1 59  7.08333 -0.0416667 WGS 84 FALSE   NULL [y]

Note the CRS for this raster as well as the cell size.

Let’s crop the precipitation layer to match the boundaries of our study area:

precip.mask <- st_crop(precip, boyaca)

precip

## stars object with 2 dimensions and 1 attribute
## attribute(s):
##                             Min. 1st Qu. Median     Mean 3rd Qu. Max.
## precip_boyaca_X2019.07.tif    18      66    102 125.0439     181  399
## dimension(s):
##   from to   offset      delta refsys point values x/y
## x    1 66 -74.6667  0.0416667 WGS 84 FALSE   NULL [x]
## y    1 59  7.08333 -0.0416667 WGS 84 FALSE   NULL [y]

Visualize the raster:

m <- leaflet() %>%
  addTiles() %>%  
  leafem:::addGeoRaster(
      precip.mask,
      opacity = 0.7,                
      colorOptions = colorOptions(palette = c("orange", "yellow", "cyan", "blue"), 
                                  domain = 18:400)
    ) 
#
m  # Print the map

Vectorize the precipitation layer

As explained in class, we need a sample dataset as input to the interpolation process. Here, we simulate such a dataset by using the available climate variable:

# raster to vector conversion
precip.pts <- st_as_sf(precip.mask, as_points = TRUE, merge = FALSE, long = TRUE)

What we got?

precip.pts

Split the points into train and test data sets

dt = sort(sample(1083, 1083*.7))
train<-precip.pts[dt,]
test<-precip.pts[-dt,]

train

Visualize the sampling sites

longit <- st_coordinates(train)[,1]
latit <- st_coordinates(train)[,2]
rain <- train$precip_boyaca_X2019.07.tif

id <- seq(1,758,1)

(sitios <- data.frame(id, longit, latit, rain))

Let’s visualize the train sites:

m <- leaflet() %>%
  addTiles() %>%  
  leafem:::addGeoRaster(
      precip.mask,
      opacity = 0.7,                
      colorOptions = colorOptions(palette = c("orange", "yellow", "cyan", "blue"), 
                                  domain = 15:400)
    ) %>%
  addMarkers(lng=sitios$longit,lat=sitios$latit, popup=sitios$rain, clusterOptions = markerClusterOptions())
m  # Print the map

Read a digital elevation model

In a previous practical, we downloaded a digital elevation model and saved it to disk. Now, we need it:

(dem = read_stars("./datos/dem_boyaca_1km.tif"))

## stars object with 2 dimensions and 1 attribute
## attribute(s):
##                     Min. 1st Qu. Median     Mean 3rd Qu. Max.  NA's
## dem_boyaca_1km.tif   121    1512   2484 2253.167    3002 5140 66118
## dimension(s):
##   from  to   offset       delta refsys point values x/y
## x    1 325 -74.6583  0.00833333 WGS 84 FALSE   NULL [x]
## y    1 287     7.05 -0.00833333 WGS 84 FALSE   NULL [y]

Note the CRS for this DEM as well as the cell size.

How does the elevation look?

pal.dem <- colorNumeric(palette = c("forestgreen", "green", "yellow", "brown", "lightcyan"), domain = 500:4000, na.color = "transparent")

m <- leaflet() %>%
  addTiles() %>%  
  leafem:::addGeoRaster(
      dem,
      opacity = 0.7,                
      colorOptions = colorOptions(palette = c("forestgreen", "green", "yellow",  "brown",         "lightcyan"), 
                                  domain = 500:4000)
    ) %>%
  addMarkers(lng=sitios$longit,lat=sitios$latit, popup=sitios$rain, clusterOptions = markerClusterOptions()) %>%
  addLegend("bottomright", pal = pal.dem, values= dem$dem_boyaca_1km.tif,
    title = "Elevation")

## Warning in pal(c(r[1], cuts, r[2])): Some values were outside the color scale
## and will be treated as NA

m  # Print the map

Extract elevation values into train sites

We need to “join” elevation value corresponding to each train site:

# make sure the crs are identical
st_crs(dem) <- st_crs(train)

Now, extract the elevation values:

(train = st_join(train, st_as_sf(dem)))

and subset those sites that coincide with the raster:

train = train[!is.na(train$dem_boyaca_1km), ]

names(train) <- c("precip", "elev", "geometry")

train

3. Spatial interpolation

Creating the gstat object

To interpolate, we first need to create an object of class gstat, using a function of the same name: gstat.

A gstat object contains all necessary information to conduct spatial interpolation, namely:

The model definition
The calibration data

Based on its arguments, the gstat function “understands” what type of interpolation model we want to use:

No variogram model → IDW
Variogram model, no covariates → Ordinary Kriging
Variogram model, with covariates → Universal Kriging

We are going to use three parameters of the gstat function:

formula: the prediction “formula” specifying the dependent and the independent variables (a.k.a covariates)
data: the calibration data (a.k.a the train data)
model: the variogram model

IDW

To interpolate the precipitation using the IDW method we create the following gstat object, specifying just the formula and data:

g1 = gstat(formula = precip ~ 1, data = train)

Now that our interpolation model g1 is defined, we can use the predict function to actually interpolate, i.e., to estimate precipitation values.

The predict function accepts:

A raster—stars object, such as dem
A model—gstat object, such as g1

The raster serves for two purposes:

Specifying the locations where we want to make predictions (in all methods)
Specifying covariate values (in Universal Kriging only)

For example, the following expression interpolates precipitation values according to the model defined in g1 and the raster template defined in dem:

## [inverse distance weighted interpolation]
z1 = predict(g1, dem)

## [inverse distance weighted interpolation]

What is z1?

z1

## stars object with 2 dimensions and 2 attributes
## attribute(s):
##            Min.  1st Qu.   Median     Mean  3rd Qu. Max.  NA's
## var1.pred    23 74.90023 100.8848 121.9123 175.9314  335 66118
## var1.var     NA       NA       NA      NaN       NA   NA 93275
## dimension(s):
##   from  to   offset       delta refsys point values x/y
## x    1 325 -74.6583  0.00833333 WGS 84    NA   NULL [x]
## y    1 287     7.05 -0.00833333 WGS 84    NA   NULL [y]

We can subset just the first attribute and rename it to “precipitation”:

z1 = z1["var1.pred",,]
names(z1) = "precipitation"

The interpolated rainfall raster, using IDW, is shown in next figure:

b = seq(15, 400, 15)
plot(z1, breaks = b, col = hcl.colors(length(b)-1, "Spectral"), reset = FALSE)
plot(st_geometry(train), pch = 3, add = TRUE)
contour(z1, breaks = b, add = TRUE)

Ordinary Kriging

Kriging methods require a variogram model. The variogram model is an objective way to quantify the autocorrelation pattern in the data, and assign weights accordingly when making predictions.

As a first step, we can calculate and examine the empirical variogram using the variogram function.

The function requires two arguments:

formula: specifies the dependent variable and the covariates, just like in gstat
data: the point layer with the dependent variable and covariates as point attributes

For example, the following expression calculates the empirical variogram of train, with no covariates:

v_emp_ok = variogram(precip ~ 1, data=train)

Using plot to examine the variogram:

plot(v_emp_ok)

There are several ways to fit a variogram model to an empirical variogram. We will use the simplest one—automatic fitting using function autofitVariogram from package automap:

v_mod_ok = autofitVariogram(precip ~ 1, as(train, "Spatial"))

The function chooses the best fitting type of model, and also fine tunes its parameters. You may use show.vgms() to display variogram model types.

Note that the autofitVariogram function does not work on sf objects, which is why we convert the object to a SpatialPointsDataFrame (package sp).

The fitted model can be plotted with plot:

plot(v_mod_ok)

The resulting object is actually a list with several components, including the empirical variogram and the fitted variogram model. The $var_model component of the resulting object contains the actual model:

v_mod_ok$var_model

The variogram model can then be passed to the gstat function, and we can carry on with the Ordinary Kriging interpolation:

## [using ordinary kriging]
g2 = gstat(formula = precip ~ 1, model = v_mod_ok$var_model, data = train)
z2= predict(g2, dem)

## [using ordinary kriging]

Again, we will subset the predicted values attribute and rename it:

z2 = z2["var1.pred",,]
names(z2) = "precip"

The Ordinary Kriging predictions are shown in next figure:

b = seq(15, 400, 15)
plot(z2, breaks = b, col = hcl.colors(length(b)-1, "Spectral"), reset = FALSE)
plot(st_geometry(train), pch = 3, add = TRUE)
contour(z2, breaks = b, add = TRUE)

Universal Kriging

Universal Kriging interpolation uses a model with one or more independent variables, i.e., covariates. The covariates need to be known for both:

The point layer, as an attribute such as elev in train
The predicted locations, as raster values such as dem values

The formula now specifies the name(s) of the covariate(s) to the right of the ~ symbol, separated by + if there are more than one. Also, we are using a subset of rainfall where elev_1km values were present:

v_emp_uk = variogram(precip ~ elev, train)
train.sp = as(train, "Spatial")
v_mod_uk = autofitVariogram(precip ~ elev, train.sp)

Comparing the Ordinary Kriging and Universal Kriging variogram models:

plot(v_emp_ok, model = v_mod_ok$var_model, ylim = c(0, 7000), main = "OK")

plot(v_emp_uk, model = v_mod_uk$var_model, ylim = c(0, 7000), main = "UK")

Next we create a gstat object, where the formula contains the covariate and the corresponding variogram model:

g3 = gstat(formula = precip ~ elev, model = v_mod_uk$var_model, data = train.sp)

Remember that all of the variables that appear in the formula need to be present in the data. In this case we have two variables: a dependent variable (precip) and an independent variable (elev).

Make sure that the dem object provides the independent variable (i.e. same name as in the fitted model):

names(dem)

## [1] "dem_boyaca_1km.tif"

names(dem) <- "elev"

Now we can make predictions:

## [using universal kriging]
z3 = predict(g3, dem)

## [using universal kriging]

and then subset and rename:

z3 = z3["var1.pred",,]
names(z3) = "precipitation"

Universal Kriging predictions are shown in the following figure:

b = seq(15, 400, 15)
plot(z3, breaks = b, col = hcl.colors(length(b)-1, "Spectral"), reset = FALSE)
plot(st_geometry(train), pch = 3, add = TRUE)
contour(z3, breaks = b, add = TRUE)

4. Qualitative assessment of results

Another view of the three interpolation outputs:

paleta <- colorNumeric(palette = c("orange", "yellow", "lightcyan", "blue", "darkblue"), domain = 10:400, na.color = "transparent")

colores <- colorOptions(palette = c("orange", "yellow", "lightcyan", "blue", "darkblue"), domain = 10:400, na.color = "transparent")

m <- leaflet() %>%
  addTiles() %>%  
  addGeoRaster(z1, opacity = 0.7, colorOptions = colores, group="IDW") %>%
  addGeoRaster(z2, colorOptions = colores, opacity = 0.7, group= "OK")  %>%
  addGeoRaster(z3, colorOptions = colores, opacity = 0.7, group= "UK")  %>%
  # Add layers controls
  addLayersControl(
    overlayGroups = c("UK", "OK", "IDW"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>% 
  
    addLegend("bottomright", pal = paleta, values= z1$precipitation,
    title = "Precipitation (2019)"
  )
m  # Print the map

In your notebook, examine and compare the three results.

5. Cross validation

We have estimated climate surfaces using three different methods: IDW, Ordinary Kriging and Universal Kriging. Although it is useful to examine and compare the results graphically, we also need an objective way to evaluate interpolation accuracy. That way, we can objectively choose the most accurate method among the available interpolation methods.

In Leave-One-Out Cross Validation we:

Take out one point out of the calibration data
Make a prediction for that point
Repeat for all points

In the end, what we get is a table with an observed value and a predicted value for all points.

We can run Leave-One-Out Cross Validation using the gstat.cv function, which accepts a gstat object.

When writing the following chunk, hide the results as explained in this link:

### run the following code from the console -- line-by-line
cv1 = gstat.cv(g1)
#cv2 = gstat.cv(g2)
#cv3 = gstat.cv(g3)

The gstat.cv function returns an object of class SpatialPointsDataFrame (package sp), which we can convert to an sf object with st_as_sf:

### run the following code from the console -- line-by-line
cv1 = st_as_sf(cv1)
#cv2 = st_as_sf(cv2)
#cv3 = st_as_sf(cv3)

The result of gstat.cv has the following attributes:

var1.pred—Predicted value
var1.var—Variance (only for Kriging)
observed—Observed value
residual—Observed-Predicted
zscore—Z-score (only for Kriging)
fold—Cross-validation ID

A bubble plot is convenient to examine the residuals, since it shows positive and negative values in different color. Here, I plot just a residual for one result. You shoud include three plots in your notebook.

bubble(as(cv1[, "residual"], "Spatial"))

Using the “predicted” and “observed” columns we can calculate prediction accuracy indices, such as the Root Mean Square Error (RMSE):

\[RMSE = \sqrt{\frac{\sum_{i=1}^{n}(pred_i-obs_i)^2}{n}}\]

where $pred_i$ and $obs_i$ are predicted and observed values for point $i$, respectively.

For example:

# This is the RMSE value for the IDW interpolation
sqrt(sum((cv1$var1.pred - cv1$observed)^2) / nrow(cv1))

## [1] 36.21553

In your notebook, examine and compare the three RMSE values (i.e. IDW, OK, and UK). In addition, make a clear statement on the quality of the interpolation you conducted. All this need to be included in your Informe No. 3.

6. Reproducibility

Cite as: Lizarazo, I., 2022. Spatial interpolation of climate data. Available at https://rpubs.com/ials2un/clim_interp.

sessionInfo()

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] leafem_0.2.0   leaflet_2.1.1  automap_1.0-16 sp_1.5-0       gstat_2.0-9   
## [6] stars_0.5-5    abind_1.4-5    sf_1.0-7      
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.8.3       lattice_0.20-45    FNN_1.1.3.1        png_0.1-7         
##  [5] class_7.3-20       zoo_1.8-10         digest_0.6.29      utf8_1.2.2        
##  [9] R6_2.5.1           plyr_1.8.7         evaluate_0.15      e1071_1.7-11      
## [13] highr_0.9          ggplot2_3.3.6      pillar_1.7.0       rlang_1.0.2       
## [17] rstudioapi_0.13    raster_3.5-15      jquerylib_0.1.4    rmarkdown_2.14    
## [21] rgdal_1.5-30       stringr_1.4.0      foreign_0.8-82     htmlwidgets_1.5.4 
## [25] munsell_0.5.0      proxy_0.4-27       compiler_4.2.0     xfun_0.31         
## [29] pkgconfig_2.0.3    base64enc_0.1-3    htmltools_0.5.2    tibble_3.1.7      
## [33] codetools_0.2-18   intervals_0.15.2   reshape_0.8.9      fansi_1.0.3       
## [37] spacetime_1.2-7    crayon_1.5.1       wk_0.6.0           grid_4.2.0        
## [41] jsonlite_1.8.0     lwgeom_0.2-8       gtable_0.3.0       lifecycle_1.0.1   
## [45] DBI_1.1.2          magrittr_2.0.3     units_0.8-0        scales_1.2.0      
## [49] KernSmooth_2.23-20 cli_3.3.0          stringi_1.7.6      farver_2.1.0      
## [53] bslib_0.3.1        ellipsis_0.3.2     xts_0.12.1         vctrs_0.4.1       
## [57] s2_1.0.7           tools_4.2.0        glue_1.6.2         crosstalk_1.2.0   
## [61] parallel_4.2.0     fastmap_1.1.0      yaml_2.3.5         terra_1.5-21      
## [65] colorspace_2.0-3   maptools_1.1-4     classInt_0.4-7     knitr_1.39        
## [69] sass_0.4.1