Spatial Interpolation

1. Introduction
2. Setup
3. Read the input data
4. Explore the input data
5. Interpolation
6. Assessment of results
- 6.1 Qualitative assessment
- 6.2 Cross-validation
7. Saving the interpolation outputs
8. Conclusions
9. Reference

1. Introduction

This notebook illustrates two spatial interpolation techniques: Inverse Distance Weighted (IDW) and Ordinary Kriging (OK). IDW is a deterministic technique. OK is a probabilistic one. Both techniques are used here to get a continuous surface of soil organic carbon (SOC) at 15-30 cm from samples obtained from SoilGrids 250 m.

Note that I provide a basic explanation of the code to help readers to understand and replicate it. However, Geomática Básica students should: (i) expand the current documentation; and (ii) interpret & analyze the code outputs. This will help them to improve their coding skills as well as to provide strong evidence that they are not just copying and pasting.

If you want to learn about writing R notebooks, study this document.

2. Setup

Clean the workspace:

# Uncomment at first-time run
rm(list=ls())

For reading & writing geospatial data, we will use terra (for raster) and sf (for vector). For spatial interpolation, we will use the gstat and automap libraries. For visualization, we will use ggplot, leaflet and tmap.

First, we need to install the required libraries. Do it from the R console, not from this notebook.

# DO NOT INSTALL  PACKAGES THAT YOU ALREADY INSTALLED
#paquetes = c('terra', 'sf', 'sp', 'stars', 'gstat', 'automap', 'leaflet', 'leafem')
#install.packages(paquetes)
#

Load the libraries:

library(terra)
library(sf)
library(sp)
library(stars)
library(gstat)
library(automap)
library(RColorBrewer)
library(leaflet)
library(leafem)

3. Read the input data

As indicated in the introduction, our samples are SOC values at 15-30 cm depth. They were obtained from SoilGrids 250 m using this notebook. We saved such samples in geopackage format, under the data directory, using soc_{depto}.gpkg as filename.

First, let’s check that the data is available for use:

list.files(path="data", pattern = "*.gpkg")

## [1] "COL_cities.gpkg"         "depto_cordoba.gpkg"     
## [3] "mun_cordoba.gpkg"        "mun_stder.gpkg"         
## [5] "new_cordoba_cities.gpkg" "soc_stder.gpkg"         
## [7] "stader_frutales.gpkg"

Then, we read the samples using sf:

samples <- sf::st_read("data/soc_stder.gpkg")

## Reading layer `soc_stder' from data source 
##   `/Users/ials/Documents/unal/GB2024/data/soc_stder.gpkg' using driver `GPKG'
## Simple feature collection with 1850 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -74.55196 ymin: 5.709403 xmax: -72.41505 ymax: 8.144378
## Geodetic CRS:  +proj=longlat +datum=WGS84 +no_defs

As our study area is Santander, it makes sense to read also the corresponding municipalities geopackage:

munic <- sf::st_read("data/mun_stder.gpkg")

## Reading layer `col_adm2' from data source 
##   `/Users/ials/Documents/unal/GB2024/data/mun_stder.gpkg' using driver `GPKG'
## Simple feature collection with 87 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.55179 ymin: 5.705701 xmax: -72.5161 ymax: 8.120555
## Geodetic CRS:  WGS 84

4. Explore the input data

Let’s obtain a summary of the samples:

summary(samples)

##       soc                    geom     
##  Min.   : 12.40   POINT        :1850  
##  1st Qu.: 24.89   epsg:NA      :   0  
##  Median : 32.89   +proj=long...:   0  
##  Mean   : 35.65                       
##  3rd Qu.: 43.70                       
##  Max.   :101.29

hist(samples$soc)

Let’s round the SOC values to two digits:

samples$soc = round(samples$soc,2)

Now, let’s visualize the samples to know their spatial distribution. Please visit this site to understand how leaflet works.

pal <- colorNumeric(
  c("#E1F5C4", "#EDE574", "#F9D423", "#FC913A", "#FF4E50"),
  # colors depend on the count variable
  domain = samples$soc,
  )

# This a simple visualization
# Change the code to suit your preferences
leaflet() %>%
  addPolygons(
    data = munic,
    color = "gray",
    # set the opacity of the outline
    opacity = 1,
    # set the stroke width in pixels
    weight = 1,
    # set the fill opacity
    fillOpacity = 0.2) %>%
 addCircleMarkers(
    data = samples,
    radius= 1.5, 
    label = ~soc,
    color = ~pal(soc),
    fillOpacity = 1,
    stroke = F
  ) %>%
  addLegend(
    data = samples,
    pal = pal,
    values = ~soc,
    position = "bottomleft",
    title = "SOC:",
    opacity = 0.9) %>%
  addProviderTiles("OpenStreetMap")

5. Interpolation

5.1 Creation of the gstat object

To interpolate, we first need to create an object of class gstat, using a function of the same name: gstat.

A gstat object contains all necessary information to conduct spatial interpolation, namely:

The model definition
The calibration data

Based on its arguments, the gstat function “understands” what type of interpolation model we want to use:

No variogram model → Inverse Weighting Distance (IDW)
Variogram model, no covariates → Ordinary Kriging (OK)

We are going to use three parameters of the gstat function:

formula: the prediction “formula” specifying the dependent and the independent variables (aka covariates)
data: the calibration data (aka the train data)
model: the variogram model

5.2 IDW interpolation

To interpolate SOC using the IDW method we create the following gstat object, specifying just the formula and data:

g1 = gstat(formula = soc ~ 1, data = samples)

Now that our interpolation model g1 is defined, we can use the predict function to actually interpolate, i.e., to estimate SOC values.

The predict function accepts:

A raster—stars object, such as dem
A model—gstat object, such as g1

The raster serves for two purposes:

Specifying the locations where we want to make predictions (in all methods)
Specifying covariate values (in Universal Kriging only)

Let’s use the terra library to create a raster object with cell values equal to 1. The extent of this raster should cover our area of interest.

Note: You can use the metadata of any raster covering your department (e.g. a digital elevation model) to define the raster creation’s parameters:

(rrr <- terra::rast(xmin=-74.55524, xmax=-72.38985, ymin=5.708308, ymax=8.145474, nrows=370, ncols = 329, vals = 1, crs = "epsg:4326"))

## class       : SpatRaster 
## dimensions  : 370, 329, 1  (nrow, ncol, nlyr)
## resolution  : 0.006581733, 0.006586935  (x, y)
## extent      : -74.55524, -72.38985, 5.708308, 8.145474  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source(s)   : memory
## name        : lyr.1 
## min value   :     1 
## max value   :     1

Note: If your computer has limited resources (e.g. low memory), use a lower spatial resolution (e.g. 0.008 or 0.01 degrees).

Now, we need to convert this SpatRaster into a stars object:

stars.rrr <- stars::st_as_stars(rrr)

The following expression interpolates SOC values according to the model defined in $g1$ and the raster defined in $stars.rrr$:

## [inverse distance weighted interpolation]
## running this code takes several minutes
z1 = predict(g1, stars.rrr)

## [inverse distance weighted interpolation]

What is z1?

z1

## stars object with 2 dimensions and 2 attributes
## attribute(s):
##                Min.  1st Qu.   Median     Mean  3rd Qu.     Max.   NA's
## var1.pred  12.73927 29.28242 34.54546 35.80878 42.00168 98.42292      0
## var1.var         NA       NA       NA      NaN       NA       NA 121730
## dimension(s):
##   from  to offset     delta refsys x/y
## x    1 329 -74.56  0.006582 WGS 84 [x]
## y    1 370  8.145 -0.006587 WGS 84 [y]

Take note of the names of the two attributes included within the z1 object.

We can subset just the first attribute and rename it:

# dealing with NA values
a <- which(is.na(z1[[1]]))
z1[[1]][a] = 0.0

z1

## stars object with 2 dimensions and 2 attributes
## attribute(s):
##                Min.  1st Qu.   Median     Mean  3rd Qu.     Max.   NA's
## var1.pred  12.73927 29.28242 34.54546 35.80878 42.00168 98.42292      0
## var1.var         NA       NA       NA      NaN       NA       NA 121730
## dimension(s):
##   from  to offset     delta refsys x/y
## x    1 329 -74.56  0.006582 WGS 84 [x]
## y    1 370  8.145 -0.006587 WGS 84 [y]

names(z1) = "soc"

To plot the interpolated surface, we need a color palette:

# change this chunk to meet your preferences
paleta <- colorNumeric(palette = c("orange", "yellow", "cyan", "green"), domain = 10:120, na.color = "transparent")

Now, let’s visualize the interpolation output:

m <- leaflet() %>%
  addTiles() %>%  
  leafem:::addGeoRaster(
      z1,
      opacity = 0.7,                
      colorOptions = colorOptions(palette = c("orange", "yellow", "cyan", "green"), 
                                  domain = 10:100)
    ) %>%
   addCircleMarkers(
    data = samples,
    radius= 1.5, 
    label = ~soc,
    color = ~paleta(soc),
    fillOpacity = 1,
    stroke = F
  ) %>%
    addLegend("bottomright", pal=paleta, values= z1$soc,
    title = "IDW SOC interpolation [%]"
    )
m  # Print the map

5.3 OK interpolation

Kriging methods require a variogram model. The variogram model is an objective way to quantify the autocorrelation pattern in the data, and assign weights accordingly when making predictions.

As a first step, we can calculate and examine the empirical variogram using the variogram function.

The function requires two arguments:

formula: specifies the dependent variable and the covariates, just like in gstat
data: the point layer with the dependent variable and covariates as point attributes

The following expression calculates the empirical variogram of samples, with no covariates:

v_emp_ok = variogram(soc ~ 1, data=samples)

Let’s plot the empirical variogram:

plot(v_emp_ok)

There are several ways to fit a variogram model to an empirical variogram. Here, we will use the simplest one—automatic fitting using the function autofitVariogram from the package automap:

#make sure you understand the parameters needed to run this fitting function
v_mod_ok = automap::autofitVariogram(soc ~ 1, as(samples, "Spatial"))

The function chooses the best fitting type of model, and also fine tunes its parameters. You may use $show.vgms()$ to display several variogram model types.

Note that the autofitVariogram function does not work on sf objects, which is why we convert the samples object into a SpatialPointsDataFrame (package sp).

The fitted model can be plotted with plot:

plot(v_mod_ok)

The resulting object is actually a list with several components, including the empirical variogram and the fitted variogram model. The $var_model component of the resulting object contains the actual model:

v_mod_ok$var_model

Now, the variogram model can be passed to the gstat function, and we can carry on with the Ordinary Kriging interpolation:

## [using ordinary kriging]
## This takes several minutes 
## (or even ages depending on your raster cell size and your computer resources)
g2 = gstat(formula = soc ~ 1, model = v_mod_ok$var_model, data = samples)
z2= predict(g2, stars.rrr)

## [using ordinary kriging]

Again, we will subset the predicted values attribute and rename it:

a <- which(is.na(z2[[1]]))
z2[[1]][a] = 0.0
names(z2) = "soc"

The OK predictions are shown in next plot:

m <- leaflet() %>%
  addTiles() %>%  
  leafem:::addGeoRaster(
      z2,
      opacity = 0.7,                
      colorOptions = colorOptions(palette = c("orange", "yellow", "cyan", "green"), 
                                  domain = 10:100)
    ) %>%
   addCircleMarkers(
    data = samples,
    radius= 1.5, 
    label = ~soc,
    color = ~paleta(soc),
    fillOpacity = 1,
    stroke = F
  ) %>%
    addLegend("bottomright", pal = paleta, values= z2$soc,
    title = "OK SOC interpolation [%]"
    )

## Warning in write_stars.stars(x, dsn = fl): all but first attribute are ignored

m  # Print the map

6. Assessment of results

6.1 Qualitative assessment

colores <- colorOptions(palette = c("orange", "yellow", "cyan", "green"), domain = 10:100, na.color = "transparent")

m <- leaflet() %>%
  addTiles() %>%  
  addGeoRaster(z1, colorOptions = colores, opacity = 0.8, group= "IDW")  %>%
  addGeoRaster(z2, colorOptions = colores, opacity = 0.8, group= "OK")  %>%
  # Add layers controls
  addLayersControl(
    overlayGroups = c("IDW", "OK"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>% 
    addLegend("bottomright", pal = paleta, values= z1$soc,
    title = "Soil organic carbon [%]"
)

## Warning in write_stars.stars(x, dsn = fl): all but first attribute are ignored
## Warning in write_stars.stars(x, dsn = fl): all but first attribute are ignored

m  # Print the map

6.2 Cross-validation

We have estimated SOC surfaces using two different methods: IDW and Ordinary Kriging. Although it is useful to examine and compare the results graphically, we also need an objective way to evaluate interpolation accuracy. That way, we can objectively choose the most accurate method among the available interpolation methods.

In Leave-One-Out Cross Validation we:

Take out one point out of the calibration data
Make a prediction for that point
Repeat for all points In the end, what we get is a table with an observed value and a predicted value for all points.

We can run Leave-One-Out Cross Validation using the gstat.cv function, which accepts a gstat object.

When writing the following chunk, hide message and results.

### 
## this code should be run from the console not from here
#cv2 = gstat.cv(g2)

The result of gstat.cv has the following attributes:

var1.pred—Predicted value
var1.var—Variance (only for Kriging)
observed—Observed value
residual—Observed-Predicted
zscore—Z-score (only for Kriging)
fold—Cross-validation ID

Let’s convert the cv2 object:

# run from the console
#cv2 = st_as_sf(cv2)

Let’s plot the residuals:

# run from the console
#sp::bubble(as(cv2[, "OK residuals"], "Spatial"))

# This is the RMSE value for the OK interpolation
# run from the console
#sqrt(sum((cv2$var1.pred - cv2$observed)^2) / nrow(cv2))

Write down the RSME obtained for OK interpolation results.

Then, repeat the process with the IDW interpolation results.

Once you compute both RMSE values, compare them and analyze their differences. What is the most reliable interpolation methods? Explain.

7. Saving the interpolation outputs

write_stars(
  z1, dsn = "data/IDW_soc_stder.tif", layer = 1
)

write_stars(
  z2, dsn = "data/OK_soc_stder.tif", layer = 1
)

8. Conclusions

Then, write a meaningful paragraph summarizing what you learned writing this notebook.

Do not copy your colleagues’ explanations or conclusions. That is plagiarism!

9. Reference

Cite this work as follows: Lizarazo, I. 2025. Spatial interpolation. Available at: https://rpubs.com/ials2un/Spatial_Interpolation

sessionInfo()

## R version 4.3.2 (2023-10-31)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.7.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Bogota
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] leafem_0.2.3       leaflet_2.2.2      RColorBrewer_1.1-3 automap_1.1-9     
##  [5] gstat_2.1-1        stars_0.6-7        abind_1.4-8        sp_2.2-0          
##  [9] sf_1.0-19          terra_1.8-10      
## 
## loaded via a namespace (and not attached):
##  [1] generics_0.1.3          sass_0.4.9              class_7.3-22           
##  [4] KernSmooth_2.23-22      lattice_0.22-6          digest_0.6.37          
##  [7] magrittr_2.0.3          evaluate_1.0.3          grid_4.3.2             
## [10] fastmap_1.2.0           plyr_1.8.9              jsonlite_1.8.9         
## [13] e1071_1.7-16            reshape_0.8.9           DBI_1.2.3              
## [16] crosstalk_1.2.1         scales_1.3.0            codetools_0.2-19       
## [19] jquerylib_0.1.4         cli_3.6.3               rlang_1.1.4            
## [22] units_0.8-5             munsell_0.5.1           intervals_0.15.3       
## [25] base64enc_0.1-3         cachem_1.1.0            yaml_2.3.10            
## [28] FNN_1.1.3.2             raster_3.6-30           tools_4.3.2            
## [31] parallel_4.3.2          dplyr_1.1.4             colorspace_2.1-1       
## [34] ggplot2_3.5.1           spacetime_1.3-0         png_0.1-8              
## [37] vctrs_0.6.5             R6_2.5.1                zoo_1.8-12             
## [40] proxy_0.4-27            lifecycle_1.0.4         classInt_0.4-11        
## [43] leaflet.providers_2.0.0 htmlwidgets_1.6.4       pkgconfig_2.0.3        
## [46] pillar_1.10.1           bslib_0.8.0             gtable_0.3.6           
## [49] glue_1.8.0              Rcpp_1.0.14             tidyselect_1.2.1       
## [52] tibble_3.2.1            xfun_0.50               rstudioapi_0.17.1      
## [55] knitr_1.49              farver_2.1.2            htmltools_0.5.8.1      
## [58] rmarkdown_2.29          xts_0.14.1              compiler_4.3.2