Spatial interpolation of soil organic carbon

1. Introduction
2. Setup
3. Read the input data
4. Sampling the world
5. Interpolation
6. Assessment of results

1. Introduction

This notebook illustrates two spatial interpolation techniques: Inverse Distance Weighted (IDW) and Ordinary Kriging (OK). IDW is a deterministic technique. OK is a probabilistic one. Both techniques are used here to get a continous surface of SOC at 15-30 cm from samples obtained from SoilGrids 250 m.

2. Setup

First, clean the memory:

rm(list=ls())

Then, make sure you have previously installed the required libraries.

Then, load the libraries:

library(sp)
library(terra)
library(sf)
library(stars)
library(gstat)
library(automap)
library(leaflet)
library(leafem)
library(ggplot2)
library(dplyr)

library(curl)

## Using libcurl 7.87.0 with LibreSSL/3.3.6

h <- new_handle()
handle_setopt(h, http_version = 2)

3. Read the input data

We need to read a dataset to mimic real world data. Thus, let’s read the SOC layer we downloaded from ISRIC) using the terra library:

archivo <- ("soc_igh_15_30.tif")
(soc <- rast(archivo))

## class       : SpatRaster 
## dimensions  : 1085, 895, 1  (nrow, ncol, nlyr)
## resolution  : 250, 250  (x, y)
## extent      : -8313500, -8089750, 635500, 906750  (xmin, xmax, ymin, ymax)
## coord. ref. : Interrupted_Goode_Homolosine 
## source      : soc_igh_15_30.tif 
## name        : soc_igh_15_30

Now, convert the SOC data into percentage. Review the scale factor of SOC in the SoilGrids website and write down it here.

soc.perc <-  soc/10

What is the CRS of the real world data?

It seems that we need a transformation from such CRS into the well-known WGS84 CRS:

geog ="+proj=longlat +datum=WGS84"
(geog.soc = project(soc.perc, geog))

## class       : SpatRaster 
## dimensions  : 1112, 987, 1  (nrow, ncol, nlyr)
## resolution  : 0.002191696, 0.002191696  (x, y)
## extent      : -74.55524, -72.39204, 5.708308, 8.145474  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
## source(s)   : memory
## name        : soc_igh_15_30 
## min value   :      8.910829 
## max value   :    128.403015

Let’s convert the SpatRaster layer into a stars object:

stars.soc = st_as_stars(geog.soc)

m <- leaflet() %>%
  addTiles() %>%  
  leafem:::addGeoRaster(
      stars.soc,
      opacity = 0.8,                
      colorOptions = colorOptions(palette = c("orange", "yellow", "cyan", "green"), 
                                  domain = 8:130)
    ) 
#
m  # Print the map

4. Sampling the world

Let’s get a sample of aprox. 500 sites from the real-world data using a randomly located sample:

set.seed(123456)

# Random sampling of 500 points
(samples <- spatSample(geog.soc, 500, "random", as.points=TRUE))

##  class       : SpatVector 
##  geometry    : points 
##  dimensions  : 500, 1  (geometries, attributes)
##  extent      : -74.55415, -72.39313, 5.713787, 8.139995  (xmin, xmax, ymin, ymax)
##  coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
##  names       : soc_igh_15_30
##  type        :         <num>
##  values      :          24.8
##                        45.83
##                        45.85

Please describe main characteristics of the samples object.

Now, we need to convert the spatVector object into a simple feature object:

(muestras <- sf::st_as_sf(samples))

nmuestras <- na.omit(muestras)

Let’s visualize the muestras:

longit <- st_coordinates(muestras)[,1]
latit <- st_coordinates(muestras)[,2]
soc <- muestras$soc_igh_15_30

id <- seq(1,500,1)

(sitios <- data.frame(id, longit, latit, soc))

Let’s remove NA values:

sitios <- na.omit(sitios)

head(sitios)

Let’s visualize the muestras:

m <- leaflet() %>%
  addTiles() %>%  
  leafem:::addGeoRaster(
      stars.soc,
      opacity = 0.7,                
      colorOptions = colorOptions(palette = c("orange", "yellow", "cyan", "green"), 
                                  domain = 8:130)
    ) %>%
  addMarkers(lng=sitios$longit,lat=sitios$latit, popup=sitios$soc, clusterOptions = markerClusterOptions())
m  # Print the map

Now, we are ready to conduct the interpolation tasks.

5. Interpolation

5.1 Creation of the gstat object

To interpolate, we first need to create an object of class gstat, using a function of the same name: gstat.

A gstat object contains all necessary information to conduct spatial interpolation, namely:

The model definition
The calibration data

Based on its arguments, the gstat function “understands” what type of interpolation model we want to use:

No variogram model → IDW
Variogram model, no covariates → Ordinary Kriging

We are going to use three parameters of the gstat function:

formula: the prediction “formula” specifying the dependent and the independent variables (a.k.a covariates)
data: the calibration data (a.k.a the train data)
model: the variogram model

5.2 IDW interpolation

To interpolate SOC using the IDW method we create the following gstat object, specifying just the formula and data:

g1 = gstat(formula = soc_igh_15_30 ~ 1, data = nmuestras)

Now that our interpolation model g1 is defined, we can use the predict function to actually interpolate, i.e., to estimate precipitation values.

The predict function accepts:

A raster—stars object, such as dem
A model—gstat object, such as g1

The raster serves for two purposes:

Specifying the locations where we want to make predictions (in all methods)
Specifying covariate values (in Universal Kriging only)

Let’s create a raster object with cell values equal to 1:

# a simple copy
rrr = aggregate(geog.soc, 4)

What is rrr?

rrr

## class       : SpatRaster 
## dimensions  : 278, 247, 1  (nrow, ncol, nlyr)
## resolution  : 0.008766785, 0.008766785  (x, y)
## extent      : -74.55524, -72.38985, 5.708308, 8.145474  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
## source(s)   : memory
## name        : soc_igh_15_30 
## min value   :      10.24676 
## max value   :     105.02796

Define new values:

values(rrr) <-1

Define new names:

names(rrr) <- "valor"

What is rrr now?

rrr

## class       : SpatRaster 
## dimensions  : 278, 247, 1  (nrow, ncol, nlyr)
## resolution  : 0.008766785, 0.008766785  (x, y)
## extent      : -74.55524, -72.38985, 5.708308, 8.145474  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
## source(s)   : memory
## name        : valor 
## min value   :     1 
## max value   :     1

stars.rrr = st_as_stars(rrr)

For example, the following expression interpolates SOC values according to the model defined in g1 and the raster template defined in stars.rrr:

## [inverse distance weighted interpolation]
z1 = predict(g1, stars.rrr)

## [inverse distance weighted interpolation]

What is z1?

z1

## stars object with 2 dimensions and 2 attributes
## attribute(s):
##                Min.  1st Qu.   Median     Mean  3rd Qu.     Max.  NA's
## var1.pred  11.27134 29.23187 33.37184 34.48736 39.75351 85.46034     0
## var1.var         NA       NA       NA      NaN       NA       NA 68666
## dimension(s):
##   from  to   offset       delta                       refsys x/y
## x    1 247 -74.5552  0.00876679 +proj=longlat +datum=WGS8... [x]
## y    1 278  8.14547 -0.00876679 +proj=longlat +datum=WGS8... [y]

Take note of the names of the two attributes included within the z1 object.

We can subset just the first attribute and rename it to “soc”:

z1 = z1["var1.pred",,]
names(z1) = "soc"

We need a color palette:

paleta <- colorNumeric(palette = c("orange", "yellow", "cyan", "green"), domain = 10:100, na.color = "transparent")

The interpolated SOC raster, using IDW, is shown in next figure:

m <- leaflet() %>%
  addTiles() %>%  
  leafem:::addGeoRaster(
      z1,
      opacity = 0.7,                
      colorOptions = colorOptions(palette = c("orange", "yellow", "cyan", "green"), 
                                  domain = 11:55)
    ) %>%
  addMarkers(lng=sitios$longit,lat=sitios$latit, popup=sitios$soc, clusterOptions = markerClusterOptions()) %>%
    addLegend("bottomright", pal=paleta, values= z1$soc,
    title = "IDW SOC interpolation [%]"
    )
m  # Print the map

5.3 OK interpolation

Kriging methods require a variogram model. The variogram model is an objective way to quantify the autocorrelation pattern in the data, and assign weights accordingly when making predictions.

As a first step, we can calculate and examine the empirical variogram using the variogram function.

The function requires two arguments:

formula: specifies the dependent variable and the covariates, just like in gstat
data: the point layer with the dependent variable and covariates as point attributes

For example, the following expression calculates the empirical variogram of muestras, with no covariates:

v_emp_ok = variogram(soc_igh_15_30 ~ 1, data=nmuestras)

Let’s plot the variogram:

plot(v_emp_ok)

There are several ways to fit a variogram model to an empirical variogram. We will use the simplest one—automatic fitting using function autofitVariogram from package automap:

v_mod_ok = autofitVariogram(soc_igh_15_30 ~ 1, as(nmuestras, "Spatial"))

The function chooses the best fitting type of model, and also fine tunes its parameters. You may use show.vgms() to display variogram model types.

Note that the autofitVariogram function does not work on sf objects, which is why we convert the object to a SpatialPointsDataFrame (package sp).

The fitted model can be plotted with plot:

plot(v_mod_ok)

The resulting object is actually a list with several components, including the empirical variogram and the fitted variogram model. The $var_model component of the resulting object contains the actual model:

v_mod_ok$var_model

In your notebook, explains the meaning of each element in the above model.

Now, the variogram model can be passed to the gstat function, and we can carry on with the Ordinary Kriging interpolation:

## [using ordinary kriging]
g2 = gstat(formula = soc_igh_15_30 ~ 1, model = v_mod_ok$var_model, data = nmuestras)
z2= predict(g2, stars.rrr)

## [using ordinary kriging]

Again, we will subset the predicted values attribute and rename it:

z2 = z2["var1.pred",,]
names(z2) = "soc"

The Ordinary Kriging predictions are shown in next figure:

m <- leaflet() %>%
  addTiles() %>%  
  leafem:::addGeoRaster(
      z2,
      opacity = 0.7,                
      colorOptions = colorOptions(palette = c("orange", "yellow", "cyan", "green"), 
                                  domain = 11:55)
    ) %>%
  addMarkers(lng=sitios$longit,lat=sitios$latit, popup=sitios$soc, clusterOptions = markerClusterOptions()) %>%
    addLegend("bottomright", pal = paleta, values= z2$soc,
    title = "OK SOC interpolation [%]"
    )
m  # Print the map

6. Assessment of results

6.1 Qualitative assessment

Another view of the three interpolation outputs:

colores <- colorOptions(palette = c("orange", "yellow", "cyan", "green"), domain = 10:100, na.color = "transparent")

m <- leaflet() %>%
  addTiles() %>%  
  addGeoRaster(stars.soc, opacity = 0.8, colorOptions = colores, group="RealWorld") %>%
  addGeoRaster(z1, colorOptions = colores, opacity = 0.8, group= "IDW")  %>%
  addGeoRaster(z2, colorOptions = colores, opacity = 0.8, group= "OK")  %>%
  # Add layers controls
  addLayersControl(
    overlayGroups = c("RealWorld", "IDW", "OK"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>% 
    addLegend("bottomright", pal = paleta, values= z1$soc,
    title = "Soil organic carbon [%]"
)
m  # Print the map

6.2 Cross-validation

We have estimated climate surfaces using two different methods: IDW and Ordinary Kriging. Although it is useful to examine and compare the results graphically, we also need an objective way to evaluate interpolation accuracy. That way, we can objectively choose the most accurate method among the available interpolation methods.

In Leave-One-Out Cross Validation we:

Take out one point out of the calibration data
Make a prediction for that point
Repeat for all points In the end, what we get is a table with an observed value and a predicted value for all points.

We can run Leave-One-Out Cross Validation using the gstat.cv function, which accepts a gstat object.

When writing the following chunk, hide message and results.

### run the following code from the console -- line-by-line
cv1 = gstat.cv(g1)
#cv2 = gstat.cv(g2)

The result of gstat.cv has the following attributes:

var1.pred—Predicted value
var1.var—Variance (only for Kriging)
observed—Observed value
residual—Observed-Predicted
zscore—Z-score (only for Kriging)
fold—Cross-validation ID

cv1 = na.omit(cv1)

cv1

## class       : SpatialPointsDataFrame 
## features    : 459 
## extent      : -74.50812, -72.42382, 5.713787, 8.137803  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 6
## names       :        var1.pred, var1.var,         observed,          residual, zscore, fold 
## min values  : 14.0232488207962,       NA, 11.2263841629028, -21.5873360525109,     NA,    1 
## max values  : 76.4997487062349,       NA,  85.560173034668,  42.4954727383233,     NA,  459

Let’s convert the cv1 object:

cv1 = st_as_sf(cv1)

Now, let’s plot the residuals:

sp::bubble(as(cv1[, "residual"], "Spatial"))

Now, we calculate prediction accuracy indices, such as the Root Mean Square Error (RMSE):

# This is the RMSE value for the IDW interpolation
sqrt(sum((cv1$var1.pred - cv1$observed)^2) / nrow(cv1))

## [1] 9.490675

Now, repeat the process with the OK results:

Conversion time:

cv2 = st_as_sf(cv2)

Compute RSME for OK results:

# This is the RMSE value for the OK interpolation
sqrt(sum((cv2$var1.pred - cv2$observed)^2) / nrow(cv2))

## [1] 9.090001

Write a paragraph summarizing your results. According to them, what interpolation technique seems to be more accurate. Explain.

7. Reference

Cite this work as: Lizarazo, I., 2023. Spatial interpolation of soil organic carbon. Available at https://rpubs.com/ials2un/soc_interp.

sessionInfo()

## R version 4.3.0 (2023-04-21)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.3.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Bogota
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] curl_5.0.0    dplyr_1.1.2   ggplot2_3.4.2 leafem_0.2.0  leaflet_2.1.2
##  [6] automap_1.1-9 gstat_2.1-1   stars_0.6-1   abind_1.4-5   sf_1.0-12    
## [11] terra_1.7-29  sp_1.6-0     
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.3       xfun_0.39          bslib_0.4.2        raster_3.6-20     
##  [5] htmlwidgets_1.6.2  lattice_0.21-8     vctrs_0.6.2        tools_4.3.0       
##  [9] crosstalk_1.2.0    generics_0.1.3     parallel_4.3.0     tibble_3.2.1      
## [13] proxy_0.4-27       spacetime_1.3-0    fansi_1.0.4        highr_0.10        
## [17] xts_0.13.1         pkgconfig_2.0.3    KernSmooth_2.23-20 lifecycle_1.0.3   
## [21] farver_2.1.1       compiler_4.3.0     FNN_1.1.3.2        munsell_0.5.0     
## [25] codetools_0.2-19   htmltools_0.5.5    class_7.3-21       sass_0.4.6        
## [29] yaml_2.3.7         pillar_1.9.0       jquerylib_0.1.4    ellipsis_0.3.2    
## [33] classInt_0.4-9     cachem_1.0.8       lwgeom_0.2-11      tidyselect_1.2.0  
## [37] digest_0.6.31      fastmap_1.1.1      grid_4.3.0         colorspace_2.1-0  
## [41] cli_3.6.1          magrittr_2.0.3     base64enc_0.1-3    utf8_1.2.3        
## [45] e1071_1.7-13       withr_2.5.0        scales_1.2.1       rmarkdown_2.21    
## [49] zoo_1.8-12         png_0.1-8          evaluate_0.21      knitr_1.42        
## [53] rgdal_1.6-6        rlang_1.1.1        Rcpp_1.0.10        glue_1.6.2        
## [57] DBI_1.1.3          rstudioapi_0.14    reshape_0.8.9      jsonlite_1.8.4    
## [61] R6_2.5.1           plyr_1.8.8         intervals_0.15.3   units_0.8-2