This notebook illustrates two spatial interpolation techniques: Inverse Distance Weighted (IDW) and Ordinary Kriging (OK). IDW is a deterministic technique. OK is a probabilistic one. Both techniques are used here to get a continuous surface of soil organic carbon (SOC) at 15-30 cm from samples obtained from SoilGrids 250 m.
Note that I provide a basic explanation of the code to help readers to understand and replicate it. However, Geomática Básica students should: (i) expand the current documentation; and (ii) interpret & analyze the code outputs. This will help them to improve their coding skills as well as to provide strong evidence that they are not just copying and pasting.
If you want to learn about writing R notebooks, study this document.
Clean the workspace:
# Uncomment at first-time run
rm(list=ls())
For reading & writing geospatial data, we will use terra (for raster) and sf (for vector). For spatial interpolation, we will use the gstat and automap libraries. For visualization, we will use ggplot, leaflet and tmap.
First, we need to install the required libraries. Do it from the R console, not from this notebook.
# DO NOT INSTALL PACKAGES THAT YOU ALREADY INSTALLED
#paquetes = c('terra', 'sf', 'sp', 'stars', 'gstat', 'automap', 'leaflet', 'leafem')
#install.packages(paquetes)
#
Load the libraries:
library(terra)
library(sf)
library(sp)
library(stars)
library(gstat)
library(automap)
library(RColorBrewer)
library(leaflet)
library(leafem)
As indicated in the introduction, our samples are SOC values at 15-30 cm depth. They were obtained from SoilGrids 250 m using this notebook. We saved such samples in geopackage format, under the data directory, using soc_{depto}.gpkg as filename.
First, let’s check that the data is available for use:
list.files(path="data", pattern = "*.gpkg")
## [1] "COL_cities.gpkg" "depto_cordoba.gpkg"
## [3] "mun_cordoba.gpkg" "mun_stder.gpkg"
## [5] "new_cordoba_cities.gpkg" "soc_stder.gpkg"
## [7] "stader_frutales.gpkg"
Then, we read the samples using sf:
samples <- sf::st_read("data/soc_stder.gpkg")
## Reading layer `soc_stder' from data source
## `/Users/ials/Documents/unal/GB2024/data/soc_stder.gpkg' using driver `GPKG'
## Simple feature collection with 1850 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -74.55196 ymin: 5.709403 xmax: -72.41505 ymax: 8.144378
## Geodetic CRS: +proj=longlat +datum=WGS84 +no_defs
As our study area is Santander, it makes sense to read also the corresponding municipalities geopackage:
munic <- sf::st_read("data/mun_stder.gpkg")
## Reading layer `col_adm2' from data source
## `/Users/ials/Documents/unal/GB2024/data/mun_stder.gpkg' using driver `GPKG'
## Simple feature collection with 87 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -74.55179 ymin: 5.705701 xmax: -72.5161 ymax: 8.120555
## Geodetic CRS: WGS 84
Let’s obtain a summary of the samples:
summary(samples)
## soc geom
## Min. : 12.40 POINT :1850
## 1st Qu.: 24.89 epsg:NA : 0
## Median : 32.89 +proj=long...: 0
## Mean : 35.65
## 3rd Qu.: 43.70
## Max. :101.29
hist(samples$soc)
Let’s round the SOC values to two digits:
samples$soc = round(samples$soc,2)
Now, let’s visualize the samples to know their spatial distribution. Please visit this site to understand how leaflet works.
pal <- colorNumeric(
c("#E1F5C4", "#EDE574", "#F9D423", "#FC913A", "#FF4E50"),
# colors depend on the count variable
domain = samples$soc,
)
# This a simple visualization
# Change the code to suit your preferences
leaflet() %>%
addPolygons(
data = munic,
color = "gray",
# set the opacity of the outline
opacity = 1,
# set the stroke width in pixels
weight = 1,
# set the fill opacity
fillOpacity = 0.2) %>%
addCircleMarkers(
data = samples,
radius= 1.5,
label = ~soc,
color = ~pal(soc),
fillOpacity = 1,
stroke = F
) %>%
addLegend(
data = samples,
pal = pal,
values = ~soc,
position = "bottomleft",
title = "SOC:",
opacity = 0.9) %>%
addProviderTiles("OpenStreetMap")
To interpolate, we first need to create an object of class gstat, using a function of the same name: gstat.
A gstat object contains all necessary information to conduct spatial interpolation, namely:
The model definition
The calibration data
Based on its arguments, the gstat function “understands” what type of interpolation model we want to use:
No variogram model → Inverse Weighting Distance (IDW)
Variogram model, no covariates → Ordinary Kriging (OK)
We are going to use three parameters of the gstat function:
formula: the prediction “formula” specifying the dependent and the independent variables (aka covariates)
data: the calibration data (aka the train data)
model: the variogram model
To interpolate SOC using the IDW method we create the following gstat object, specifying just the formula and data:
g1 = gstat(formula = soc ~ 1, data = samples)
Now that our interpolation model g1 is defined, we can use the predict function to actually interpolate, i.e., to estimate SOC values.
The predict function accepts:
The raster serves for two purposes:
Let’s use the terra library to create a raster object with cell values equal to 1. The extent of this raster should cover our area of interest.
Note: You can use the metadata of any raster covering your department (e.g. a digital elevation model) to define the raster creation’s parameters:
(rrr <- terra::rast(xmin=-74.55524, xmax=-72.38985, ymin=5.708308, ymax=8.145474, nrows=370, ncols = 329, vals = 1, crs = "epsg:4326"))
## class : SpatRaster
## dimensions : 370, 329, 1 (nrow, ncol, nlyr)
## resolution : 0.006581733, 0.006586935 (x, y)
## extent : -74.55524, -72.38985, 5.708308, 8.145474 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source(s) : memory
## name : lyr.1
## min value : 1
## max value : 1
Note: If your computer has limited resources (e.g. low memory), use a lower spatial resolution (e.g. 0.008 or 0.01 degrees).
Now, we need to convert this SpatRaster into a stars object:
stars.rrr <- stars::st_as_stars(rrr)
The following expression interpolates SOC values according to the model defined in \(g1\) and the raster defined in \(stars.rrr\):
## [inverse distance weighted interpolation]
## running this code takes several minutes
z1 = predict(g1, stars.rrr)
## [inverse distance weighted interpolation]
What is z1?
z1
## stars object with 2 dimensions and 2 attributes
## attribute(s):
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## var1.pred 12.73927 29.28242 34.54546 35.80878 42.00168 98.42292 0
## var1.var NA NA NA NaN NA NA 121730
## dimension(s):
## from to offset delta refsys x/y
## x 1 329 -74.56 0.006582 WGS 84 [x]
## y 1 370 8.145 -0.006587 WGS 84 [y]
Take note of the names of the two attributes included within the z1 object.
We can subset just the first attribute and rename it:
# dealing with NA values
a <- which(is.na(z1[[1]]))
z1[[1]][a] = 0.0
z1
## stars object with 2 dimensions and 2 attributes
## attribute(s):
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## var1.pred 12.73927 29.28242 34.54546 35.80878 42.00168 98.42292 0
## var1.var NA NA NA NaN NA NA 121730
## dimension(s):
## from to offset delta refsys x/y
## x 1 329 -74.56 0.006582 WGS 84 [x]
## y 1 370 8.145 -0.006587 WGS 84 [y]
names(z1) = "soc"
To plot the interpolated surface, we need a color palette:
# change this chunk to meet your preferences
paleta <- colorNumeric(palette = c("orange", "yellow", "cyan", "green"), domain = 10:120, na.color = "transparent")
Now, let’s visualize the interpolation output:
m <- leaflet() %>%
addTiles() %>%
leafem:::addGeoRaster(
z1,
opacity = 0.7,
colorOptions = colorOptions(palette = c("orange", "yellow", "cyan", "green"),
domain = 10:100)
) %>%
addCircleMarkers(
data = samples,
radius= 1.5,
label = ~soc,
color = ~paleta(soc),
fillOpacity = 1,
stroke = F
) %>%
addLegend("bottomright", pal=paleta, values= z1$soc,
title = "IDW SOC interpolation [%]"
)
m # Print the map
Kriging methods require a variogram model. The variogram model is an objective way to quantify the autocorrelation pattern in the data, and assign weights accordingly when making predictions.
As a first step, we can calculate and examine the empirical variogram using the variogram function.
The function requires two arguments:
formula: specifies the dependent variable and the covariates, just like in gstat
data: the point layer with the dependent variable and covariates as point attributes
The following expression calculates the empirical variogram of samples, with no covariates:
v_emp_ok = variogram(soc ~ 1, data=samples)
Let’s plot the empirical variogram:
plot(v_emp_ok)
There are several ways to fit a variogram model to an empirical variogram. Here, we will use the simplest one—automatic fitting using the function autofitVariogram from the package automap:
#make sure you understand the parameters needed to run this fitting function
v_mod_ok = automap::autofitVariogram(soc ~ 1, as(samples, "Spatial"))
The function chooses the best fitting type of model, and also fine tunes its parameters. You may use \(show.vgms()\) to display several variogram model types.
Note that the autofitVariogram function does not work on sf objects, which is why we convert the samples object into a SpatialPointsDataFrame (package sp).
The fitted model can be plotted with plot:
plot(v_mod_ok)
The resulting object is actually a list with several components, including the empirical variogram and the fitted variogram model. The $var_model component of the resulting object contains the actual model:
v_mod_ok$var_model
Now, the variogram model can be passed to the gstat function, and we can carry on with the Ordinary Kriging interpolation:
## [using ordinary kriging]
## This takes several minutes
## (or even ages depending on your raster cell size and your computer resources)
g2 = gstat(formula = soc ~ 1, model = v_mod_ok$var_model, data = samples)
z2= predict(g2, stars.rrr)
## [using ordinary kriging]
Again, we will subset the predicted values attribute and rename it:
a <- which(is.na(z2[[1]]))
z2[[1]][a] = 0.0
names(z2) = "soc"
The OK predictions are shown in next plot:
m <- leaflet() %>%
addTiles() %>%
leafem:::addGeoRaster(
z2,
opacity = 0.7,
colorOptions = colorOptions(palette = c("orange", "yellow", "cyan", "green"),
domain = 10:100)
) %>%
addCircleMarkers(
data = samples,
radius= 1.5,
label = ~soc,
color = ~paleta(soc),
fillOpacity = 1,
stroke = F
) %>%
addLegend("bottomright", pal = paleta, values= z2$soc,
title = "OK SOC interpolation [%]"
)
## Warning in write_stars.stars(x, dsn = fl): all but first attribute are ignored
m # Print the map
colores <- colorOptions(palette = c("orange", "yellow", "cyan", "green"), domain = 10:100, na.color = "transparent")
m <- leaflet() %>%
addTiles() %>%
addGeoRaster(z1, colorOptions = colores, opacity = 0.8, group= "IDW") %>%
addGeoRaster(z2, colorOptions = colores, opacity = 0.8, group= "OK") %>%
# Add layers controls
addLayersControl(
overlayGroups = c("IDW", "OK"),
options = layersControlOptions(collapsed = FALSE)
) %>%
addLegend("bottomright", pal = paleta, values= z1$soc,
title = "Soil organic carbon [%]"
)
## Warning in write_stars.stars(x, dsn = fl): all but first attribute are ignored
## Warning in write_stars.stars(x, dsn = fl): all but first attribute are ignored
m # Print the map
We have estimated SOC surfaces using two different methods: IDW and Ordinary Kriging. Although it is useful to examine and compare the results graphically, we also need an objective way to evaluate interpolation accuracy. That way, we can objectively choose the most accurate method among the available interpolation methods.
In Leave-One-Out Cross Validation we:
We can run Leave-One-Out Cross Validation using the gstat.cv function, which accepts a gstat object.
When writing the following chunk, hide message and results.
###
## this code should be run from the console not from here
#cv2 = gstat.cv(g2)
The result of gstat.cv has the following attributes:
var1.pred—Predicted value
var1.var—Variance (only for Kriging)
observed—Observed value
residual—Observed-Predicted
zscore—Z-score (only for Kriging)
fold—Cross-validation ID
Let’s convert the cv2 object:
# run from the console
#cv2 = st_as_sf(cv2)
Let’s plot the residuals:
# run from the console
#sp::bubble(as(cv2[, "OK residuals"], "Spatial"))
# This is the RMSE value for the OK interpolation
# run from the console
#sqrt(sum((cv2$var1.pred - cv2$observed)^2) / nrow(cv2))
Write down the RSME obtained for OK interpolation results.
Then, repeat the process with the IDW interpolation results.
Once you compute both RMSE values, compare them and analyze their differences. What is the most reliable interpolation methods? Explain.
write_stars(
z1, dsn = "data/IDW_soc_stder.tif", layer = 1
)
write_stars(
z2, dsn = "data/OK_soc_stder.tif", layer = 1
)
Then, write a meaningful paragraph summarizing what you learned writing this notebook.
Do not copy your colleagues’ explanations or conclusions. That is plagiarism!
Cite this work as follows: Lizarazo, I. 2025. Spatial interpolation. Available at: https://rpubs.com/ials2un/Spatial_Interpolation
sessionInfo()
## R version 4.3.2 (2023-10-31)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.7.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Bogota
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] leafem_0.2.3 leaflet_2.2.2 RColorBrewer_1.1-3 automap_1.1-9
## [5] gstat_2.1-1 stars_0.6-7 abind_1.4-8 sp_2.2-0
## [9] sf_1.0-19 terra_1.8-10
##
## loaded via a namespace (and not attached):
## [1] generics_0.1.3 sass_0.4.9 class_7.3-22
## [4] KernSmooth_2.23-22 lattice_0.22-6 digest_0.6.37
## [7] magrittr_2.0.3 evaluate_1.0.3 grid_4.3.2
## [10] fastmap_1.2.0 plyr_1.8.9 jsonlite_1.8.9
## [13] e1071_1.7-16 reshape_0.8.9 DBI_1.2.3
## [16] crosstalk_1.2.1 scales_1.3.0 codetools_0.2-19
## [19] jquerylib_0.1.4 cli_3.6.3 rlang_1.1.4
## [22] units_0.8-5 munsell_0.5.1 intervals_0.15.3
## [25] base64enc_0.1-3 cachem_1.1.0 yaml_2.3.10
## [28] FNN_1.1.3.2 raster_3.6-30 tools_4.3.2
## [31] parallel_4.3.2 dplyr_1.1.4 colorspace_2.1-1
## [34] ggplot2_3.5.1 spacetime_1.3-0 png_0.1-8
## [37] vctrs_0.6.5 R6_2.5.1 zoo_1.8-12
## [40] proxy_0.4-27 lifecycle_1.0.4 classInt_0.4-11
## [43] leaflet.providers_2.0.0 htmlwidgets_1.6.4 pkgconfig_2.0.3
## [46] pillar_1.10.1 bslib_0.8.0 gtable_0.3.6
## [49] glue_1.8.0 Rcpp_1.0.14 tidyselect_1.2.1
## [52] tibble_3.2.1 xfun_0.50 rstudioapi_0.17.1
## [55] knitr_1.49 farver_2.1.2 htmltools_0.5.8.1
## [58] rmarkdown_2.29 xts_0.14.1 compiler_4.3.2