Spacial_interpolation

1. Introduction

This notebook briefly explains how to download raster data from SoilGrids using R. This process is carried out with soil organic carbon data as a variable of interest, but the code can clearly be replicated using a different variable.

After downloading the data, we will illustrate two spatial interpolation methods: Inverse Distance Weighted (IDW) and Ordinary Kriging (OK). IDW is a deterministic technique and OK is a probabilistic technique. Both techniques are used here to obtain a continuous surface of soil organic carbon (SOC) at a depth of 15-30 cm from samples obtained from SoilGrids 250 m.

We set the variables of interest

We define the link that corresponds to the ISRIC website.

url = "https://files.isric.org/soilgrids/latest/data/"

So, we create some objects indicating that we need ISRIC

#basic strings
voi = "soc" #Soil organic carbon
depth = "15-30cm"
quantile = "mean"
#concatenar los strings
(variable = paste(url, voi, sep = ""))

## [1] "https://files.isric.org/soilgrids/latest/data/soc"

Now we define the soil property we want to unload

(layer = paste(variable,depth,quantile, sep = "_")) #capa de interes

## [1] "https://files.isric.org/soilgrids/latest/data/soc_15-30cm_mean"

Thus, the VRT verification is complete.

(vrt_layer = paste(layer, '.vrt', sep = "" ))

## [1] "https://files.isric.org/soilgrids/latest/data/soc_15-30cm_mean.vrt"

3. Download TIFF of the region of interest

Read a vector map with the previously downloaded area of interest

(stder = st_read("C:/Users/galle/OneDrive - Universidad Nacional de Colombia/Freelance_2025/0_Portafolio/Geomatics/Spatial_Interpolation/el_carmen.shp"))

## Reading layer `el_carmen' from data source 
##   `C:\Users\galle\OneDrive - Universidad Nacional de Colombia\Freelance_2025\0_Portafolio\Geomatics\Spatial_Interpolation\el_carmen.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 91 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 4936439 ymin: 2486485 xmax: 4999704 ymax: 2584646
## Projected CRS: MAGNA-SIRGAS 2018 / Origen-Nacional

## Simple feature collection with 1 feature and 91 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 4936439 ymin: 2486485 xmax: 4999704 ymax: 2584646
## Projected CRS: MAGNA-SIRGAS 2018 / Origen-Nacional
##   DPTO_CCDGO MPIO_CCDGO MPIO_CNMBR MPIO_CDPMP VERSION       AREA LATITUD
## 1         54        245  EL CARMEN      54245    2018 1727742958 8.86799
##    LONGITUD STCTNENCUE STP3_1_SI STP3_2_NO STP3A_RI STP3B_TCN STP4_1_SI
## 1 -73.34586       5461       424      5037      424         0       424
##   STP4_2_NO STP9_1_USO STP9_2_USO STP9_3_USO STP9_4_USO STP9_2_1_M STP9_2_2_M
## 1      5037       4390        137        931          3          1        126
##   STP9_2_3_M STP9_2_4_M STP9_2_9_M STP9_3_1_N STP9_3_2_N STP9_3_3_N STP9_3_4_N
## 1         10          0          0          2         90         78         20
##   STP9_3_5_N STP9_3_6_N STP9_3_7_N STP9_3_8_N STP9_3_9_N STP9_3_10 STP9_3_99
## 1        139        526         26          0          4        46         0
##   STVIVIENDA STP14_1_TI STP14_2_TI STP14_3_TI STP14_4_TI STP14_5_TI STP14_6_TI
## 1       4527       4152         78        179        104          9          5
##   STP15_1_OC STP15_2_OC STP15_3_OC STP15_4_OC TSP16_HOG STP19_EC_1 STP19_ES_2
## 1       3591          5        385        546      3630       2852        739
##   STP19_EE_1 STP19_EE_2 STP19_EE_3 STP19_EE_4 STP19_EE_5 STP19_EE_6 STP19_EE_9
## 1       2068        764          6          4          0          2          8
##   STP19_ACU1 STP19_ACU2 STP19_ALC1 STP19_ALC2 STP19_GAS1 STP19_GAS2 STP19_GAS9
## 1       1530       2061       1387       2204         21       3545         25
##   STP19_REC1 STP19_REC2 STP19_INT1 STP19_INT2 STP19_INT9 STP27_PERS STPERSON_L
## 1       1505       2086        142       3418         31      12001         42
##   STPERSON_S STP32_1_SE STP32_2_SE STP34_1_ED STP34_2_ED STP34_3_ED STP34_4_ED
## 1      11959       6261       5740       2306       2493       1834       1566
##   STP34_5_ED STP34_6_ED STP34_7_ED STP34_8_ED STP34_9_ED STP51_PRIM STP51_SECU
## 1       1367        999        729        448        259       5552       2615
##   STP51_SUPE STP51_POST STP51_13_E STP51_99_E Shape_Leng Shape_Area Codigo_Mun
## 1        450         82       2030        190   3.214756  0.1419942      54245
##                         geometry
## 1 POLYGON ((4999571 2583470, ...

repro = '+proj=igh +lat_0=0 +lon_0=0 +datum=WGS84 +units=m +no_defs'
stder_reproj = st_transform(stder, repro)

Now we define the boundaries of our area of interest.

(box = st_bbox(stder_reproj))

##       xmin       ymin       xmax       ymax 
## -8222911.2   935342.9 -8166003.6  1034252.3

Now, let’s use the bbox data to define the bounding box limits as used by the GDAL library. By the way, this is one of the most complicated parts of using GDAL.

ulx = box$xmin
uly = box$ymax
lrx = box$xmax
lry = box$ymin
(bbx = c(ulx, uly, lrx,lry))

##       xmin       ymax       xmax       ymin 
## -8222911.2  1034252.3 -8166003.6   935342.9

Now, we can use the gdal_translate function to download the VRT layer. First, let’s define where to save the data:

sg_url="/vsicurl/https://files.isric.org/soilgrids/latest/data/"
datos = 'soc/soc_15-30cm_mean.vrt'
file = "C:/Users/galle/OneDrive - Universidad Nacional de Colombia/Freelance_2025/0_Portafolio/Geomatics/Spatial_Interpolation/rsoc_igh_15_30.tif"

SOC layer units are not the most common, so to make them easier to use, we will use a conversion factor to obtain familiar units.

(stder_soc <- terra::rast(file)/10)

## class       : SpatRaster 
## dimensions  : 396, 228, 1  (nrow, ncol, nlyr)
## resolution  : 250, 250  (x, y)
## extent      : -8223000, -8166000, 935500, 1034500  (xmin, xmax, ymin, ymax)
## coord. ref. : Interrupted_Goode_Homolosine 
## source(s)   : memory
## varname     : rsoc_igh_15_30 
## name        : rsoc_igh_15_30 
## min value   :           10.5 
## max value   :           76.2

4. Data exploration

A histogram can help with this

(media_soc = mean(values(stder_soc), na.rm = T))

## [1] 27.63008

par(bg = "white")
options(scipen = 10)
hist(values(stder_soc), main= "Organic carbon content of soil in the fine soil fraction", 
     xlab= "g/kg", xlim = c(0, 120), col="lightblue", breaks=20)

A summary is also useful..

summary(stder_soc)

##  rsoc_igh_15_30 
##  Min.   :10.50  
##  1st Qu.:23.10  
##  Median :26.90  
##  Mean   :27.63  
##  3rd Qu.:31.10  
##  Max.   :76.20  
##  NA's   :27

Now we change the name of the SOC attribute

(names(stder_soc) <-  "soc")

## [1] "soc"

We put the SOC values in a variable

val = values(stder_soc, na.rm = T)

Let’s create a color palette.

orangecyan = colorNumeric(c("orange","yellow2", "darkseagreen", "cyan" ), val,
                          na.color = "transparent")

Finally, the graph.

#Grafica un mapa interactivo
leaflet::leaflet(stder) %>% 
  addTiles() %>% 
  setView(-74, 6, 9) %>% 
  addPolygons(color = "gray", weight = 1.0, smoothFactor = 0.5,
              opacity = 0.5, fillOpacity = 0.10,
              popup = paste("Municipio: ", stder$MPIO_CNMBR)) %>%
  addRasterImage(stder_soc, colors ="Spectral", opacity = 0.8)  %>%
  addLegend(pal = orangecyan, values = val, title = "Soil Organic Carbon (SOC) [g/kg]")

## Warning: sf layer is not long-lat data

## Warning: sf layer has inconsistent datum (+proj=tmerc +lat_0=4 +lon_0=-73 +k=0.9992 +x_0=5000000 +y_0=2000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs).
## Need '+proj=longlat +datum=WGS84'

5. Sampling the world

Let’s assume that we are also interested in obtaining a sample from the SOC dataset. In this notebook, we will use the terra library for this purpose.

set.seed(123446)
#convertimos al SRC de SOC
geog ="+proj=longlat +datum=WGS84"
(geog.soc = project(stder_soc, geog))

## class       : SpatRaster 
## dimensions  : 407, 267, 1  (nrow, ncol, nlyr)
## resolution  : 0.002184878, 0.002184878  (x, y)
## extent      : -73.58484, -73.00148, 8.403826, 9.293072  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
## source(s)   : memory
## name        :      soc 
## min value   : 10.61527 
## max value   : 73.71278

(samples = terra::spatSample(geog.soc, 2000, "random", as.points=TRUE))

##  class       : SpatVector 
##  geometry    : points 
##  dimensions  : 2000, 1  (geometries, attributes)
##  extent      : -73.58375, -73.00257, 8.404919, 9.291979  (xmin, xmax, ymin, ymax)
##  coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
##  names       :   soc
##  type        : <num>
##  values      : 43.15
##                   NA
##                46.16

Note two things: (i) the class of the object in the output and (ii) the CRS of the sample dataset.

Therefore, we need to perform several processes. Explain each of the code blocks that follow.

(muestras <- sf::st_as_sf(samples))

## Simple feature collection with 2000 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -73.58375 ymin: 8.404919 xmax: -73.00257 ymax: 9.291979
## Geodetic CRS:  GEOGCRS["unknown",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ID["EPSG",6326]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433],
##         ID["EPSG",8901]],
##     CS[ellipsoidal,2],
##         AXIS["longitude",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]],
##         AXIS["latitude",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]]]
## First 10 features:
##         soc                   geometry
## 1  43.14563 POINT (-73.09871 9.110634)
## 2        NA POINT (-73.02005 8.968617)
## 3  46.15673 POINT (-73.17736 8.442061)
## 4  26.51857 POINT (-73.30408 8.813491)
## 5        NA POINT (-73.58156 9.097525)
## 6        NA POINT (-73.57282 9.248282)
## 7  27.21408 POINT (-73.42644 8.699877)
## 8        NA POINT (-73.57719 8.763239)
## 9  36.05132 POINT (-73.05501 8.782902)
## 10 31.55754 POINT (-73.39585 8.636516)

muestras %>% as_tibble()

## # A tibble: 2,000 × 2
##      soc             geometry
##    <dbl>          <POINT [°]>
##  1  43.1 (-73.09871 9.110634)
##  2  NA   (-73.02005 8.968617)
##  3  46.2 (-73.17736 8.442061)
##  4  26.5 (-73.30408 8.813491)
##  5  NA   (-73.58156 9.097525)
##  6  NA   (-73.57282 9.248282)
##  7  27.2 (-73.42644 8.699877)
##  8  NA   (-73.57719 8.763239)
##  9  36.1 (-73.05501 8.782902)
## 10  31.6 (-73.39585 8.636516)
## # ℹ 1,990 more rows

We omit NA values.

(nmuestras <- na.omit(muestras))

## Simple feature collection with 1748 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -73.57719 ymin: 8.404919 xmax: -73.00694 ymax: 9.291979
## Geodetic CRS:  GEOGCRS["unknown",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ID["EPSG",6326]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433],
##         ID["EPSG",8901]],
##     CS[ellipsoidal,2],
##         AXIS["longitude",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]],
##         AXIS["latitude",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]]]
## First 10 features:
##         soc                   geometry
## 1  43.14563 POINT (-73.09871 9.110634)
## 3  46.15673 POINT (-73.17736 8.442061)
## 4  26.51857 POINT (-73.30408 8.813491)
## 7  27.21408 POINT (-73.42644 8.699877)
## 9  36.05132 POINT (-73.05501 8.782902)
## 10 31.55754 POINT (-73.39585 8.636516)
## 11 26.04888 POINT (-73.15333 8.798197)
## 12 30.82602 POINT (-73.24509 8.485759)
## 13 20.44965  POINT (-73.5619 8.404919)
## 14 23.28090 POINT (-73.28879 8.494499)

We continue with the visualization of the clean samples.

longit <- st_coordinates(nmuestras)[,1]
latit <- st_coordinates(nmuestras)[,2]
soc <- nmuestras$soc

What did we get?

summary(soc)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.53   23.76   27.27   28.03   31.52   63.85

length(soc)

## [1] 1748

How many samples do not contain valid data?

id <- seq(1,1748,1)
(sitios <- data.frame(id, longit, latit, soc))

Now, let’s plot the samples.

m <- leaflet() %>%
  addTiles() %>%  
  addMarkers(lng=sitios$longit,lat=sitios$latit, popup=sitios$soc, clusterOptions = markerClusterOptions())
m

6. We save the samples

#Usar la ruta deseada
sf::st_write(nmuestras, "C:/Users/galle/OneDrive - Universidad Nacional de Colombia/Freelance_2025/0_Portafolio/Geomatics/Spatial_Interpolation/soc_stder.gpkg", driver = "GPKG", append = F)

7. Read the input data

list.files(path="C:/Users/galle/OneDrive - Universidad Nacional de Colombia/Freelance_2025/0_Portafolio/Geomatics/Spatial_Interpolation", pattern = "soc_stder.gpkg")

## [1] "soc_stder.gpkg"

Then, we read the samples using sf:

samples <- sf::st_read("C:/Users/galle/OneDrive - Universidad Nacional de Colombia/Freelance_2025/0_Portafolio/Geomatics/Spatial_Interpolation/soc_stder.gpkg")

## Reading layer `soc_stder' from data source 
##   `C:\Users\galle\OneDrive - Universidad Nacional de Colombia\Freelance_2025\0_Portafolio\Geomatics\Spatial_Interpolation\soc_stder.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 1748 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -73.57719 ymin: 8.404919 xmax: -73.00694 ymax: 9.291979
## Geodetic CRS:  +proj=longlat +datum=WGS84 +no_defs

We also read the geopackage corresponding to our study area.

munic <- sf::st_read("C:/Users/galle/OneDrive - Universidad Nacional de Colombia/Freelance_2025/0_Portafolio/Geomatics/Spatial_Interpolation/el_carmen.gpkg")

## Reading layer `el_carmen' from data source 
##   `C:\Users\galle\OneDrive - Universidad Nacional de Colombia\Freelance_2025\0_Portafolio\Geomatics\Spatial_Interpolation\el_carmen.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 1 feature and 91 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 4936439 ymin: 2486485 xmax: 4999704 ymax: 2584646
## Projected CRS: MAGNA-SIRGAS 2018 / Origen-Nacional

8. Explore the input data

We obtain a summary of the samples

summary(samples)

##       soc                   geom     
##  Min.   :12.53   POINT        :1748  
##  1st Qu.:23.76   epsg:NA      :   0  
##  Median :27.27   +proj=long...:   0  
##  Mean   :28.03                       
##  3rd Qu.:31.52                       
##  Max.   :63.85

A histogram can help again.

hist(samples$soc, main="Histograma SOC", xlab = "g/Kg", ylab = "frecuencia", col= "pink") 
abline(h = 0, v = 28.03, col = "firebrick",lty = 2 )

We round SOC values to two digits.

samples$soc = round(samples$soc,2)

Now we can visualize the spatial distribution of the samples.

pal <- colorNumeric(
  palette = c("#E1F5C4", "#EDE574", "#F9D423", "#FC913A", "#FF4E50"),
  domain = samples$soc,      # debe ser numérico
  na.color = "transparent"   # por si hay NA
)

We reproject the layers to the same SRC.

munic <- st_transform(munic, crs = 4326)
samples <- st_transform(samples, crs = 4326)

leaflet()%>%
  addPolygons(
    data = munic,
    color = "cyan",
    # set the opacity of the outline
    opacity = 0.5,
    # set the stroke width in pixels
    weight = 1,
    # set the fill opacity
    fillOpacity = 0.2) %>%
  addCircleMarkers(
    data = samples,
    radius= 1.5, 
    label = ~soc,
    color = ~pal(soc),
    fillOpacity = 1,
    stroke = F
  ) %>%
  addLegend(
    data = samples,
    pal = pal,
    values = ~soc,
    position = "bottomleft",
    title = "SOC:",
    opacity = 0.9) %>%
  addProviderTiles("OpenStreetMap")

st_geometry_type(munic)

## [1] MULTIPOLYGON
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

9. Interpolation

9.1 Creating a gstat object

To interpolate, we first need to create a gstat class object using a function with the same name: gstat.

A gstat object contains all the information necessary to perform spatial interpolation, including:

The model definition The calibration data Based on its arguments, the gstat function “interprets” what type of interpolation model we want to use:

No variogram model → Inverse Distance Weighting (IDW) Variogram model, no covariates → Ordinary Kriging (OK) We will use three parameters in the gstat function:

formula: the prediction “formula” that specifies the dependent variable and the independent variables (also called covariates). data: the calibration data (also known as training data). model: the variogram model.

9.2 IDW Interpolation

To interpolate soil organic carbon (SOC) using the IDW method, we create the following gstat object, specifying only the formula and data.

g1 = gstat(formula = soc ~ 1, data = samples)

Now that our g1 interpolation model is defined, we can use the predict function to perform the interpolation, i.e., estimate the SOC values.

The predict function accepts:

A raster (stars object), such as dem. A model (gstat object), such as g1. The raster serves two purposes:

To specify the locations where we want to make predictions (in all methods). To specify covariate values (only in Universal Kriging). We will use the terra library to create a raster object with cell values equal to 1. The extent of this raster must cover our area of interest.

Note: You can use the metadata from any raster that covers your department (e.g., a digital elevation model) to define the raster creation parameters.

(rrr <- terra::rast(xmin=-73.577194, xmax=-73.006941, ymin=8.404919, ymax=9.291979, nrows=370, ncols = 329, vals = 1, crs = "epsg:4326"))

## class       : SpatRaster 
## dimensions  : 370, 329, 1  (nrow, ncol, nlyr)
## resolution  : 0.001733292, 0.002397459  (x, y)
## extent      : -73.57719, -73.00694, 8.404919, 9.291979  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source(s)   : memory
## name        : lyr.1 
## min value   :     1 
## max value   :     1

Now, we need to convert this SpatRaster into a stars object:

stars.rrr <- stars::st_as_stars(rrr)

The following expression interpolates the SOC values according to the model defined in g1 and the raster defined in stars.rrr.

z1 = predict(g1, stars.rrr)

## [inverse distance weighted interpolation]

¿What is z1?

z1

## stars object with 2 dimensions and 2 attributes
## attribute(s):
##                Min.  1st Qu.   Median     Mean  3rd Qu.     Max.   NA's
## var1.pred  12.75162 25.79533 28.04482 27.73713 29.53867 62.43863      0
## var1.var         NA       NA       NA      NaN       NA       NA 121730
## dimension(s):
##   from  to offset     delta refsys x/y
## x    1 329 -73.58  0.001733 WGS 84 [x]
## y    1 370  9.292 -0.002397 WGS 84 [y]

Take note of the names of the two attributes included within the z1 object.

We can select only the first attribute and rename it:

a <- which(is.na(z1[[1]]))
z1[[1]][a] = 0.0

z1

## stars object with 2 dimensions and 2 attributes
## attribute(s):
##                Min.  1st Qu.   Median     Mean  3rd Qu.     Max.   NA's
## var1.pred  12.75162 25.79533 28.04482 27.73713 29.53867 62.43863      0
## var1.var         NA       NA       NA      NaN       NA       NA 121730
## dimension(s):
##   from  to offset     delta refsys x/y
## x    1 329 -73.58  0.001733 WGS 84 [x]
## y    1 370  9.292 -0.002397 WGS 84 [y]

names(z1) = "soc"

To print the interpolated surface, we will need a color palette.

paleta <- colorNumeric(
  palette = c("orange", "yellow", "cyan", "green"),
  domain = range(samples$soc, na.rm = TRUE),
  na.color = "transparent"
)

Now let’s visualize the result of the interpolation:

z1 <- z1["soc"]

msoc <- leaflet() %>%
  addTiles() %>%
  addStarsImage(
    z1,
    opacity = 0.7,
    colors = paleta        # ← este es el argumento correcto
  ) %>%
  addCircleMarkers(
    data = samples,
    radius = 1.5,
    label = ~round(soc, 2),
    color = ~paleta(soc),
    fillOpacity = 1,
    stroke = FALSE
  ) %>%
  addLegend(
    position = "bottomright",
    pal = paleta,
    values = samples$soc,
    title = "IDW SOC interpolation [%]"
  )
msoc #imprimir el mapa

9.3 Interpolation OK

Kriging methods require a variogram model. The variogram model is an objective way to quantify the autocorrelation pattern in the data and assign weights accordingly when making predictions.

As a first step, we can calculate and examine the empirical variogram using the variogram function.

This function requires two arguments:

formula: specifies the dependent variable and covariates, as in gstat. data: the point layer with the dependent variable and covariates as point attributes. The following expression calculates the empirical variogram of samples, without covariates:

v_emp_ok = variogram(soc ~ 1, data=samples)

We print the empirical variogram

plot(v_emp_ok)

We fit the variogram to a function

v_mod_ok = automap::autofitVariogram(soc ~ 1, as(samples, "Spatial"), model = c("Sph", "Exp", "Gau"))

print the model

plot(plot(v_mod_ok)
)

The resulting object is actually a list with several components, including the empirical variogram and the fitted variogram model.

The var_model component of the resulting object contains the actual model:

v_mod_ok$var_model

##   model      psill    range
## 1   Nug  0.8028901 0.000000
## 2   Exp 39.6389427 2.559963

Now, the variogram model can be passed to the gstat function, and we can continue with the interpolation using Ordinary Kriging.

g2 = gstat(formula = soc ~ 1, model = v_mod_ok$var_model, data = samples)
z2= predict(g2, stars.rrr)

## [using ordinary kriging]

Once again, we will select the predicted values attribute and rename it.

a <- which(is.na(z2[[1]]))
z2[[1]][a] = 0.0
names(z2) = "soc"

OK’s predictions can be seen in the following printout

mp <- leaflet() %>%
  addTiles() %>%
  addStarsImage(
    z2,
    opacity = 0.7,
    colors = paleta
  ) %>%
  addCircleMarkers(
    data = samples,
    radius = 1.5,
    label = ~round(soc, 2),
    color = ~paleta(soc),
    fillOpacity = 1,
    stroke = FALSE
  ) %>%
  addLegend(
    position = "bottomright",
    pal = paleta,
    values = samples$soc,  # ← aquí usas samples$soc, no z2$soc
    title = "OK SOC interpolation [%]"
  )

mp

10. Evaluation of results

10.1 Qualitative evaluation

We compare the graphs

# Establecemos la paleta de colores
paleta <- colorNumeric(palette = c("orange", "yellow", "cyan", "green"), 
                        domain = range(z1[[1]], z2[[1]], na.rm = TRUE), 
                        na.color = "transparent")

Create interactive map

We convert our data into raster objects

z1_terra <- rast(z1) 
z2_soc <- z2["soc"]
z2_terra <- rast(z2_soc)

we save the new raster layers

writeRaster(z1_terra, "IDW_interpolated.tif", overwrite=TRUE)
writeRaster(z2_terra, "OK_interpolated.tif", overwrite=TRUE)

We add them to the workspace.

z1_terra <- rast("IDW_interpolated.tif")
z2_terra <- rast("OK_interpolated.tif")

# Convertir a RasterLayer (si es necesario para compatibilidad con leaflet)
z1_raster <- raster::raster(z1_terra)
z2_raster <- raster::raster(z2_terra)

# Crear el mapa interactivo con leaflet
mapa_interactivo <- leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%  # Agrega un mapa base ligero
  addRasterImage(z1_raster, opacity = 0.7, group = "IDW") %>%
  addRasterImage(z2_raster, opacity = 0.7, group = "OK") %>%
  addLayersControl(
    overlayGroups = c("IDW", "OK"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>% #añadimos la leyenda
  addLegend("bottomright", pal = paleta, values = range(z1[[1]], na.rm = TRUE),
            title = "Soil organic carbon [%]")
mapa_interactivo

10.2 Cross-validation

We have estimated SOC areas using two different methods: IDW and Ordinary Kriging. Although it is useful to examine and compare the results graphically, we also need an objective way to evaluate the accuracy of the interpolation. This allows us to objectively choose the most accurate method among the available interpolation methods.

In Leave-One-Out Cross-Validation:

One point is removed from the calibration data. A prediction is made for that point. The process is repeated for all points. At the end, we obtain a table with an observed value and a predicted value for all points.

We can run Leave-One-Out Cross Validation using the gstat.cv function, which accepts a gstat object.

cv2 = gstat.cv(g2)

cv2 = st_as_sf(cv2)

sp::bubble(as(cv2[, "residual"], "Spatial"))

This is the RMSE value for OK interpolation.

sqrt(sum((cv2$var1.pred - cv2$observed)^2) / nrow(cv2))

## [1] 3.903031

Create the IDW model

g_idw <- gstat(formula = soc ~ 1, data = samples, nmax = 15, set = list(idp = 2))

cv_idw <- gstat.cv(g_idw)

cv_idw <- st_as_sf(cv_idw)  # Convertir a sf

sp::bubble(as(cv_idw[, "residual"], "Spatial"))

This is the RMSE value for IDW interpolation.

rmse_idw <- sqrt(sum((cv_idw$var1.pred - cv_idw$observed)^2) / nrow(cv_idw))
print(rmse_idw)

## [1] 4.023636

By comparing the RMSE values, we can conclude that the best interpolation method for this project is Ordinary Kriging.

References

Lizarazo, I. 2025. Spatial interpolation. Available at: https://rpubs.com/ials2un/Spatial_Interpolation