1. Introduction

This notebook explains how to download SoilGrids raster data using R. The process is illustrated using soil organic carbon stock as variable of interest. Obviously, the code can be replicated using a different variable.

Note that I provide a basic explanation of the functions used in the notebook to help beginneers to reproduce & replicate the code. However, it is expected that GB students improve the current documentation and add more explanations in order to: (i) improve readability of their work; and (ii) provide evidence they understand what is happening behind the scenes.

SoilGrids is a system for global digital soil mapping that makes use of global soil profile information and covariate data to model the spatial distribution of soil properties across the globe. SoilGrids is a collections of soil property maps for the world produced using machine learning at 250 m resolution.

SoilGrids data can be visualized here.

2x2 degree tiles can be downloaded directly from the website.

Furthermore, SoilGrids can be accessed from R through the following services provided by the International Soil Reference and Information Centre - (ISRIC)[https://www.isric.org/explore/soilgrids]:

We will use the WebDAV service in this notebook. In case of any doubt, please review the SoilGrids WebDAV access from R.

2. Setup

Clean the workspace:

rm(list=ls())

Install the required libraries from the console (not from a code chunk).

##it is assummed that you already installed terra, ggplot2, tidyerra, sf, and leaflet
##paquetes= c('XML','rgdal', 'dplyr','devtools', 'stars', 'mapview', 'RColorBrewer')
##install.packages(paquetes)
## in addition, use either 1 or 2 options below
##option1## 
#devtools::install_github("JoshOBrien/gdalUtilities")
##option2## 
#devtools::install_github('gearslaboratory/gdalUtils')

Load the libraries:

library(XML)
library(sf)
library(gdalUtilities)
library(terra)
library(dplyr)
library(leaflet)
library(RColorBrewer)

SoilGrids background

SoilGrids is designed as a globally consistent, data-driven system that predicts soil properties and classes using global covariates and globally fitted models. If you are looking for soil information on national and/or local levels you should compare SoilGrids predictions with soil maps derived from national soil agencies. In Colombia, it is Instituto Geografico Agustin Codazzi (IGAC).

National soil maps are usually based on more detailed input soil information and therefore are often more accurate than SoilGrids (within the local coverage area). The ‘mean’ and ‘median (0.5 quantile)’ may both be used as predictions of the soil property for a given cell. The mean represents the ‘expected value’ and provides an unbiased prediction of the soil property.

Set variables of interest:

First, we define the url corresponding to the ISRIC webDAD site:

url = "https://files.isric.org/soilgrids/latest/data/" # Path to the webDAV data.

Then, we create some objects indicating what we need from ISRIC:

# basic strings
voi = "soc" # soil organic carbon 
depth = "15-30cm"
quantile = "mean"
# concatenate the strings
(variable = paste(url, voi, sep=""))
## [1] "https://files.isric.org/soilgrids/latest/data/soc"

Now, we define the soil property we want to download>

#
(layer = paste(variable,depth,quantile, sep="_")) # layer of interest 
## [1] "https://files.isric.org/soilgrids/latest/data/soc_15-30cm_mean"

Then, the VRT specification is complete:

(vrt_layer = paste(layer, '.vrt', sep=""))
## [1] "https://files.isric.org/soilgrids/latest/data/soc_15-30cm_mean.vrt"

3. Download a Tiff for a region of interest (ROI)

Let’s read a GeoPackage of a department previously downloaded from DANE. We will use here the sf library:

# change path and filename to match your data
(stder = st_read("68_SANTANDER/stder_mun.gpkg"))
## Reading layer `municipios' from data source 
##   `/Users/ials/Documents/unal2025/GB2024/68_SANTANDER/stder_mun.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 87 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.52895 ymin: 5.707536 xmax: -72.47706 ymax: 8.14501
## Geodetic CRS:  WGS 84

Note the CRS of stder. As the ISRIC layers use the Homolosine projection, we need to reproject our layer using the sf library:

igh='+proj=igh +lat_0=0 +lon_0=0 +datum=WGS84 +units=m +no_defs'
stder_igh <- st_transform(stder, igh)

Let’s get the bounding box of our area of interest:

(bbox <- st_bbox(stder_igh))
##       xmin       ymin       xmax       ymax 
## -8313449.6   635360.0 -8089703.8   906698.4

Now, let’s use the bbox data to define the boundary box limits as used by the GDA libraryL. By the way, this is one of the trickiest parts of using GDAL.

## ul means upper left
## lr means lower right
ulx = bbox$xmin
uly = bbox$ymax
lrx= bbox$xmax
lry = bbox$ymin
(bb <- c(ulx, uly, lrx, lry))
##       xmin       ymax       xmax       ymin 
## -8313449.6   906698.4 -8089703.8   635360.0

Now, we can use gdal_translate function to download the vrt layer. First, let’s define where to save the data:

sg_url="/vsicurl/https://files.isric.org/soilgrids/latest/data/"
datos = 'soc/soc_15-30cm_mean.vrt'
file = "soc_igh_15_30.tif"

Note that the following task may be take several minutes. Run it only once!

# UNCOMMENT FOR FIRST TIME USE
# THEN, COMMENT IT BACK
gdal_translate(paste0(sg_url,datos), file ,
               tr=c(250,250),
              projwin=bb,
              projwin_srs =igh)

Let’s remind us what are the SOC layer units:

More information is available here

By dividing the predictions values (in mapped units) by the values in the Conversion factor column, the user can obtain the more familiar units indicated in the Conventional units column.

(stder_soc <- terra::rast(file)/10)
## class       : SpatRaster 
## size        : 1085, 895, 1  (nrow, ncol, nlyr)
## resolution  : 250, 250  (x, y)
## extent      : -8313500, -8089750, 635500, 906750  (xmin, xmax, ymin, ymax)
## coord. ref. : Interrupted_Goode_Homolosine 
## source(s)   : memory
## varname     : soc_igh_15_30 
## name        : soc_igh_15_30 
## min value   :           8.7 
## max value   :         130.3

4. Data exploration

A histogram can help at this point in time:

## add a meaningful title and SOC units
terra::hist(stder_soc)

A summary is also helpful:

summary(stder_soc)
##  soc_igh_15_30   
##  Min.   :  9.50  
##  1st Qu.: 25.10  
##  Median : 33.20  
##  Mean   : 35.81  
##  3rd Qu.: 44.40  
##  Max.   :130.20  
##  NA's   :1588

Now, te’s change the name of the SOC attribute:

(names(stder_soc) <-  "soc")
## [1] "soc"

Then, plotting time.

Let’s put the SOC values in a variable:

valores <- values(stder_soc, na.rm=TRUE)

Let’s create a color palette:

# change as needed
orangecyan <- colorNumeric(c("orange","yellow2", "darkseagreen", "cyan" ), valores,
  na.color = "transparent")

Lastly, the plot:

# plot an interactive map
leaflet::leaflet(stder) %>% 
  addTiles() %>% 
  setView(-74, 6, 9) %>% 
   addPolygons(color = "gray", weight = 1.0, smoothFactor = 0.5,
    opacity = 0.5, fillOpacity = 0.10,
    popup = paste("Municipio: ", stder$MPIO_CNMBR)) %>%
  addRasterImage(stder_soc, colors ="Spectral", opacity = 0.8)  %>%
  addLegend(pal = orangecyan, values = valores, title = "Soil Organic Carbon (SOC) [g/kg]")

You can change the color palette and the map layout to improve the display as well as to better understand the soil layer’s information.

5. Alternative visualization (20%)

In the next chunk, try a different visualization of your SOC data. It is suggested to use tidyterra.

# WRITE YOUR CODE IN THIS CODE CHUNK
#
#
#

6. Sampling the world

Let’s assume that we are also interested in getting a sample of the SOC dataset. In this notebook, we will use the terra library for such a purpose. Please see here how the function works to undertand the next code’s chunk.

# change as needed
set.seed(123456)
# Conversion of SOC coordinate reference system
geog ="+proj=longlat +datum=WGS84"
(geog.soc = project(stder_soc, geog))
## class       : SpatRaster 
## size        : 1112, 987, 1  (nrow, ncol, nlyr)
## resolution  : 0.002191696, 0.002191696  (x, y)
## extent      : -74.55524, -72.39204, 5.708308, 8.145474  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
## source(s)   : memory
## name        :        soc 
## min value   :   8.910829 
## max value   : 128.403015
# Random sampling of 2000 points
(samples <- terra::spatSample(geog.soc, 2000, "random", as.points=TRUE))
##  class       : SpatVector 
##  geometry    : points 
##  dimensions  : 2000, 1  (geometries, attributes)
##  extent      : -74.55415, -72.39313, 5.709403, 8.144378  (xmin, xmax, ymin, ymax)
##  coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
##  names       :   soc
##  type        : <num>
##  values      :  38.9
##                53.71
##                25.91

Note two things: (i) the object’s class of the output; and (ii) the CRS of the sample dataset.

Thus, we need to do several procesing. Explain each chunk that follows:

(muestras <- sf::st_as_sf(samples))

Let’s omit the NA values:

nmuestras <- na.omit(muestras)

Now, let’s visualize the cleaned samples:

longit <- st_coordinates(nmuestras)[,1]
latit <- st_coordinates(nmuestras)[,2]
soc <- nmuestras$soc

What we got?

summary(soc)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.40   24.89   32.89   35.65   43.70  101.29
length(soc)
## [1] 1850

How many samples did not contain valid data?

id <- seq(1,1850,1)
(sitios <- data.frame(id, longit, latit, soc))

Now, let’s plot the samples:

m <- leaflet() %>%
  addTiles() %>%  
  addMarkers(lng=sitios$longit,lat=sitios$latit, popup=sitios$soc, clusterOptions = markerClusterOptions())
m  # Print the map

Please check the leaflet documentation to better understand the code above. Then, add any relevant information here.

7. Saving the samples

Review how the sf::st_write function works and add your explanation here.

# change path and filename to match your directories
# as well as your department name
sf::st_write(nmuestras, "data/soc_stder.gpkg", driver = "GPKG",
             append=FALSE)
## Deleting layer `soc_stder' using driver `GPKG'
## Writing layer `soc_stder' to data source `data/soc_stder.gpkg' using driver `GPKG'
## Writing 1850 features with 1 fields and geometry type Point.

Do you understand what is purpose of setting append to FALSE?

Make sure you bring the *soc_your_depto.gpkg” to the classroom.

8. Reference

Cite this work as follows: Lizarazo, I. 2025. How to download soil data from SoilGrids. Available at: https://rpubs.com/ials2un/downloadSoilGrids

sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.0.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Bogota
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] RColorBrewer_1.1-3  leaflet_2.2.2       dplyr_1.1.4        
## [4] terra_1.8-70        gdalUtilities_1.2.5 sf_1.0-21          
## [7] XML_3.99-0.18      
## 
## loaded via a namespace (and not attached):
##  [1] jsonlite_2.0.0     compiler_4.5.1     tidyselect_1.2.1   Rcpp_1.1.0        
##  [5] jquerylib_0.1.4    png_0.1-8          scales_1.4.0       yaml_2.3.10       
##  [9] fastmap_1.2.0      R6_2.6.1           generics_0.1.4     classInt_0.4-11   
## [13] knitr_1.50         htmlwidgets_1.6.4  tibble_3.3.0       units_1.0-0       
## [17] DBI_1.2.3          bslib_0.9.0        pillar_1.11.1      rlang_1.1.6       
## [21] cachem_1.1.0       xfun_0.52          sass_0.4.10        cli_3.6.5         
## [25] magrittr_2.0.4     class_7.3-23       crosstalk_1.2.1    digest_0.6.37     
## [29] grid_4.5.1         rstudioapi_0.17.1  lifecycle_1.0.4    vctrs_0.6.5       
## [33] KernSmooth_2.23-26 proxy_0.4-27       evaluate_1.0.4     glue_1.8.0        
## [37] farver_2.1.2       codetools_0.2-20   e1071_1.7-16       rmarkdown_2.29    
## [41] tools_4.5.1        pkgconfig_2.0.3    htmltools_0.5.8.1