This notebook explains how to download SoilGrids raster data using R. The process is illustrated using soil organic carbon stock as variable of interest. Obviously, the code can be replicated using a different variable.
Note that I provide a basic explanation of the functions used in the notebook to help beginneers to reproduce & replicate the code. However, it is expected that GB students improve the current documentation and add more explanations in order to: (i) improve readability of their work; and (ii) provide evidence they understand what is happening behind the scenes.
SoilGrids is a system for global digital soil mapping that makes use of global soil profile information and covariate data to model the spatial distribution of soil properties across the globe. SoilGrids is a collections of soil property maps for the world produced using machine learning at 250 m resolution.
SoilGrids data can be visualized here.
2x2 degree tiles can be downloaded directly from the website.
Furthermore, SoilGrids can be accessed from R through the following services provided by the International Soil Reference and Information Centre - (ISRIC)[https://www.isric.org/explore/soilgrids]:
Web Mapping Service (WMS): access for visualisation and data overview.
Web Coverage Service (WCS): best way to obtain a subset of a map and use SoilGrids as input to other modelling pipelines.
Web Distributed Authoring and Versioning (WebDAV): download the complete global map(s) in VRT format.
We will use the WebDAV service in this notebook. In case of any doubt, please review the SoilGrids WebDAV access from R.
Clean the workspace:
rm(list=ls())
Install the required libraries from the console (not from a code chunk).
##it is assummed that you already installed terra, ggplot2, tidyerra, sf, and leaflet
##paquetes= c('XML','rgdal', 'dplyr','devtools', 'stars', 'mapview', 'RColorBrewer')
##install.packages(paquetes)
## in addition, use either 1 or 2 options below
##option1##
#devtools::install_github("JoshOBrien/gdalUtilities")
##option2##
#devtools::install_github('gearslaboratory/gdalUtils')
Load the libraries:
library(XML)
library(sf)
library(gdalUtilities)
library(terra)
library(dplyr)
library(leaflet)
library(RColorBrewer)
SoilGrids is designed as a globally consistent, data-driven system that predicts soil properties and classes using global covariates and globally fitted models. If you are looking for soil information on national and/or local levels you should compare SoilGrids predictions with soil maps derived from national soil agencies. In Colombia, it is Instituto Geografico Agustin Codazzi (IGAC).
National soil maps are usually based on more detailed input soil information and therefore are often more accurate than SoilGrids (within the local coverage area). The ‘mean’ and ‘median (0.5 quantile)’ may both be used as predictions of the soil property for a given cell. The mean represents the ‘expected value’ and provides an unbiased prediction of the soil property.
First, we define the url corresponding to the ISRIC webDAD site:
url = "https://files.isric.org/soilgrids/latest/data/" # Path to the webDAV data.
Then, we create some objects indicating what we need from ISRIC:
# basic strings
voi = "soc" # soil organic carbon
depth = "15-30cm"
quantile = "mean"
# concatenate the strings
(variable = paste(url, voi, sep=""))
## [1] "https://files.isric.org/soilgrids/latest/data/soc"
Now, we define the soil property we want to download>
#
(layer = paste(variable,depth,quantile, sep="_")) # layer of interest
## [1] "https://files.isric.org/soilgrids/latest/data/soc_15-30cm_mean"
Then, the VRT specification is complete:
(vrt_layer = paste(layer, '.vrt', sep=""))
## [1] "https://files.isric.org/soilgrids/latest/data/soc_15-30cm_mean.vrt"
Let’s read a GeoPackage of a department previously downloaded from DANE. We will use here the sf library:
# change path and filename to match your data
(stder = st_read("68_SANTANDER/stder_mun.gpkg"))
## Reading layer `municipios' from data source
## `/Users/ials/Documents/unal2025/GB2024/68_SANTANDER/stder_mun.gpkg'
## using driver `GPKG'
## Simple feature collection with 87 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -74.52895 ymin: 5.707536 xmax: -72.47706 ymax: 8.14501
## Geodetic CRS: WGS 84
Note the CRS of stder. As the ISRIC layers use the Homolosine projection, we need to reproject our layer using the sf library:
igh='+proj=igh +lat_0=0 +lon_0=0 +datum=WGS84 +units=m +no_defs'
stder_igh <- st_transform(stder, igh)
Let’s get the bounding box of our area of interest:
(bbox <- st_bbox(stder_igh))
## xmin ymin xmax ymax
## -8313449.6 635360.0 -8089703.8 906698.4
Now, let’s use the bbox data to define the boundary box limits as used by the GDA libraryL. By the way, this is one of the trickiest parts of using GDAL.
## ul means upper left
## lr means lower right
ulx = bbox$xmin
uly = bbox$ymax
lrx= bbox$xmax
lry = bbox$ymin
(bb <- c(ulx, uly, lrx, lry))
## xmin ymax xmax ymin
## -8313449.6 906698.4 -8089703.8 635360.0
Now, we can use gdal_translate function to download the vrt layer. First, let’s define where to save the data:
sg_url="/vsicurl/https://files.isric.org/soilgrids/latest/data/"
datos = 'soc/soc_15-30cm_mean.vrt'
file = "soc_igh_15_30.tif"
Note that the following task may be take several minutes. Run it only once!
# UNCOMMENT FOR FIRST TIME USE
# THEN, COMMENT IT BACK
gdal_translate(paste0(sg_url,datos), file ,
tr=c(250,250),
projwin=bb,
projwin_srs =igh)
Let’s remind us what are the SOC layer units:
More information is available here
By dividing the predictions values (in mapped units) by the values in the Conversion factor column, the user can obtain the more familiar units indicated in the Conventional units column.
(stder_soc <- terra::rast(file)/10)
## class : SpatRaster
## size : 1085, 895, 1 (nrow, ncol, nlyr)
## resolution : 250, 250 (x, y)
## extent : -8313500, -8089750, 635500, 906750 (xmin, xmax, ymin, ymax)
## coord. ref. : Interrupted_Goode_Homolosine
## source(s) : memory
## varname : soc_igh_15_30
## name : soc_igh_15_30
## min value : 8.7
## max value : 130.3
A histogram can help at this point in time:
## add a meaningful title and SOC units
terra::hist(stder_soc)
A summary is also helpful:
summary(stder_soc)
## soc_igh_15_30
## Min. : 9.50
## 1st Qu.: 25.10
## Median : 33.20
## Mean : 35.81
## 3rd Qu.: 44.40
## Max. :130.20
## NA's :1588
Now, te’s change the name of the SOC attribute:
(names(stder_soc) <- "soc")
## [1] "soc"
Then, plotting time.
Let’s put the SOC values in a variable:
valores <- values(stder_soc, na.rm=TRUE)
Let’s create a color palette:
# change as needed
orangecyan <- colorNumeric(c("orange","yellow2", "darkseagreen", "cyan" ), valores,
na.color = "transparent")
Lastly, the plot:
# plot an interactive map
leaflet::leaflet(stder) %>%
addTiles() %>%
setView(-74, 6, 9) %>%
addPolygons(color = "gray", weight = 1.0, smoothFactor = 0.5,
opacity = 0.5, fillOpacity = 0.10,
popup = paste("Municipio: ", stder$MPIO_CNMBR)) %>%
addRasterImage(stder_soc, colors ="Spectral", opacity = 0.8) %>%
addLegend(pal = orangecyan, values = valores, title = "Soil Organic Carbon (SOC) [g/kg]")