Tor-Gunnar Vågen (ICRAF), Leigh Winowiecki (CIAT), Lulseged T. Desta (CIAT) and Jerome E. Tondoh (CIAT)
Required packages:
- nlme (comes with your R installation)
- lattice
- car
- nnet
- sp
- rgdal
- raster
- MASS
In this document we present examples of R functions and scripts that may be used to generate maps of soil functional properties based on for example remote sensing data, climate surfaces and/or digital elevation models and their derivatives. The data used come from the Africa Soil Information Service (AfSIS) project, which consists of a set of sites stratified across sub-Saharan Africa based on Koeppen-Geiger climate zones[1]. The methodology used was developed for land health assessment as is referred to as the Land Degradation Surveillance Framework (LDSF)[2,3]. A copy of the field guide can be accessed through this link (requires registration).
We demonstrate a prediction model for soil organic carbon (SOC), using linear mixed effects (lme) or multilevel modeling approaches. Such models are well suited for developing local maps of soil properties. We use surface reflectance calibrated Landsat ETM+ reflectance from the GLS2010 library as predictors. The example data are from sites in Tanzania, Mozambique and Mali, respectively, but have been made anonymous in order to make this tutorial publicly available.
The code presented here is intended as a starting point. Other approaches, such as gradient boosting models (gbm) and bayesian additive regression trees (BART) may be interesting to explore as well. There is no such thing as “the right model” as models are essentially wrong and the selection of model depends on the application as well. You are encouraged to use this tutorial as a starting point and we hope it is of help in developing similar models and maps using your own data.
The example dataset used in this exercise can be downloaded through the Dataverse site here as well.
Data citation: Vågen, Tor-G; Winowiecki, Leigh A; Tondoh, Jerome E; Desta, Lulseged T, 2013-01-01, “Africa Soil Information Service (AfSIS) - Soil Health Mapping”, http://hdl.handle.net/1902.1/19793 v2012.
setwd("~/Desktop/afsisData"); #Change this to where you saved the data
dat <- read.table("afsisSoil.csv", sep=",", header=T);
To see the content and the structure of the data, you may issue the following commands:
head(dat)
## Country Site Cluster Plot DepthTop DepthBottom SOC elev map mfi pci
## 1 1 1 2 7 0 20 2937 1099 1299 233 18
## 2 1 1 2 7 0 20 2937 1099 1299 233 18
## 3 1 1 1 2 0 20 3043 1067 1299 233 18
## 4 1 1 1 2 0 20 3043 1067 1299 233 18
## 5 1 1 1 3 0 20 3312 1064 1299 233 18
## 6 1 1 1 3 0 20 3312 1064 1299 233 18
## slope flowaccum SRFIBL SRFIGL SRFIRL SRFINA SRFIMB SRFIMC PBI PVI
## 1 13 27 319 632 927 2300 2487 1497 681 1192
## 2 13 27 319 632 927 2300 2487 1497 681 1192
## 3 2 0 319 661 975 2251 2711 1567 680 1173
## 4 2 0 319 661 975 2251 2711 1567 680 1173
## 5 4 26 232 429 642 1860 1815 1037 540 1168
## 6 4 26 232 429 642 1860 1815 1037 540 1168
str(dat)
## 'data.frame': 6630 obs. of 21 variables:
## $ Country : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Site : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Cluster : int 2 2 1 1 1 1 1 1 1 1 ...
## $ Plot : int 7 7 2 2 3 3 4 4 5 5 ...
## $ DepthTop : int 0 0 0 0 0 0 0 0 0 0 ...
## $ DepthBottom: int 20 20 20 20 20 20 20 20 20 20 ...
## $ SOC : int 2937 2937 3043 3043 3312 3312 2159 2159 3558 3558 ...
## $ elev : int 1099 1099 1067 1067 1064 1064 1094 1094 1087 1087 ...
## $ map : num 1299 1299 1299 1299 1299 ...
## $ mfi : num 233 233 233 233 233 233 233 233 233 233 ...
## $ pci : num 18 18 18 18 18 18 18 18 18 18 ...
## $ slope : int 13 13 2 2 4 4 11 11 2 2 ...
## $ flowaccum : int 27 27 0 0 26 26 0 0 0 0 ...
## $ SRFIBL : int 319 319 319 319 232 232 348 348 261 261 ...
## $ SRFIGL : int 632 632 661 661 429 429 545 545 400 400 ...
## $ SRFIRL : int 927 927 975 975 642 642 666 666 666 666 ...
## $ SRFINA : int 2300 2300 2251 2251 1860 1860 2398 2398 1909 1909 ...
## $ SRFIMB : int 2487 2487 2711 2711 1815 1815 2064 2064 1865 1865 ...
## $ SRFIMC : int 1497 1497 1567 1567 1037 1037 1060 1060 1060 1060 ...
## $ PBI : int 681 681 680 680 540 540 652 652 554 554 ...
## $ PVI : int 1192 1192 1173 1173 1168 1168 1262 1262 1172 1172 ...
The variables are:
- Country, Site, Cluster, Plot are sampling plot locations (see the LDSF field guide[4])
- DepthTop and DepthBottom soil depth in cm (all these data are topsoils)
- elev: Elevation from SRTM (m)
- map: Mean annual precipitation (mm)
- mfi: Modified fournier index (rainfall agressiveness)
- slope: Slope based on SRTM (degrees)
- flowaccum: Flow accumulation from SRTM
- SRFIBL … SRFIMC: Calibrated surface reflectance for Landsat bands 1…5,7
- PBI and PVI: Prependicular background and vegetation indices, respectively.
Let's take a look at the SOC data by Site…
library(lattice)
bwplot(Site ~ SOC, dat, box.ratio = 0.05, fill = "brown")
As is evident from the plot above, there is great variation in SOC between these sites. Note that the SOC values are concentrations multiplied by 100 as we will be producing the final map in 16-bit format.
Before proceeding it is often a good idea to look at the distribution of the variable we are modeling to determine if any transformations are required.
library(lattice)
densityplot(~SOC, dat)
library(nlme)
soc.lme1 <- lme(log(SOC) ~ SRFIGL + SRFIRL + SRFINA + SRFIMB + SRFIMC + PBI +
PVI + map, random = ~1 | Site/Cluster, dat)
summary(soc.lme1)
## Linear mixed-effects model fit by REML
## Data: dat
## AIC BIC logLik
## 4629 4710 -2302
##
## Random effects:
## Formula: ~1 | Site
## (Intercept)
## StdDev: 0.6534
##
## Formula: ~1 | Cluster %in% Site
## (Intercept) Residual
## StdDev: 0.2618 0.301
##
## Fixed effects: log(SOC) ~ SRFIGL + SRFIRL + SRFINA + SRFIMB + SRFIMC + PBI + PVI + map
## Value Std.Error DF t-value p-value
## (Intercept) 11.811 13.021 5960 0.907 0.3644
## SRFIGL 0.000 0.000 5960 6.035 0.0000
## SRFIRL 0.001 0.004 5960 0.265 0.7909
## SRFINA 0.003 0.004 5960 0.786 0.4319
## SRFIMB 0.000 0.000 5960 3.893 0.0001
## SRFIMC 0.000 0.000 5960 -5.652 0.0000
## PBI -0.012 0.014 5960 -0.851 0.3950
## PVI -0.004 0.014 5960 -0.286 0.7749
## map 0.000 0.000 5960 -1.231 0.2182
## Correlation:
## (Intr) SRFIGL SRFIRL SRFINA SRFIMB SRFIMC PBI PVI
## SRFIGL -0.009
## SRFIRL -0.684 -0.005
## SRFINA 0.726 -0.006 0.004
## SRFIMB -0.005 0.056 0.025 0.017
## SRFIMC -0.014 -0.006 -0.015 -0.027 -0.794
## PBI -0.089 -0.003 -0.665 -0.749 -0.033 0.029
## PVI -0.999 0.008 0.721 -0.690 0.006 0.013 0.038
## map -0.012 0.014 -0.006 -0.006 -0.004 -0.006 0.008 0.000
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.8010 -0.5373 -0.0103 0.5335 5.2811
##
## Number of Observations: 6630
## Number of Groups:
## Site Cluster %in% Site
## 42 662
The model summary above indicates that we may want to simplify the model. Mean annual rainfall does not contribute much to the model when we include Landsat reflectance. After a few iterations with adding/removing variables from the model and testing each model we end up with:
library(nlme)
soc.lme2 <- lme(log(SOC) ~ SRFIGL + SRFINA + SRFIMB + SRFIMC + PVI, random = ~1 |
Site/Cluster, dat)
summary(soc.lme2)
## Linear mixed-effects model fit by REML
## Data: dat
## AIC BIC logLik
## 4592 4654 -2287
##
## Random effects:
## Formula: ~1 | Site
## (Intercept)
## StdDev: 0.625
##
## Formula: ~1 | Cluster %in% Site
## (Intercept) Residual
## StdDev: 0.2623 0.301
##
## Fixed effects: log(SOC) ~ SRFIGL + SRFINA + SRFIMB + SRFIMC + PVI
## Value Std.Error DF t-value p-value
## (Intercept) 5.410 0.3314 5963 16.322 0e+00
## SRFIGL 0.000 0.0001 5963 6.046 0e+00
## SRFINA 0.000 0.0001 5963 -6.904 0e+00
## SRFIMB 0.000 0.0000 5963 3.842 1e-04
## SRFIMC 0.000 0.0001 5963 -5.613 0e+00
## PVI 0.002 0.0003 5963 6.262 0e+00
## Correlation:
## (Intr) SRFIGL SRFINA SRFIMB SRFIMC
## SRFIGL -0.682
## SRFINA 0.888 -0.777
## SRFIMB -0.158 0.057 -0.283
## SRFIMC -0.252 -0.005 -0.113 -0.794
## PVI -0.950 0.729 -0.952 0.133 0.283
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.7859 -0.5358 -0.0112 0.5346 5.2935
##
## Number of Observations: 6630
## Number of Groups:
## Site Cluster %in% Site
## 42 662
The residuals from soc.lme2 look reasonable as well:
plot(soc.lme2)
# Measured versus predicted SOC (back-transforming the predicted values):
library(car)
## Loading required package: MASS
## Loading required package: nnet
scatterplot(exp(predict(soc.lme2)) ~ SOC, xlab = "Measured SOC", ylab = "Predicted SOC",
xlim = c(0, 8000), ylim = c(0, 8000), smooth = F, dat)
The above model has an adjusted R-squared of 0.85. As shown in the summary variance is highest between sites, but also relatively high within many of the sites (between sampling clusters).
For this exercise, we show an example of how to use the model developed earlier (soc.lme2) to predict SOC from satellite imagery. The images used are from the GLS2005 archive from sites in Mali, Tanzania and Mozambique. We have clipped the individual sites to reduce the size of the dataset. The images are in the original zip archive downloaded earlier.
Note that the bands must be named consistent with the covariates in the model and that we create a “dummy” band for each site which contains the site ID so that the model uses site-specific coefficients in the predictions of SOC.
library(raster);
## Loading required package: sp
## raster 2.0-41 (21-December-2012)
## Attaching package: 'raster'
## The following object(s) are masked from 'package:MASS':
##
## area, select
## The following object(s) are masked from 'package:nlme':
##
## getData
setwd("~/Desktop/afsisData"); #Change this to where you saved the data
rast1 <- stack(raster("1/SRFIGL.tif"),raster("1/SRFINA.tif"),raster("1/SRFIMB.tif"),raster("1/SRFIMC.tif"),raster("1/PVI.tif"));
site <- raster(rast1,1);
names(site) <- 'Site';
site[site>0] <- 7;
rast1 <- stack(rast1,site);
rast2<- stack(raster("2/SRFIGL.tif"),raster("2/SRFINA.tif"),raster("2/SRFIMB.tif"),raster("2/SRFIMC.tif"),raster("2/PVI.tif"));
site <- raster(rast2,1);
names(site) <- 'Site';
site[site>0] <- 39;
rast2 <- stack(rast2,site);
rast3<- stack(raster("3/SRFIGL.tif"),raster("3/SRFINA.tif"),raster("3/SRFIMB.tif"),raster("3/SRFIMC.tif"),raster("3/PVI.tif"));
site <- raster(rast3,1);
names(site) <- 'Site';
site[site>0] <- 1;
rast3 <- stack(rast3,site);
Let's apply soc.lme2 to the imagery and display the results.
mapSOC1 <- exp(predict(rast1, soc.lme2, level=1));
mapSOC2 <- exp(predict(rast2, soc.lme2, level=1));
mapSOC3 <- exp(predict(rast3, soc.lme2, level=1));
col=colorRampPalette(c("yellow","white","wheat","black"))(255); #You may want to play with these!
zlim=c(0,3500) #Or an appropriate stretch for the site you are modeling
plot(mapSOC1, col=col, main="Map of SOC (g/kg) for Site 1 (Mali)", xlab="Longitude", ylab="Latitude", zlim=zlim);
plot(mapSOC2, col=col, main="Map of SOC (g/kg) for Site 2 (Mozambique)", xlab="Longitude", ylab="Latitude", zlim=zlim);
plot(mapSOC3, col=col, main="Map of SOC (g/kg) for Site 3 (Tanzania)", xlab="Longitude", ylab="Latitude", zlim=zlim);
Note that the model we use above is the “1-level” model, which means that we are using the “site-specific” model coefficients from soc.lme2. Given the size of the training dataset this model should give us reasonable predictions, but it is useful to compare the distribution of predicted SOC from the map surface and the measured values used to develop the model.
mapSOC.val1 <- getValues(mapSOC1)
mapSOC.val2 <- getValues(mapSOC2)
mapSOC.val3 <- getValues(mapSOC3)
plot(density(na.omit(mapSOC.val1)), col = "orange", lwd = 2, ylim = c(0, 0.008),
xlim = c(0, 3000), main = "")
# lines(density(na.omit(dat[dat$Site==7,]$SOC)), col='orange', lwd=2,
# lty=2);
lines(density(na.omit(mapSOC.val2)), col = "red", lwd = 2, main = "")
lines(density(na.omit(mapSOC.val3)), col = "black", lwd = 2, main = "")
legend("topright", bty = "n", c("Site 1", "Site 2", "Site 3"), lty = c(1, 2,
2), col = c("orange", "red", "black"), lwd = c(2, 2, 2))
Compare the above density plots to the boxplots for the measured values by site. The corresponding sites in the boxplot are:
- Site 1: 7
- Site 2: 39
- Site 3: 1
[1] Kottek, M., Grieser, J., Beck, C., Rudolf, B., & Rubel, F. (2006). World Map of the Köppen-Geiger climate classification updated. Meteorologische Zeitschrift, 15(3), 259–263. doi:10.1127/0941-2948/2006/0130
[2] Vågen, T.-G., Davey, F., & Shepherd, K. D. (2012). Land health surveillance: Mapping soil carbon in Kenyan rangelands. In P. K. R. Nair & D. Garrity (Eds.), Agroforestry - The Future of Global Land Use. Springer.
[3] Vågen, T.-G., Shepherd, K. D., & Walsh, M. G. (2006). Sensing landscape level change in soil fertility following deforestation and conversion in the highlands of Madagascar using Vis-NIR spectroscopy. Geoderma, 133(3-4), 281–294. doi:10.1016/j.geoderma.2005.07.014
[4] Vågen, T.-G., Winowiecki, L. A., Tamene Desta, L., & Tondoh, J. E. (2010). The Land Degradation Surveillance Framework (LDSF) - Field Guide. 14 p. Nairobi, Kenya: World Agroforestry Centre.