Main goal of this study

This script is a proof of concept for soil moisture estimation from remotely sensed data. The study area is located in Llanos Orientales, a vast tropical grassland plain situated to the east of the Colombian Andes.

Three different sources are used to obtain explanatory variables: (i) a downscaled CHIRPS precipitation dataset; (ii) Landsat multispectral images; and (iii) a SRTM digital elevation model.

Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) is a 30+ year quasi-global rainfall dataset. Spanning 50°S-50°N (and all longitudes), starting in 1981 to near-present, CHIRPS incorporates 0.05° resolution satellite imagery with in-situ station data to create gridded rainfall time series for trend analysis and seasonal drought monitoring.

The Shuttle Radar Topographic Mission (SRTM)elevation data, produced by NASA originally, is a major breakthrough in digital mapping of the world, and provides a major advance in the accessibility of high quality elevation data for large portions of the tropics and other areas of the developing world.

The target soil property is surface soil moisture. Such soil moisture data were measured in two different dates, during a rainy season (April 2011) as well as through a dry season (December 2009).

Study Area

The study is comprised by two watersheds, RĂ­o Negro and GuatiquĂ­a, that can be seen in the next figure.

library(raster)
## Loading required package: sp
library(rasterVis)
## Loading required package: lattice
## Loading required package: latticeExtra
## Loading required package: RColorBrewer
library(tmap)
setwd("C:\\Users\\IvĂ¡n Lizarazo\\Documents\\UN\\datos\\soilmoisture\\piedemonte")
hum <- shapefile("humedad.shp")
W <- shapefile("WGuatiquia_Negro.shp")
### soil moisture' spatial reference system is WGS 84
### Let's visualize the study area
library(leaflet)
m <- leaflet(W) %>% 
  addPolygons() %>% 
  addTiles()
  addMiniMap(m, tiles =  providers$Esri.WorldStreetMap,
             toggleDisplay = T, width = 120, height=80)

Methodological approach

Our study comprises two stages. In the first stage, a random forest model is built for each season of interest using the field measured soil moisture as reference data for training. In the second stage, the previously obtained random forest model is applied to 2015 remotely-sensed data in order to estimate soil moisture during a time when the SMAP. Following figures illustrate the two stages.

Fist stage. Creation of two random forest models: one for the rainy season and the other one for the dry season

Fist stage. Creation of two random forest models: one for the rainy season and the other one for the dry season

Second stage. Application of the random forest model for the 2015 rainy season

Second stage. Application of the random forest model for the 2015 rainy season

Soil moisture data

Soil moisture in the top 30 cm was measured in two hundred sampling sites located in different mapping soil units.

Soil moisture sampling points.

Soil moisture sampling points.

Now, let’s review the measured soil moisture data:

# histogram of soil moisture at rainy season
hist(hum$Humed1)

# histogram of soil moisture at dry season
hist(hum$Humed2)

# a plot of data may be useful
tm_shape(W) + tm_fill(col="grey") +
  tm_shape(hum) + 
  tm_dots(col="Humed1", palette="RdBu",
        auto.palette.mapping =FALSE,
             title="Soil Moisture at Rainy Season",
  size=0.1) +
# 
  tm_legend(legend.outside=TRUE)

####
tm_shape(W) + tm_fill(col="grey") +
  tm_shape(hum) + 
  tm_dots(col="Humed2", palette="RdBu",
        auto.palette.mapping =FALSE,
             title="Soil Moisture at Dry Season",
  size=0.1) +
# 
  tm_legend(legend.outside=TRUE)

Digital Elevation

Now, let’s visualize SRTM elevation in the study area:

### plot dem
dem <- raster("dem_30.tif")
dem
## class       : RasterLayer 
## dimensions  : 2545, 3055, 7774975  (nrow, ncol, ncell)
## resolution  : 30, 30  (x, y)
## extent      : 1028016, 1119666, 934053, 1010403  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\dem_30.tif 
## names       : dem_30 
## values      : 106, 3978  (min, max)
mapTheme <- rasterTheme(region=rev(brewer.pal(11,"RdYlGn")))
plt <- levelplot(dem, margin=F, par.settings=mapTheme,
                 main="SRTM Digital Elevation Model (30m)")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

As DEM coordinates reference system is projected, we need to reproject soil moisture data into the same CRS system. Then, we compute slope and aspect using the raster library.

library(sp)
proy_dem <- proj4string(dem)
crs_dem <- CRS(proy_dem)
hum_rep <- spTransform(hum, crs_dem)
plot(dem, col=terrain.colors(10), axes = TRUE)
plot(hum_rep, pch=21, cex=0.5, add=TRUE)

Computing topographic variables

Let’s use the dynatopmodel library to compute upslope area and topographic wetness index values. Before such computation, a fix on a function is needed.

###install.packages("dynatopmodel")
library("dynatopmodel")


### the upslope function has been modified
##  based on Robert Hijmans advise

upslope <- function (dem, log = TRUE, atb = FALSE, deg = 0.12, fill.sinks = TRUE) 
{
  if (!all.equal(xres(dem), yres(dem))) {
    stop("Raster has differing x and y cell resolutions. Check that it is in a projected coordinate system (e.g. UTM) and use raster::projectRaster to reproject to one if not. Otherwise consider using raster::resample")
  }
  if (fill.sinks) {
    capture.output(dem <- invisible(raster::setValues(dem, 
                                                      topmodel::sinkfill(raster::as.matrix(dem), res = xres(dem), 
                                                                         degree = deg))))
  }
  topidx <- topmodel::topidx(raster::as.matrix(dem), res = xres(dem))
  a <- raster::setValues(dem, topidx$area)
  if (log) {
    a <- log(a)
  }
  if (atb) {
    atb <- raster::setValues(dem, topidx$atb)
    a <- addLayer(a, atb)
    names(a) <- c("a", "atb")
  }
  return(a)
}

create_layers <- function (dem, fill.sinks = TRUE, deg = 0.1) 
{
  layers <- stack(dem)
  message("Building upslope areas...")
  a.atb <- upslope(dem, atb = TRUE, fill.sinks = fill.sinks, 
                        deg = deg)
  layers <- addLayer(layers, a.atb)
  names(layers) <- c("dem", "a", "atb")
  return(layers)
}

# deg is the threshold intercell slope to determine sinks (degrees).
# deg default's value is 0.10

topo <- create_layers(dem, fill.sinks = TRUE, deg = 0.10)
## Building upslope areas...
# topo is a multi-band raster (stack) comprising, in order, 
#the filled elevations, upslope area and topographic wetness
#index values.

### calculo de variables topograficas
slope <- terrain(dem, "slope", "tangent", 8)

pslope <- slope * 100

mapTheme <- rasterTheme(region=rev(brewer.pal(11,"RdYlBu")))
plt <- levelplot(slope, margin=F, par.settings=mapTheme,
                 main="Slope")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

aspect <- terrain(dem, "aspect", "tangent", 8)
aspect
## class       : RasterLayer 
## dimensions  : 2545, 3055, 7774975  (nrow, ncol, ncell)
## resolution  : 30, 30  (x, y)
## extent      : 1028016, 1119666, 934053, 1010403  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## data source : in memory
## names       : aspect 
## values      : 1.192622e-18, 6.283185  (min, max)
mapTheme <- rasterTheme(region=rev(brewer.pal(11,"Spectral")))
plt <- levelplot(aspect, margin=F, par.settings=mapTheme,
                 main="Aspect")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

topo$slope <- pslope

topo$aspect <- aspect

Landsat-based variables

Several indices can be computed from Landsat 5 TM images, including, the Enhanced Vegetation Index (EVI), the Normalized Difference Water Index (NDWI), as well as the Normalized Burn Ratio Thermal (NBRT).

The EVI is an ‘optimized’ vegetation index designed to enhance the vegetation signal with improved sensitivity in high biomass regions and improved vegetation monitoring through a de-coupling of the canopy background signal and a reduction in atmosphere influences. EVI Index is computed following this equation:

\(\begin{align*}\textrm{EVI}=G * \frac{\rho_{\textrm{nir}}-\rho_{\textrm{red}}}{\rho_{\textrm{nir}}+ C1 * \rho_{\textrm{red} } - C2 * \rho_{\textrm{blue} } + L}\end{align*} \\\)

where NIR/red/blue are atmospherically-corrected surface reflectances, L is the canopy background adjustment that addresses non-linear, differential NIR and red radiant transfer through a canopy, and C1, C2 are the coefficients of the aerosol resistance term, which uses the blue band to correct for aerosol influences in the red band. The coefficients adopted in this stuy were L=1, C1 = 6, C2 = 7.5, and G (gain factor) = 2.5.

The NDWI is an indicator sensitive to the change in the water content of vegetation and bare soil (Gao, 1996). NDWI is computed using the near infrared (NIR) and the short wave infrared (SWIR) reflectance’s. The NDWI Index is computed by:

\(\begin{align*}\textrm{NDWI}=\frac{\rho_{\textrm{nir}}-\rho_{\textrm{swir}}}{\rho_{\textrm{nir}}+\rho_{\textrm{swir}}}\end{align*} \\\)

THE NBRT index uses a near-infrared band, a shortwave-infrared band, and a thermal band. It provides separability between land surfaces due to temperature.The NBRT Index is computed by:

\(\begin{align*}\textrm{NBRT}=\frac{\rho_{\textrm{nir}}-\rho_{\textrm{swir}}\frac{\textrm{Thermal}}{1000}}{\rho_{\textrm{nir}}+\rho_{\textrm{swir}}\frac{\textrm{Thermal}}{1000}}\end{align*} \\\)

### NDWI and EVI were obtained from Landsat images
### The Normalized Difference Water Index (NDWI) is sensitive to changes 
### in liquid water content of vegetation canopies. 
### It is derived from the Near-IR band and the second IR band, 1240 nm, 
### when available and the nearest available IR band otherwise. 
### It ranges in value from -1.0 to 1.0. 


###

ndwi1 = raster("ndwi1_30r.tif")

ndwi1
## class       : RasterLayer 
## dimensions  : 2545, 3055, 7774975  (nrow, ncol, ncell)
## resolution  : 30, 30  (x, y)
## extent      : 1028016, 1119666, 934053, 1010403  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\ndwi1_30r.tif 
## names       : ndwi1_30r 
## values      : -1, 1  (min, max)
ndwi2 = raster("ndwi2_30r.tif")

ndwi2
## class       : RasterLayer 
## dimensions  : 2545, 3055, 7774975  (nrow, ncol, ncell)
## resolution  : 30, 30  (x, y)
## extent      : 1028016, 1119666, 934053, 1010403  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\ndwi2_30r.tif 
## names       : ndwi2_30r 
## values      : -1, 1  (min, max)
evi1 = raster("evi1_30r.tif")

evi1
## class       : RasterLayer 
## dimensions  : 2545, 3055, 7774975  (nrow, ncol, ncell)
## resolution  : 30, 30  (x, y)
## extent      : 1028016, 1119666, 934053, 1010403  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\evi1_30r.tif 
## names       : evi1_30r 
## values      : -1, 1  (min, max)
evi2 = raster("evi2_30r.tif")

evi2
## class       : RasterLayer 
## dimensions  : 2545, 3055, 7774975  (nrow, ncol, ncell)
## resolution  : 30, 30  (x, y)
## extent      : 1028016, 1119666, 934053, 1010403  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\evi2_30r.tif 
## names       : evi2_30r 
## values      : -1, 1  (min, max)
nbrt1 = raster("nbrt1_30r.tif")

nbrt1
## class       : RasterLayer 
## dimensions  : 2545, 3055, 7774975  (nrow, ncol, ncell)
## resolution  : 30, 30  (x, y)
## extent      : 1028016, 1119666, 934053, 1010403  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\nbrt1_30r.tif 
## names       : nbrt1_30r 
## values      : -1, 1  (min, max)
nbrt2 = raster("nbrt2_30r.tif")

nbrt2
## class       : RasterLayer 
## dimensions  : 2545, 3055, 7774975  (nrow, ncol, ncell)
## resolution  : 30, 30  (x, y)
## extent      : 1028016, 1119666, 934053, 1010403  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\nbrt2_30r.tif 
## names       : nbrt2_30r 
## values      : -1, 1  (min, max)

CHIRPS precipitation data

Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) is a 30+ year quasi-global rainfall dataset. Its spatial resolution is aprox. 5 km resolution. CHIRPS data were downscaled to 30 m using the Empirical Bayes Krigging (EBK) technique.

According to ESRI, EBK is a geostatistical interpolation method that automates the most difficult aspects of building a valid kriging model. While common kriging methods require users to manually adjust parameters to receive accurate results, EBK automatically calculates these parameters through a process of subsetting and simulations.

EBK also differs from other kriging methods by accounting for the error introduced by estimating the underlying semivariogram. Other kriging methods calculate the semivariogram from known data locations and use this single semivariogram to make predictions at unknown locations; this process implicitly assumes that the estimated semivariogram is the true semivariogram for the interpolation region. By not taking the uncertainty of semivariogram estimation into account, other kriging methods underestimate the standard errors of prediction.

rchirps1 <- raster("chirps1_5k.tif")

chirps1 <- raster("ebk_ch1.tif")

###ch1 <- resample(chirps1, nbrt1)

rchirps2 <- raster("chirps2_5k.tif")

chirps2 = raster("ebk_ch2.tif")

###ch2 <- resample(chirps2, nbrt1)

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlBu"))
plt <- levelplot(rchirps1, margin=F, par.settings=mapTheme, at = do.breaks(c(0,800),100),
                 main="Raw CHIRPS Precipitation Dry Season (mm)")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlBu"))
plt <- levelplot(chirps1, margin=F, par.settings=mapTheme, at = do.breaks(c(0,800),100),
                 main="EBK CHIRPS Precipitation Dry Season (mm)")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlBu"))
plt <- levelplot(rchirps2, margin=F, par.settings=mapTheme, at = do.breaks(c(0,800),100),
                 main="Raw CHIRPS Precipitation Wet Season (mm)")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlBu"))
plt <- levelplot(chirps2, margin=F, par.settings=mapTheme, at = do.breaks(c(0,800),100),
                 main="EBK CHIRPS Precipitation Wet Season (mm)")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

topo$ndwi1 <- ndwi1

topo$ndwi2 <- ndwi2

topo$evi1 <- evi1

topo$evi2 <- evi2

topo$nbrt1 <- nbrt1

topo$nbrt2 <- nbrt2

topo$chirps1 <- chirps1

topo$chirps2 <- chirps2

Visualization time

Let’s plot several variables included in the raster stack topo.

# devtools::install_github('oscarperpinan/rasterVis') 
#Palettes provided by this package are available through 
#the viridisTheme, infernoTheme, and plasmaTheme functions. 
#Besides, YlOrRdTheme, BuRdTheme, RdBuTheme, GrTheme, and BTCTheme 
#are variations of rasterTheme 
#using palettes of the RColorBrewer and hexbin packages. 

library(rasterVis)
proj <- CRS('+proj=longlat +ellps=WGS84')
levelplot(topo$dem, par.settings = BuRdTheme, xlab.top='DEM')

levelplot(topo$chirps2, par.settings = RdBuTheme, xlab.top='CHIRPS 2 for Rainy Season')

levelplot(topo$a, par.settings = BTCTheme, xlab.top='Upslope Area')

levelplot(topo$atb, par.settings = viridisTheme, xlab.top='TWI')

W <- shapefile("WGuatiquia_Negro.shp")

hist(ndwi2)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"Spectral"))
plt <- levelplot(ndwi2, margin=F, par.settings=mapTheme, at = do.breaks(c(0,1),10),
                 main="NDWI for Rainy Season")
plt + layer(sp.lines(W, col="red", lwd=0.5))

hist(ndwi1)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"Spectral"))
plt <- levelplot(ndwi1, margin=F, par.settings=mapTheme, at = do.breaks(c(0,1),10),
                 main="NDWI for Dry Season")
plt + layer(sp.lines(W, col="red", lwd=1))

hist(evi2)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(8,"RdYlGn"))
plt <- levelplot(evi2, margin=F, par.settings=mapTheme, at = do.breaks(c(0,1),10),
                 main="EVI for Wet Season")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

hist(evi1)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(8,"RdYlGn"))
plt <- levelplot(evi1, margin=F, par.settings=mapTheme, at = do.breaks(c(0,1),10),
                 main="EVI for Dry Season")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

hist(nbrt1)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

#
#
# 
# 
hist(nbrt2)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

#  raw NBRT seems to be useless with no transformation
power2 <- function(x) {x**2}
pnbrt2 <- calc(nbrt2, power2)
hist(pnbrt2, main="Square NBRT for Rainy Season")

#
mapTheme <- rasterTheme(region=brewer.pal(11,"YlOrRd"))
## Warning in brewer.pal(11, "YlOrRd"): n too large, allowed maximum for palette YlOrRd is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(pnbrt2, margin=F, par.settings=mapTheme,
                 at = do.breaks(c(0.8,1),10),
                 main="NBRT  for Wet Season")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

pnbrt1 <- calc(nbrt1, power2)
#
hist(pnbrt1, main = "Square NBRT for Dry Season")

mapTheme <- rasterTheme(region=brewer.pal(11,"YlOrRd"))
## Warning in brewer.pal(11, "YlOrRd"): n too large, allowed maximum for palette YlOrRd is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(pnbrt1, margin=F, par.settings=mapTheme,
                 at = do.breaks(c(0.8,1),10),
                 main="NBRT  for Dry Season")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

Some preprocessing work

topo$nbrt1 <- pnbrt1

topo$nbrt2 <- pnbrt2

hum_rep$x <- hum_rep$ESTE

hum_rep$y <- hum_rep$NORTE

hum_pts <- hum_rep

hum_pts$ALTURA <- NULL

df.hum_pts <- data.frame(hum_pts)

coordinates(df.hum_pts) <- ~ESTE+NORTE

#df.hum_pts@coords

proj4string(df.hum_pts) <- proj4string(hum)

df.hum_rep <- spTransform(df.hum_pts, crs_dem)

### extraction of explanatory variables data
### at points where soil moisture was measured in the field

data <- data.frame(coordinates(df.hum_rep),
                   hum_rep, 
                   extract(topo, df.hum_rep))

coordinates(data) <- ~ESTE+NORTE

proj4string(data) <- proj4string(dem)



####################

#Sample Indexes
indexes = sample(1:nrow(data), size=0.4*nrow(data))

# Split data
test = data[indexes,]
dim(test)  # 80 35
## [1] 80 39
train = data[-indexes,]
dim(train) # 120 35
## [1] 120  39
drops <- c("ID","No","UCS","ESTE.1","No1", "NORTE.1","ALTURA", "Dapar", "Dreal",    "K", "Arena", "Arcilla", "SAT","CC", "pmp", "Agua_aprov", "F19",  "F20")

ntrain <- train[,!(names(train) %in% drops)] #remove columns "AREA" and "PERIMETER"

ntrain[1,8:15]
## class       : SpatialPointsDataFrame 
## features    : 1 
## extent      : 1091983, 1091983, 947800.6, 947800.6  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## variables   : 8
## names       : optional, dem,                a,              atb,            slope,          aspect,         ndwi1,             ndwi2 
## min values  :        1, 227, 7.64396283310526, 10.4950411508114, 1.86338998124982, 2.0344439357957, 0.19260199368, 0.116410337388515 
## max values  :        1, 227, 7.64396283310526, 10.4950411508114, 1.86338998124982, 2.0344439357957, 0.19260199368, 0.116410337388515

Random Forest prediction of Surface Soil Moisture

set.seed(42)
library(randomForest)
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
xx <- data.frame(ntrain$Humed1, ntrain$dem, ntrain$a, ntrain$atb, ntrain$slope, ntrain$aspect, ntrain$ndwi2, ntrain$evi2, ntrain$nbrt2) 

xx <- xx[complete.cases(xx),]

y <- xx[1]

x <- xx[-1]

# another option
# df[,c("A","B","E")]
#

## next code can be used to tune the random forest model
##tt <- tuneRF(x=x, y=y, type="regression", mtryStart=3, ntreeTry=500, stepFactor=2, improve=0.05,
##       trace=TRUE, plot=TRUE, doBest=FALSE, na.action= na.omit)

# Algorithm Tune (tuneRF)
#set.seed(1234)
#bestmtry <- tuneRF(x, y, stepFactor=1.5, improve=1e-5, ntree=500)
#print(bestmtry)


fit1 <- randomForest(Humed1 ~ dem + a + atb + slope + aspect + ndwi2 + evi2 + nbrt2 + chirps2,   data=train, 
                     ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)

print(fit1) # view results 
## 
## Call:
##  randomForest(formula = Humed1 ~ dem + a + atb + slope + aspect +      ndwi2 + evi2 + nbrt2 + chirps2, data = train, ntrees = 5000,      mtry = 5, importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 5
## 
##           Mean of squared residuals: 92.85344
##                     % Var explained: -2.5
importance(fit1) # importance of each predictor
##            %IncMSE IncNodePurity
## dem      7.5618325     1400.9901
## a        1.9128718      560.1969
## atb     -1.1764081      614.3434
## slope    4.0829722     1297.1526
## aspect   0.1134062      676.6828
## ndwi2   -0.1147599      879.3562
## evi2     1.0313908     1434.3180
## nbrt2   -1.2837103      713.5284
## chirps2  7.3824844     2156.9906
par(mar=c(1,1,1,1))

varImpPlot(fit1,type=2)

rf1 <- predict(fit1, test)

summary(rf1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   29.85   34.62   37.29   37.53   40.32   50.72       1
#
#

compare1 <- cbind (actual=test$Humed1, rf1) 

#
mean (apply(compare1, 1, min, na.rm=T)/apply(compare1, 1, max, na.rm=T)) # calculate accuracy
## [1] 0.8460238
# 
# Random Forest prediction of Humedad 2 (dry season)
#library(randomForest)
fit2 <- randomForest(Humed2 ~ dem + a + atb + slope + aspect + ndwi1  + evi1 + nbrt1 + chirps1,   data=train, 
                     ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)
print(fit2) # view results 
## 
## Call:
##  randomForest(formula = Humed2 ~ dem + a + atb + slope + aspect +      ndwi1 + evi1 + nbrt1 + chirps1, data = train, ntrees = 5000,      mtry = 5, importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 5
## 
##           Mean of squared residuals: 90.64886
##                     % Var explained: -8.41
importance(fit2) # importance of each predictor
##            %IncMSE IncNodePurity
## dem      2.9400222     1105.7472
## a        3.3484779      936.0559
## atb      2.9293093     1140.8841
## slope   -1.2557620     1379.9738
## aspect  -1.3126297      518.4703
## ndwi1    0.5277643      958.2932
## evi1    -0.9904028      899.4558
## nbrt1    0.4519406      738.4453
## chirps1  1.6435506     1274.2156
varImpPlot(fit2,type=2)

rf2 <- predict(fit2, test)

summary(rf2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   10.13   14.60   16.90   17.31   18.81   29.79       1
#Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#

compare2 <- cbind (actual=test$Humed2, rf2) 

#> 
mean (apply(compare2, 1, min, na.rm=T)/apply(compare2, 1, max, na.rm=T)) # calculate accuracy
## [1] 0.6691419
# 
############## RF implemented in party library

library(party)
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
cf <- cforest(Humed1 ~ dem + a + atb + slope + aspect + ndwi2 + evi2 + nbrt2 + chirps2
              , data = train) 
pt <- party:::prettytree(cf@ensemble[[1]], names(cf@data@get("input"))) 
pt 
## 1) atb <= 9.373291; criterion = 0.718, statistic = 1.157
##   2) slope <= 1.767767; criterion = 0.968, statistic = 4.585
##     3)*  weights = 0 
##   2) slope > 1.767767
##     4) chirps2 <= 544.2397; criterion = 0.892, statistic = 2.589
##       5)*  weights = 0 
##     4) chirps2 > 544.2397
##       6) slope <= 8.249579; criterion = 0.994, statistic = 7.688
##         7)*  weights = 0 
##       6) slope > 8.249579
##         8)*  weights = 0 
## 1) atb > 9.373291
##   9) a <= 9.540003; criterion = 0.818, statistic = 1.785
##     10)*  weights = 0 
##   9) a > 9.540003
##     11)*  weights = 0
nt <- new("BinaryTree") 
nt@tree <- pt 
nt@data <- cf@data 
nt@responses <- cf@responses 
nt 
## 
##   Conditional inference tree with 0 terminal nodes
## 
## Response:  Humed1 
## Inputs:  dem, a, atb, slope, aspect, ndwi2, evi2, nbrt2, chirps2 
## Number of observations:  120 
## 
## 1) atb <= 9.373291; criterion = 0.718, statistic = 1.157
##   2) slope <= 1.767767; criterion = 0.968, statistic = 4.585
##     3)*  weights = 0 
##   2) slope > 1.767767
##     4) chirps2 <= 544.2397; criterion = 0.892, statistic = 2.589
##       5)*  weights = 0 
##     4) chirps2 > 544.2397
##       6) slope <= 8.249579; criterion = 0.994, statistic = 7.688
##         7)*  weights = 0 
##       6) slope > 8.249579
##         8)*  weights = 0 
## 1) atb > 9.373291
##   9) a <= 9.540003; criterion = 0.818, statistic = 1.785
##     10)*  weights = 0 
##   9) a > 9.540003
##     11)*  weights = 0
plot(nt, type="simple", main = "A Tree from the Rainy Random Forest") 

################another option to visualize

## http://stats.stackexchange.com/questions/41443/how-to-actually-plot-a-sample-tree-from-randomforestgettree

#options(repos='http://cran.rstudio.org')
#have.packages <- installed.packages()
#cran.packages <- c('devtools','plotrix','randomForest','tree')
#to.install <- setdiff(cran.packages, have.packages[,1])
#if(length(to.install)>0) install.packages(to.install)

library(devtools)
#if(!('reprtree' %in% installed.packages())){
#  install_github('araastat/reprtree')
#}
#for(p in c(cran.packages, 'reprtree')) eval(substitute(library(pkg), list(pkg=p)))
#Then go ahead and make your model and tree:
  

library(reprtree)
## Loading required package: tree
## Loading required package: plotrix
rainymodel <- randomForest(Humed1 ~ dem + a + atb + slope + aspect + ndwi2 + evi2 + nbrt2 + chirps2,
                      data=train, na.action=na.omit, 
                      importance=TRUE, ntree=5000, mtry = 2, do.trace=100)
##      |      Out-of-bag   |
## Tree |      MSE  %Var(y) |
##  100 |    91.14   100.60 |
##  200 |    92.35   101.94 |
##  300 |    92.59   102.21 |
##  400 |    93.27   102.96 |
##  500 |    92.44   102.04 |
##  600 |    92.55   102.16 |
##  700 |    91.77   101.30 |
##  800 |    92.12   101.69 |
##  900 |    91.79   101.33 |
## 1000 |    91.63   101.15 |
## 1100 |    91.59   101.10 |
## 1200 |    91.46   100.96 |
## 1300 |    91.24   100.72 |
## 1400 |    91.45   100.95 |
## 1500 |    91.54   101.04 |
## 1600 |    91.46   100.96 |
## 1700 |    91.33   100.82 |
## 1800 |    91.31   100.79 |
## 1900 |    91.24   100.72 |
## 2000 |       91   100.45 |
## 2100 |    90.97   100.42 |
## 2200 |     90.8   100.23 |
## 2300 |     90.6   100.01 |
## 2400 |    90.66   100.08 |
## 2500 |     90.7   100.12 |
## 2600 |    90.64   100.06 |
## 2700 |    90.53    99.93 |
## 2800 |    90.47    99.87 |
## 2900 |    90.52    99.92 |
## 3000 |    90.45    99.85 |
## 3100 |    90.45    99.85 |
## 3200 |     90.4    99.79 |
## 3300 |     90.4    99.78 |
## 3400 |    90.48    99.87 |
## 3500 |    90.51    99.91 |
## 3600 |    90.47    99.87 |
## 3700 |    90.45    99.84 |
## 3800 |    90.37    99.76 |
## 3900 |    90.41    99.79 |
## 4000 |    90.41    99.80 |
## 4100 |    90.38    99.77 |
## 4200 |    90.39    99.77 |
## 4300 |    90.37    99.76 |
## 4400 |    90.34    99.72 |
## 4500 |    90.29    99.67 |
## 4600 |    90.22    99.59 |
## 4700 |    90.19    99.55 |
## 4800 |    90.13    99.49 |
## 4900 |    90.13    99.49 |
## 5000 |    90.13    99.49 |
# reprtree:::plot.getTree(rainymodel, k=2, depth=4)

repres1 <- reprtree:::ReprTree(rainymodel, train, metric='d2')
## [1] "Constructing distance matrix..."
## [1] "Finding representative trees..."
reprtree:::plot.reprtree(repres1, depth=4,  main= "A representative rainy tree")

drymodel <- randomForest(Humed2 ~ dem + a + atb + slope + aspect + ndwi1 + evi1 + nbrt1 + chirps1,
                      data=train, na.action=na.omit, 
                      importance=TRUE, ntree=5000, mtry = 2, do.trace=100)
##      |      Out-of-bag   |
## Tree |      MSE  %Var(y) |
##  100 |    86.26   103.16 |
##  200 |    86.49   103.45 |
##  300 |    86.62   103.59 |
##  400 |    86.73   103.72 |
##  500 |     86.4   103.33 |
##  600 |    86.15   103.04 |
##  700 |    85.69   102.48 |
##  800 |    85.57   102.34 |
##  900 |    85.28   101.99 |
## 1000 |    85.04   101.71 |
## 1100 |    85.49   102.25 |
## 1200 |    85.46   102.21 |
## 1300 |    85.58   102.36 |
## 1400 |    85.39   102.12 |
## 1500 |    85.48   102.24 |
## 1600 |    85.07   101.74 |
## 1700 |    84.98   101.63 |
## 1800 |    85.18   101.87 |
## 1900 |    85.16   101.85 |
## 2000 |    85.26   101.97 |
## 2100 |    85.23   101.94 |
## 2200 |    85.08   101.76 |
## 2300 |    85.28   101.99 |
## 2400 |    85.19   101.88 |
## 2500 |    85.21   101.91 |
## 2600 |    85.25   101.96 |
## 2700 |    85.31   102.02 |
## 2800 |    85.34   102.06 |
## 2900 |    85.26   101.97 |
## 3000 |    85.03   101.70 |
## 3100 |    85.05   101.72 |
## 3200 |    84.99   101.65 |
## 3300 |    85.01   101.66 |
## 3400 |       85   101.66 |
## 3500 |    84.99   101.64 |
## 3600 |    85.08   101.75 |
## 3700 |    85.06   101.73 |
## 3800 |    85.12   101.80 |
## 3900 |    85.05   101.72 |
## 4000 |    85.05   101.72 |
## 4100 |       85   101.66 |
## 4200 |       85   101.65 |
## 4300 |    85.02   101.69 |
## 4400 |    85.06   101.73 |
## 4500 |     85.1   101.78 |
## 4600 |    85.12   101.80 |
## 4700 |    85.13   101.82 |
## 4800 |     85.1   101.77 |
## 4900 |    85.18   101.87 |
## 5000 |    85.25   101.96 |
# reprtree:::plot.getTree(drymodel, k=2, depth=4)

repres2 <- reprtree:::ReprTree(drymodel, train, metric='d2')
## [1] "Constructing distance matrix..."
## [1] "Finding representative trees..."
reprtree:::plot.reprtree(repres2, depth=4, main= "A representative dry tree", digits=5)

#### random forest prediction in the whole study area

r_hum1 <- predict(topo, fit1, filename="rf_sm2011_04_07.tif", overwrite=T)

r_hum2 <- predict(topo, fit2, filename="rf_sm2009_11_17.tif", overwrite=T)

hist(r_hum1)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"YlGnBu"))
## Warning in brewer.pal(11, "YlGnBu"): n too large, allowed maximum for palette YlGnBu is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(r_hum1, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), main="Soil Moisture for Rainy Season")
plt + layer(sp.lines(W, col="red", lwd=0.5))

hist(r_hum2)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"YlGnBu"))
## Warning in brewer.pal(11, "YlGnBu"): n too large, allowed maximum for palette YlGnBu is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(r_hum2, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), main="Soil Moisture for Dry Season")
plt + layer(sp.lines(W, col="red", lwd=0.5))

What if we have only precipitation data?

This is the ideal scenario. However, considering that rainfall data are very sparse in the region, it is a highly unlikely scenario. Anyway …

set.seed(42)

fit3 <- randomForest(Humed1 ~ chirps2,   data=train, 
                     ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)
## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within
## valid range
print(fit3) # view results 
## 
## Call:
##  randomForest(formula = Humed1 ~ chirps2, data = train, ntrees = 5000,      mtry = 5, importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 1
## 
##           Mean of squared residuals: 107.3428
##                     % Var explained: -18.49
rf3 <- predict(fit3, test)

summary(rf3)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   22.57   34.02   36.21   37.03   39.68   66.42       1
hist(rf3)

compare5 <- cbind (actual=test$Humed1, rf3) 


mean (apply(compare5, 1, min, na.rm=T)/apply(compare5, 1, max, na.rm=T)) # calculate accuracy
## [1] 0.8389367
#### estimation for the whole area

r_hum3 <- predict(topo, fit3, filename="rf_chrps2011_04_07.tif", overwrite=T)

hist(r_hum3)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"YlGnBu"))
## Warning in brewer.pal(11, "YlGnBu"): n too large, allowed maximum for palette YlGnBu is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(r_hum3, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), 
                 main="Soil Moisture for Rainy Season only with CHIRPS")
plt + layer(sp.lines(W, col="red", lwd=0.5))

# Random Forest prediction of Humedad 2 (dry season)
#library(randomForest)
fit4 <- randomForest(Humed2 ~ chirps1,   data=train, 
                     ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)
## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within
## valid range
print(fit4) # view results 
## 
## Call:
##  randomForest(formula = Humed2 ~ chirps1, data = train, ntrees = 5000,      mtry = 5, importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 1
## 
##           Mean of squared residuals: 117.4044
##                     % Var explained: -40.41
rf4 <- predict(fit4, test)

summary(rf4)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   8.969  13.500  17.240  17.410  20.700  38.820       1
compare6 <- cbind (actual=test$Humed2, rf4) 

mean (apply(compare6, 1, min, na.rm=T)/apply(compare6, 1, max, na.rm=T)) # calculate accuracy
## [1] 0.6763555
#### random forest prediction in the whole study area


r_hum4 <- predict(topo, fit4, filename="rf_chrps2009_11_17.tif", overwrite=T)

hist(r_hum4)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"YlGnBu"))
## Warning in brewer.pal(11, "YlGnBu"): n too large, allowed maximum for palette YlGnBu is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(r_hum4, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), main="Soil Moisture for Dry Season only with CHIRPS")
plt + layer(sp.lines(W, col="red", lwd=0.5))

What if we doesn’t have precipitation data?

This is the most likely scenario.

fit5 <- randomForest(Humed1 ~ dem + a + atb + slope + aspect + ndwi2  + evi2 + nbrt2 
                       ,   data=train, 
                     ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)

print(fit5) # view results 
## 
## Call:
##  randomForest(formula = Humed1 ~ dem + a + atb + slope + aspect +      ndwi2 + evi2 + nbrt2, data = train, ntrees = 5000, mtry = 5,      importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 5
## 
##           Mean of squared residuals: 96.18758
##                     % Var explained: -6.18
importance(fit5) # importance of each predictor
##          %IncMSE IncNodePurity
## dem     4.362087     1409.0084
## a       2.622775      729.0493
## atb     1.349211      776.2987
## slope   3.618073     1586.5552
## aspect -2.118544      826.4140
## ndwi2   3.107427     1231.8175
## evi2    2.563807     1870.0942
## nbrt2   1.555589     1182.9750
par(mar=c(1,1,1,1))

varImpPlot(fit5,type=2)

rf5 <- predict(fit5, test)

summary(rf5)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   28.78   34.92   36.87   37.44   39.94   46.02       1
hist(rf5)

compare5a <- cbind (actual=test$Humed1, rf5) 

mean (apply(compare5a, 1, min, na.rm=T)/apply(compare5a, 1, max, na.rm=T)) # calculate accuracy
## [1] 0.8449089
#### random forest prediction in the whole study area

r_hum5 <- predict(topo, fit5, filename="rf_nochrps2011_04_07.tif", overwrite=T)

hist(r_hum5)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"YlGnBu"))
## Warning in brewer.pal(11, "YlGnBu"): n too large, allowed maximum for palette YlGnBu is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(r_hum5, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), main="Soil Moisture for Rainy Season with no CHIRPS")

plt + layer(sp.lines(W, col="red", lwd=0.5))

# Random Forest prediction of Humedad 2 (dry season)
#library(randomForest)
fit6 <- randomForest(Humed2 ~ dem + a + atb + slope + aspect + ndwi1  + evi1 + nbrt1,   data=train, 
                     ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)
print(fit6) # view results 
## 
## Call:
##  randomForest(formula = Humed2 ~ dem + a + atb + slope + aspect +      ndwi1 + evi1 + nbrt1, data = train, ntrees = 5000, mtry = 5,      importance = TRUE, na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 5
## 
##           Mean of squared residuals: 90.01093
##                     % Var explained: -7.65
importance(fit6) # importance of each predictor
##           %IncMSE IncNodePurity
## dem     4.1051983     1485.1611
## a       5.1329670     1043.5463
## atb     6.4948263     1340.6864
## slope  -0.8781836     1596.6738
## aspect -0.3627364      576.0891
## ndwi1   1.6719758     1142.3154
## evi1   -2.1228218     1056.1900
## nbrt1   1.0683044      911.1272
varImpPlot(fit6,type=2)

rf6 <- predict(fit6, test)

summary(rf6)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   10.25   14.38   16.19   17.21   18.87   29.55       1
hist(rf6)

compare6a <- cbind (actual=test$Humed2, rf6) 

mean (apply(compare6a, 1, min, na.rm=T)/apply(compare6a, 1, max, na.rm=T)) # calculate accuracy
## [1] 0.6687158
#### random forest prediction in the whole study area


r_hum6 <- predict(topo, fit6, filename="rf_nochrps2009_11_17.tif", overwrite=T)

hist(r_hum6)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"YlGnBu"))
## Warning in brewer.pal(11, "YlGnBu"): n too large, allowed maximum for palette YlGnBu is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(r_hum6, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), main="Soil Moisture for Dry Season with no CHIRPS")

plt + layer(sp.lines(W, col="red", lwd=0.5))

Let’s compare the different soil moisture estimates and see how correlated they are pixel by pixel. We will do it using a spearman correlation.

#### what is the correlation between total estimation vs partial estimation
####

rstack <- stack(r_hum1, r_hum2, r_hum3, r_hum4, r_hum5, r_hum6)

corr <-layerStats(rstack,'pearson', na.rm=T)

corr_matrix <- corr$'pearson correlation coefficient'

corr
## $`pearson correlation coefficient`
##                      rf_sm2011_04_07 rf_sm2009_11_17 rf_chrps2011_04_07
## rf_sm2011_04_07           1.00000000       0.1248800          0.5030803
## rf_sm2009_11_17           0.12487996       1.0000000         -0.4426376
## rf_chrps2011_04_07        0.50308028      -0.4426376          1.0000000
## rf_chrps2009_11_17        0.32634313       0.1471044          0.2334186
## rf_nochrps2011_04_07      0.85383692       0.2029546          0.2130577
## rf_nochrps2009_11_17      0.06665559       0.9694070         -0.5008639
##                      rf_chrps2009_11_17 rf_nochrps2011_04_07
## rf_sm2011_04_07              0.32634313            0.8538369
## rf_sm2009_11_17              0.14710438            0.2029546
## rf_chrps2011_04_07           0.23341864            0.2130577
## rf_chrps2009_11_17           1.00000000            0.1965351
## rf_nochrps2011_04_07         0.19653510            1.0000000
## rf_nochrps2009_11_17         0.04030183            0.1851501
##                      rf_nochrps2009_11_17
## rf_sm2011_04_07                0.06665559
## rf_sm2009_11_17                0.96940702
## rf_chrps2011_04_07            -0.50086393
## rf_chrps2009_11_17             0.04030183
## rf_nochrps2011_04_07           0.18515014
## rf_nochrps2009_11_17           1.00000000
## 
## $mean
##      rf_sm2011_04_07      rf_sm2009_11_17   rf_chrps2011_04_07 
##             35.87180             18.81710             32.84444 
##   rf_chrps2009_11_17 rf_nochrps2011_04_07 rf_nochrps2009_11_17 
##             15.05096             36.30708             19.58022

While precipitation explains most of surface soil variability in the study area for the rainy season, it only partially explains surface soil for the dry season. On a similar way, topographic variables and multispectral indices may predict with good accuracy surface soil moisture for the rainy season but with limited accuracy for the dry season.

Time for surface soil moisture prediction for 2015

Let’s predict soil moisture for a date within the temporal range the NASA Soil Moisture Active Pasive (SMAP) was operating at full capacity. The SMAP L3 Radar worked perfectly from 2015/04/13 to 2015/07/07. We select a Landsat 8 image taken in June 2015 for obtaining explanatories variables for our test. As this period correspond to a rainy season, we can apply the fit5 randomforest model to the new dataset.

#### Let's build the new dataset of predictor variables
#### corresponding to June 2015

### reading time

evi2 = raster("./test/evi0610.tif")

nbrt2 = raster("./test/nbrt0610.tif")

ndwi2 = raster("./test/ndwi0610.tif")

### assembling time

topo$ndwi2 <- ndwi2

topo$evi2 <- evi2

topo$nbrt2 <- nbrt2

#### prediction time


sm10Jun15 <- predict(topo, fit5, filename="rf_sm2015_06_10.tif", overwrite=T)


hist(sm10Jun15)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlBu"))
plt <- levelplot(sm10Jun15, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), main="Soil Moisture 10 June 2015")
plt + layer(sp.lines(W, col="red", lwd=0.5))

Are there soil moisture datasets available for the study area?

Let’s review what datasets are available from the European Space Agency (ESA) for the period under study. We should remind that they are coarse-resolution datasets.

library(raster)

library(leaflet)

ro <- raster("C:/Users/IvĂ¡n Lizarazo/Documents/UN/datos/soilmoisture/ESACCI_SM/COMBINED/2015/ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED-20150630000000-fv03.2.nc", varname = "sm")
## Loading required namespace: ncdf4
miext <- extent(-74, -72, 4, 6)
r <- crop(ro, miext)

r <- r * 100

pal <- colorNumeric(palette="YlGnBu", domain = c(0, 100),
  na.color = "transparent")

leaflet() %>% addTiles() %>%
  addRasterImage(r, colors = pal, opacity = 1.0) %>%
  addLegend(pal = pal, values = values(r),
    title = "ESA CCI Soil Moisture 2015-06-30")
r <- raster("C:/Users/IvĂ¡n Lizarazo/Documents/UN/datos/soilmoisture/ASCAR/2015/EUMETSAT-HSAF-ASCAT__5-day-mean__2015_180.nc4",
            varname = "sm")


pal <- colorNumeric(palette="YlGnBu", domain = c(0, 100),
  na.color = "transparent")

leaflet() %>% addTiles() %>%
  addRasterImage(r, colors = pal, opacity = 1.0) %>%
  addLegend(pal = pal, values = values(r),
    title = "EUMESAT HSAF Soil Moisture (%)- June 2015")
r <- sm10Jun15
ra <- aggregate(r, fact=4, fun=mean)
leaflet() %>% addTiles() %>%
  addRasterImage(ra, colors = pal, opacity = 1.0) %>%
  addLegend(pal = pal, values = values(r),
    title = "Estimated soil moisture (%) - June 2015")

While the soil moisture estimation for June 2015 seems plausible, there is a remaining task to evaluate its validity, that is, accuracy assessment. As there are no field measurements available for that, we will use the NASA 3-km SMAP soil moisture dataset as ground reference.

The Soil Moisture Active Passive Program (SMAP) is microwave remote sensing observatory that measures the amount of water in the top 5 cm of soil everywhere on Earth’s surface. It was launched in 2015 with promising calibration activities. The SMAP active radar instrument, which provided 5-km resolution soil moisture, failed after 3 months of operation. The SMAP passive radiometer is providing 60-km-resolution soil moisture.

Accuracy assessment of the 2015 estimated soil moisture can be seen at this link.