Landsat-based variables
Several indices can be computed from Landsat 5 TM images, including, the Enhanced Vegetation Index (EVI), the Normalized Difference Water Index (NDWI), as well as the Normalized Burn Ratio Thermal (NBRT).
The EVI is an ‘optimized’ vegetation index designed to enhance the vegetation signal with improved sensitivity in high biomass regions and improved vegetation monitoring through a de-coupling of the canopy background signal and a reduction in atmosphere influences. EVI Index is computed following this equation:
\(\begin{align*}\textrm{EVI}=G * \frac{\rho_{\textrm{nir}}-\rho_{\textrm{red}}}{\rho_{\textrm{nir}}+ C1 * \rho_{\textrm{red} } - C2 * \rho_{\textrm{blue} } + L}\end{align*} \\\)
where NIR/red/blue are atmospherically-corrected surface reflectances, L is the canopy background adjustment that addresses non-linear, differential NIR and red radiant transfer through a canopy, and C1, C2 are the coefficients of the aerosol resistance term, which uses the blue band to correct for aerosol influences in the red band. The coefficients adopted in this stuy were L=1, C1 = 6, C2 = 7.5, and G (gain factor) = 2.5.
The NDWI is an indicator sensitive to the change in the water content of vegetation and bare soil (Gao, 1996). NDWI is computed using the near infrared (NIR) and the short wave infrared (SWIR) reflectance’s. The NDWI Index is computed by:
\(\begin{align*}\textrm{NDWI}=\frac{\rho_{\textrm{nir}}-\rho_{\textrm{swir}}}{\rho_{\textrm{nir}}+\rho_{\textrm{swir}}}\end{align*} \\\)
THE NBRT index uses a near-infrared band, a shortwave-infrared band, and a thermal band. It provides separability between land surfaces due to temperature.The NBRT Index is computed by:
\(\begin{align*}\textrm{NBRT}=\frac{\rho_{\textrm{nir}}-\rho_{\textrm{swir}}\frac{\textrm{Thermal}}{1000}}{\rho_{\textrm{nir}}+\rho_{\textrm{swir}}\frac{\textrm{Thermal}}{1000}}\end{align*} \\\)
### NDWI and EVI were obtained from Landsat images
### The Normalized Difference Water Index (NDWI) is sensitive to changes
### in liquid water content of vegetation canopies.
### It is derived from the Near-IR band and the second IR band, 1240 nm,
### when available and the nearest available IR band otherwise.
### It ranges in value from -1.0 to 1.0.
###
ndwi1 = raster("ndwi1_30r.tif")
ndwi1
## class : RasterLayer
## dimensions : 2545, 3055, 7774975 (nrow, ncol, ncell)
## resolution : 30, 30 (x, y)
## extent : 1028016, 1119666, 934053, 1010403 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\ndwi1_30r.tif
## names : ndwi1_30r
## values : -1, 1 (min, max)
ndwi2 = raster("ndwi2_30r.tif")
ndwi2
## class : RasterLayer
## dimensions : 2545, 3055, 7774975 (nrow, ncol, ncell)
## resolution : 30, 30 (x, y)
## extent : 1028016, 1119666, 934053, 1010403 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\ndwi2_30r.tif
## names : ndwi2_30r
## values : -1, 1 (min, max)
evi1 = raster("evi1_30r.tif")
evi1
## class : RasterLayer
## dimensions : 2545, 3055, 7774975 (nrow, ncol, ncell)
## resolution : 30, 30 (x, y)
## extent : 1028016, 1119666, 934053, 1010403 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\evi1_30r.tif
## names : evi1_30r
## values : -1, 1 (min, max)
evi2 = raster("evi2_30r.tif")
evi2
## class : RasterLayer
## dimensions : 2545, 3055, 7774975 (nrow, ncol, ncell)
## resolution : 30, 30 (x, y)
## extent : 1028016, 1119666, 934053, 1010403 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\evi2_30r.tif
## names : evi2_30r
## values : -1, 1 (min, max)
nbrt1 = raster("nbrt1_30r.tif")
nbrt1
## class : RasterLayer
## dimensions : 2545, 3055, 7774975 (nrow, ncol, ncell)
## resolution : 30, 30 (x, y)
## extent : 1028016, 1119666, 934053, 1010403 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\nbrt1_30r.tif
## names : nbrt1_30r
## values : -1, 1 (min, max)
nbrt2 = raster("nbrt2_30r.tif")
nbrt2
## class : RasterLayer
## dimensions : 2545, 3055, 7774975 (nrow, ncol, ncell)
## resolution : 30, 30 (x, y)
## extent : 1028016, 1119666, 934053, 1010403 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## data source : C:\Users\IvĂ¡n Lizarazo\Documents\UN\datos\soilmoisture\piedemonte\nbrt2_30r.tif
## names : nbrt2_30r
## values : -1, 1 (min, max)
CHIRPS precipitation data
Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) is a 30+ year quasi-global rainfall dataset. Its spatial resolution is aprox. 5 km resolution. CHIRPS data were downscaled to 30 m using the Empirical Bayes Krigging (EBK) technique.
According to ESRI, EBK is a geostatistical interpolation method that automates the most difficult aspects of building a valid kriging model. While common kriging methods require users to manually adjust parameters to receive accurate results, EBK automatically calculates these parameters through a process of subsetting and simulations.
EBK also differs from other kriging methods by accounting for the error introduced by estimating the underlying semivariogram. Other kriging methods calculate the semivariogram from known data locations and use this single semivariogram to make predictions at unknown locations; this process implicitly assumes that the estimated semivariogram is the true semivariogram for the interpolation region. By not taking the uncertainty of semivariogram estimation into account, other kriging methods underestimate the standard errors of prediction.
rchirps1 <- raster("chirps1_5k.tif")
chirps1 <- raster("ebk_ch1.tif")
###ch1 <- resample(chirps1, nbrt1)
rchirps2 <- raster("chirps2_5k.tif")
chirps2 = raster("ebk_ch2.tif")
###ch2 <- resample(chirps2, nbrt1)
mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlBu"))
plt <- levelplot(rchirps1, margin=F, par.settings=mapTheme, at = do.breaks(c(0,800),100),
main="Raw CHIRPS Precipitation Dry Season (mm)")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlBu"))
plt <- levelplot(chirps1, margin=F, par.settings=mapTheme, at = do.breaks(c(0,800),100),
main="EBK CHIRPS Precipitation Dry Season (mm)")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlBu"))
plt <- levelplot(rchirps2, margin=F, par.settings=mapTheme, at = do.breaks(c(0,800),100),
main="Raw CHIRPS Precipitation Wet Season (mm)")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlBu"))
plt <- levelplot(chirps2, margin=F, par.settings=mapTheme, at = do.breaks(c(0,800),100),
main="EBK CHIRPS Precipitation Wet Season (mm)")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

topo$ndwi1 <- ndwi1
topo$ndwi2 <- ndwi2
topo$evi1 <- evi1
topo$evi2 <- evi2
topo$nbrt1 <- nbrt1
topo$nbrt2 <- nbrt2
topo$chirps1 <- chirps1
topo$chirps2 <- chirps2
Visualization time
Let’s plot several variables included in the raster stack topo.
# devtools::install_github('oscarperpinan/rasterVis')
#Palettes provided by this package are available through
#the viridisTheme, infernoTheme, and plasmaTheme functions.
#Besides, YlOrRdTheme, BuRdTheme, RdBuTheme, GrTheme, and BTCTheme
#are variations of rasterTheme
#using palettes of the RColorBrewer and hexbin packages.
library(rasterVis)
proj <- CRS('+proj=longlat +ellps=WGS84')
levelplot(topo$dem, par.settings = BuRdTheme, xlab.top='DEM')

levelplot(topo$chirps2, par.settings = RdBuTheme, xlab.top='CHIRPS 2 for Rainy Season')

levelplot(topo$a, par.settings = BTCTheme, xlab.top='Upslope Area')

levelplot(topo$atb, par.settings = viridisTheme, xlab.top='TWI')

W <- shapefile("WGuatiquia_Negro.shp")
hist(ndwi2)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"Spectral"))
plt <- levelplot(ndwi2, margin=F, par.settings=mapTheme, at = do.breaks(c(0,1),10),
main="NDWI for Rainy Season")
plt + layer(sp.lines(W, col="red", lwd=0.5))

hist(ndwi1)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"Spectral"))
plt <- levelplot(ndwi1, margin=F, par.settings=mapTheme, at = do.breaks(c(0,1),10),
main="NDWI for Dry Season")
plt + layer(sp.lines(W, col="red", lwd=1))

hist(evi2)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(8,"RdYlGn"))
plt <- levelplot(evi2, margin=F, par.settings=mapTheme, at = do.breaks(c(0,1),10),
main="EVI for Wet Season")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

hist(evi1)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(8,"RdYlGn"))
plt <- levelplot(evi1, margin=F, par.settings=mapTheme, at = do.breaks(c(0,1),10),
main="EVI for Dry Season")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

hist(nbrt1)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

#
#
#
#
hist(nbrt2)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

# raw NBRT seems to be useless with no transformation
power2 <- function(x) {x**2}
pnbrt2 <- calc(nbrt2, power2)
hist(pnbrt2, main="Square NBRT for Rainy Season")

#
mapTheme <- rasterTheme(region=brewer.pal(11,"YlOrRd"))
## Warning in brewer.pal(11, "YlOrRd"): n too large, allowed maximum for palette YlOrRd is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(pnbrt2, margin=F, par.settings=mapTheme,
at = do.breaks(c(0.8,1),10),
main="NBRT for Wet Season")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

pnbrt1 <- calc(nbrt1, power2)
#
hist(pnbrt1, main = "Square NBRT for Dry Season")

mapTheme <- rasterTheme(region=brewer.pal(11,"YlOrRd"))
## Warning in brewer.pal(11, "YlOrRd"): n too large, allowed maximum for palette YlOrRd is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(pnbrt1, margin=F, par.settings=mapTheme,
at = do.breaks(c(0.8,1),10),
main="NBRT for Dry Season")
plt + layer(sp.lines(W, col="gray", lwd=0.5))

Some preprocessing work
topo$nbrt1 <- pnbrt1
topo$nbrt2 <- pnbrt2
hum_rep$x <- hum_rep$ESTE
hum_rep$y <- hum_rep$NORTE
hum_pts <- hum_rep
hum_pts$ALTURA <- NULL
df.hum_pts <- data.frame(hum_pts)
coordinates(df.hum_pts) <- ~ESTE+NORTE
#df.hum_pts@coords
proj4string(df.hum_pts) <- proj4string(hum)
df.hum_rep <- spTransform(df.hum_pts, crs_dem)
### extraction of explanatory variables data
### at points where soil moisture was measured in the field
data <- data.frame(coordinates(df.hum_rep),
hum_rep,
extract(topo, df.hum_rep))
coordinates(data) <- ~ESTE+NORTE
proj4string(data) <- proj4string(dem)
####################
#Sample Indexes
indexes = sample(1:nrow(data), size=0.4*nrow(data))
# Split data
test = data[indexes,]
dim(test) # 80 35
## [1] 80 39
train = data[-indexes,]
dim(train) # 120 35
## [1] 120 39
drops <- c("ID","No","UCS","ESTE.1","No1", "NORTE.1","ALTURA", "Dapar", "Dreal", "K", "Arena", "Arcilla", "SAT","CC", "pmp", "Agua_aprov", "F19", "F20")
ntrain <- train[,!(names(train) %in% drops)] #remove columns "AREA" and "PERIMETER"
ntrain[1,8:15]
## class : SpatialPointsDataFrame
## features : 1
## extent : 1091983, 1091983, 947800.6, 947800.6 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=4.596200416666666 +lon_0=-74.07750791666666 +k=1 +x_0=1000000 +y_0=1000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## variables : 8
## names : optional, dem, a, atb, slope, aspect, ndwi1, ndwi2
## min values : 1, 227, 7.64396283310526, 10.4950411508114, 1.86338998124982, 2.0344439357957, 0.19260199368, 0.116410337388515
## max values : 1, 227, 7.64396283310526, 10.4950411508114, 1.86338998124982, 2.0344439357957, 0.19260199368, 0.116410337388515
Random Forest prediction of Surface Soil Moisture
set.seed(42)
library(randomForest)
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
xx <- data.frame(ntrain$Humed1, ntrain$dem, ntrain$a, ntrain$atb, ntrain$slope, ntrain$aspect, ntrain$ndwi2, ntrain$evi2, ntrain$nbrt2)
xx <- xx[complete.cases(xx),]
y <- xx[1]
x <- xx[-1]
# another option
# df[,c("A","B","E")]
#
## next code can be used to tune the random forest model
##tt <- tuneRF(x=x, y=y, type="regression", mtryStart=3, ntreeTry=500, stepFactor=2, improve=0.05,
## trace=TRUE, plot=TRUE, doBest=FALSE, na.action= na.omit)
# Algorithm Tune (tuneRF)
#set.seed(1234)
#bestmtry <- tuneRF(x, y, stepFactor=1.5, improve=1e-5, ntree=500)
#print(bestmtry)
fit1 <- randomForest(Humed1 ~ dem + a + atb + slope + aspect + ndwi2 + evi2 + nbrt2 + chirps2, data=train,
ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)
print(fit1) # view results
##
## Call:
## randomForest(formula = Humed1 ~ dem + a + atb + slope + aspect + ndwi2 + evi2 + nbrt2 + chirps2, data = train, ntrees = 5000, mtry = 5, importance = TRUE, na.action = na.omit)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 5
##
## Mean of squared residuals: 92.85344
## % Var explained: -2.5
importance(fit1) # importance of each predictor
## %IncMSE IncNodePurity
## dem 7.5618325 1400.9901
## a 1.9128718 560.1969
## atb -1.1764081 614.3434
## slope 4.0829722 1297.1526
## aspect 0.1134062 676.6828
## ndwi2 -0.1147599 879.3562
## evi2 1.0313908 1434.3180
## nbrt2 -1.2837103 713.5284
## chirps2 7.3824844 2156.9906
par(mar=c(1,1,1,1))
varImpPlot(fit1,type=2)

rf1 <- predict(fit1, test)
summary(rf1)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 29.85 34.62 37.29 37.53 40.32 50.72 1
#
#
compare1 <- cbind (actual=test$Humed1, rf1)
#
mean (apply(compare1, 1, min, na.rm=T)/apply(compare1, 1, max, na.rm=T)) # calculate accuracy
## [1] 0.8460238
#
# Random Forest prediction of Humedad 2 (dry season)
#library(randomForest)
fit2 <- randomForest(Humed2 ~ dem + a + atb + slope + aspect + ndwi1 + evi1 + nbrt1 + chirps1, data=train,
ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)
print(fit2) # view results
##
## Call:
## randomForest(formula = Humed2 ~ dem + a + atb + slope + aspect + ndwi1 + evi1 + nbrt1 + chirps1, data = train, ntrees = 5000, mtry = 5, importance = TRUE, na.action = na.omit)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 5
##
## Mean of squared residuals: 90.64886
## % Var explained: -8.41
importance(fit2) # importance of each predictor
## %IncMSE IncNodePurity
## dem 2.9400222 1105.7472
## a 3.3484779 936.0559
## atb 2.9293093 1140.8841
## slope -1.2557620 1379.9738
## aspect -1.3126297 518.4703
## ndwi1 0.5277643 958.2932
## evi1 -0.9904028 899.4558
## nbrt1 0.4519406 738.4453
## chirps1 1.6435506 1274.2156
varImpPlot(fit2,type=2)

rf2 <- predict(fit2, test)
summary(rf2)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 10.13 14.60 16.90 17.31 18.81 29.79 1
#Min. 1st Qu. Median Mean 3rd Qu. Max.
#
compare2 <- cbind (actual=test$Humed2, rf2)
#>
mean (apply(compare2, 1, min, na.rm=T)/apply(compare2, 1, max, na.rm=T)) # calculate accuracy
## [1] 0.6691419
#
############## RF implemented in party library
library(party)
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
cf <- cforest(Humed1 ~ dem + a + atb + slope + aspect + ndwi2 + evi2 + nbrt2 + chirps2
, data = train)
pt <- party:::prettytree(cf@ensemble[[1]], names(cf@data@get("input")))
pt
## 1) atb <= 9.373291; criterion = 0.718, statistic = 1.157
## 2) slope <= 1.767767; criterion = 0.968, statistic = 4.585
## 3)* weights = 0
## 2) slope > 1.767767
## 4) chirps2 <= 544.2397; criterion = 0.892, statistic = 2.589
## 5)* weights = 0
## 4) chirps2 > 544.2397
## 6) slope <= 8.249579; criterion = 0.994, statistic = 7.688
## 7)* weights = 0
## 6) slope > 8.249579
## 8)* weights = 0
## 1) atb > 9.373291
## 9) a <= 9.540003; criterion = 0.818, statistic = 1.785
## 10)* weights = 0
## 9) a > 9.540003
## 11)* weights = 0
nt <- new("BinaryTree")
nt@tree <- pt
nt@data <- cf@data
nt@responses <- cf@responses
nt
##
## Conditional inference tree with 0 terminal nodes
##
## Response: Humed1
## Inputs: dem, a, atb, slope, aspect, ndwi2, evi2, nbrt2, chirps2
## Number of observations: 120
##
## 1) atb <= 9.373291; criterion = 0.718, statistic = 1.157
## 2) slope <= 1.767767; criterion = 0.968, statistic = 4.585
## 3)* weights = 0
## 2) slope > 1.767767
## 4) chirps2 <= 544.2397; criterion = 0.892, statistic = 2.589
## 5)* weights = 0
## 4) chirps2 > 544.2397
## 6) slope <= 8.249579; criterion = 0.994, statistic = 7.688
## 7)* weights = 0
## 6) slope > 8.249579
## 8)* weights = 0
## 1) atb > 9.373291
## 9) a <= 9.540003; criterion = 0.818, statistic = 1.785
## 10)* weights = 0
## 9) a > 9.540003
## 11)* weights = 0
plot(nt, type="simple", main = "A Tree from the Rainy Random Forest")

################another option to visualize
## http://stats.stackexchange.com/questions/41443/how-to-actually-plot-a-sample-tree-from-randomforestgettree
#options(repos='http://cran.rstudio.org')
#have.packages <- installed.packages()
#cran.packages <- c('devtools','plotrix','randomForest','tree')
#to.install <- setdiff(cran.packages, have.packages[,1])
#if(length(to.install)>0) install.packages(to.install)
library(devtools)
#if(!('reprtree' %in% installed.packages())){
# install_github('araastat/reprtree')
#}
#for(p in c(cran.packages, 'reprtree')) eval(substitute(library(pkg), list(pkg=p)))
#Then go ahead and make your model and tree:
library(reprtree)
## Loading required package: tree
## Loading required package: plotrix
rainymodel <- randomForest(Humed1 ~ dem + a + atb + slope + aspect + ndwi2 + evi2 + nbrt2 + chirps2,
data=train, na.action=na.omit,
importance=TRUE, ntree=5000, mtry = 2, do.trace=100)
## | Out-of-bag |
## Tree | MSE %Var(y) |
## 100 | 91.14 100.60 |
## 200 | 92.35 101.94 |
## 300 | 92.59 102.21 |
## 400 | 93.27 102.96 |
## 500 | 92.44 102.04 |
## 600 | 92.55 102.16 |
## 700 | 91.77 101.30 |
## 800 | 92.12 101.69 |
## 900 | 91.79 101.33 |
## 1000 | 91.63 101.15 |
## 1100 | 91.59 101.10 |
## 1200 | 91.46 100.96 |
## 1300 | 91.24 100.72 |
## 1400 | 91.45 100.95 |
## 1500 | 91.54 101.04 |
## 1600 | 91.46 100.96 |
## 1700 | 91.33 100.82 |
## 1800 | 91.31 100.79 |
## 1900 | 91.24 100.72 |
## 2000 | 91 100.45 |
## 2100 | 90.97 100.42 |
## 2200 | 90.8 100.23 |
## 2300 | 90.6 100.01 |
## 2400 | 90.66 100.08 |
## 2500 | 90.7 100.12 |
## 2600 | 90.64 100.06 |
## 2700 | 90.53 99.93 |
## 2800 | 90.47 99.87 |
## 2900 | 90.52 99.92 |
## 3000 | 90.45 99.85 |
## 3100 | 90.45 99.85 |
## 3200 | 90.4 99.79 |
## 3300 | 90.4 99.78 |
## 3400 | 90.48 99.87 |
## 3500 | 90.51 99.91 |
## 3600 | 90.47 99.87 |
## 3700 | 90.45 99.84 |
## 3800 | 90.37 99.76 |
## 3900 | 90.41 99.79 |
## 4000 | 90.41 99.80 |
## 4100 | 90.38 99.77 |
## 4200 | 90.39 99.77 |
## 4300 | 90.37 99.76 |
## 4400 | 90.34 99.72 |
## 4500 | 90.29 99.67 |
## 4600 | 90.22 99.59 |
## 4700 | 90.19 99.55 |
## 4800 | 90.13 99.49 |
## 4900 | 90.13 99.49 |
## 5000 | 90.13 99.49 |
# reprtree:::plot.getTree(rainymodel, k=2, depth=4)
repres1 <- reprtree:::ReprTree(rainymodel, train, metric='d2')
## [1] "Constructing distance matrix..."
## [1] "Finding representative trees..."
reprtree:::plot.reprtree(repres1, depth=4, main= "A representative rainy tree")

drymodel <- randomForest(Humed2 ~ dem + a + atb + slope + aspect + ndwi1 + evi1 + nbrt1 + chirps1,
data=train, na.action=na.omit,
importance=TRUE, ntree=5000, mtry = 2, do.trace=100)
## | Out-of-bag |
## Tree | MSE %Var(y) |
## 100 | 86.26 103.16 |
## 200 | 86.49 103.45 |
## 300 | 86.62 103.59 |
## 400 | 86.73 103.72 |
## 500 | 86.4 103.33 |
## 600 | 86.15 103.04 |
## 700 | 85.69 102.48 |
## 800 | 85.57 102.34 |
## 900 | 85.28 101.99 |
## 1000 | 85.04 101.71 |
## 1100 | 85.49 102.25 |
## 1200 | 85.46 102.21 |
## 1300 | 85.58 102.36 |
## 1400 | 85.39 102.12 |
## 1500 | 85.48 102.24 |
## 1600 | 85.07 101.74 |
## 1700 | 84.98 101.63 |
## 1800 | 85.18 101.87 |
## 1900 | 85.16 101.85 |
## 2000 | 85.26 101.97 |
## 2100 | 85.23 101.94 |
## 2200 | 85.08 101.76 |
## 2300 | 85.28 101.99 |
## 2400 | 85.19 101.88 |
## 2500 | 85.21 101.91 |
## 2600 | 85.25 101.96 |
## 2700 | 85.31 102.02 |
## 2800 | 85.34 102.06 |
## 2900 | 85.26 101.97 |
## 3000 | 85.03 101.70 |
## 3100 | 85.05 101.72 |
## 3200 | 84.99 101.65 |
## 3300 | 85.01 101.66 |
## 3400 | 85 101.66 |
## 3500 | 84.99 101.64 |
## 3600 | 85.08 101.75 |
## 3700 | 85.06 101.73 |
## 3800 | 85.12 101.80 |
## 3900 | 85.05 101.72 |
## 4000 | 85.05 101.72 |
## 4100 | 85 101.66 |
## 4200 | 85 101.65 |
## 4300 | 85.02 101.69 |
## 4400 | 85.06 101.73 |
## 4500 | 85.1 101.78 |
## 4600 | 85.12 101.80 |
## 4700 | 85.13 101.82 |
## 4800 | 85.1 101.77 |
## 4900 | 85.18 101.87 |
## 5000 | 85.25 101.96 |
# reprtree:::plot.getTree(drymodel, k=2, depth=4)
repres2 <- reprtree:::ReprTree(drymodel, train, metric='d2')
## [1] "Constructing distance matrix..."
## [1] "Finding representative trees..."
reprtree:::plot.reprtree(repres2, depth=4, main= "A representative dry tree", digits=5)

#### random forest prediction in the whole study area
r_hum1 <- predict(topo, fit1, filename="rf_sm2011_04_07.tif", overwrite=T)
r_hum2 <- predict(topo, fit2, filename="rf_sm2009_11_17.tif", overwrite=T)
hist(r_hum1)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"YlGnBu"))
## Warning in brewer.pal(11, "YlGnBu"): n too large, allowed maximum for palette YlGnBu is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(r_hum1, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), main="Soil Moisture for Rainy Season")
plt + layer(sp.lines(W, col="red", lwd=0.5))

hist(r_hum2)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"YlGnBu"))
## Warning in brewer.pal(11, "YlGnBu"): n too large, allowed maximum for palette YlGnBu is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(r_hum2, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), main="Soil Moisture for Dry Season")
plt + layer(sp.lines(W, col="red", lwd=0.5))

What if we have only precipitation data?
This is the ideal scenario. However, considering that rainfall data are very sparse in the region, it is a highly unlikely scenario. Anyway …
set.seed(42)
fit3 <- randomForest(Humed1 ~ chirps2, data=train,
ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)
## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within
## valid range
print(fit3) # view results
##
## Call:
## randomForest(formula = Humed1 ~ chirps2, data = train, ntrees = 5000, mtry = 5, importance = TRUE, na.action = na.omit)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 1
##
## Mean of squared residuals: 107.3428
## % Var explained: -18.49
rf3 <- predict(fit3, test)
summary(rf3)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 22.57 34.02 36.21 37.03 39.68 66.42 1
hist(rf3)

compare5 <- cbind (actual=test$Humed1, rf3)
mean (apply(compare5, 1, min, na.rm=T)/apply(compare5, 1, max, na.rm=T)) # calculate accuracy
## [1] 0.8389367
#### estimation for the whole area
r_hum3 <- predict(topo, fit3, filename="rf_chrps2011_04_07.tif", overwrite=T)
hist(r_hum3)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"YlGnBu"))
## Warning in brewer.pal(11, "YlGnBu"): n too large, allowed maximum for palette YlGnBu is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(r_hum3, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10),
main="Soil Moisture for Rainy Season only with CHIRPS")
plt + layer(sp.lines(W, col="red", lwd=0.5))

# Random Forest prediction of Humedad 2 (dry season)
#library(randomForest)
fit4 <- randomForest(Humed2 ~ chirps1, data=train,
ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)
## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within
## valid range
print(fit4) # view results
##
## Call:
## randomForest(formula = Humed2 ~ chirps1, data = train, ntrees = 5000, mtry = 5, importance = TRUE, na.action = na.omit)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 1
##
## Mean of squared residuals: 117.4044
## % Var explained: -40.41
rf4 <- predict(fit4, test)
summary(rf4)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 8.969 13.500 17.240 17.410 20.700 38.820 1
compare6 <- cbind (actual=test$Humed2, rf4)
mean (apply(compare6, 1, min, na.rm=T)/apply(compare6, 1, max, na.rm=T)) # calculate accuracy
## [1] 0.6763555
#### random forest prediction in the whole study area
r_hum4 <- predict(topo, fit4, filename="rf_chrps2009_11_17.tif", overwrite=T)
hist(r_hum4)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"YlGnBu"))
## Warning in brewer.pal(11, "YlGnBu"): n too large, allowed maximum for palette YlGnBu is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(r_hum4, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), main="Soil Moisture for Dry Season only with CHIRPS")
plt + layer(sp.lines(W, col="red", lwd=0.5))

What if we doesn’t have precipitation data?
This is the most likely scenario.
fit5 <- randomForest(Humed1 ~ dem + a + atb + slope + aspect + ndwi2 + evi2 + nbrt2
, data=train,
ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)
print(fit5) # view results
##
## Call:
## randomForest(formula = Humed1 ~ dem + a + atb + slope + aspect + ndwi2 + evi2 + nbrt2, data = train, ntrees = 5000, mtry = 5, importance = TRUE, na.action = na.omit)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 5
##
## Mean of squared residuals: 96.18758
## % Var explained: -6.18
importance(fit5) # importance of each predictor
## %IncMSE IncNodePurity
## dem 4.362087 1409.0084
## a 2.622775 729.0493
## atb 1.349211 776.2987
## slope 3.618073 1586.5552
## aspect -2.118544 826.4140
## ndwi2 3.107427 1231.8175
## evi2 2.563807 1870.0942
## nbrt2 1.555589 1182.9750
par(mar=c(1,1,1,1))
varImpPlot(fit5,type=2)

rf5 <- predict(fit5, test)
summary(rf5)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 28.78 34.92 36.87 37.44 39.94 46.02 1
hist(rf5)

compare5a <- cbind (actual=test$Humed1, rf5)
mean (apply(compare5a, 1, min, na.rm=T)/apply(compare5a, 1, max, na.rm=T)) # calculate accuracy
## [1] 0.8449089
#### random forest prediction in the whole study area
r_hum5 <- predict(topo, fit5, filename="rf_nochrps2011_04_07.tif", overwrite=T)
hist(r_hum5)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"YlGnBu"))
## Warning in brewer.pal(11, "YlGnBu"): n too large, allowed maximum for palette YlGnBu is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(r_hum5, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), main="Soil Moisture for Rainy Season with no CHIRPS")
plt + layer(sp.lines(W, col="red", lwd=0.5))

# Random Forest prediction of Humedad 2 (dry season)
#library(randomForest)
fit6 <- randomForest(Humed2 ~ dem + a + atb + slope + aspect + ndwi1 + evi1 + nbrt1, data=train,
ntrees=5000, mtry=5, na.action=na.omit, importance=TRUE)
print(fit6) # view results
##
## Call:
## randomForest(formula = Humed2 ~ dem + a + atb + slope + aspect + ndwi1 + evi1 + nbrt1, data = train, ntrees = 5000, mtry = 5, importance = TRUE, na.action = na.omit)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 5
##
## Mean of squared residuals: 90.01093
## % Var explained: -7.65
importance(fit6) # importance of each predictor
## %IncMSE IncNodePurity
## dem 4.1051983 1485.1611
## a 5.1329670 1043.5463
## atb 6.4948263 1340.6864
## slope -0.8781836 1596.6738
## aspect -0.3627364 576.0891
## ndwi1 1.6719758 1142.3154
## evi1 -2.1228218 1056.1900
## nbrt1 1.0683044 911.1272
varImpPlot(fit6,type=2)

rf6 <- predict(fit6, test)
summary(rf6)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 10.25 14.38 16.19 17.21 18.87 29.55 1
hist(rf6)

compare6a <- cbind (actual=test$Humed2, rf6)
mean (apply(compare6a, 1, min, na.rm=T)/apply(compare6a, 1, max, na.rm=T)) # calculate accuracy
## [1] 0.6687158
#### random forest prediction in the whole study area
r_hum6 <- predict(topo, fit6, filename="rf_nochrps2009_11_17.tif", overwrite=T)
hist(r_hum6)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"YlGnBu"))
## Warning in brewer.pal(11, "YlGnBu"): n too large, allowed maximum for palette YlGnBu is 9
## Returning the palette you asked for with that many colors
plt <- levelplot(r_hum6, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), main="Soil Moisture for Dry Season with no CHIRPS")
plt + layer(sp.lines(W, col="red", lwd=0.5))

Let’s compare the different soil moisture estimates and see how correlated they are pixel by pixel. We will do it using a spearman correlation.
#### what is the correlation between total estimation vs partial estimation
####
rstack <- stack(r_hum1, r_hum2, r_hum3, r_hum4, r_hum5, r_hum6)
corr <-layerStats(rstack,'pearson', na.rm=T)
corr_matrix <- corr$'pearson correlation coefficient'
corr
## $`pearson correlation coefficient`
## rf_sm2011_04_07 rf_sm2009_11_17 rf_chrps2011_04_07
## rf_sm2011_04_07 1.00000000 0.1248800 0.5030803
## rf_sm2009_11_17 0.12487996 1.0000000 -0.4426376
## rf_chrps2011_04_07 0.50308028 -0.4426376 1.0000000
## rf_chrps2009_11_17 0.32634313 0.1471044 0.2334186
## rf_nochrps2011_04_07 0.85383692 0.2029546 0.2130577
## rf_nochrps2009_11_17 0.06665559 0.9694070 -0.5008639
## rf_chrps2009_11_17 rf_nochrps2011_04_07
## rf_sm2011_04_07 0.32634313 0.8538369
## rf_sm2009_11_17 0.14710438 0.2029546
## rf_chrps2011_04_07 0.23341864 0.2130577
## rf_chrps2009_11_17 1.00000000 0.1965351
## rf_nochrps2011_04_07 0.19653510 1.0000000
## rf_nochrps2009_11_17 0.04030183 0.1851501
## rf_nochrps2009_11_17
## rf_sm2011_04_07 0.06665559
## rf_sm2009_11_17 0.96940702
## rf_chrps2011_04_07 -0.50086393
## rf_chrps2009_11_17 0.04030183
## rf_nochrps2011_04_07 0.18515014
## rf_nochrps2009_11_17 1.00000000
##
## $mean
## rf_sm2011_04_07 rf_sm2009_11_17 rf_chrps2011_04_07
## 35.87180 18.81710 32.84444
## rf_chrps2009_11_17 rf_nochrps2011_04_07 rf_nochrps2009_11_17
## 15.05096 36.30708 19.58022
While precipitation explains most of surface soil variability in the study area for the rainy season, it only partially explains surface soil for the dry season. On a similar way, topographic variables and multispectral indices may predict with good accuracy surface soil moisture for the rainy season but with limited accuracy for the dry season.
Time for surface soil moisture prediction for 2015
Let’s predict soil moisture for a date within the temporal range the NASA Soil Moisture Active Pasive (SMAP) was operating at full capacity. The SMAP L3 Radar worked perfectly from 2015/04/13 to 2015/07/07. We select a Landsat 8 image taken in June 2015 for obtaining explanatories variables for our test. As this period correspond to a rainy season, we can apply the fit5 randomforest model to the new dataset.
#### Let's build the new dataset of predictor variables
#### corresponding to June 2015
### reading time
evi2 = raster("./test/evi0610.tif")
nbrt2 = raster("./test/nbrt0610.tif")
ndwi2 = raster("./test/ndwi0610.tif")
### assembling time
topo$ndwi2 <- ndwi2
topo$evi2 <- evi2
topo$nbrt2 <- nbrt2
#### prediction time
sm10Jun15 <- predict(topo, fit5, filename="rf_sm2015_06_10.tif", overwrite=T)
hist(sm10Jun15)
## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 1% of the raster cells were used. 100000 values used.

mapTheme <- rasterTheme(region=brewer.pal(11,"RdYlBu"))
plt <- levelplot(sm10Jun15, margin=F, par.settings=mapTheme, at = do.breaks(c(0,80),10), main="Soil Moisture 10 June 2015")
plt + layer(sp.lines(W, col="red", lwd=0.5))

Are there soil moisture datasets available for the study area?
Let’s review what datasets are available from the European Space Agency (ESA) for the period under study. We should remind that they are coarse-resolution datasets.
library(raster)
library(leaflet)
ro <- raster("C:/Users/IvĂ¡n Lizarazo/Documents/UN/datos/soilmoisture/ESACCI_SM/COMBINED/2015/ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED-20150630000000-fv03.2.nc", varname = "sm")
## Loading required namespace: ncdf4
miext <- extent(-74, -72, 4, 6)
r <- crop(ro, miext)
r <- r * 100
pal <- colorNumeric(palette="YlGnBu", domain = c(0, 100),
na.color = "transparent")
leaflet() %>% addTiles() %>%
addRasterImage(r, colors = pal, opacity = 1.0) %>%
addLegend(pal = pal, values = values(r),
title = "ESA CCI Soil Moisture 2015-06-30")
r <- raster("C:/Users/IvĂ¡n Lizarazo/Documents/UN/datos/soilmoisture/ASCAR/2015/EUMETSAT-HSAF-ASCAT__5-day-mean__2015_180.nc4",
varname = "sm")
pal <- colorNumeric(palette="YlGnBu", domain = c(0, 100),
na.color = "transparent")
leaflet() %>% addTiles() %>%
addRasterImage(r, colors = pal, opacity = 1.0) %>%
addLegend(pal = pal, values = values(r),
title = "EUMESAT HSAF Soil Moisture (%)- June 2015")
r <- sm10Jun15
ra <- aggregate(r, fact=4, fun=mean)
leaflet() %>% addTiles() %>%
addRasterImage(ra, colors = pal, opacity = 1.0) %>%
addLegend(pal = pal, values = values(r),
title = "Estimated soil moisture (%) - June 2015")
While the soil moisture estimation for June 2015 seems plausible, there is a remaining task to evaluate its validity, that is, accuracy assessment. As there are no field measurements available for that, we will use the NASA 3-km SMAP soil moisture dataset as ground reference.
The Soil Moisture Active Passive Program (SMAP) is microwave remote sensing observatory that measures the amount of water in the top 5 cm of soil everywhere on Earth’s surface. It was launched in 2015 with promising calibration activities. The SMAP active radar instrument, which provided 5-km resolution soil moisture, failed after 3 months of operation. The SMAP passive radiometer is providing 60-km-resolution soil moisture.
Accuracy assessment of the 2015 estimated soil moisture can be seen at this link.