I wanted to play around with the mixed effects random forest, and do some comparisons between a basic RF, and I thought it would be useful to share what I found, particularly for Jessie. For the sake of simplicity, I’m just going to do this analysis using the field data we collected (omitting the data compiled from a whole bunch of others).
I’ll begin with some basic project setup.
# load libraries
library(ranger)
library(SAEforest)
## Warning: package 'SAEforest' was built under R version 4.2.2
library(RColorBrewer)
# set random seed
set.seed(1)
Next, I’ll read in the data. These data are already subset to just
our plots. And, for Jessie’s awareness in particular, I’ve used the
lidRmetrics library to generate lidar structural metrics
rather than the suite of metrics I’ve manually generated in the past.
Performance between mine and theirs appears pretty similar, but mine
takes a lot longer to run, so for quick re-runs, I’ve switched over to
lidRmetrics. We can chat more about this if you’d like.
Once the data are read in, I’ll simplify it to just the fields needed for this modeling test.
# read in input data
df <- read.csv("S:/ursa/campbell/NIP/Data/modeling/plot_level_structure_vs_lidar_metrics_20221231_nodups_by_date.csv")
# refine table for modeling
site.col <- which(colnames(df) == "site.id")
bio.col <- which(colnames(df) == "bio")
pred.cols <- seq(which(colnames(df) == "zmax"), ncol(df))
df <- df[,c(site.col, bio.col, pred.cols)]
To gain a somehwat unbiased sense of model performance, I’m going to split the data into training and test datasets. As we know, the question of model transferability can really only be answered in an unbiased fashion if the training and test data belong to differentn geographic strata. Unfortunately, (at least as far as I can wrap my head around it…) the mixed effects approach will not really allow this, because the strata are the “random” effects. So, if you build a model with strata 1, 2, and 3, you can’t test it on stratum 4, because stratum 4 is not present in the training data. As such, this approach shouldn’t really be used for the model transferability portion of Jessie’s thesis. Instead, it can be used in the generation of a final, comprehensive predictive model that incorporates the strata as a basis of prediction. So, that’s really the focus here, just to be 100% clear.
Again in the interest of simplicity, I’m just going to use our site IDs as the random effects (rather than Jessie’s more ecologically-useful measures of ecoregion, climate class, etc.). There are 159 useable plots of 180 total that overlap currently-available airborne lidar. There are 16 sites, so 2 plots from each site (32 total) is approximately a 20% test dataset, leaving ~80% for training.
# take two sites randomly from each site and use as test
df$train.test <- "train"
site.ids <- unique(df$site.id)
for (site.id in site.ids){
site.rows <- which(df$site.id == site.id)
test.row <- sample(site.rows, 2)
df$train.test[test.row] <- "test"
}
# split into training and test data
df.train <- df[df$train.test == "train",]
df.test <- df[df$train.test == "test",]
df.train$train.test <- NULL
df.test$train.test <- NULL
I’m going to use two modeling approaches:
site.id as a factor
variablesite.id as a random
variableIn all of the following code, .me will be used to
distinguish the basic RF from the mixed effects RF.
# define modeling columns
df.train$site.id <- as.factor(df.train$site.id)
df.test$site.id <- as.factor(df.test$site.id)
x.cols <- c(1,seq(3,ncol(df.train)))
x.cols.me <- seq(3,ncol(df.train))
y.col <- 2
# build basic ranger model
mod <- ranger(x = df.train[,x.cols],
y = df.train[,y.col])
# build mixed effects ranger model
mod.me <- MERFranger(Y = df.train[,y.col],
X = df.train[,x.cols],
random = "(1|site.id)",
data = df.train)
Given RF’s propensity for producing overly-moderate predictions, with overestimation on the low and and underestimation on the high end, I’m going to perform a quantile-based bias correction procedure (thanks, Simon!). It assumes that differences in predictions made on the training data have comparable bias to predictions made on the test data with respect to over- or underestimation in comparison to the observed data.
# get quantile-based bias correction -- basic ranger model
pred.mod <- predict(mod, df.train)$predictions
obs <- df.train$bio
qs <- seq(0,1,0.001)
q.obs.mod <- quantile(obs, probs = qs)
q.pred.mod <- quantile(pred.mod, probs = qs)
q.diff.mod <- q.obs.mod - q.pred.mod
# get quantile-based bias correction -- mixed effects ranger model
pred.mod.me <- predict(mod.me, df.train)
obs <- df.train$bio
qs <- seq(0,1,0.001)
q.obs.mod.me <- quantile(obs, probs = qs)
q.pred.mod.me <- quantile(pred.mod.me, probs = qs)
q.diff.mod.me <- q.obs.mod.me - q.pred.mod.me
# make predictions on test data
pred.mod <- predict(mod, df.test)$predictions
pred.mod.me <- predict(mod.me, df.test)
obs <- df.test$bio
# perform bias correction
pred.mod.bc <- pred.mod + approx(q.pred.mod, q.diff.mod, pred.mod, rule = 2)$y
pred.mod.me.bc <- pred.mod.me + approx(q.pred.mod.me, q.diff.mod.me, pred.mod.me, rule = 2)$y
I’ll now plot out the four models:
# define plotting function
plot.fun <- function(x,y,main,xlab="Observed AGB",ylab="Predicted AGB"){
plot(y ~ x, xlim = c(ax.min, ax.max), ylim = c(ax.min, ax.max), pch = 16,
main = main, xlab = xlab, ylab = ylab)
grid()
lines(x = c(-1000,1000), y = c(-1000,1000), lty = 2)
mod <- lm(y ~ x)
abline(mod, lwd = 2, col = "red")
a <- coef(mod)[2]
b <- coef(mod)[1]
if (a < 0) eq <- paste0("y=", round(a,2), "x-", abs(round(b,2)))
if (a >= 0) eq <- paste0("y=", round(a,2), "x+", abs(round(b,2)))
r2 <- paste0("R2=", round(summary(mod)$adj.r.squared,2))
rmse <- paste0("RMSE=", round(sqrt(mean((y-x)^2)),2))
legend("topleft", legend = c(eq,r2,rmse), text.col = "red", bty = "n", x.intersp = 0)
}
# plot out the results
par(mfrow = c(2,2),
las = 1,
mar = c(5,5,2,1))
ax.min <- min(c(pred.mod, pred.mod.me, obs))
ax.max <- max(c(pred.mod, pred.mod.me, obs))
plot.fun(obs, pred.mod, "RF")
plot.fun(obs, pred.mod.me, "MERF")
plot.fun(obs, pred.mod.bc, "RF Bias Corr")
plot.fun(obs, pred.mod.me.bc, "MERF Bias Corr")
Quick takes:
There is value in understanding what the random effects are, as it provides insight into overall stratum-level deviations from the overarching trend in lidar structure vs. biomass.
# get data frame of random effects
df.re <- as.data.frame(mod.me$RandomEffects$site.id)
colnames(df.re) <- "Intercept"
# get bar colors
cols <- colorRampPalette(brewer.pal(11, "Spectral"))(nrow(df.re))
# plot the random effects
par(mar = c(8,5,1,1),
las = 1)
b <- barplot(df.re$Intercept, space = 0, col = cols, ylim = c(min(df.re - 1), max(df.re + 1)), ylab = "Intercept")
b
## [,1]
## [1,] 0.5
## [2,] 1.5
## [3,] 2.5
## [4,] 3.5
## [5,] 4.5
## [6,] 5.5
## [7,] 6.5
## [8,] 7.5
## [9,] 8.5
## [10,] 9.5
## [11,] 10.5
## [12,] 11.5
## [13,] 12.5
## [14,] 13.5
## [15,] 14.5
## [16,] 15.5
box()
mtext(text = row.names(df.re), side = 1, line = 1, at = b, las = 2)
abline(h = 0)
So, I know you’re not supposed to mess with the random seed. Once you set it, you shouldn’t mess with it, particularly if you’re using random samping of your test data, because then you can hack the seed to make it appear as if your results are better than they really are (e.g., you happen to select a seed that results in an easier-to-predict test dataset…). Nevertheless, I did that. And I noticed that the results changed pretty dramatically. In some cases RFs were better in some cases MERFs were better. In some cases R2 values were in the 0.5 range, in others the 0.8 range. So, clearly the test results are not very robust.
One possible way to to account for this would be a k-fold cross-validation approach, rather than a straight up training/test split. But, given that the sampling is happening within sites, and given that each site only contains a relatively small number of plots, and give that that small number can vary (e.g., some sites have 12 plots, others 8, etc.), it would be difficult to define a useful value for k. So, I went with a bootstrapping approach, which basically does everything above, but repeats it 1000 times, each time selecting a new set of training/test data. By compiling the model performance metrics (R2 and RMSE) on each iteration, I think we’ll get a better, more robust comparison between models.
# bootstrap the whole thing
for (i in seq(1,1000)){
df$train.test <- "train"
site.ids <- unique(df$site.id)
for (site.id in site.ids){
site.rows <- which(df$site.id == site.id)
test.row <- sample(site.rows, 2)
df$train.test[test.row] <- "test"
}
df.train <- df[df$train.test == "train",]
df.test <- df[df$train.test == "test",]
df.train$train.test <- NULL
df.test$train.test <- NULL
x.cols <- seq(3,ncol(df.train))
y.col <- 2
mod <- ranger(x = df.train[,x.cols],
y = df.train[,y.col])
mod.me <- MERFranger(Y = df.train[,y.col],
X = df.train[,x.cols],
random = "(1|site.id)",
data = df.train)
pred.mod <- predict(mod, df.train)$predictions
obs <- df.train$bio
qs <- seq(0,1,0.001)
q.obs.mod <- quantile(obs, probs = qs)
q.pred.mod <- quantile(pred.mod, probs = qs)
q.diff.mod <- q.obs.mod - q.pred.mod
pred.mod.me <- predict(mod.me, df.train)
obs <- df.train$bio
qs <- seq(0,1,0.001)
q.obs.mod.me <- quantile(obs, probs = qs)
q.pred.mod.me <- quantile(pred.mod.me, probs = qs)
q.diff.mod.me <- q.obs.mod.me - q.pred.mod.me
pred.mod <- predict(mod, df.test)$predictions
pred.mod.me <- predict(mod.me, df.test)
obs <- df.test$bio
pred.mod.bc <- pred.mod + approx(q.pred.mod, q.diff.mod, pred.mod, rule = 2)$y
pred.mod.me.bc <- pred.mod.me + approx(q.pred.mod.me, q.diff.mod.me, pred.mod.me, rule = 2)$y
r2.mod <- summary(lm(pred.mod ~ obs))$adj.r.squared
r2.mod.me <- summary(lm(pred.mod.me ~ obs))$adj.r.squared
r2.mod.bc <- summary(lm(pred.mod.bc ~ obs))$adj.r.squared
r2.mod.me.bc <- summary(lm(pred.mod.me.bc ~ obs))$adj.r.squared
rmse.mod <- sqrt(mean((pred.mod - obs)^2))
rmse.mod.me <- sqrt(mean((pred.mod.me - obs)^2))
rmse.mod.bc <- sqrt(mean((pred.mod.bc - obs)^2))
rmse.mod.me.bc <- sqrt(mean((pred.mod.me.bc - obs)^2))
if (i == 1){
results.df <- data.frame(r2.mod = r2.mod,
r2.mod.me = r2.mod.me,
r2.mod.bc = r2.mod.bc,
r2.mod.me.bc = r2.mod.me.bc,
rmse.mod = rmse.mod,
rmse.mod.me = rmse.mod.me,
rmse.mod.bc = rmse.mod.bc,
rmse.mod.me.bc = rmse.mod.me.bc)
df.re <- t(as.data.frame(mod.me$RandomEffects$site.id))
} else {
results.df <- rbind(results.df, data.frame(r2.mod = r2.mod,
r2.mod.me = r2.mod.me,
r2.mod.bc = r2.mod.bc,
r2.mod.me.bc = r2.mod.me.bc,
rmse.mod = rmse.mod,
rmse.mod.me = rmse.mod.me,
rmse.mod.bc = rmse.mod.bc,
rmse.mod.me.bc = rmse.mod.me.bc))
df.re <- rbind(df.re, t(as.data.frame(mod.me$RandomEffects$site.id)))
}
}
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0024653 (tol = 0.002, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00217975 (tol = 0.002, component 1)
Since we’ve now got a distribution of R2 and RMSE, I’ll use kernel density plots as the basis of visualization, and I’ll highlight the median value among all of the bootstrap iterations to summarize comparative performance.
# get kernel densities
d.r2.mod <- density(results.df$r2.mod)
d.r2.mod.me <- density(results.df$r2.mod.me)
d.r2.mod.bc <- density(results.df$r2.mod.bc)
d.r2.mod.me.bc <- density(results.df$r2.mod.me.bc)
d.rmse.mod <- density(results.df$rmse.mod)
d.rmse.mod.me <- density(results.df$rmse.mod.me)
d.rmse.mod.bc <- density(results.df$rmse.mod.bc)
d.rmse.mod.me.bc <- density(results.df$rmse.mod.me.bc)
# get axis limits
xlim.r2 <- c(min(c(d.r2.mod$x, d.r2.mod.me$x, d.r2.mod.bc$x, d.r2.mod.me.bc$x)),
max(c(d.r2.mod$x, d.r2.mod.me$x, d.r2.mod.bc$x, d.r2.mod.me.bc$x)))
xlim.rmse <- c(min(c(d.rmse.mod$x, d.rmse.mod.me$x, d.rmse.mod.bc$x, d.rmse.mod.me.bc$x)),
max(c(d.rmse.mod$x, d.rmse.mod.me$x, d.rmse.mod.bc$x, d.rmse.mod.me.bc$x)))
ylim.r2 <- c(min(c(d.r2.mod$y, d.r2.mod.me$y, d.r2.mod.bc$y, d.r2.mod.me.bc$y)),
max(c(d.r2.mod$y, d.r2.mod.me$y, d.r2.mod.bc$y, d.r2.mod.me.bc$y)))
ylim.rmse <- c(min(c(d.rmse.mod$y, d.rmse.mod.me$y, d.rmse.mod.bc$y, d.rmse.mod.me.bc$y)),
max(c(d.rmse.mod$y, d.rmse.mod.me$y, d.rmse.mod.bc$y, d.rmse.mod.me.bc$y)))
# set up the plot
par(mfrow = c(2,2),
las = 1,
mar = c(5,1,2,1))
# plot 1 -- RF vs. MERF, R2
plot(xlim.r2, ylim.r2, type = "n", yaxt = "n", ylab = NA, xlab = "R2", main = "R2")
polygon(d.r2.mod$x, d.r2.mod$y, col = rgb(1,0,0,0.25))
polygon(d.r2.mod.me$x, d.r2.mod.me$y, col = rgb(0,0,1,0.25))
abline(v = median(results.df$r2.mod), lwd = 2, col = "red")
abline(v = median(results.df$r2.mod.me), lwd = 2, col = "blue")
r2.med.mod <- paste0("Med Basic = ", round(median(results.df$r2.mod),2))
r2.med.mod.me <- paste0("Med ME = ", round(median(results.df$r2.mod.me),2))
legend("topleft", legend = c(r2.med.mod, r2.med.mod.me), text.col = c("red", "blue"), bty = "n", x.intersp = 0)
# plot 2 -- RF vs. MERF, RMSE
plot(xlim.rmse, ylim.rmse, type = "n", yaxt = "n", ylab = NA, xlab = "RMSE", main = "RMSE")
polygon(d.rmse.mod$x, d.rmse.mod$y, col = rgb(1,0,0,0.25))
polygon(d.rmse.mod.me$x, d.rmse.mod.me$y, col = rgb(0,0,1,0.25))
abline(v = median(results.df$rmse.mod), lwd = 2, col = "red")
abline(v = median(results.df$rmse.mod.me), lwd = 2, col = "blue")
rmse.med.mod <- paste0("Med Basic = ", round(median(results.df$rmse.mod),2))
rmse.med.mod.me <- paste0("Med ME = ", round(median(results.df$rmse.mod.me),2))
legend("topleft", legend = c(rmse.med.mod, rmse.med.mod.me), text.col = c("red", "blue"), bty = "n", x.intersp = 0)
# plot 3 -- RF vs. MERF Bias Corr, R2
plot(xlim.r2, ylim.r2, type = "n", yaxt = "n", ylab = NA, xlab = "R2 Bias", main = "R2 Bias Corr")
polygon(d.r2.mod.bc$x, d.r2.mod.bc$y, col = rgb(1,0,0,0.25))
polygon(d.r2.mod.me.bc$x, d.r2.mod.me.bc$y, col = rgb(0,0,1,0.25))
abline(v = median(results.df$r2.mod.bc), lwd = 2, col = "red")
abline(v = median(results.df$r2.mod.me.bc), lwd = 2, col = "blue")
r2.med.mod.bc <- paste0("Med Basic = ", round(median(results.df$r2.mod.bc),2))
r2.med.mod.me.bc <- paste0("Med ME = ", round(median(results.df$r2.mod.me.bc),2))
legend("topleft", legend = c(r2.med.mod.bc, r2.med.mod.me.bc), text.col = c("red", "blue"), bty = "n", x.intersp = 0)
# plot 4 -- RF vs. MERF Bias Corr, RMSE
plot(xlim.rmse, ylim.rmse, type = "n", yaxt = "n", ylab = NA, xlab = "RMSE", main = "RMSE Bias Corr")
polygon(d.rmse.mod.bc$x, d.rmse.mod.bc$y, col = rgb(1,0,0,0.25))
polygon(d.rmse.mod.me.bc$x, d.rmse.mod.me.bc$y, col = rgb(0,0,1,0.25))
abline(v = median(results.df$rmse.mod.bc), lwd = 2, col = "red")
abline(v = median(results.df$rmse.mod.me.bc), lwd = 2, col = "blue")
rmse.med.mod.bc <- paste0("Med Basic = ", round(median(results.df$rmse.mod.bc),2))
rmse.med.mod.me.bc <- paste0("Med ME = ", round(median(results.df$rmse.mod.me.bc),2))
legend("topleft", legend = c(rmse.med.mod.bc, rmse.med.mod.me.bc), text.col = c("red", "blue"), bty = "n", x.intersp = 0)
The results differ somewhat from those we saw in the singular test area:
Once again, each bootstrapped model iteration will have its own set
of random effects. To view the distribution of random effects per
stratum (site.id), I’ll use violin plots (i.e., fancy
density plots). And since I’m a stubborn base R plotter, I’m going to do
it the manual way.
# get axis limits
ymin <- 100
ymax <- -100
site.ids <- colnames(df.re)
for (site.id in site.ids){
d <- density(df.re[,site.id])
y <- d$x
if (min(y) < ymin) ymin <- min(y)
if (max(y) > ymax) ymax <- max(y)
}
# set up plot
par(mar = c(8,5,1,1),
las = 1)
# create empty plot
plot(x = c(0.5, ncol(df.re) + 0.5),
y = c(ymin, ymax),
type = "n",
ylab = "Intercept",
xaxt = "n",
xlab = NA)
abline(h = 0)
# loop through sites and plot violins
site.ids <- colnames(df.re)
for (i in seq(1,length(site.ids))){
abline(v = i, lty = 3, col = "lightgray")
site.id <- site.ids[i]
d <- density(df.re[,site.id])
y <- d$x
x <- d$y + i
y <- c(y, rev(y))
x <- c(x, rev(d$y * -1 + i))
col <- cols[i]
polygon(x, y, col = col)
points(i, median(df.re[,site.id]), pch = 16)
mtext(site.id, 1, 1, at = i, las = 2)
}
Interesting that some have considerably wider distributions than others. It suggests to me that perhaps these are sites that have a wide range of AGB among the plots. But, generally, the trend is quite consistent with the single test site.