Mixed Effects RF Testing

Introduction

I wanted to play around with the mixed effects random forest, and do some comparisons between a basic RF, and I thought it would be useful to share what I found, particularly for Jessie. For the sake of simplicity, I’m just going to do this analysis using the field data we collected (omitting the data compiled from a whole bunch of others).

I’ll begin with some basic project setup.

# load libraries
library(ranger)
library(SAEforest)

## Warning: package 'SAEforest' was built under R version 4.2.2

library(RColorBrewer)

# set random seed
set.seed(1)

Read in and clean up the data

Next, I’ll read in the data. These data are already subset to just our plots. And, for Jessie’s awareness in particular, I’ve used the lidRmetrics library to generate lidar structural metrics rather than the suite of metrics I’ve manually generated in the past. Performance between mine and theirs appears pretty similar, but mine takes a lot longer to run, so for quick re-runs, I’ve switched over to lidRmetrics. We can chat more about this if you’d like.

Once the data are read in, I’ll simplify it to just the fields needed for this modeling test.

# read in input data
df <- read.csv("S:/ursa/campbell/NIP/Data/modeling/plot_level_structure_vs_lidar_metrics_20221231_nodups_by_date.csv")

# refine table for modeling
site.col <- which(colnames(df) == "site.id")
bio.col <- which(colnames(df) == "bio")
pred.cols <- seq(which(colnames(df) == "zmax"), ncol(df))
df <- df[,c(site.col, bio.col, pred.cols)]

Split into training/test

To gain a somehwat unbiased sense of model performance, I’m going to split the data into training and test datasets. As we know, the question of model transferability can really only be answered in an unbiased fashion if the training and test data belong to differentn geographic strata. Unfortunately, (at least as far as I can wrap my head around it…) the mixed effects approach will not really allow this, because the strata are the “random” effects. So, if you build a model with strata 1, 2, and 3, you can’t test it on stratum 4, because stratum 4 is not present in the training data. As such, this approach shouldn’t really be used for the model transferability portion of Jessie’s thesis. Instead, it can be used in the generation of a final, comprehensive predictive model that incorporates the strata as a basis of prediction. So, that’s really the focus here, just to be 100% clear.

Again in the interest of simplicity, I’m just going to use our site IDs as the random effects (rather than Jessie’s more ecologically-useful measures of ecoregion, climate class, etc.). There are 159 useable plots of 180 total that overlap currently-available airborne lidar. There are 16 sites, so 2 plots from each site (32 total) is approximately a 20% test dataset, leaving ~80% for training.

# take two sites randomly from each site and use as test
df$train.test <- "train"
site.ids <- unique(df$site.id)
for (site.id in site.ids){
  site.rows <- which(df$site.id == site.id)
  test.row <- sample(site.rows, 2)
  df$train.test[test.row] <- "test"
}

# split into training and test data
df.train <- df[df$train.test == "train",]
df.test <- df[df$train.test == "test",]
df.train$train.test <- NULL
df.test$train.test <- NULL

Modeling

I’m going to use two modeling approaches:

Basic RF that incorporates site.id as a factor variable
Mixed effects RF that uses site.id as a random variable

In all of the following code, .me will be used to distinguish the basic RF from the mixed effects RF.

# define modeling columns
df.train$site.id <- as.factor(df.train$site.id)
df.test$site.id <- as.factor(df.test$site.id)
x.cols <- c(1,seq(3,ncol(df.train)))
x.cols.me <- seq(3,ncol(df.train))
y.col <- 2

# build basic ranger model
mod <- ranger(x = df.train[,x.cols],
              y = df.train[,y.col])

# build mixed effects ranger model
mod.me <- MERFranger(Y = df.train[,y.col],
                     X = df.train[,x.cols],
                     random = "(1|site.id)",
                     data = df.train)

Bias correction

Given RF’s propensity for producing overly-moderate predictions, with overestimation on the low and and underestimation on the high end, I’m going to perform a quantile-based bias correction procedure (thanks, Simon!). It assumes that differences in predictions made on the training data have comparable bias to predictions made on the test data with respect to over- or underestimation in comparison to the observed data.

# get quantile-based bias correction -- basic ranger model
pred.mod <- predict(mod, df.train)$predictions
obs <- df.train$bio
qs <- seq(0,1,0.001)
q.obs.mod <- quantile(obs, probs = qs)
q.pred.mod <- quantile(pred.mod, probs = qs)
q.diff.mod <- q.obs.mod - q.pred.mod

# get quantile-based bias correction -- mixed effects ranger model
pred.mod.me <- predict(mod.me, df.train)
obs <- df.train$bio
qs <- seq(0,1,0.001)
q.obs.mod.me <- quantile(obs, probs = qs)
q.pred.mod.me <- quantile(pred.mod.me, probs = qs)
q.diff.mod.me <- q.obs.mod.me - q.pred.mod.me

# make predictions on test data
pred.mod <- predict(mod, df.test)$predictions
pred.mod.me <- predict(mod.me, df.test)
obs <- df.test$bio

# perform bias correction
pred.mod.bc <- pred.mod + approx(q.pred.mod, q.diff.mod, pred.mod, rule = 2)$y
pred.mod.me.bc <- pred.mod.me + approx(q.pred.mod.me, q.diff.mod.me, pred.mod.me, rule = 2)$y

Plot out the results

I’ll now plot out the four models:

RF: Basic random forest model
MERF: Mixed effects random forest model
RF Bias Corr: Basic random forest model with bias correction
MERF Bias Corr: Mixed effects random forest model with bias correction

# define plotting function
plot.fun <- function(x,y,main,xlab="Observed AGB",ylab="Predicted AGB"){
  plot(y ~ x, xlim = c(ax.min, ax.max), ylim = c(ax.min, ax.max), pch = 16,
       main = main, xlab = xlab, ylab = ylab)
  grid()
  lines(x = c(-1000,1000), y = c(-1000,1000), lty = 2)
  mod <- lm(y ~ x)
  abline(mod, lwd = 2, col = "red")
  a <- coef(mod)[2]
  b <- coef(mod)[1]
  if (a < 0) eq <- paste0("y=", round(a,2), "x-", abs(round(b,2)))
  if (a >= 0) eq <- paste0("y=", round(a,2), "x+", abs(round(b,2)))
  r2 <- paste0("R2=", round(summary(mod)$adj.r.squared,2))
  rmse <- paste0("RMSE=", round(sqrt(mean((y-x)^2)),2))
  legend("topleft", legend = c(eq,r2,rmse), text.col = "red", bty = "n", x.intersp = 0)
}

# plot out the results
par(mfrow = c(2,2),
    las = 1,
    mar = c(5,5,2,1))
ax.min <- min(c(pred.mod, pred.mod.me, obs))
ax.max <- max(c(pred.mod, pred.mod.me, obs))
plot.fun(obs, pred.mod, "RF")
plot.fun(obs, pred.mod.me, "MERF")
plot.fun(obs, pred.mod.bc, "RF Bias Corr")
plot.fun(obs, pred.mod.me.bc, "MERF Bias Corr")

Quick takes:

MERFs outperformed basic RFs, but only slightly
Bias correction had a similarly positive, but slight effect
As a result, the MERF Bias Corr was the best model, though not by a huge margin

Plot the random effects

There is value in understanding what the random effects are, as it provides insight into overall stratum-level deviations from the overarching trend in lidar structure vs. biomass.

# get data frame of random effects
df.re <- as.data.frame(mod.me$RandomEffects$site.id)
colnames(df.re) <- "Intercept"

# get bar colors
cols <- colorRampPalette(brewer.pal(11, "Spectral"))(nrow(df.re))

# plot the random effects
par(mar = c(8,5,1,1),
    las = 1)
b <- barplot(df.re$Intercept, space = 0, col = cols, ylim = c(min(df.re - 1), max(df.re + 1)), ylab = "Intercept")
b

##       [,1]
##  [1,]  0.5
##  [2,]  1.5
##  [3,]  2.5
##  [4,]  3.5
##  [5,]  4.5
##  [6,]  5.5
##  [7,]  6.5
##  [8,]  7.5
##  [9,]  8.5
## [10,]  9.5
## [11,] 10.5
## [12,] 11.5
## [13,] 12.5
## [14,] 13.5
## [15,] 14.5
## [16,] 15.5

box()
mtext(text = row.names(df.re), side = 1, line = 1, at = b, las = 2)
abline(h = 0)

Bootstrapping

So, I know you’re not supposed to mess with the random seed. Once you set it, you shouldn’t mess with it, particularly if you’re using random samping of your test data, because then you can hack the seed to make it appear as if your results are better than they really are (e.g., you happen to select a seed that results in an easier-to-predict test dataset…). Nevertheless, I did that. And I noticed that the results changed pretty dramatically. In some cases RFs were better in some cases MERFs were better. In some cases R2 values were in the 0.5 range, in others the 0.8 range. So, clearly the test results are not very robust.

One possible way to to account for this would be a k-fold cross-validation approach, rather than a straight up training/test split. But, given that the sampling is happening within sites, and given that each site only contains a relatively small number of plots, and give that that small number can vary (e.g., some sites have 12 plots, others 8, etc.), it would be difficult to define a useful value for k. So, I went with a bootstrapping approach, which basically does everything above, but repeats it 1000 times, each time selecting a new set of training/test data. By compiling the model performance metrics (R² and RMSE) on each iteration, I think we’ll get a better, more robust comparison between models.

# bootstrap the whole thing
for (i in seq(1,1000)){
  df$train.test <- "train"
  site.ids <- unique(df$site.id)
  for (site.id in site.ids){
    site.rows <- which(df$site.id == site.id)
    test.row <- sample(site.rows, 2)
    df$train.test[test.row] <- "test"
  }
  df.train <- df[df$train.test == "train",]
  df.test <- df[df$train.test == "test",]
  df.train$train.test <- NULL
  df.test$train.test <- NULL
  x.cols <- seq(3,ncol(df.train))
  y.col <- 2
  mod <- ranger(x = df.train[,x.cols],
                y = df.train[,y.col])
  mod.me <- MERFranger(Y = df.train[,y.col],
                       X = df.train[,x.cols],
                       random = "(1|site.id)",
                       data = df.train)
  pred.mod <- predict(mod, df.train)$predictions
  obs <- df.train$bio
  qs <- seq(0,1,0.001)
  q.obs.mod <- quantile(obs, probs = qs)
  q.pred.mod <- quantile(pred.mod, probs = qs)
  q.diff.mod <- q.obs.mod - q.pred.mod
  pred.mod.me <- predict(mod.me, df.train)
  obs <- df.train$bio
  qs <- seq(0,1,0.001)
  q.obs.mod.me <- quantile(obs, probs = qs)
  q.pred.mod.me <- quantile(pred.mod.me, probs = qs)
  q.diff.mod.me <- q.obs.mod.me - q.pred.mod.me
  pred.mod <- predict(mod, df.test)$predictions
  pred.mod.me <- predict(mod.me, df.test)
  obs <- df.test$bio
  pred.mod.bc <- pred.mod + approx(q.pred.mod, q.diff.mod, pred.mod, rule = 2)$y
  pred.mod.me.bc <- pred.mod.me + approx(q.pred.mod.me, q.diff.mod.me, pred.mod.me, rule = 2)$y
  r2.mod <- summary(lm(pred.mod ~ obs))$adj.r.squared
  r2.mod.me <- summary(lm(pred.mod.me ~ obs))$adj.r.squared
  r2.mod.bc <- summary(lm(pred.mod.bc ~ obs))$adj.r.squared
  r2.mod.me.bc <- summary(lm(pred.mod.me.bc ~ obs))$adj.r.squared
  rmse.mod <- sqrt(mean((pred.mod - obs)^2))
  rmse.mod.me <- sqrt(mean((pred.mod.me - obs)^2))
  rmse.mod.bc <- sqrt(mean((pred.mod.bc - obs)^2))
  rmse.mod.me.bc <- sqrt(mean((pred.mod.me.bc - obs)^2))
  if (i == 1){
    results.df <- data.frame(r2.mod = r2.mod,
                             r2.mod.me = r2.mod.me,
                             r2.mod.bc = r2.mod.bc,
                             r2.mod.me.bc = r2.mod.me.bc,
                             rmse.mod = rmse.mod,
                             rmse.mod.me = rmse.mod.me,
                             rmse.mod.bc = rmse.mod.bc,
                             rmse.mod.me.bc = rmse.mod.me.bc)
    df.re <- t(as.data.frame(mod.me$RandomEffects$site.id))
  } else {
    results.df <- rbind(results.df, data.frame(r2.mod = r2.mod,
                                               r2.mod.me = r2.mod.me,
                                               r2.mod.bc = r2.mod.bc,
                                               r2.mod.me.bc = r2.mod.me.bc,
                                               rmse.mod = rmse.mod,
                                               rmse.mod.me = rmse.mod.me,
                                               rmse.mod.bc = rmse.mod.bc,
                                               rmse.mod.me.bc = rmse.mod.me.bc))
    df.re <- rbind(df.re, t(as.data.frame(mod.me$RandomEffects$site.id)))
  }
}

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0024653 (tol = 0.002, component 1)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00217975 (tol = 0.002, component 1)

Plotting bootstrapped model performance

Since we’ve now got a distribution of R² and RMSE, I’ll use kernel density plots as the basis of visualization, and I’ll highlight the median value among all of the bootstrap iterations to summarize comparative performance.

# get kernel densities
d.r2.mod <- density(results.df$r2.mod)
d.r2.mod.me <- density(results.df$r2.mod.me)
d.r2.mod.bc <- density(results.df$r2.mod.bc)
d.r2.mod.me.bc <- density(results.df$r2.mod.me.bc)
d.rmse.mod <- density(results.df$rmse.mod)
d.rmse.mod.me <- density(results.df$rmse.mod.me)
d.rmse.mod.bc <- density(results.df$rmse.mod.bc)
d.rmse.mod.me.bc <- density(results.df$rmse.mod.me.bc)

# get axis limits
xlim.r2 <- c(min(c(d.r2.mod$x, d.r2.mod.me$x, d.r2.mod.bc$x, d.r2.mod.me.bc$x)), 
             max(c(d.r2.mod$x, d.r2.mod.me$x, d.r2.mod.bc$x, d.r2.mod.me.bc$x)))
xlim.rmse <- c(min(c(d.rmse.mod$x, d.rmse.mod.me$x, d.rmse.mod.bc$x, d.rmse.mod.me.bc$x)), 
               max(c(d.rmse.mod$x, d.rmse.mod.me$x, d.rmse.mod.bc$x, d.rmse.mod.me.bc$x)))
ylim.r2 <- c(min(c(d.r2.mod$y, d.r2.mod.me$y, d.r2.mod.bc$y, d.r2.mod.me.bc$y)), 
             max(c(d.r2.mod$y, d.r2.mod.me$y, d.r2.mod.bc$y, d.r2.mod.me.bc$y)))
ylim.rmse <- c(min(c(d.rmse.mod$y, d.rmse.mod.me$y, d.rmse.mod.bc$y, d.rmse.mod.me.bc$y)), 
               max(c(d.rmse.mod$y, d.rmse.mod.me$y, d.rmse.mod.bc$y, d.rmse.mod.me.bc$y)))

# set up the plot
par(mfrow = c(2,2),
    las = 1,
    mar = c(5,1,2,1))

# plot 1 -- RF vs. MERF, R2
plot(xlim.r2, ylim.r2, type = "n", yaxt = "n", ylab = NA, xlab = "R2", main = "R2")
polygon(d.r2.mod$x, d.r2.mod$y, col = rgb(1,0,0,0.25))
polygon(d.r2.mod.me$x, d.r2.mod.me$y, col = rgb(0,0,1,0.25))
abline(v = median(results.df$r2.mod), lwd = 2, col = "red")
abline(v = median(results.df$r2.mod.me), lwd = 2, col = "blue")
r2.med.mod <- paste0("Med Basic = ", round(median(results.df$r2.mod),2))
r2.med.mod.me <- paste0("Med ME = ", round(median(results.df$r2.mod.me),2))
legend("topleft", legend = c(r2.med.mod, r2.med.mod.me), text.col = c("red", "blue"), bty = "n", x.intersp = 0)

# plot 2 -- RF vs. MERF, RMSE
plot(xlim.rmse, ylim.rmse, type = "n", yaxt = "n", ylab = NA, xlab = "RMSE", main = "RMSE")
polygon(d.rmse.mod$x, d.rmse.mod$y, col = rgb(1,0,0,0.25))
polygon(d.rmse.mod.me$x, d.rmse.mod.me$y, col = rgb(0,0,1,0.25))
abline(v = median(results.df$rmse.mod), lwd = 2, col = "red")
abline(v = median(results.df$rmse.mod.me), lwd = 2, col = "blue")
rmse.med.mod <- paste0("Med Basic = ", round(median(results.df$rmse.mod),2))
rmse.med.mod.me <- paste0("Med ME = ", round(median(results.df$rmse.mod.me),2))
legend("topleft", legend = c(rmse.med.mod, rmse.med.mod.me), text.col = c("red", "blue"), bty = "n", x.intersp = 0)

# plot 3 -- RF vs. MERF Bias Corr, R2
plot(xlim.r2, ylim.r2, type = "n", yaxt = "n", ylab = NA, xlab = "R2 Bias", main = "R2 Bias Corr")
polygon(d.r2.mod.bc$x, d.r2.mod.bc$y, col = rgb(1,0,0,0.25))
polygon(d.r2.mod.me.bc$x, d.r2.mod.me.bc$y, col = rgb(0,0,1,0.25))
abline(v = median(results.df$r2.mod.bc), lwd = 2, col = "red")
abline(v = median(results.df$r2.mod.me.bc), lwd = 2, col = "blue")
r2.med.mod.bc <- paste0("Med Basic = ", round(median(results.df$r2.mod.bc),2))
r2.med.mod.me.bc <- paste0("Med ME = ", round(median(results.df$r2.mod.me.bc),2))
legend("topleft", legend = c(r2.med.mod.bc, r2.med.mod.me.bc), text.col = c("red", "blue"), bty = "n", x.intersp = 0)

# plot 4 -- RF vs. MERF Bias Corr, RMSE
plot(xlim.rmse, ylim.rmse, type = "n", yaxt = "n", ylab = NA, xlab = "RMSE", main = "RMSE Bias Corr")
polygon(d.rmse.mod.bc$x, d.rmse.mod.bc$y, col = rgb(1,0,0,0.25))
polygon(d.rmse.mod.me.bc$x, d.rmse.mod.me.bc$y, col = rgb(0,0,1,0.25))
abline(v = median(results.df$rmse.mod.bc), lwd = 2, col = "red")
abline(v = median(results.df$rmse.mod.me.bc), lwd = 2, col = "blue")
rmse.med.mod.bc <- paste0("Med Basic = ", round(median(results.df$rmse.mod.bc),2))
rmse.med.mod.me.bc <- paste0("Med ME = ", round(median(results.df$rmse.mod.me.bc),2))
legend("topleft", legend = c(rmse.med.mod.bc, rmse.med.mod.me.bc), text.col = c("red", "blue"), bty = "n", x.intersp = 0)

The results differ somewhat from those we saw in the singular test area:

Mixed effects RF generally outperforms basic RF, albeit only slightly
Bias correction does not appear to actually improve results
There is a fairly wide range of resulting model performance metrics, highlighting the potential bias stemming from the use of a singular test dataset

Plotting bootstrapped random effects

Once again, each bootstrapped model iteration will have its own set of random effects. To view the distribution of random effects per stratum (site.id), I’ll use violin plots (i.e., fancy density plots). And since I’m a stubborn base R plotter, I’m going to do it the manual way.

# get axis limits
ymin <- 100
ymax <- -100
site.ids <- colnames(df.re)
for (site.id in site.ids){
  d <- density(df.re[,site.id])
  y <- d$x
  if (min(y) < ymin) ymin <- min(y)
  if (max(y) > ymax) ymax <- max(y)
}

# set up plot
par(mar = c(8,5,1,1),
    las = 1)

# create empty plot
plot(x = c(0.5, ncol(df.re) + 0.5),
     y = c(ymin, ymax),
     type = "n",
     ylab = "Intercept",
     xaxt = "n",
     xlab = NA)
abline(h = 0)

# loop through sites and plot violins
site.ids <- colnames(df.re)
for (i in seq(1,length(site.ids))){
  abline(v = i, lty = 3, col = "lightgray")
  site.id <- site.ids[i]
  d <- density(df.re[,site.id])
  y <- d$x
  x <- d$y + i
  y <- c(y, rev(y))
  x <- c(x, rev(d$y * -1 + i))
  col <- cols[i]
  polygon(x, y, col = col)
  points(i, median(df.re[,site.id]), pch = 16)
  mtext(site.id, 1, 1, at = i, las = 2)
}

Interesting that some have considerably wider distributions than others. It suggests to me that perhaps these are sites that have a wide range of AGB among the plots. But, generally, the trend is quite consistent with the single test site.