Monroe Mountain Fuel Consumption Modeling Part 3

Optimizing the profile density change approach

Author

Mickey Campbell

Introduction

The primary goal of this document is to optimize the profile density change (PDC) fuel consumption analysis approach. There are relatively few parameters to perform a sensitivity test, given the simplicity of the kernel density estimation algorithm. One that is worth evaluating is the bandwidth of the kernel. The bandwidth represents the standard deviation of the kernel, which by default is defined by a Gaussian function. In the context of lidar point heights, this means that individual vertical layers of the fuel profile are being evaluated for the proportion of points that fall within a vertical slice defined by the mean (center point) and standard deviation (spread) of the Guassian kernel. The means will capture different fuel components, with lower values being more representative of fuels closer to the ground, and higher values being more representative of canopy fuels. The bandwidths define how wide of a vertical slice you are willing to consider when building a set of predictor variables. Lower bandwidths will more precisely capture narrow fuel profile slices, whereas higher bandwidths will contain a more vertically integrated measure of point density/fuel loads. Variable selection will take care of the question of which mean values are best for modeling fuel consumption of different fuelbeds. However, there remains uncertainty surrounding what the best bandwith is to conduct this analysis. Indeed, different fuelbeds may benefit from different bandwidths, but ideally we would identify a singular bandwidth that is broadly applicable for accurate fuel consumption modeling.

Another sensitivity test that requires further investigation is whether the difference in pre- vs. post-fire density alone is sufficient for modeling consumption, or if models perform substantially better with the inclusion of pre-fire density as well. Theoretically, change in density should be the most directly correlated to consumption. However, informing the model what the pre-fire density looked like may add value by enabling it to distinguish between different fuel types. For example, if one looked at density difference alone, and a particular plot or pixel featured no change in canopy fuels, that could be due to the fact that no canopy fuels were consumed by fire, but it could also be that there simply were no canopy fuels to burn (i.e., it was a shrub-dominated setting). Only by providing the model with pre-fire structural information could it distinguish between these two very different scenarios.

Accordingly, the objectives of this document are to:

  1. Test a range of Gaussian kernel bandwidths for their respective ability to accurately predict fuel consumption of several different fuelbeds; and
  2. Compare the performance between models built only using density differences and those built using a combination of density differences and pre-fire densities.

Field Data

Nothing new here – just repeating the same field plot wrangling and filtering that was done in the previous documents.

Show Code
library(terra)
library(rmarkdown)
library(sf)
library(lidR)
library(lidRmetrics)
library(VSURF)
library(tuneRanger)
library(ranger)
library(mlr)
library(dplyr)
library(colorspace)
library(RColorBrewer)
library(stringr)
library(dplyr)

# set random seed
set.seed(5757)

# read in the data
fuel.plots <- vect("S:/ursa/campbell/fasmee/data/from_ryan/plots_wFuels.shp")

# remove empty plots
fuel.plots <- fuel.plots[!is.na(fuel.plots$FID),]

# separate pre-fire and post-fire plots
fuel.plots.pre <- fuel.plots[grep("pre", fuel.plots$FID),]
fuel.plots.pst <- fuel.plots[grep("post", fuel.plots$FID),]

# remove unnecessary columns
del.cols <- c("FID", "X", "Y")
keep.cols <- names(fuel.plots)[!names(fuel.plots) %in% del.cols]
fuel.plots.pre <- fuel.plots.pre[,keep.cols]
fuel.plots.pst <- fuel.plots.pst[,keep.cols]

# move plot ID column from last to first
n <- ncol(fuel.plots.pre)
fuel.plots.pre <- fuel.plots.pre[,c(n,1:(n-1))]
fuel.plots.pst <- fuel.plots.pst[,c(n,1:(n-1))]

# add "pre" and "pst" to column names for all attributes
dif.names <- names(fuel.plots.pre)[2:ncol(fuel.plots.pre)]
names(fuel.plots.pre)[2:ncol(fuel.plots.pre)] <- 
  paste0(names(fuel.plots.pre)[2:ncol(fuel.plots.pre)], "_pre")
names(fuel.plots.pst)[2:ncol(fuel.plots.pst)] <- 
  paste0(names(fuel.plots.pst)[2:ncol(fuel.plots.pst)], "_pst")

# merge them
fuel.plots <- merge(fuel.plots.pre, as.data.frame(fuel.plots.pst))

# pad single-digit plot ids with 0's
fuel.plots$Plot[as.numeric(fuel.plots$Plot) < 100] <- paste0("00", fuel.plots$Plot[as.numeric(fuel.plots$Plot) < 100])

# sort of plot id
fuel.plots <- fuel.plots[order(fuel.plots$Plot),]

# get differences between pre- and post-fire fuel loads (i.e., calc consumption)
for (dif.name in dif.names){
  pre.fld <- paste0(dif.name, "_pre")
  pst.fld <- paste0(dif.name, "_pst")
  dif.fld <- paste0(dif.name, "_dif")
  fuel.plots[,dif.fld] <- as.data.frame(fuel.plots[,pre.fld]) - as.data.frame(fuel.plots[,pst.fld])
}

# remove the plot that had an extreme increase in ground fuels
fuel.plots <- fuel.plots[fuel.plots$hr1000_dif > -300,]

Definition of Field Names

For clarity, here are the field names from the fuel plot data again:

  • yr: year of measurment
  • hr1: 1 hour biomass
  • hr10: 10 hours biomass
  • hr100: 100 hour biomass
  • hr1000: 1000 hour biomass
  • nonwoody: herb biomass
  • shrubs: shrub biomass
  • trees: seedling biomass
  • llm: litter, lichen, and moss biomass
  • ground: duff biomass
  • shrb_sd: combined biomass of shrubs and seedlings
  • woody: combined biomass of 1hr, 10hr, 100hr, and 1000hr
  • uf: combined biomass of nonwoody, llm, ground, shrubs, trees, and woody
  • cf: tree biomass, including snags as well as live boles, branches, and foliage
  • cbd: canopy bulk density
  • cbh: canopy base height
  • acf: available canopy fuel (foliage and fine branches)
  • tf: total fuel (all biomass combined)
  • af: available fuel (combined biomass of uf and acf)

And here are the self-explanatory suffixes I’ve added:

  • pre: pre-fire measurement
  • pst: post-fire measurement
  • dif: difference between pre- and post-fire measurements

As I understand it, all of the fields are measured in Mg/ha, with the exception of cbd and cbh, which are probably measured in kg/m3 and m, respectively.

PDC Sensitivity Testing

As in the previous documents, I will define two core functions that will form the basis of this analysis. The first, tuned.model() does the following:

  1. Performs variable selection using VSURF() on the full dataset.
  2. Conducts a 100-iteration random grid search to tune random forest hyperparameters (mtry, min.node.size, and sample.fraction), again using the full dataset and out-of-bag prediction error.
  3. Uses the selected variables from (1) and the best set of hyperparameters from (2) to conduct a 10-fold spatial cross-validation based on clusters of fuel plots, to quantify model performance in a robust manner that accounts for spatial autocorrelation.

The second function, prd.obs.plot() compiles model predictions and observations, calculates model performance metrics (R2 and %RMSE), and optionally plots predictions vs. observations.

Show Code
# define modeling function
tuned.model <- function(mod.df){
  mod.df <- na.omit(mod.df)
  plot.ids <- mod.df$Plot
  folds <- substr(plot.ids, 1, 1)
  mod.df$Plot <- NULL
  x.cols <- 2:ncol(mod.df)
  y.col <- 1
  v <- VSURF(
    x = mod.df[,x.cols],
    y = mod.df[,y.col],
    RFimplem = "ranger",
    parallel = T,
    ncores = 24,
    verbose = F
  )
  x.cols <- x.cols[v$varselect.pred]
  mod.df <- mod.df[,c(y.col, x.cols)]
  cv.iters <- 100
  hyp.grid <- data.frame(
    mtry = sample(1:length(x.cols), cv.iters, T),
    min.node.size = sample(1:10, cv.iters, T),
    sample.fraction = runif(cv.iters, 0.1, 0.9),
    mse = rep(NA, cv.iters)
  )
  for (i in seq(1,cv.iters)){
    form <- as.formula(paste0(colnames(mod.df)[1], " ~ ."))
    mod <- ranger(
      formula = form,
      data = mod.df,
      mtry = hyp.grid$mtry[i],
      sample.fraction = hyp.grid$sample.fraction[i],
      min.node.size = hyp.grid$min.node.size[i]
    )
    hyp.grid$mse[i] <- mod$prediction.error
  }
  mtry <- hyp.grid$mtry[which.min(hyp.grid$mse)]
  min.node.size <- hyp.grid$min.node.size[which.min(hyp.grid$mse)]
  sample.fraction <- hyp.grid$sample.fraction[which.min(hyp.grid$mse)]
  form <- as.formula(paste0(colnames(mod.df)[1], " ~ ."))
  for (fold in unique(folds)){
    mod.df.valid <- mod.df[folds == fold,]
    mod.df.train <- mod.df[folds != fold,]
    mod <- ranger(
      formula = form,
      data = mod.df.train,
      mtry = mtry,
      sample.fraction = sample.fraction,
      min.node.size = min.node.size
    )
    pred <- predict(mod, mod.df.valid)$prediction
    if (fold == unique(folds)[1]){
      df.pred.obs <- data.frame(
        Plot = plot.ids[folds == fold],
        obs = mod.df.valid[,y.col],
        prd = pred
      )
    } else {
      df.pred.obs <- rbind(
        df.pred.obs,
        data.frame(
          Plot = plot.ids[folds == fold],
          obs = mod.df.valid[,y.col],
          prd = pred
        )
      )
    }
  }
  return(df.pred.obs)
}

# define predicted vs. observed plotting function
prd.obs.plot <- function(df.pred.obs, name, plot = T){
  df.pred.obs <- na.omit(df.pred.obs)
  x <- df.pred.obs$obs
  y <- df.pred.obs$prd
  mod <- lm(y ~ x)
  r2 <- summary(mod)$adj.r.squared
  rrmse <- (100*sqrt(mean((x-y)^2))/mean(x))
  if (plot == T){
    rg <- range(c(x,y), na.rm = T)
    par(mar = c(5,5,2,1), las = 1)
    plot(x, y, type = "n", xlab = "Observed", ylab = "Predicted", main = name,
         ylim = rg, xlim = rg)
    grid()
    lines(x = c(-1e10, 1e10), y = c(-1e10, 1e10))
    points(x, y, pch = 16, col = rgb(0,0,0,0.25), cex = 1.25)
    new.x <- range(x)
    new.y <- predict(mod, newdata = list(x = new.x))
    lines(new.x, new.y, lwd = 2, col = 2)
    r2.lab <- bquote(R^2 == .(round(r2,2)))
    rrmse.lab <- bquote(symbol("%")*RMSE == .(round(rrmse)))
    legend(
      "topleft",
      legend = c(r2.lab, rrmse.lab),
      text.col = 2,
      bty = "n",
      x.intersp = 0
    )
  }
  return(data.frame(var = name, r2 = r2, rrmse = rrmse))
}

For this analysis, I will be using the 1-year post-fire lidar data. In a future document, I will be exploring the differences in model performance between the 1-year and 3-4 year post-fire lidar datasets. In the code chunk below, I will iterate through a series of bandwidths:

  • 1 to 25 cm, at an interval of 1 cm

For each bandwidth, I will calculate the densities at each kernel center point (mean) between 0 and 30 m, at an interval equal to the bandwidth. In other words, for 1 cm bandwidth, I will test mean = [0.00 m, 0.01 m, …, 29.9 cm, 30.0 cm]. For 2 cm bandwidth, mean = [0.00 m, 0.02 m, …, 29.8 m, 30.0 m], etc. I will calculate these for both pre-fire and post-fire data, calculating the difference between the two at each height interval. Then, I will plot out a series of four example fuel profile comparisons calculated at different density bandwidths, to provide context for what these predictor variables look like.

Show Code
# suppress progress bar
options(lidR.progress = F)

# directory structure
als.dir.pre <- "S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_lidar/a04_height"
als.dir.pst.ann <- "S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_postfire_lidar/Annabella2021/LAZ/a03_hgt"
als.dir.pst.mon <- "S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_postfire_lidar/Monroe2020/LAZ_1.4/a03_hgt"

# read in lascatalogs
ctg.pre <- readLAScatalog(als.dir.pre, filter = "-drop_withheld -drop_class 7 18", progress = F)
ctg.pst.ann <- readLAScatalog(als.dir.pst.ann, filter = "-drop_withheld -drop_class 7 18", progress = F)
ctg.pst.mon <- readLAScatalog(als.dir.pst.mon, filter = "-drop_withheld -drop_class 7 18", progress = F)

# define percentile metrics function
dens.fun <- function(z, bw = 0.05){
  z[z < 0] <- 0
  d <- density(z, bw, from = 0, to = 30, n = length(seq(0,30,bw)))
  dy <- as.list(d$y)
  names(dy) <- paste0("d.", d$x)
  return(dy)
}

# reproject plots to match crs of pre and pst ctgs
fuel.plots.pre <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pre))
fuel.plots.pst.ann <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pst.ann)) |>
  subset(substr(Plot, 1, 1) %in% c("5", "7"))
fuel.plots.pst.mon <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pst.mon)) |>
  subset(!substr(Plot, 1, 1) %in% c("5", "7"))

# define csv directory
csv.dir <- "S:/ursa/campbell/fasmee/modeling"

# begin bandwidth loop
bws <- seq(0.01,0.25,0.01)
i <- 0
n <- length(bws)
for (bw in bws){
  
  # print status
  i <- i + 1
  print(paste0(date(), " ", bw, " (", i, "/", n, ")"))
  
  # define output csv file
  out.csv.pre <- paste0(csv.dir, "/fuel_cons_vs_pdc_bw_", str_pad(bw * 100, 2, "left", "0"), "cm_pre.csv")
  out.csv.pst <- paste0(csv.dir, "/fuel_cons_vs_pdc_bw_", str_pad(bw * 100, 2, "left", "0"), "cm_pst.csv")
  out.csv.dif <- paste0(csv.dir, "/fuel_cons_vs_pdc_bw_", str_pad(bw * 100, 2, "left", "0"), "cm_dif.csv")
  
  # check to see if it's already been run
  if (!all(file.exists(out.csv.pre), file.exists(out.csv.pst), file.exists(out.csv.dif))){
    
    # run plot metrics
    dens.pre <- plot_metrics(
      las = ctg.pre,
      func = ~dens.fun(Z, bw),
      geometry = fuel.plots.pre,
      radius = 8
    ) |>
      st_drop_geometry()
    dens.pst.ann <- plot_metrics(
      las = ctg.pst.ann,
      func = ~dens.fun(Z, bw),
      geometry = fuel.plots.pst.ann,
      radius = 8
    ) |>
      st_drop_geometry()
    dens.pst.mon <- plot_metrics(
      las = ctg.pst.mon,
      func = ~dens.fun(Z, bw),
      geometry = fuel.plots.pst.mon,
      radius = 8
    ) |>
      st_drop_geometry()
    
    # merge the ann and mon datasets
    dens.pst <- rbind(dens.pst.ann, dens.pst.mon)
    
    # sort the data
    dens.pre <- dens.pre[order(dens.pre$Plot),]
    dens.pst <- dens.pst[order(dens.pst$Plot),]
    
    # separate out predictor and response data frames
    dens.pre <- dens.pre[,!colnames(dens.pre) %in% colnames(fuel.plots.pre)]
    dens.pst <- dens.pst[,!colnames(dens.pst) %in% colnames(fuel.plots.pre)]
    
    # calculate differences
    dens.dif <- dens.pre - dens.pst
    
    # change column names
    colnames(dens.pre) <- paste0(colnames(dens.pre), ".pre")
    colnames(dens.pst) <- paste0(colnames(dens.pst), ".pst")
    colnames(dens.dif) <- paste0(colnames(dens.dif), ".dif")
    
    # write to csv
    write.csv(dens.pre, out.csv.pre, row.names = F)
    write.csv(dens.pst, out.csv.pst, row.names = F)
    write.csv(dens.dif, out.csv.dif, row.names = F)

  }

}
[1] "Thu Jun 20 10:52:36 2024 0.01 (1/25)"
[1] "Thu Jun 20 10:52:36 2024 0.02 (2/25)"
[1] "Thu Jun 20 10:52:36 2024 0.03 (3/25)"
[1] "Thu Jun 20 10:52:36 2024 0.04 (4/25)"
[1] "Thu Jun 20 10:52:36 2024 0.05 (5/25)"
[1] "Thu Jun 20 10:52:36 2024 0.06 (6/25)"
[1] "Thu Jun 20 10:52:36 2024 0.07 (7/25)"
[1] "Thu Jun 20 10:52:36 2024 0.08 (8/25)"
[1] "Thu Jun 20 10:52:36 2024 0.09 (9/25)"
[1] "Thu Jun 20 10:52:36 2024 0.1 (10/25)"
[1] "Thu Jun 20 10:52:36 2024 0.11 (11/25)"
[1] "Thu Jun 20 10:52:36 2024 0.12 (12/25)"
[1] "Thu Jun 20 10:52:36 2024 0.13 (13/25)"
[1] "Thu Jun 20 10:52:36 2024 0.14 (14/25)"
[1] "Thu Jun 20 10:52:36 2024 0.15 (15/25)"
[1] "Thu Jun 20 10:52:36 2024 0.16 (16/25)"
[1] "Thu Jun 20 10:52:36 2024 0.17 (17/25)"
[1] "Thu Jun 20 10:52:36 2024 0.18 (18/25)"
[1] "Thu Jun 20 10:52:36 2024 0.19 (19/25)"
[1] "Thu Jun 20 10:52:36 2024 0.2 (20/25)"
[1] "Thu Jun 20 10:52:36 2024 0.21 (21/25)"
[1] "Thu Jun 20 10:52:36 2024 0.22 (22/25)"
[1] "Thu Jun 20 10:52:36 2024 0.23 (23/25)"
[1] "Thu Jun 20 10:52:36 2024 0.24 (24/25)"
[1] "Thu Jun 20 10:52:36 2024 0.25 (25/25)"
Show Code
# define example bws
ex.bws <- c(0.02,0.05,0.10,0.20)

# define example row
ex.row <- 1

# set up plot
par(mfrow = c(2,2), mar = c(5,5,2,1), las = 1)

# loop through bws
for (ex.bw in ex.bws){
  
  # read in the data
  dens.pre <- read.csv(paste0(csv.dir, "/fuel_cons_vs_pdc_bw_", str_pad(ex.bw * 100, 2, "left", "0"), "cm_pre.csv"))
  dens.pst <- read.csv(paste0(csv.dir, "/fuel_cons_vs_pdc_bw_", str_pad(ex.bw * 100, 2, "left", "0"), "cm_pst.csv"))

  # get example row data
  dens.pre.ex <- dens.pre[ex.row,]
  dens.pst.ex <- dens.pst[ex.row,]

  # create empty plot
  plot(x = c(0,30), #y = range(c(dens.pre.ex, dens.pst.ex)),
       y = c(0,0.1),
      type = "n", xlab = "Height (m)", ylab = "Density",
      main = paste0("Bandwidth: ", ex.bw*100, "cm"))
  grid()
  
  # add the data
  polygon(x = c(0, seq(0,30,ex.bw), 30),
          y = c(0, dens.pre.ex, 0),
          col = adjust_transparency(2, 0.5),
          border = NA)
  lines(x = seq(0,30,ex.bw), y = dens.pre.ex, col = 2, lwd = 2)
  polygon(x = c(0, seq(0,30,ex.bw), 30),
          y = c(0, dens.pst.ex, 0),
          col = adjust_transparency(4, 0.5),
          border = NA)
  lines(x = seq(0,30,ex.bw), y = dens.pst.ex, col = 4, lwd = 2)
  
  # add legend
  legend("topright", legend = c("Pre-fire", "Post-fire"), col = c(2, 4),
         fill = c(adjust_transparency(2, 0.5), adjust_transparency(4, 0.5)))

}

I will build two sets of models, one with just the density differences as predictors, and one with both the density differences and the pre-fire densities as predictors. For each set of predictors, I will apply the tuned.model() function to build a model and make predictions, and the prd.obs.plot() function to compile the results. I will use %RMSE as the basis of comparison to determine what bandwidths perform best for each fuelbed in the plot data.

Show Code
# define output performance metrics csv and see if it's already been run
out.perf.mets.csv <- paste0(csv.dir, "/fuel_cons_vs_pdc_bw_test_perf_mets.csv")
if (!file.exists(out.perf.mets.csv)){
  
  # define response variables of interest
  vars <- c(
    "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
    "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
  )
  vars <- paste0(vars, "_dif")
  
  # define bandwidths
  bws <- seq(0.01,0.25,0.01)
  
  # get plot ids
  plot.ids <- fuel.plots[,"Plot"] |> as.data.frame()
  
  # loop through response variables
  i <- 0
  n <- length(vars)
  for (var in vars){
    
    # print status
    i <- i + 1
    print(paste0(date(), " ", var, "(", i, "/", n, ")"))
    
    # loop through bandwidths
    j <- 0
    m <- length(bws)
    for (bw in bws){
      
      # print status
      j <- j + 1
      print(paste0(date(), "   ", bw, "(", j, "/", m, ")"))
      
      # read in the data
      dens.pre <- read.csv(paste0(csv.dir, "/fuel_cons_vs_pdc_bw_", str_pad(bw * 100, 2, "left", "0"), "cm_pre.csv"))
      dens.dif <- read.csv(paste0(csv.dir, "/fuel_cons_vs_pdc_bw_", str_pad(bw * 100, 2, "left", "0"), "cm_dif.csv"))
      
      #-----------just dif
      
      # create modeling data frame
      x <- dens.dif
      y <- fuel.plots[,var] |> as.data.frame()
      mod.df <- cbind(plot.ids, y,x)
      
      # run the model
      out.csv <- paste0(csv.dir, "/fuel_cons_vs_pdc_bw_", str_pad(bw * 100, 2, "left", "0"), "cm_pre_pred_obs_dif.csv")
      if (!file.exists(out.csv)){
        df.pred.obs <- tuned.model(mod.df)
      } else {
        df.pred.obs <- read.csv(out.csv)
      }
      perf.mets <- prd.obs.plot(df.pred.obs, var, plot = F)
      
      # compile results
      perf.mets$type <- "dif"
      perf.mets$bw <- bw
      if (var == vars[1] & bw == bws[1]){
        perf.mets.pdc <- perf.mets
      } else {
        perf.mets.pdc <- rbind(perf.mets.pdc, perf.mets)
      }
    
      #-----------pre & dif
          
      # create modeling data frame
      x <- cbind(dens.pre, dens.dif)
      y <- fuel.plots[,var] |> as.data.frame()
      mod.df <- cbind(plot.ids, y,x)
      
      # run the model
      out.csv <- paste0(csv.dir, "/fuel_cons_vs_pdc_bw_", str_pad(bw * 100, 2, "left", "0"), "cm_pre_pred_obs_pre&dif.csv")
      if (!file.exists(out.csv)){
        df.pred.obs <- tuned.model(mod.df)
      } else {
        df.pred.obs <- read.csv(out.csv)
      }
      perf.mets <- prd.obs.plot(df.pred.obs, var, plot = F)
      
      # compile results
      perf.mets$type <- "pre&dif"
      perf.mets$bw <- bw
      perf.mets.pdc <- rbind(perf.mets.pdc, perf.mets)
      
    }
    
  }
  
  # write to csv
  write.csv(perf.mets.pdc, out.perf.mets.csv, row.names = F)
  
} else {
  
  # read the data in
  perf.mets.pdc <- read.csv(out.perf.mets.csv)
  
}

In the plot below, I will look at each fuelbed, and assess model predictive performance by bandwidth for both the density difference-only and the pre-fire density + density difference models. The lines on the left figures will represent the %RMSE of each model type, with vertical bars indicating the best-performing bandwidth for each model type. On the right, a boxplot will show the overall comparison between the difference-only and pre-fire + difference models across all bandwidths to reveal a general sense of which model type performed better. To compare performance across all fuelbeds, the last row of the figure will contain the median %RMSE values at each bandwidth for all fuelbeds.

Show Code
# set up plot
layout(matrix(1:36, byrow = T, nrow = 18), widths = c(4,1))
par(oma = c(5,5,3,5), las = 1)

# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af", "median"
)
vars <- paste0(vars, "_dif")

# see if median has already been calculated
if (!"median_dif" %in% perf.mets.pdc$var){
    
  # calculate median performance across all variables
  perf.mets.pdc.mean <- perf.mets.pdc |>
    group_by(type, bw) |>
    summarize(r2 = median(r2),
              rrmse = median(rrmse),
              .groups = "keep") |>
    mutate(var = "median_dif") |>
    as.data.frame()
  
  # merge
  perf.mets.pdc <- bind_rows(perf.mets.pdc, perf.mets.pdc.mean)
  
}


# loop through variables
i <- 0
for (var in vars){
  
  # add one to counter
  i <- i + 1
  
  # get rrmse vals
  rrmse.dif <- perf.mets.pdc$rrmse[perf.mets.pdc$var == var & perf.mets.pdc$type == "dif"]
  rrmse.predif <- perf.mets.pdc$rrmse[perf.mets.pdc$var == var & perf.mets.pdc$type == "pre&dif"]
  
  # get range
  rrmse.rng <- range(c(rrmse.dif, rrmse.predif))
  
  # set up plot
  par(mar = c(0,0,0,4))
  
  # create empty plot
  plot(x = c(1,25), y = rrmse.rng, type = "n", xlab = NA, ylab = NA, xaxt = "n",
       yaxt = "n")
  
  # add vertical grid
  grid()
  
  # add lines
  lines(rrmse.dif ~ seq(1,25), lwd = 2, col = 2)
  lines(rrmse.predif ~ seq(1,25), lwd = 2, col = 4)
  
  # add vertical lines
  abline(v = seq(1,25)[which.min(rrmse.dif)], lwd = 7, col = adjust_transparency(2, 0.33))
  abline(v = seq(1,25)[which.min(rrmse.predif)], lwd = 7, col = adjust_transparency(4, 0.33))
  
  # add axes
  if (i %% 2 == 0){
    axis(2)
  } else {
    axis(4)
  }
  if (var == vars[length(vars)]){
    axis(1, at = seq(1,25), labels = F, col.ticks = "lightgray")
    axis(1)
  }
  
  # add box
  if (var == vars[length(vars)]){
    box(lwd = 2.5)
  } else {
    box()
  }

  # add legends
  if (var == vars[1]){
    legend(
      x = par("usr")[1],
      y = par("usr")[4] + 0.05 * diff(par("usr")[3:4]),
      legend = c("Density Difference Only", "Lowest Error Bandwidth"),
      col = c(2,adjust_transparency(2, 0.33)), lwd = c(2,7), xpd = NA,
      xjust = 0, yjust = 0, bty = "n"
    )
    legend(
      x = par("usr")[2],
      y = par("usr")[4] + 0.05 * diff(par("usr")[3:4]),
      legend = c("Density Difference & Pre-Fire Density", "Lowest Error Bandwidth"),
      col = c(4,adjust_transparency(4, 0.33)), lwd = c(2,7), xpd = NA,
      xjust = 1, yjust = 0, bty = "n"
    )
  }
  
  # add axis labels
  if (var == vars[length(vars)]){
    mtext("Bandwidth (cm)", 1, 2.5, outer = F)
    mtext(expression(symbol("%")*RMSE), 2, 3, outer = T, las = 0)
  }
  
  #--------------------boxplot
  
  # set up plot
  par(rep(0,2.5))
  
  # get sub df
  perf.mets.pdc.sub <- perf.mets.pdc[perf.mets.pdc$var == var,]
  
  # boxplot
  boxplot(rrmse ~ type, data = perf.mets.pdc.sub, yaxt = "n", ylab = NA,
          xaxt = "n", yaxt = "n",
          col = c(adjust_transparency(2, 0.33), adjust_transparency(4, 0.33)), 
          border = c(2,4), pch = 16)
  
  # add x axis
  if (var == vars[length(vars)]){
    axis(1, at = c(1,2), labels = F)
    par(xpd = NA)
    text(x = c(1,2), y = par("usr")[3] - 0.1 * diff(par("usr")[3:4]),
         labels = c("Dens\nDiff\nOnly", "Dens\nDiff\n& Pre\nDens"),
         adj = c(0.5,1))
    par(xpd = F)
  }
  
   # add box
  if (var == vars[length(vars)]){
    box(lwd = 3)
  } else {
    box()
  }

  # add variable label
  mtext(
    text = sub("_dif", "", var),
    side = 4,
    line = 0.5,
    font = 2
  )

}

Results and Discussion

Although some fuelbeds benefitted from somewhat wider bandwidths, by and large the trend was that smaller bandwidths tended to yield better model performance. In fact, many of the models performed best at the smallest bandwidth tested (1 cm). This was really quite surprising to me, considering how seemingly little information is contained within such narrow slices of the vertical fuel profile. But, I suppose that narrowness allowed the models to leverage multiple different precise height ranges to accurately quantify fuel consumption. When taken together (i.e., when looking at the median results in the last row of the figure), 1 cm is the best bandwidth for both the density difference-only and the pre-fire + density difference models. Furthermore, the pre-fire + density difference models outperformed the density difference-only models, both at the individual fuelbed level and in aggregate.