Monroe Mountain Fuel Consumption Modeling Part 2

A focus on model comparisons, with spatial cross-validation and a new consumption modeling approach

Author

Mickey Campbell

Introduction

This document serves as an addendum to Monroe Mountain Fuel Consumption Modeling. It will be comparably sparse in explanatory text, since most details are contained within the previously linked document. The primary objectives of this document are:

1. To update the results using the smaller, shorter-term post-fire lidar datasets.

The previous document focused on results driven by a comparison between the full Monroe Mountain 2016 and 2023 lidar datasets. While this serves as a nice, spatially comprehensive comparison, the fires burned in 2019-2020. Given the time lag between fire and lidar data (3-4 years), a lot of regeneration could have (and likely has) occurred, likely adding uncertainty into the fuel consumption modeling procedure. By focusing on two post-fire lidar dataset that were collected 1 year post-fire, we can hopefully more accurately capture consumption before the burn areas had a chance to regrow.

2. To compare results between different fuel consumption modeling approaches, but using a more robust validation technique.

In a meeting with Andy on Friday, 6/7/2024, it was revealed that spatial cross-validation (as opposed to simple k-fold or out-of-bag model assessment) demonstrated relatively poor performance in the fuel consumption models. That is, when you test the models on a spatially distinct cluster of fuel plots that are separated from the others on which the model is trained, the predictive power decreases substantially. However, this is the best way to assess how generalizable these models are, and how well they would perform when applied to truly novel fuel and fire conditions. Thus, they represent a considerably more robust performance assessment than the simple out-of-bag estimates that were contained within the previous document.

3. To add a new fuel consumption modeling technique into the mix: profile density change (PDC).

In the same meeting, Andy raised the question of using densities, rather than height quantiles (as in the PAC method), as the basis of fuel consumption modeling. So, I implemented that method, which I am calling profile density change. Pretty simply, the technique just involves using a Gaussian kernel density estimator to quantify the point density at a series of vertical heights above the ground. This is done for both pre- and post-fire lidar data, and the differences between height-specific densities are used as candidate predictor variables for modeling fuel consumption.

4. For both PAC and PDC, adding pre-fire structure into the set of candidate predictor variables.

Pperhaps structural differences (height quantiles for PAC and point densities for PDC) alone are insufficient for quantifying consumption. Perhaps by adding the pre-fire height quantiles/point densities into the mix of candidate predictor variables, the models will more accurately be able to suss out how different pre-fire conditions are linked with consumption of different fuel components. For example, if the model knows that a subset of plots comprised tall forests before the fire, then it may be able to leverage that knowledge to understand which difference variables should be more indicative of canopy fuel consumption.

5. Test a combined PAC + PDC model.

Preliminary results were demonstrating that PAC was good at capturing some variables and PDC was good at capturing others. Given the variable selection approach baked into my modeling procedure, perhaps we can benefit from the best of both worlds while modeling fuel consumption by throwing all of the above into the soup of candidate predictors.

Field Data

Nothing new here – just repeating the same field plot wrangling and filtering that was done in the previous document.

Show Code
library(terra)
library(rmarkdown)
library(sf)
library(lidR)
library(lidRmetrics)
library(VSURF)
library(tuneRanger)
library(ranger)
library(mlr)
library(dplyr)
library(colorspace)
library(RColorBrewer)
library(stringr)

# set random seed
set.seed(5757)

# read in the data
fuel.plots <- vect("S:/ursa/campbell/fasmee/data/from_ryan/plots_wFuels.shp")

# remove empty plots
fuel.plots <- fuel.plots[!is.na(fuel.plots$FID),]

# separate pre-fire and post-fire plots
fuel.plots.pre <- fuel.plots[grep("pre", fuel.plots$FID),]
fuel.plots.pst <- fuel.plots[grep("post", fuel.plots$FID),]

# remove unnecessary columns
del.cols <- c("FID", "X", "Y")
keep.cols <- names(fuel.plots)[!names(fuel.plots) %in% del.cols]
fuel.plots.pre <- fuel.plots.pre[,keep.cols]
fuel.plots.pst <- fuel.plots.pst[,keep.cols]

# move plot ID column from last to first
n <- ncol(fuel.plots.pre)
fuel.plots.pre <- fuel.plots.pre[,c(n,1:(n-1))]
fuel.plots.pst <- fuel.plots.pst[,c(n,1:(n-1))]

# add "pre" and "pst" to column names for all attributes
dif.names <- names(fuel.plots.pre)[2:ncol(fuel.plots.pre)]
names(fuel.plots.pre)[2:ncol(fuel.plots.pre)] <- 
  paste0(names(fuel.plots.pre)[2:ncol(fuel.plots.pre)], "_pre")
names(fuel.plots.pst)[2:ncol(fuel.plots.pst)] <- 
  paste0(names(fuel.plots.pst)[2:ncol(fuel.plots.pst)], "_pst")

# merge them
fuel.plots <- merge(fuel.plots.pre, as.data.frame(fuel.plots.pst))

# pad single-digit plot ids with 0's
fuel.plots$Plot[as.numeric(fuel.plots$Plot) < 100] <- paste0("00", fuel.plots$Plot[as.numeric(fuel.plots$Plot) < 100])

# sort of plot id
fuel.plots <- fuel.plots[order(fuel.plots$Plot),]

# get differences between pre- and post-fire fuel loads (i.e., calc consumption)
for (dif.name in dif.names){
  pre.fld <- paste0(dif.name, "_pre")
  pst.fld <- paste0(dif.name, "_pst")
  dif.fld <- paste0(dif.name, "_dif")
  fuel.plots[,dif.fld] <- as.data.frame(fuel.plots[,pre.fld]) - as.data.frame(fuel.plots[,pst.fld])
}

# remove the plot that had an extreme increase in ground fuels
fuel.plots <- fuel.plots[fuel.plots$hr1000_dif > -300,]

Definition of Field Names

For clarity, here are the field names from the fuel plot data again:

  • yr: year of measurment
  • hr1: 1 hour biomass
  • hr10: 10 hours biomass
  • hr100: 100 hour biomass
  • hr1000: 1000 hour biomass
  • nonwoody: herb biomass
  • shrubs: shrub biomass
  • trees: seedling biomass
  • llm: litter, lichen, and moss biomass
  • ground: duff biomass
  • shrb_sd: combined biomass of shrubs and seedlings
  • woody: combined biomass of 1hr, 10hr, 100hr, and 1000hr
  • uf: combined biomass of nonwoody, llm, ground, shrubs, trees, and woody
  • cf: tree biomass, including snags as well as live boles, branches, and foliage
  • cbd: canopy bulk density
  • cbh: canopy base height
  • acf: available canopy fuel (foliage and fine branches)
  • tf: total fuel (all biomass combined)
  • af: available fuel (combined biomass of uf and acf)

And here are the self-explanatory suffixes I’ve added:

  • pre: pre-fire measurement
  • pst: post-fire measurement
  • dif: difference between pre- and post-fire measurements

As I understand it, all of the fields are measured in Mg/ha, with the exception of cbd and cbh, which are probably measured in kg/m3 and m, respectively.

Modeling Fuel Consumption

Approach #1: Modeled Fuel Load Subtraction

Again, for more details on this, please refer back to the previous document. But, briefly, I will test two modeling approaches:

1a. Modeling pre- and post-fire fuel load separately (PrePost model). 1b. Building a single model that incorporates both pre- and post-fire fuel load data (Pooled model).

Both approaches predict fuel loads before and after the fire separately, the subtraction of which yields a fuel consumption estimate, which is assessed against field-measured consumption.

Show Code
# define modeling function
tuned.model <- function(mod.df){
  mod.df <- na.omit(mod.df)
  plot.ids <- mod.df$Plot
  folds <- substr(plot.ids, 1, 1)
  mod.df$Plot <- NULL
  x.cols <- 2:ncol(mod.df)
  y.col <- 1
  v <- VSURF(
    x = mod.df[,x.cols],
    y = mod.df[,y.col],
    RFimplem = "randomForest",
    parallel = T,
    ncores = 18,
    verbose = F
  )
  x.cols <- x.cols[v$varselect.pred]
  mod.df <- mod.df[,c(y.col, x.cols)]
  cv.iters <- 100
  hyp.grid <- data.frame(
    mtry = sample(1:length(x.cols), cv.iters, T),
    min.node.size = sample(1:10, cv.iters, T),
    sample.fraction = runif(cv.iters, 0.1, 0.9),
    mse = rep(NA, cv.iters)
  )
  for (i in seq(1,cv.iters)){
    form <- as.formula(paste0(colnames(mod.df)[1], " ~ ."))
    mod <- ranger(
      formula = form,
      data = mod.df,
      mtry = hyp.grid$mtry[i],
      sample.fraction = hyp.grid$sample.fraction[i],
      min.node.size = hyp.grid$min.node.size[i]
    )
    hyp.grid$mse[i] <- mod$prediction.error
  }
  mtry <- hyp.grid$mtry[which.min(hyp.grid$mse)]
  min.node.size <- hyp.grid$min.node.size[which.min(hyp.grid$mse)]
  sample.fraction <- hyp.grid$sample.fraction[which.min(hyp.grid$mse)]
  form <- as.formula(paste0(colnames(mod.df)[1], " ~ ."))
  for (fold in unique(folds)){
    mod.df.valid <- mod.df[folds == fold,]
    mod.df.train <- mod.df[folds != fold,]
    mod <- ranger(
      formula = form,
      data = mod.df.train,
      mtry = mtry,
      sample.fraction = sample.fraction,
      min.node.size = min.node.size
    )
    pred <- predict(mod, mod.df.valid)$prediction
    if (fold == unique(folds)[1]){
      df.pred.obs <- data.frame(
        Plot = plot.ids[folds == fold],
        obs = mod.df.valid[,y.col],
        prd = pred
      )
    } else {
      df.pred.obs <- rbind(
        df.pred.obs,
        data.frame(
          Plot = plot.ids[folds == fold],
          obs = mod.df.valid[,y.col],
          prd = pred
        )
      )
    }
  }
  return(df.pred.obs)
}

# define predicted vs. observed plotting function
prd.obs.plot <- function(df.pred.obs, name){
  df.pred.obs <- na.omit(df.pred.obs)
  x <- df.pred.obs$obs
  y <- df.pred.obs$prd
  rg <- range(c(x,y), na.rm = T)
  par(mar = c(5,5,2,1), las = 1)
  plot(x, y, type = "n", xlab = "Observed", ylab = "Predicted", main = name,
       ylim = rg, xlim = rg)
  grid()
  lines(x = c(-1e10, 1e10), y = c(-1e10, 1e10))
  points(x, y, pch = 16, col = rgb(0,0,0,0.25), cex = 1.25)
  mod <- lm(y ~ x)
  new.x <- range(x)
  new.y <- predict(mod, newdata = list(x = new.x))
  lines(new.x, new.y, lwd = 2, col = 2)
  r2 <- summary(mod)$adj.r.squared
  rrmse <- (100*sqrt(mean((x-y)^2))/mean(x))
  r2.lab <- bquote(R^2 == .(round(r2,2)))
  rrmse.lab <- bquote(symbol("%")*RMSE == .(round(rrmse)))
  legend(
    "topleft",
    legend = c(r2.lab, rrmse.lab),
    text.col = 2,
    bty = "n",
    x.intersp = 0
  )
  return(data.frame(var = name, r2 = r2, rrmse = rrmse))
}

1a. PrePost model

Show Code
# suppress progress bar
options(lidR.progress = F)

# directory structure
als.dir.pre <- "S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_lidar/a04_height"
# als.dir.pst <- "S:/ursa2/campbell/ems4d/data/als/als_2023/LAZ/Eastburn_processed"
als.dir.pst.ann <- "S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_postfire_lidar/Annabella2021/LAZ/a03_hgt"
als.dir.pst.mon <- "S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_postfire_lidar/Monroe2020/LAZ_1.4/a03_hgt"

# read in lascatalogs
ctg.pre <- readLAScatalog(als.dir.pre, filter = "-drop_withheld -drop_class 7 18", progress = F)
# ctg.pst <- readLAScatalog(als.dir.pst, filter = "-drop_withheld -drop_class 7 18", progress = F)
ctg.pst.ann <- readLAScatalog(als.dir.pst.ann, filter = "-drop_withheld -drop_class 7 18", progress = F)
ctg.pst.mon <- readLAScatalog(als.dir.pst.mon, filter = "-drop_withheld -drop_class 7 18", progress = F)

# reproject plots to match crs of pre and pst ctgs
fuel.plots.pre <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pre))
# fuel.plots.pst <- fuel.plots |>
#   st_as_sf() |>
#   st_transform(st_crs(ctg.pst))
fuel.plots.pst.ann <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pst.ann)) |>
  subset(substr(Plot, 1, 1) %in% c("5", "7"))
fuel.plots.pst.mon <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pst.mon)) |>
  subset(!substr(Plot, 1, 1) %in% c("5", "7"))

# run plot metrics
mets.pre <- plot_metrics(
  las = ctg.pre,
  func = ~lidRmetrics::metrics_set3(
    x = X,
    y = Y,
    z = Z,
    i = Intensity,
    ReturnNumber = ReturnNumber,
    NumberOfReturns = NumberOfReturns
  ),
  geometry = fuel.plots.pre,
  radius = 8
) |>
  as.data.frame()
# mets.pst <- plot_metrics(
#   las = ctg.pst,
#   func = ~lidRmetrics::metrics_set3(
#     x = X,
#     y = Y,
#     z = Z,
#     i = Intensity,
#     ReturnNumber = ReturnNumber,
#     NumberOfReturns = NumberOfReturns
#   ),
#   geometry = fuel.plots.pst,
#   radius = 8
# ) |>
#   as.data.frame()
mets.pst.ann <- plot_metrics(
  las = ctg.pst.ann,
  func = ~lidRmetrics::metrics_set3(
    x = X,
    y = Y,
    z = Z,
    i = Intensity,
    ReturnNumber = ReturnNumber,
    NumberOfReturns = NumberOfReturns
  ),
  geometry = fuel.plots.pst.ann,
  radius = 8
) |>
  as.data.frame()
mets.pst.mon <- plot_metrics(
  las = ctg.pst.mon,
  func = ~lidRmetrics::metrics_set3(
    x = X,
    y = Y,
    z = Z,
    i = Intensity,
    ReturnNumber = ReturnNumber,
    NumberOfReturns = NumberOfReturns
  ),
  geometry = fuel.plots.pst.mon,
  radius = 8
) |>
  as.data.frame()

# merge the ann and mon datasets
mets.pst <- rbind(mets.pst.ann, mets.pst.mon)

# sort the data
mets.pre <- mets.pre[order(mets.pre$Plot),]
mets.pst <- mets.pst[order(mets.pst$Plot),]

# separate out predictor and response data frames
mets.pre <- mets.pre[,!colnames(mets.pre) %in% colnames(fuel.plots.pre)]
mets.pst <- mets.pst[,!colnames(mets.pst) %in% colnames(fuel.plots.pre)]

# remove unscalable variables
del.cols <- c("n", "n_first", "n_intermediate", "n_last", "n_single",
              "n_multiple", "n_return_1", "n_return_2", "n_return_3",
              "n_return_4", "vn")

# remove predictors with no variability
sds.pre <- apply(mets.pre, 2, sd)
sds.pst <- apply(mets.pst, 2, sd)
del.cols.pre <- c(del.cols, names(sds.pre)[sds.pre == 0 & !is.na(sds.pre)])
del.cols.pst <- c(del.cols, names(sds.pst)[sds.pst == 0 & !is.na(sds.pst)])

# remove predictors with na values
nas.pre <- apply(mets.pre, 2, function(x) sum(is.na(x)))
nas.pst <- apply(mets.pst, 2, function(x) sum(is.na(x)))
del.cols.pre <- c(del.cols.pre, names(nas.pre)[nas.pre > 0])
del.cols.pst <- c(del.cols.pst, names(nas.pst)[nas.pst > 0])

# keep columns
keep.cols.pre <- colnames(mets.pre)[!colnames(mets.pre) %in% del.cols.pre]
keep.cols.pst <- colnames(mets.pst)[!colnames(mets.pst) %in% del.cols.pst]
mets.pre <- mets.pre[,keep.cols.pre]
mets.pst <- mets.pst[,keep.cols.pst]

# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars.pre <- paste0(vars, "_pre")

# loop through variables and model, plot, and compile results of each
successes <- 0
for (var.pre in vars.pre){
  resp.var <- fuel.plots[,var.pre] |> as.data.frame()
  plot.id <- fuel.plots[,"Plot"] |> as.data.frame()
  if (length(unique(resp.var[,1])) < 5){
    next
  } else {
    successes <- successes + 1
  }
  mod.df <- cbind(plot.id, resp.var, mets.pre)
  df.pred.obs <- tuned.model(mod.df)
  df.pred.pre <- data.frame(Plot = df.pred.obs$Plot,
                            x = df.pred.obs$prd)
  colnames(df.pred.pre)[2] <- paste0(var.pre, "_prd")
  if (successes == 1){
    df.pred.pre.merge <- df.pred.pre
  } else {
    df.pred.pre.merge <- merge(df.pred.pre.merge, df.pred.pre, all = T)
  }
}

# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars.pst <- paste0(vars, "_pst")

# loop through variables and model, plot, and compile results of each
successes <- 0
for (var.pst in vars.pst){
  resp.var <- fuel.plots[,var.pst] |> as.data.frame()
  plot.id <- fuel.plots[,"Plot"] |> as.data.frame()
  if (length(unique(resp.var[,1])) < 5){
    next
  } else {
    successes <- successes + 1
  }
  mod.df <- cbind(plot.id, resp.var, mets.pst)
  df.pred.obs <- tuned.model(mod.df)
  df.pred.pst <- data.frame(Plot = df.pred.obs$Plot,
                            x = df.pred.obs$prd)
  colnames(df.pred.pst)[2] <- paste0(var.pst, "_prd")
  if (successes == 1){
    df.pred.pst.merge <- df.pred.pst
  } else {
    df.pred.pst.merge <- merge(df.pred.pst.merge, df.pred.pst, all = T)
  }
}

# merge pre and post fire predictions
df.pred.merge <- merge(df.pred.pre.merge, df.pred.pst.merge, all = T)

# get fuel metric names
vars <- colnames(df.pred.merge) |>
  sub("_pre_prd", "", x = _) |>
  sub("_pst_prd", "", x = _) |>
  unique()
vars <- vars[vars != "Plot"]

# loop through them and calculate differences
for (var in vars){
  var.pre <- paste0(var, "_pre_prd")
  var.pst <- paste0(var, "_pst_prd")
  var.dif <- paste0(var, "_dif_prd")
  if (!all(c(var.pre, var.pst) %in% colnames(df.pred.merge))) next
  df.pred.merge[,var.dif] <- df.pred.merge[,var.pre] - df.pred.merge[,var.pst]
}

# join back to fuel plot data
df.pre.post <- fuel.plots |>
  as.data.frame() |>
  merge(df.pred.merge, all = T)

# loop through the variables and compare consumption predictions to observations
par(mfrow = c(4,4))
for (var in vars){
  var.dif.prd <- paste0(var, "_dif_prd")
  var.dif.obs <- paste0(var, "_dif")
  if (!all(c(var.dif.prd, var.dif.obs) %in% colnames(df.pre.post))) next
  df.pred.obs <- data.frame(obs = df.pre.post[,var.dif.obs],
                            prd = df.pre.post[,var.dif.prd])
  perf.mets <- prd.obs.plot(df.pred.obs, var)
  if (var == vars[1]){
    perf.mets.prepost <- perf.mets
  } else {
    perf.mets.prepost <- rbind(perf.mets.prepost, perf.mets)
  }
}

Show Code
perf.mets.prepost$method <- "prepost"

1b. Pooled model

Show Code
# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars.pre <- paste0(vars, "_pre")
vars.pst <- paste0(vars, "_pst")

# subset the fuels data
fuel.plots.pooled.pre <- fuel.plots[,c("Plot", vars.pre)] |> as.data.frame()
fuel.plots.pooled.pst <- fuel.plots[,c("Plot", vars.pst)] |> as.data.frame()

# remove "_pre" and "_pst" from column names to enable row merging
names(fuel.plots.pooled.pre) <- sub("_pre", "", names(fuel.plots.pooled.pre))
names(fuel.plots.pooled.pst) <- sub("_pst", "", names(fuel.plots.pooled.pst))

# add "_pre" and "_pst" to plot ids
fuel.plots.pooled.pre$Plot <- paste0(fuel.plots.pooled.pre$Plot, "_pre")
fuel.plots.pooled.pst$Plot <- paste0(fuel.plots.pooled.pst$Plot, "_pst")

# merge rowwise
fuel.plots.pooled <- bind_rows(fuel.plots.pooled.pre, fuel.plots.pooled.pst)

# merge mets rowwise too
mets.pooled <- bind_rows(mets.pre, mets.pst)

# remove columns with nas
mets.pooled <- mets.pooled[,apply(mets.pooled, 2, function(x) sum(is.na(x)) == 0)]

# loop through variables and model, plot, and compile results of each
successes <- 0
for (var in vars){
  resp.var <- fuel.plots.pooled[var]
  plot.id <- fuel.plots.pooled["Plot"]
  if (length(unique(resp.var[,1])) < 5){
    next
  } else {
    successes <- successes + 1
  }
  mod.df <- cbind(plot.id, resp.var, mets.pooled)
  df.pred.obs <- tuned.model(mod.df)
  df.pred.pooled <- data.frame(Plot = df.pred.obs$Plot,
                            x = df.pred.obs$prd)
  colnames(df.pred.pooled)[2] <- paste0(var, "_prd")
  if (successes == 1){
    df.pred.pooled.merge <- df.pred.pooled
  } else {
    df.pred.pooled.merge <- merge(df.pred.pooled.merge, df.pred.pooled, all = T)
  }
}

# separate back out the pre- and post-fire plots
df.pred.pooled.pre <- df.pred.pooled.merge[grep("_pre", df.pred.pooled.merge$Plot),]
df.pred.pooled.pst <- df.pred.pooled.merge[grep("_pst", df.pred.pooled.merge$Plot),]

# add "_pre" and "_pst" to prediction column names
colnames(df.pred.pooled.pre) <- sub("_prd", "_pre_prd", colnames(df.pred.pooled.pre))
colnames(df.pred.pooled.pst) <- sub("_prd", "_pst_prd", colnames(df.pred.pooled.pst))

# remove "_pre" and "_pst" from plot names
df.pred.pooled.pre$Plot <- sub("_pre", "", df.pred.pooled.pre$Plot)
df.pred.pooled.pst$Plot <- sub("_pst", "", df.pred.pooled.pst$Plot)

# merge pre and post fire predictions
df.pred.pooled.merge <- merge(df.pred.pooled.pre, df.pred.pooled.pst, all = T)

# get fuel metric names
vars <- colnames(df.pred.pooled.merge) |>
  sub("_pre_prd", "", x = _) |>
  sub("_pst_prd", "", x = _) |>
  unique()
vars <- vars[vars != "Plot"]

# loop through them and calculate differences
for (var in vars){
  var.pre <- paste0(var, "_pre_prd")
  var.pst <- paste0(var, "_pst_prd")
  var.dif <- paste0(var, "_dif_prd")
  if (!all(c(var.pre, var.pst) %in% colnames(df.pred.pooled.merge))) next
  df.pred.pooled.merge[,var.dif] <- df.pred.pooled.merge[,var.pre] - df.pred.pooled.merge[,var.pst]
}

# join back to fuel plot data
df.pooled <- fuel.plots |>
  as.data.frame() |>
  merge(df.pred.pooled.merge, all = T)

# loop through the variables and compare consumption predictions to observations
par(mfrow = c(5,4))
for (var in vars){
  var.dif.prd <- paste0(var, "_dif_prd")
  var.dif.obs <- paste0(var, "_dif")
  if (!all(c(var.dif.prd, var.dif.obs) %in% colnames(df.pooled))) next
  df.pred.obs <- data.frame(obs = df.pooled[,var.dif.obs],
                            prd = df.pooled[,var.dif.prd])
  perf.mets <- prd.obs.plot(df.pred.obs, var)
  if (var == vars[1]){
    perf.mets.pooled <- perf.mets
  } else {
    perf.mets.pooled <- rbind(perf.mets.pooled, perf.mets)
  }
}
perf.mets.pooled$method <- "pooled"

Approach #2: Profile Area Change (PAC)

Briefly, the idea here is that the area under the curve (AUC) representing the quantile distribution of heights approximately represents the volume of fuels in a given plot/pixel. Thus, the difference between AUC before and after a fire shoud capture fuel volumetric (or loading) change, as in the visual example shown below.

Show Code
# define percentile metrics function
perc.fun <- function(z){
  q <- quantile(z, seq(0,1,0.01)) |>
    as.list()
  names(q) <- paste0("p_", stringr::str_pad(seq(0,100,1), 3, "left", "0"))
  return(q)
}

# reproject plots to match crs of pre and pst ctgs
fuel.plots.pre <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pre))
# fuel.plots.pst <- fuel.plots |>
#   st_as_sf() |>
#   st_transform(st_crs(ctg.pst))
fuel.plots.pst.ann <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pst.ann)) |>
  subset(substr(Plot, 1, 1) %in% c("5", "7"))
fuel.plots.pst.mon <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pst.mon)) |>
  subset(!substr(Plot, 1, 1) %in% c("5", "7"))

# run plot metrics
percs.pre <- plot_metrics(
  las = ctg.pre,
  func = ~perc.fun(Z),
  geometry = fuel.plots.pre,
  radius = 8
) |>
  st_drop_geometry()

# percs.pst <- plot_metrics(
#   las = ctg.pst,
#   func = ~perc.fun(Z),
#   geometry = fuel.plots.pst,
#   radius = 8
# ) |>
#   st_drop_geometry()
percs.pst.ann <- plot_metrics(
  las = ctg.pst.ann,
  func = ~perc.fun(Z),
  geometry = fuel.plots.pst.ann,
  radius = 8
) |>
  st_drop_geometry()
percs.pst.mon <- plot_metrics(
  las = ctg.pst.mon,
  func = ~perc.fun(Z),
  geometry = fuel.plots.pst.mon,
  radius = 8
) |>
  st_drop_geometry()

# merge the ann and mon datasets
percs.pst <- rbind(percs.pst.ann, percs.pst.mon)

# sort the data
percs.pre <- percs.pre[order(percs.pre$Plot),]
percs.pst <- percs.pst[order(percs.pst$Plot),]

# separate out predictor and response data frames
percs.pre <- percs.pre[,!colnames(percs.pre) %in% colnames(fuel.plots.pre)]
percs.pst <- percs.pst[,!colnames(percs.pst) %in% colnames(fuel.plots.pre)]

# set negative values to zero
percs.pre[percs.pre < 0] <- 0
percs.pst[percs.pst < 0] <- 0

# plot an example out
ex.row <- 1
percs.pre.ex <- percs.pre[ex.row,]
percs.pst.ex <- percs.pst[ex.row,]
par(mar = c(5,5,1,1), las = 1)
plot(x = c(0,100), y = range(c(percs.pre.ex, percs.pst.ex)),
     type = "n", xlab = "Percentile", ylab = "Height (m)")
grid()
polygon(x = c(0, seq(0,100), 100),
        y = c(0, percs.pre.ex, 0),
        col = adjust_transparency(2, 0.5),
        border = NA)
lines(x = seq(0,100), y = percs.pre.ex, col = 2, lwd = 2)
polygon(x = c(0, seq(0,100), 100),
        y = c(0, percs.pst.ex, 0),
        col = adjust_transparency(4, 0.5),
        border = NA)
lines(x = seq(0,100), y = percs.pst.ex, col = 4, lwd = 2)
legend("topleft", legend = c("Pre-fire", "Post-fire"), col = c(2, 4), 
       fill = c(adjust_transparency(2, 0.5), adjust_transparency(4, 0.5)))

2a. Raw height PAC for profile slices

We found out in the previous document that a singular measure of PAC was insufficient for capturing nuance of consumption in different fuel components. To address this, I took “slices” out of the height quantile profile at a series of different quantile ranges and fed those into an RF to model consumption directly. The first way was using the raw height quantiles (i.e., in meters). In addition to the PAC values as predictors, I also threw in the pre-fire height quantiles as candidates. As mentioned in the Introduction, this may help the model distinguish forests of different pre-fire structure, which could help provide context for the PAC values.

Show Code
# create auc function
auc.slice.fun <- function(y, slice.min, slice.max){
  x <- seq(0,100)
  auc <- integrate(splinefun(x, y), slice.min, slice.max)$value
  return(auc)
}

# begin slice.min loop
counter <- 0
for (i in seq(0,100,10)){
  for (j in seq(0,100,10)){
    if (j > i){
      counter <- counter + 1
      auc.raw.slice.pre <- apply(percs.pre, 1, function(x) auc.slice.fun(x, i, j))
      auc.raw.slice.pst <- apply(percs.pst, 1, function(x) auc.slice.fun(x, i, j))
      pac.raw.slice <- auc.raw.slice.pre - auc.raw.slice.pst
      if (counter == 1){
        pac.raw.slice.df <- data.frame(x = pac.raw.slice)
        colnames(pac.raw.slice.df) <- paste0("pac_", i, "_", j)
      } else {
        pac.raw.slice.df <- cbind(pac.raw.slice.df, data.frame(x = pac.raw.slice))
        colnames(pac.raw.slice.df)[ncol(pac.raw.slice.df)] <- paste0("pac_", i, "_", j)
      }
    }
  }
}

# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars <- paste0(vars, "_dif")

# get plot ids
plot.ids <- fuel.plots[,"Plot"] |> as.data.frame()

# loop through them
par(mfrow = c(5,4), las = 1, mar = c(5,5,2,1))
for (var in vars){
  y <- fuel.plots[,var] |> as.data.frame()
  x <- cbind(percs.pre, pac.raw.slice.df)
  mod.df <- cbind(plot.ids, y, x)
  df.pred.obs <- tuned.model(mod.df)
  perf.mets <- prd.obs.plot(df.pred.obs, var)
  if (var == vars[1]){
    perf.mets.pacraw <- perf.mets
  } else {
    perf.mets.pacraw <- rbind(perf.mets.pacraw, perf.mets)
  }
}
perf.mets.pacraw$method <- "pacraw"

2b. Normalized height PAC for profile slices

The second way (more akin to how Hu et al. (2019) did it) was using normalized height quantiles (i.e., percent relative to the maximum per-plot height between pre- and post-fire data):

Show Code
# get plot-level maximum heights
max.hgt.pre <- apply(percs.pre, 1, max)
max.hgt.pst <- apply(percs.pst, 1, max)
max.hgt <- apply(cbind(max.hgt.pre, max.hgt.pst), 1, max)

# normalize heights to max
percs.pre.norm <- percs.pre
for (i in seq(1,nrow(percs.pre.norm))){
  percs.pre.norm[i,] <- percs.pre.norm[i,] / max.hgt[i]
}
percs.pst.norm <- percs.pst
for (i in seq(1,nrow(percs.pst.norm))){
  percs.pst.norm[i,] <- percs.pst.norm[i,] / max.hgt[i]
}

# begin slice.min loop
counter <- 0
for (i in seq(0,100,10)){
  for (j in seq(0,100,10)){
    if (j > i){
      counter <- counter + 1
      auc.norm.slice.pre <- apply(percs.pre.norm, 1, function(x) auc.slice.fun(x, i, j))
      auc.norm.slice.pst <- apply(percs.pst.norm, 1, function(x) auc.slice.fun(x, i, j))
      pac.norm.slice <- auc.norm.slice.pre - auc.norm.slice.pst
      if (counter == 1){
        pac.norm.slice.df <- data.frame(x = pac.norm.slice)
        colnames(pac.norm.slice.df) <- paste0("pac_", i, "_", j)
      } else {
        pac.norm.slice.df <- cbind(pac.norm.slice.df, data.frame(x = pac.norm.slice))
        colnames(pac.norm.slice.df)[ncol(pac.norm.slice.df)] <- paste0("pac_", i, "_", j)
      }
    }
  }
}

# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars <- paste0(vars, "_dif")

# get plot ids
plot.ids <- fuel.plots[,"Plot"] |> as.data.frame()

# loop through them
par(mfrow = c(5,4), las = 1, mar = c(5,5,2,1))
for (var in vars){
  y <- fuel.plots[,var] |> as.data.frame()
  x <- cbind(percs.pre.norm, pac.norm.slice.df)
  mod.df <- cbind(plot.ids, y, x)
  df.pred.obs <- tuned.model(mod.df)
  perf.mets <- prd.obs.plot(df.pred.obs, var)
  if (var == vars[1]){
    perf.mets.pacnorm <- perf.mets
  } else {
    perf.mets.pacnorm <- rbind(perf.mets.pacnorm, perf.mets)
  }
}
perf.mets.pacnorm$method <- "pacnorm"

Approach #3: Profile Density Change (PDC)

Here’s where things start to diverge from the previous document. As mentioned in the Introduction, Andy suggested exploring densities at different height ranges, rather than height quantiles, and differences therein, as potential predictors of fuel consumption. To do this, I used a Gaussian kernel density estimator with a 5cm bandwidth, calculated at every 5cm vertical interval from 0 to 30m in height. I calculated the difference between pre-fire and post-fire densities, and use those as predictors of fuel consumption. As with PAC, I also added in the pre-fire densities to give the model a sense for pre-fire structure, hopefully allowing it to make more forest-structure-specific predictions. First, I’ll plot out two density profiles for an example plot, one representing pre-fire densities, and one representing post-fire densities. The difference between the two at different heights should capture fuelbed-specific consumption (at least that’s the theory!).

Show Code
# define percentile metrics function
dens.fun <- function(z, bw = 0.05){
  z[z < 0] <- 0
  d <- density(z, bw, from = 0, to = 30, n = length(seq(0,30,bw)))
  dy <- as.list(d$y)
  names(dy) <- paste0("d.", d$x)
  return(dy)
}

# reproject plots to match crs of pre and pst ctgs
fuel.plots.pre <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pre))
# fuel.plots.pst <- fuel.plots |>
#   st_as_sf() |>
#   st_transform(st_crs(ctg.pst))
fuel.plots.pst.ann <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pst.ann)) |>
  subset(substr(Plot, 1, 1) %in% c("5", "7"))
fuel.plots.pst.mon <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pst.ann)) |>
  subset(!substr(Plot, 1, 1) %in% c("5", "7"))

# run plot metrics
dens.pre <- plot_metrics(
  las = ctg.pre,
  func = ~dens.fun(Z),
  geometry = fuel.plots.pre,
  radius = 8
) |>
  st_drop_geometry()

# dens.pst <- plot_metrics(
#   las = ctg.pst,
#   func = ~dens.fun(Z),
#   geometry = fuel.plots.pst,
#   radius = 8
# ) |>
#   st_drop_geometry()
dens.pst.ann <- plot_metrics(
  las = ctg.pst.ann,
  func = ~dens.fun(Z),
  geometry = fuel.plots.pst.ann,
  radius = 8
) |>
  st_drop_geometry()
dens.pst.mon <- plot_metrics(
  las = ctg.pst.mon,
  func = ~dens.fun(Z),
  geometry = fuel.plots.pst.mon,
  radius = 8
) |>
  st_drop_geometry()

# merge the ann and mon datasets
dens.pst <- rbind(dens.pst.ann, dens.pst.mon)

# sort the data
dens.pre <- dens.pre[order(dens.pre$Plot),]
dens.pst <- dens.pst[order(dens.pst$Plot),]

# separate out predictor and response data frames
dens.pre <- dens.pre[,!colnames(dens.pre) %in% colnames(fuel.plots.pre)]
dens.pst <- dens.pst[,!colnames(dens.pst) %in% colnames(fuel.plots.pre)]

# calculate differences
dens.dif <- dens.pre - dens.pst

# change column names
colnames(dens.pre) <- paste0(colnames(dens.pre), ".pre")
colnames(dens.pst) <- paste0(colnames(dens.pst), ".pst")
colnames(dens.dif) <- paste0(colnames(dens.dif), ".dif")

# plot an example out
ex.row <- 1
dens.pre.ex <- dens.pre[ex.row,]
dens.pst.ex <- dens.pst[ex.row,]
par(mar = c(5,5,1,1), las = 1)
plot(x = c(0,30), y = range(c(dens.pre.ex, dens.pst.ex)),
     type = "n", xlab = "Height (m)", ylab = "Density")
grid()
polygon(x = c(0, seq(0,30,0.05), 30),
        y = c(0, dens.pre.ex, 0),
        col = adjust_transparency(2, 0.5),
        border = NA)
lines(x = seq(0,30,0.05), y = dens.pre.ex, col = 2, lwd = 2)
polygon(x = c(0, seq(0,30,0.05), 30),
        y = c(0, dens.pst.ex, 0),
        col = adjust_transparency(4, 0.5),
        border = NA)
lines(x = seq(0,30,0.05), y = dens.pst.ex, col = 4, lwd = 2)
legend("topright", legend = c("Pre-fire", "Post-fire"), col = c(2, 4), 
       fill = c(adjust_transparency(2, 0.5), adjust_transparency(4, 0.5)))

Show Code
# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars <- paste0(vars, "_dif")

# get plot ids
plot.ids <- fuel.plots[,"Plot"] |> as.data.frame()

# loop through them
par(mfrow = c(5,4), las = 1, mar = c(5,5,2,1))
for (var in vars){
  x <- cbind(dens.pre, dens.dif)
  y <- fuel.plots[,var] |> as.data.frame()
  mod.df <- cbind(plot.ids, y,x)
  df.pred.obs <- tuned.model(mod.df)
  perf.mets <- prd.obs.plot(df.pred.obs, var)
  if (var == vars[1]){
    perf.mets.pdc <- perf.mets
  } else {
    perf.mets.pdc <- rbind(perf.mets.pdc, perf.mets)
  }
}
perf.mets.pdc$method <- "pdc"

Approach #4: PAC+PDC

PAC seems to do pretty well at predicting consumption in certain fuel beds, and PDC in others. Perhaps through their combined use, we can predict fuel consumption well across a greater diversity of fuel beds.

Show Code
# loop through them
par(mfrow = c(5,4), las = 1, mar = c(5,5,2,1))
for (var in vars){
  x <- cbind(dens.pre, dens.dif, percs.pre.norm, pac.norm.slice.df)
  y <- fuel.plots[,var] |> as.data.frame()
  mod.df <- cbind(plot.ids, y,x)
  df.pred.obs <- tuned.model(mod.df)
  perf.mets <- prd.obs.plot(df.pred.obs, var)
  if (var == vars[1]){
    perf.mets.pacpdc <- perf.mets
  } else {
    perf.mets.pacpdc <- rbind(perf.mets.pacpdc, perf.mets)
  }
}
perf.mets.pacpdc$method <- "pacpdc"

Results Comparison

Lastly, let’s take a look at all of the results together to compare methods.

Show Code
# define the variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)

# remove "_dif" from variable names in performance metrics tables
perf.mets.pacraw$var <- sub("_dif", "", perf.mets.pacraw$var)
perf.mets.pacnorm$var <- sub("_dif", "", perf.mets.pacnorm$var)
perf.mets.pdc$var <- sub("_dif", "", perf.mets.pdc$var)
perf.mets.pacpdc$var <- sub("_dif", "", perf.mets.pacpdc$var)

# create "long" table
perf.mets.long <- rbind(
  perf.mets.prepost,
  perf.mets.pooled,
  perf.mets.pacraw,
  perf.mets.pacnorm,
  perf.mets.pdc,
  perf.mets.pacpdc
)

# add method to r2 and rrmse colnames
perf.mets.prepost$method <- NULL
colnames(perf.mets.prepost)[2:3] <- paste0(colnames(perf.mets.prepost)[2:3], ".prepost")
perf.mets.pooled$method <- NULL
colnames(perf.mets.pooled)[2:3] <- paste0(colnames(perf.mets.pooled)[2:3], ".pooled")
perf.mets.pacraw$method <- NULL
colnames(perf.mets.pacraw)[2:3] <- paste0(colnames(perf.mets.pacraw)[2:3], ".pacraw")
perf.mets.pacnorm$method <- NULL
colnames(perf.mets.pacnorm)[2:3] <- paste0(colnames(perf.mets.pacnorm)[2:3], ".pacnorm")
perf.mets.pdc$method <- NULL
colnames(perf.mets.pdc)[2:3] <- paste0(colnames(perf.mets.pdc)[2:3], ".pdc")
perf.mets.pacpdc$method <- NULL
colnames(perf.mets.pacpdc)[2:3] <- paste0(colnames(perf.mets.pacpdc)[2:3], ".pacpdc")

# merge them to create "wide" table
perf.mets.wide <- merge(perf.mets.prepost, perf.mets.pooled) |>
  merge(perf.mets.pacraw) |>
  merge(perf.mets.pacnorm) |>
  merge(perf.mets.pdc) |>
  merge(perf.mets.pacpdc)

# define colors
cols <- brewer.pal(6, "Dark2")

# set up the plot
par(mfrow = c(1,2), las = 1)

# create empty plot
x <- rep(100,length(vars))
names(x) <- vars
r2.cols <- perf.mets.wide[,grep("r2", colnames(perf.mets.wide))]
dotchart(x, pch = 16, xlab = expression(R^2),
         xlim = range(r2.cols))

# loop though variables
for (i in seq(1,length(vars))){
  var <- vars[i]
  points(x = perf.mets.wide$r2.prepost[perf.mets.wide$var == var],
         y = i, pch = 16, col = cols[1])
  points(x = perf.mets.wide$r2.pooled[perf.mets.wide$var == var],
         y = i, pch = 16, col = cols[2])
  points(x = perf.mets.wide$r2.pacraw[perf.mets.wide$var == var],
         y = i, pch = 16, col = cols[3])
  points(x = perf.mets.wide$r2.pacnorm[perf.mets.wide$var == var],
         y = i, pch = 16, col = cols[4])
  points(x = perf.mets.wide$r2.pdc[perf.mets.wide$var == var],
         y = i, pch = 16, col = cols[5])
  points(x = perf.mets.wide$r2.pacpdc[perf.mets.wide$var == var],
         y = i, pch = 16, col = cols[6])
}

# create empty plot
y <- rep(10000, length(vars))
names(y) <- vars
rrmse.cols <- perf.mets.wide[perf.mets.wide$var != "nonwody",grep("rrmse", colnames(perf.mets.wide))]
dotchart(y, pch = 16, xlab = expression(symbol("%")*RMSE),
         xlim = range(rrmse.cols))

# loop though variables
for (i in seq(1,length(vars))){
  var <- vars[i]
  points(x = perf.mets.wide$rrmse.prepost[perf.mets.wide$var == var],
         y = i, pch = 16, col = cols[1])
  points(x = perf.mets.wide$rrmse.pooled[perf.mets.wide$var == var],
         y = i, pch = 16, col = cols[2])
  points(x = perf.mets.wide$rrmse.pacraw[perf.mets.wide$var == var],
         y = i, pch = 16, col = cols[3])
  points(x = perf.mets.wide$rrmse.pacnorm[perf.mets.wide$var == var],
         y = i, pch = 16, col = cols[4])
  points(x = perf.mets.wide$rrmse.pdc[perf.mets.wide$var == var],
         y = i, pch = 16, col = cols[5])
  points(x = perf.mets.wide$rrmse.pacpdc[perf.mets.wide$var == var],
         y = i, pch = 16, col = cols[6])
}

# add legend
legend("topright", legend = c("PrePost", "Pooled", "Raw PAC", "Norm PAC", "PDC", "PAC + PDC"),
       pch = 16, col = cols)

Show Code
# figure out which approach performed best for each fuelbed
fn <- function(x){
  m <- which.max(x)
  cn <- colnames(r2.cols)[m]
  var <- sub("r2\\.", "", cn)
  return(var)
}
r2.best <- apply(r2.cols, 1, fn)

fn <- function(x){
  m <- which.max(x)
  cn <- colnames(rrmse.cols)[m]
  var <- sub("rrmse\\.", "", cn)
  return(var)
}
rrmse.best <- apply(r2.cols, 1, fn)
df.best <- data.frame(var = perf.mets.wide$var,
                      r2.best = r2.best,
                      rrmse.best = rrmse.best)
paged_table(df.best, options = list(rows.print = nrow(df.best)))
Show Code
# plot out best performer counts
par(mfrow = c(1,2), mar = c(5,5,1,1), las = 1)
r2.best.n <- table(factor(df.best$r2.best, levels =  c(
  "prepost", "pooled", "pacraw", "pacnorm", "pdc", "pacpdc"
)))
barplot(r2.best.n, col = cols, ylab = expression("Best Performer by "*R^2))
rrmse.best.n <- table(factor(df.best$rrmse.best, levels =  c(
  "prepost", "pooled", "pacraw", "pacnorm", "pdc", "pacpdc"
)))
barplot(rrmse.best.n, col = cols, ylab = expression("Best Performer by "*symbol("%")*RMSE))

Show Code
# plot out their performance as boxplots
par(mfrow = c(1,2), mar = c(5,3,1,1), las = 1)
perf.mets.long$method <- factor(perf.mets.long$method, levels = c(
  "prepost", "pooled", "pacraw", "pacnorm", "pdc", "pacpdc"
))
bp <- boxplot(r2 ~ method, data = perf.mets.long, col = cols,
        xaxt = "n", ylim = range(r2.cols), ylab = expression(R^2),
        xlab = NA)
tick <- seq_along(bp$names)
axis(1, at = tick, labels = F)
text(x = tick, y = par("usr")[3] - 0.05 * diff(par("usr")[3:4]),
     labels = c("PrePost", "Pooled", "PAC Raw", "PAC Norm", "PDC", "PAC + PDC"),
     srt = 45, xpd = T)

bp <- boxplot(rrmse ~ method, data = perf.mets.long, col = cols,
        xaxt = "n", ylim = range(rrmse.cols), ylab = expression(symbol("%")*RMSE), xlab = NA)
tick <- seq_along(bp$names)
axis(1, at = tick, labels = F)
text(x = tick, y = par("usr")[3] - 0.05 * diff(par("usr")[3:4]),
     labels = c("PrePost", "Pooled", "PAC Raw", "PAC Norm", "PDC", "PAC + PDC"),
     srt = 45, xpd = T)

PDC and PAC + PDC tend to yield the best performance, especially with respect to R2, and to a somewhat lesser extent, %RMSE. One of the two was best in terms of both R2 and %RMSE for nearly all of the fuel consumption models.

The big question is is PDC good/novel enough to put out into the world as a methods-focused publication?