Monroe Mountain Fuel Consumption Modeling

Author

Mickey Campbell

Introduction

The overarching goal of this project is to develop and test a new approach for mapping fuel consumption using only post-fire lidar data. The motivation is that lidar data can be hard to come by, and multitemporal lidar are even more rare. So, although multitemporal lidar data are well-suited to the task of mapping fuel consumption, their rarity makes it hard to quantify fuel consumption on broad scales. However, perhaps if lidar data were collected after a fire (even within a few years of a fire), then we could model what pre-fire lidar/vegetation structure might have looked like (using spectral, topographic, climatic, soils, etc. data along with undisturbed post-fire lidar data), effectively creating pseudo-pre-fire lidar data, that could be used to map fuel consumption.

Monroe Mountain represents an ideal area to test this workflow out, because it has pre-fire and post-fire lidar data and fuel measurement data. However, before jumping right into the deep end by modeling pre-fire lidar structure, I thought it would be wise to see how well we can map fuel consumption with all real lidar data. If we can’t do it accurately with real data, then the odds of success with modeled data are low.

Accordingly, the primary objective of this document is to demonstrate methods and results for mapping fuel consumption using real lidar data. I will use FERA fuel plot data, acquired from Ryan McCarley to build several different model types using 2016 (pre-fire) and 2023 (post-fire) airborne lidar data.

Field Data

Overview

First things first, let’s check out the field data. These data were provided to us by Ryan McCarley (via Andy Hudak) on 3/5/2024. They contain point locations of FERA plots on Monroe Mountain collected over the course of several years with pre- and post-fire fuel loading measurements. Here’s a quick glimpse:

Show Code
library(terra)
library(leaflet)
library(rmarkdown)
library(sf)
library(lidR)
library(lidRmetrics)
library(VSURF)
library(tuneRanger)
library(ranger)
library(mlr)
library(dplyr)
library(colorspace)

# read in the data
fuel.plots <- vect("S:/ursa/campbell/fasmee/data/from_ryan/plots_wFuels.shp")

# plot them out
fuel.plots.ll <- project(fuel.plots, "epsg:4326")
m <- leaflet(data = fuel.plots.ll) |>
  addProviderTiles("Esri.WorldImagery") |>
  addMarkers()
m
Show Code
# take a look at the table
fuel.plots |>
  as.data.frame() |>
  paged_table()

There are 262 plots in this database. But, that number is a little deceptive, for two main reasons:

  1. Each plot was measured twice, and each measurement is recorded as its own point, effectively doubling the count.
  2. There are several points at the end of the table that have no data associated with them.

So, let’s do some data cleaning to reformat the data in a way that will make it easier to use for modeling.

Field Data Cleaning

First, I’ll remove the plots without any data. Incidentally, these are plots 601 - 620.

Show Code
# remove empty plots
fuel.plots <- fuel.plots[!is.na(fuel.plots$FID),]

Next, I’ll separate out pre-fire and post-fire plots.

Show Code
# separate pre-fire and post-fire plots
fuel.plots.pre <- fuel.plots[grep("pre", fuel.plots$FID),]
fuel.plots.pst <- fuel.plots[grep("post", fuel.plots$FID),]

# remove unnecessary columns
del.cols <- c("FID", "X", "Y")
keep.cols <- names(fuel.plots)[!names(fuel.plots) %in% del.cols]
fuel.plots.pre <- fuel.plots.pre[,keep.cols]
fuel.plots.pst <- fuel.plots.pst[,keep.cols]

# move plot ID column from last to first
n <- ncol(fuel.plots.pre)
fuel.plots.pre <- fuel.plots.pre[,c(n,1:(n-1))]
fuel.plots.pst <- fuel.plots.pst[,c(n,1:(n-1))]

# add "pre" and "pst" to column names for all attributes
dif.names <- names(fuel.plots.pre)[2:ncol(fuel.plots.pre)]
names(fuel.plots.pre)[2:ncol(fuel.plots.pre)] <- 
  paste0(names(fuel.plots.pre)[2:ncol(fuel.plots.pre)], "_pre")
names(fuel.plots.pst)[2:ncol(fuel.plots.pst)] <- 
  paste0(names(fuel.plots.pst)[2:ncol(fuel.plots.pst)], "_pst")

# merge them
fuel.plots <- merge(fuel.plots.pre, as.data.frame(fuel.plots.pst))

# get differences between pre- and post-fire fuel loads (i.e., calc consumption)
for (dif.name in dif.names){
  pre.fld <- paste0(dif.name, "_pre")
  pst.fld <- paste0(dif.name, "_pst")
  dif.fld <- paste0(dif.name, "_dif")
  fuel.plots[,dif.fld] <- as.data.frame(fuel.plots[,pre.fld]) - as.data.frame(fuel.plots[,pst.fld])
}

# plot them out
fuel.plots.ll <- project(fuel.plots, "epsg:4326")
m <- leaflet(data = fuel.plots.ll) |>
  addProviderTiles("Esri.WorldImagery") |>
  addMarkers()
m
Show Code
# take a look at the table
fuel.plots |>
  as.data.frame() |>
  paged_table()

After cleaning, we have 121 plots.

Definition of Field Names

It is worth now providing a key that represents what each abbreviation in the field names means:

  • yr: year of measurment
  • hr1: 1 hour biomass
  • hr10: 10 hours biomass
  • hr100: 100 hour biomass
  • hr1000: 1000 hour biomass
  • nonwoody: herb biomass
  • shrubs: shrub biomass
  • trees: seedling biomass
  • llm: litter, lichen, and moss biomass
  • ground: duff biomass
  • shrb_sd: combined biomass of shrubs and seedlings
  • woody: combined biomass of 1hr, 10hr, 100hr, and 1000hr
  • uf: combined biomass of nonwoody, llm, ground, shrubs, trees, and woody
  • cf: tree biomass, including snags as well as live boles, branches, and foliage
  • cbd: canopy bulk density
  • cbh: canopy base height
  • acf: available canopy fuel (foliage and fine branches)
  • tf: total fuel (all biomass combined)
  • af: available fuel (combined biomass of uf and acf)

And here are the self-explanatory suffixes I’ve added:

  • pre: pre-fire measurement
  • pst: post-fire measurement
  • dif: difference between pre- and post-fire measurements

As I understand it, all of the fields are measured in Mg/ha, with the exception of cbd and cbh, which are probably measured in kg/m3 and m, respectively.

Modeling Fuel Consumption

I will test two different approaches for mapping fuel consumption. The first will be referred to as the modeled fuel load subtraction approaches. In this approach, pre-fire and post-fire fuel loads for each component will be modeled individually, and the differences between the modeled predictions will serve as the fuel consumption predictions. This is comparable to, for example, McCarley et al. (2020). The second approach will be referred to as the profile area change approach. In this approach, described in Hu et al. (2019), a cumulative probability density functions are created from lidar height percentiles for both pre- and post-fire point cloud data. The area under the curve for each resulting profile represents an approximation of the fuel volume. Theoretically, if fire consumes fuel, then the post-fire AUC should be lower than the pre-fire AUC. The difference between these AUCs is the profile area change, or PAC.

We have two airborne lidar datasets, one from 2016 and one from 2023. The former approximates pre-fire conditions, and the latter post-fire conditions, though the timing differences between when the fuels were measured and when the lidar data were flown is well worth noting, and may be especially problematic for post-fire structural modeling, given the regeneration that likely has occurred in the several years since these fires burned.

Approach #1: Modeled Fuel Load Subtraction

Following McCarley et al. (2020), I will try two different approaches here, described in the paper as the PrePost and Pooled models:

  • PrePost: Model pre- and post-fire fuel loads separately. Use pre-fire lidar and pre-fire fuel loads to train and validate pre-fire model. Use post-fire lidar and post-fire fuel loads to train and validate post-fire model.
  • Pooled: Combine pre- and post-fire data to train and validate singular pre-/post-fire model.

There are several different techniques one could employ to link plot data to lidar data. These include:

  1. Generating a set of rasterized lidar metrics and extracting pixel values at each plot.
  • May not capture true within-plot conditions as well, since it requires interpolating gridded values to a plot’s geometry that doesn’t align with that grid
  • But, easier to translate to prediction, since the pixels are being used to both train and predict with the model
  1. Clipping raw point cloud data to the plots, and calculating lidar metrics from the clipped clouds.
  • Faster, since you’re processing on small, clipped point clouds
  • More accurately captures the true dimensions of the plot
  • May not translate as directly well to pixel-based prediction

Taking those factors into consideration, for the sake of this test, I opted to go with the second technique. Doing so requires the supply of plot dimensions. FERA plots had different variables collected within different plot subgeometries, but the largest plot extent, within which tree data were collected, was an 8m-radius plot. So I will extract pre- and post-fire point cloud data within an 8m radius cirle around each plot center point, and use those clouds to derive the lidar structural metrics. I will be using Tompalski et al.’s lidRmetrics library to generate a suite of plot-level predictor variables. Specifically, I use their metrics_set3() function, which encompasses the most comprehensive set of structural predictor variables.

1a. PrePost model

Here, I’ll read in the lidar data, calculate lidar metrics within plot-clipped point clouds, and remove lidar predictor variables that, for a variety of reasons, will not be useful for modeling purposes.

Show Code
# suppress progress bar
options(lidR.progress = F)

# directory structure
als.dir.pre <- "S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_lidar/a04_height"
als.dir.pst <- "S:/ursa2/campbell/ems4d/data/als/als_2023/LAZ/Eastburn_processed"

# read in lascatalogs
ctg.pre <- readLAScatalog(als.dir.pre, filter = "-drop_withheld -drop_class 7 18", progress = F)
ctg.pst <- readLAScatalog(als.dir.pst, filter = "-drop_withheld -drop_class 7 18", progress = F)

# reproject plots to match crs of pre and pst ctgs
fuel.plots.pre <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pre))
fuel.plots.pst <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pst))

# run plot metrics
mets.pre <- plot_metrics(
  las = ctg.pre,
  func = ~lidRmetrics::metrics_set3(
    x = X,
    y = Y,
    z = Z,
    i = Intensity,
    ReturnNumber = ReturnNumber,
    NumberOfReturns = NumberOfReturns
  ),
  geometry = fuel.plots.pre,
  radius = 8
) |>
  as.data.frame()
mets.pst <- plot_metrics(
  las = ctg.pst,
  func = ~lidRmetrics::metrics_set3(
    x = X,
    y = Y,
    z = Z,
    i = Intensity,
    ReturnNumber = ReturnNumber,
    NumberOfReturns = NumberOfReturns
  ),
  geometry = fuel.plots.pst,
  radius = 8
) |>
  as.data.frame()

# separate out predictor and response data frames
resp.pre <- fuel.plots.pre |> st_drop_geometry()
resp.pst <- fuel.plots.pst |> st_drop_geometry()
mets.pre <- mets.pre[,!colnames(mets.pre) %in% colnames(fuel.plots.pre)]
mets.pst <- mets.pst[,!colnames(mets.pst) %in% colnames(fuel.plots.pst)]

# remove unscalable variables
del.cols <- c("n", "n_first", "n_intermediate", "n_last", "n_single",
              "n_multiple", "n_return_1", "n_return_2", "n_return_3",
              "n_return_4", "vn")

# remove predictors with no variability
sds.pre <- apply(mets.pre, 2, sd)
sds.pst <- apply(mets.pst, 2, sd)
del.cols.pre <- c(del.cols, names(sds.pre)[sds.pre == 0 & !is.na(sds.pre)])
del.cols.pst <- c(del.cols, names(sds.pst)[sds.pst == 0 & !is.na(sds.pst)])

# remove predictors with na values
nas.pre <- apply(mets.pre, 2, function(x) sum(is.na(x)))
nas.pst <- apply(mets.pst, 2, function(x) sum(is.na(x)))
del.cols.pre <- c(del.cols.pre, names(nas.pre)[nas.pre > 0])
del.cols.pst <- c(del.cols.pst, names(nas.pst)[nas.pst > 0])

# keep columns
keep.cols.pre <- colnames(mets.pre)[!colnames(mets.pre) %in% del.cols.pre]
keep.cols.pst <- colnames(mets.pst)[!colnames(mets.pst) %in% del.cols.pst]
mets.pre <- mets.pre[,keep.cols.pre]
mets.pst <- mets.pst[,keep.cols.pst]

Next, I’ll define a couple quick functions to avoid redundant code, one for modeling and one for plotting.

Show Code
# define modeling function
tuned.model <- function(mod.df){
  mod.df <- na.omit(mod.df)
  plot.ids <- mod.df[,1]
  mod.df$Plot <- NULL
  x.cols <- 2:ncol(mod.df)
  y.col <- 1
  v <- VSURF(
    x = mod.df[,x.cols],
    y = mod.df[,y.col],
    RFimplem = "ranger",
    parallel = T,
    ncores = 18,
    verbose = F
  )
  x.cols <- x.cols[v$varselect.pred]
  mod.df <- mod.df[,c(y.col, x.cols)]
  # task <- makeRegrTask(data = mod.df, target = colnames(mod.df)[1])
  # tune <- tuneRanger(task, show.info = F)
  form <- as.formula(paste0(colnames(mod.df)[1], " ~ ."))
  mod <- ranger(
    formula = form,
    data = mod.df,
    # mtry = tune$recommended.pars$mtry,
    # sample.fraction = tune$recommended.pars$sample.fraction,
    # min.node.size = tune$recommended.pars$min.node.size,
    importance = "permutation"
  )
  df.pred.obs <- data.frame(Plot = plot.ids,
                            obs = mod.df[,y.col],
                            prd = mod$predictions)
  return(df.pred.obs)
}

# define predicted vs. observed plotting function
prd.obs.plot <- function(df.pred.obs, name){
  df.pred.obs <- na.omit(df.pred.obs)
  x <- df.pred.obs$obs
  y <- df.pred.obs$prd
  rg <- range(c(x,y), na.rm = T)
  par(mar = c(5,5,2,1), las = 1)
  plot(x, y, type = "n", xlab = "Observed", ylab = "Predicted", main = name,
       ylim = rg, xlim = rg)
  grid()
  lines(x = c(-1e10, 1e10), y = c(-1e10, 1e10))
  points(x, y, pch = 16, col = rgb(0,0,0,0.25), cex = 1.25)
  mod <- lm(y ~ x)
  new.x <- range(x)
  new.y <- predict(mod, newdata = list(x = new.x))
  lines(new.x, new.y, lwd = 2, col = 2)
  r2 <- summary(mod)$adj.r.squared
  rrmse <- (100*sqrt(mean((x-y)^2))/mean(x))
  r2.lab <- bquote(R^2 == .(round(r2,2)))
  rrmse.lab <- bquote(symbol("%")*RMSE == .(round(rrmse)))
  legend(
    "topleft",
    legend = c(r2.lab, rrmse.lab),
    text.col = 2,
    bty = "n",
    x.intersp = 0
  )
  return(data.frame(var = name, r2 = r2, rrmse = rrmse))
}

Next, I’ll run the models! First, for pre-fire…

Show Code
# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars.pre <- paste0(vars, "_pre")

# loop through variables and model, plot, and compile results of each
par(mfrow = c(5,4))
successes <- 0
for (var.pre in vars.pre){
  resp.var <- fuel.plots.pre[,var.pre] |> st_drop_geometry()
  plot.id <- fuel.plots.pre[,"Plot"] |> st_drop_geometry()
  if (length(unique(resp.var[,1])) < 5){
    next
  } else {
    successes <- successes + 1
  }
  mod.df <- cbind(plot.id, resp.var, mets.pre)
  df.pred.obs <- tuned.model(mod.df)
  prd.obs.plot(df.pred.obs, var.pre)
  df.pred.pre <- data.frame(Plot = df.pred.obs$Plot,
                            x = df.pred.obs$prd)
  colnames(df.pred.pre)[2] <- paste0(var.pre, "_prd")
  if (successes == 1){
    df.pred.pre.merge <- df.pred.pre
  } else {
    df.pred.pre.merge <- merge(df.pred.pre.merge, df.pred.pre, all = T)
  }
}

And now, for post-fire…

Show Code
# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars.pst <- paste0(vars, "_pst")

# loop through variables and model, plot, and compile results of each
par(mfrow = c(4,4))
successes <- 0
for (var.pst in vars.pst){
  resp.var <- fuel.plots.pst[,var.pst] |> st_drop_geometry()
  plot.id <- fuel.plots.pst[,"Plot"] |> st_drop_geometry()
  if (length(unique(resp.var[,1])) < 5){
    next
  } else {
    successes <- successes + 1
  }
  mod.df <- cbind(plot.id, resp.var, mets.pst)
  df.pred.obs <- tuned.model(mod.df)
  prd.obs.plot(df.pred.obs, var.pst)
  df.pred.pst <- data.frame(Plot = df.pred.obs$Plot,
                            x = df.pred.obs$prd)
  colnames(df.pred.pst)[2] <- paste0(var.pst, "_prd")
  if (successes == 1){
    df.pred.pst.merge <- df.pred.pst
  } else {
    df.pred.pst.merge <- merge(df.pred.pst.merge, df.pred.pst, all = T)
  }
}

Note that we lost shrubs_pst because it had too few unique values to be modeled using RF.

Obviously a pretty mixed bag in terms of model performance by fuel metric. But, there are definitely some promising results peppered throughout. Mapping fuel structure, however, may be easier than mapping changes in fuel structure (i.e., fuel consumption). So, that will be the true test of model performance. To test this, I will take all of the predicted values and subtract post-fire predictions from pre-fire predictions, comparing the results to differences between pre- and post-fire fuel metrics from the field data.

Show Code
# merge pre and post fire predictions
df.pred.merge <- merge(df.pred.pre.merge, df.pred.pst.merge, all = T)

# get fuel metric names
vars <- colnames(df.pred.merge) |>
  sub("_pre_prd", "", x = _) |>
  sub("_pst_prd", "", x = _) |>
  unique()
vars <- vars[vars != "Plot"]

# loop through them and calculate differences
for (var in vars){
  var.pre <- paste0(var, "_pre_prd")
  var.pst <- paste0(var, "_pst_prd")
  var.dif <- paste0(var, "_dif_prd")
  if (!all(c(var.pre, var.pst) %in% colnames(df.pred.merge))) next
  df.pred.merge[,var.dif] <- df.pred.merge[,var.pre] - df.pred.merge[,var.pst]
}

# join back to fuel plot data
df.pre.post <- fuel.plots |>
  as.data.frame() |>
  merge(df.pred.merge, all = T)

# loop through the variables and compare consumption predictions to observations
par(mfrow = c(4,4))
for (var in vars){
  var.dif.prd <- paste0(var, "_dif_prd")
  var.dif.obs <- paste0(var, "_dif")
  if (!all(c(var.dif.prd, var.dif.obs) %in% colnames(df.pre.post))) next
  df.pred.obs <- data.frame(obs = df.pre.post[,var.dif.obs],
                            prd = df.pre.post[,var.dif.prd])
  prd.obs.plot(df.pred.obs, var)
}

Some pretty impressive results in there! There’s clearly an outlier plot that shows up in the ground fuels – I’m guessing this is a case of a new snag that fell in the middle of a plot, resulting in a massive increase in ground fuels. I think we can safely remove that plot, because it’s not an accurate reflection of fuel consumption. Here are the results with that plot omitted:

Show Code
# remove the plot that had an extreme increase in ground fuels
df.pre.post <- df.pre.post[df.pre.post$hr1000_dif > -300,]

# loop through the variables and compare consumption predictions to observations
par(mfrow = c(4,4))
for (var in vars){
  var.dif.prd <- paste0(var, "_dif_prd")
  var.dif.obs <- paste0(var, "_dif")
  if (!all(c(var.dif.prd, var.dif.obs) %in% colnames(df.pre.post))) next
  df.pred.obs <- data.frame(obs = df.pre.post[,var.dif.obs],
                            prd = df.pre.post[,var.dif.prd])
  perf.mets <- prd.obs.plot(df.pred.obs, var)
  if (var == vars[1]){
    perf.mets.prepost <- perf.mets
  } else {
    perf.mets.prepost <- rbind(perf.mets.prepost, perf.mets)
  }
}

Show Code
perf.mets.prepost$method <- "prepost"

Those results look better.

1b. Pooled model

Again, the idea here is to create a singular structural predictive model that incorporates both pre- and post-fire fuel plot and lidar metrics data. This acts to increase the sample size, and potentially capture more structural nuance within a singular model. It also has potential issues with autocorrelation, but we can address that later if needed through a more strategic evaluation procedure.

Show Code
# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars.pre <- paste0(vars, "_pre")
vars.pst <- paste0(vars, "_pst")

# subset the fuels data
fuel.plots.pooled.pre <- fuel.plots.pre[,c("Plot", vars.pre)] |> st_drop_geometry()
fuel.plots.pooled.pst <- fuel.plots.pst[,c("Plot", vars.pst)] |> st_drop_geometry()

# remove "_pre" and "_pst" from column names to enable row merging
names(fuel.plots.pooled.pre) <- sub("_pre", "", names(fuel.plots.pooled.pre))
names(fuel.plots.pooled.pst) <- sub("_pst", "", names(fuel.plots.pooled.pst))

# add "_pre" and "_pst" to plot ids
fuel.plots.pooled.pre$Plot <- paste0(fuel.plots.pooled.pre$Plot, "_pre")
fuel.plots.pooled.pst$Plot <- paste0(fuel.plots.pooled.pst$Plot, "_pst")

# merge rowwise
fuel.plots.pooled <- bind_rows(fuel.plots.pooled.pre, fuel.plots.pooled.pst)

# merge mets rowwise too
mets.pooled <- bind_rows(mets.pre, mets.pst)

# remove columns with nas
mets.pooled <- mets.pooled[,apply(mets.pooled, 2, function(x) sum(is.na(x)) == 0)]

# loop through variables and model, plot, and compile results of each
par(mfrow = c(5,4))
successes <- 0
for (var in vars){
  resp.var <- fuel.plots.pooled[var]
  plot.id <- fuel.plots.pooled["Plot"]
  if (length(unique(resp.var[,1])) < 5){
    next
  } else {
    successes <- successes + 1
  }
  mod.df <- cbind(plot.id, resp.var, mets.pooled)
  df.pred.obs <- tuned.model(mod.df)
  prd.obs.plot(df.pred.obs, var)
  df.pred.pooled <- data.frame(Plot = df.pred.obs$Plot,
                            x = df.pred.obs$prd)
  colnames(df.pred.pooled)[2] <- paste0(var, "_prd")
  if (successes == 1){
    df.pred.pooled.merge <- df.pred.pooled
  } else {
    df.pred.pooled.merge <- merge(df.pred.pooled.merge, df.pred.pooled, all = T)
  }
}

Pretty impressive! Let’s see how it performs in the context of fuel consumption.

Show Code
# separate back out the pre- and post-fire plots
df.pred.pooled.pre <- df.pred.pooled.merge[grep("_pre", df.pred.pooled.merge$Plot),]
df.pred.pooled.pst <- df.pred.pooled.merge[grep("_pst", df.pred.pooled.merge$Plot),]

# add "_pre" and "_pst" to prediction column names
colnames(df.pred.pooled.pre) <- sub("_prd", "_pre_prd", colnames(df.pred.pooled.pre))
colnames(df.pred.pooled.pst) <- sub("_prd", "_pst_prd", colnames(df.pred.pooled.pst))

# remove "_pre" and "_pst" from plot names
df.pred.pooled.pre$Plot <- sub("_pre", "", df.pred.pooled.pre$Plot)
df.pred.pooled.pst$Plot <- sub("_pst", "", df.pred.pooled.pst$Plot)

# merge pre and post fire predictions
df.pred.pooled.merge <- merge(df.pred.pooled.pre, df.pred.pooled.pst, all = T)

# get fuel metric names
vars <- colnames(df.pred.pooled.merge) |>
  sub("_pre_prd", "", x = _) |>
  sub("_pst_prd", "", x = _) |>
  unique()
vars <- vars[vars != "Plot"]

# loop through them and calculate differences
for (var in vars){
  var.pre <- paste0(var, "_pre_prd")
  var.pst <- paste0(var, "_pst_prd")
  var.dif <- paste0(var, "_dif_prd")
  if (!all(c(var.pre, var.pst) %in% colnames(df.pred.pooled.merge))) next
  df.pred.pooled.merge[,var.dif] <- df.pred.pooled.merge[,var.pre] - df.pred.pooled.merge[,var.pst]
}

# join back to fuel plot data
df.pooled <- fuel.plots |>
  as.data.frame() |>
  merge(df.pred.pooled.merge, all = T)

# loop through the variables and compare consumption predictions to observations
par(mfrow = c(5,4))
for (var in vars){
  var.dif.prd <- paste0(var, "_dif_prd")
  var.dif.obs <- paste0(var, "_dif")
  if (!all(c(var.dif.prd, var.dif.obs) %in% colnames(df.pooled))) next
  df.pred.obs <- data.frame(obs = df.pooled[,var.dif.obs],
                            prd = df.pooled[,var.dif.prd])
  perf.mets <- prd.obs.plot(df.pred.obs, var)
  if (var == vars[1]){
    perf.mets.pooled <- perf.mets
  } else {
    perf.mets.pooled <- rbind(perf.mets.pooled, perf.mets)
  }
}
perf.mets.pooled$method <- "pooled"

Interesting. I would have expected it to be better than the pre-post modeling approach, but it actually seemed to do somewhat worse. I guess this would suggest that the different lidar pulse densities (and perhaps other considerations like time of year, etc.) in the pre- and post-fire lidar data may have yielded enough deviation in the structural metrics to negatively impact pre-/post-fire comparisons when modeled together.

Approach #2: Profile Area Change (PAC)

One could argue that PAC is conceptually simpler than the Modeled Fuel Load Subtraction approach. It basically assumes that changes in the lidar-derived vegetation profile before and after a fire should correspond to changes in pre- and post-fire fuel loads. Very logically consistent. Indeed, its sole reliance on vegetation height quantiles makes it very appealing considering the broader objective of this study – to use post-fire-only lidar to predict pre-fire lidar structure for fuel consumption mapping purposes. Height quantiles are likely easier to predict than some of the more structurally complex lidar metrics used in the Modeled Fuel Load Subtraction portion of this analysis.

Below, I’ll calculate height percentiles on all of the plots and then plot out one example to show how PAC is calculated.

Show Code
# define percentile metrics function
perc.fun <- function(z){
  q <- quantile(z, seq(0,1,0.01)) |>
    as.list()
  names(q) <- paste0("p_", stringr::str_pad(seq(0,100,1), 3, "left", "0"))
  return(q)
}

# reproject plots to match crs of pre and pst ctgs
fuel.plots.pre <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pre))
fuel.plots.pst <- fuel.plots |>
  st_as_sf() |>
  st_transform(st_crs(ctg.pst))

# run plot metrics
percs.pre <- plot_metrics(
  las = ctg.pre,
  func = ~perc.fun(Z),
  geometry = fuel.plots.pre,
  radius = 8
) |>
  st_drop_geometry()

percs.pst <- plot_metrics(
  las = ctg.pst,
  func = ~perc.fun(Z),
  geometry = fuel.plots.pst,
  radius = 8
) |>
  st_drop_geometry()

# separate out predictor and response data frames
resp.pre <- fuel.plots.pre |> st_drop_geometry()
resp.pst <- fuel.plots.pst |> st_drop_geometry()
percs.pre <- percs.pre[,!colnames(percs.pre) %in% colnames(fuel.plots.pre)]
percs.pst <- percs.pst[,!colnames(percs.pst) %in% colnames(fuel.plots.pst)]

# set negative values to zero
percs.pre[percs.pre < 0] <- 0
percs.pst[percs.pst < 0] <- 0

# plot an example out
ex.row <- 1
percs.pre.ex <- percs.pre[ex.row,]
percs.pst.ex <- percs.pst[ex.row,]
par(mar = c(5,5,1,1), las = 1)
plot(x = c(0,100), y = range(c(percs.pre.ex, percs.pst.ex)),
     type = "n", xlab = "Percentile", ylab = "Height (m)")
grid()
polygon(x = c(0, seq(0,100), 100),
        y = c(0, percs.pre.ex, 0),
        col = adjust_transparency(2, 0.5),
        border = NA)
lines(x = seq(0,100), y = percs.pre.ex, col = 2, lwd = 2)
polygon(x = c(0, seq(0,100), 100),
        y = c(0, percs.pst.ex, 0),
        col = adjust_transparency(4, 0.5),
        border = NA)
lines(x = seq(0,100), y = percs.pst.ex, col = 4, lwd = 2)
legend("topleft", legend = c("Pre-fire", "Post-fire"), col = c(2, 4), 
       fill = c(adjust_transparency(2, 0.5), adjust_transparency(4, 0.5)))

In the figure above, PAC would be the difference in area under the curve between pre- and post-fire height distributions. Clearly there was much more fuel volume in this one example plot before the fire than after.

As with anything, there are several different ways PAC could be calculated. The simplest way would be just calculating the AUC of both of these curves, as you see them, and subtracting the two. In the Hu et al. (2019) paper, they normalize the heights first to the maximum of pre- and post-fire height for each pixel/plot, effectively coercing the y-axis to a range of 0-1. This effectively levels the playing field for different fuel types, which could be advantageous, but it could also inflate the PAC-based fuel consumption in shorter-stature fuel types. In either case (the raw height-based or the normalize height-based PAC), you could calculate PAC across the full height range (as they did in the paper), but you could also calculate PAC within vertical slices of the profile. This may be advantageous for separating out different portions of the profile for different fuel components.

So, in summary, I’ll try four different things:

  • 2a. Raw height PAC for full profile
  • 2b. Normalized height PAC for full profile
  • 2c. Raw height PAC for profile slices
  • 2d. Normalized height PAC for profile slices

2a. Raw height PAC for full profile

This approach will yield a singular PAC value for each plot. So, using a simple OLS regression, this approach will seek to understand the degree to which a singular measure of PAC can explain several different fuel metrics. The assumption is that it won’t do that well, but let’s see…

Show Code
# create auc function
auc.fun <- function(y){
  x <- seq(0,100)
  auc <- integrate(approxfun(x, y), 0, 100)$value
  return(auc)
}

# calculate raw auc on full profiles for pre- and post-fire data
auc.raw.pre <- apply(percs.pre, 1, auc.fun)
auc.raw.pst <- apply(percs.pst, 1, auc.fun)

# calculate pac
pac.raw <- auc.raw.pre - auc.raw.pst

# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars <- paste0(vars, "_dif")

# loop through them
par(mfrow = c(5,4), las = 1, mar = c(5,5,2,1))
for (var in vars){
  x <- pac.raw
  y <- fuel.plots[,var] |> as.data.frame()
  y <- y[,1]
  x <- x[!is.na(y)]
  y <- y[!is.na(y)]
  mod <- lm(y ~ x)
  prd <- predict(mod)
  df.pred.obs <- data.frame(prd = prd, obs = y)
  prd.obs.plot(df.pred.obs, var)
}

OK, so not great. Definitely some positive trends, suggesting it’s not totally useless, but it’s too simple of a measure to be able to capture the nuance of fuel consumption – especially of different fuel components.

2b. Normalized height PAC for full profile

This next approach mirrors that which Hu et al. (2019) did, apparently with great success! Instead of using raw vegetation heights, the pre-/post-fire height values are first normalized for each plot to the maximum height between the two measurements. This may be particularly useful in this case, given the much higher point density of the post-fire lidar (and the associated greater likelihood of capturing tree tops).

Show Code
# create auc function
auc.fun <- function(y){
  x <- seq(0,100)
  auc <- integrate(approxfun(x, y), 0, 100)$value
  return(auc)
}

# get plot-level maximum heights
max.hgt.pre <- apply(percs.pre, 1, max)
max.hgt.pst <- apply(percs.pst, 1, max)
max.hgt <- apply(cbind(max.hgt.pre, max.hgt.pst), 1, max)

# normalize heights to max
percs.pre.norm <- percs.pre
for (i in seq(1,nrow(percs.pre.norm))){
  percs.pre.norm[i,] <- percs.pre.norm[i,] / max.hgt[i]
}
percs.pst.norm <- percs.pst
for (i in seq(1,nrow(percs.pst.norm))){
  percs.pst.norm[i,] <- percs.pst.norm[i,] / max.hgt[i]
}

# calculate raw auc on full profiles for pre- and post-fire data
auc.norm.pre <- apply(percs.pre.norm, 1, auc.fun)
auc.norm.pst <- apply(percs.pst.norm, 1, auc.fun)

# calculate pac
pac.norm <- auc.norm.pre - auc.norm.pst

# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars <- paste0(vars, "_dif")

# loop through them
par(mfrow = c(5,4), las = 1, mar = c(5,5,2,1))
for (var in vars){
  x <- pac.norm
  y <- fuel.plots[,var] |> as.data.frame()
  y <- y[,1]
  x <- x[!is.na(y)]
  y <- y[!is.na(y)]
  mod <- lm(y ~ x)
  prd <- predict(mod)
  df.pred.obs <- data.frame(prd = prd, obs = y)
  prd.obs.plot(df.pred.obs, var)
}

…Essentially the same. A few better results, a few worse. Moral of the story: a singular measure of PAC is not good enough to quantify fuel consumption.

2c. Raw height PAC for profile slices

If a singular measure is insufficient, perhaps evaluating profile changes within vertical subsets will be more closely linked to field-measured fuel consumption. Let’s first try it out on the raw heights. I will create a series of slices based on percentiles in increment widths of 5 percentiles (e.g., 0-5th, 0-10th, 5-10th, etc.). Since I will wind up with a large list of potential predictors, I will fall back on the variable selection-aided random forest modeling approach used in the Modeled Fuel Load Subtraction analysis.

Show Code
# create auc function
auc.slice.fun <- function(y, slice.min, slice.max){
  x <- seq(0,100)
  auc <- integrate(splinefun(x, y), slice.min, slice.max)$value
  return(auc)
}

# begin slice.min loop
counter <- 0
for (i in seq(0,100,10)){
  for (j in seq(0,100,10)){
    if (j > i){
      counter <- counter + 1
      auc.raw.slice.pre <- apply(percs.pre, 1, function(x) auc.slice.fun(x, i, j))
      auc.raw.slice.pst <- apply(percs.pst, 1, function(x) auc.slice.fun(x, i, j))
      pac.raw.slice <- auc.raw.slice.pre - auc.raw.slice.pst
      if (counter == 1){
        pac.raw.slice.df <- data.frame(x = pac.raw.slice)
        colnames(pac.raw.slice.df) <- paste0("pac_", i, "_", j)
      } else {
        pac.raw.slice.df <- cbind(pac.raw.slice.df, data.frame(x = pac.raw.slice))
        colnames(pac.raw.slice.df)[ncol(pac.raw.slice.df)] <- paste0("pac_", i, "_", j)
      }
    }
  }
}

# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars <- paste0(vars, "_dif")

# get plot ids
plot.ids <- fuel.plots[,"Plot"] |> as.data.frame()

# loop through them
par(mfrow = c(5,4), las = 1, mar = c(5,5,2,1))
for (var in vars){
  y <- fuel.plots[,var] |> as.data.frame()
  x <- pac.raw.slice.df
  mod.df <- cbind(plot.ids, y, x)
  df.pred.obs <- tuned.model(mod.df)
  perf.mets <- prd.obs.plot(df.pred.obs, var)
  if (var == vars[1]){
    perf.mets.rawpacslice <- perf.mets
  } else {
    perf.mets.rawpacslice <- rbind(perf.mets.rawpacslice, perf.mets)
  }
}
perf.mets.rawpacslice$method <- "rawpacslice"

Much better than the singular PAC measure! And in some cases, nearly as good as the Modeled Fuel Load Subtraction approach. Interesting…

2d. Normalized height PAC for profile slices

The last thing to try is the same, multiple slice approach, but using normalized instead of raw heights.

Show Code
# create auc function
auc.slice.fun <- function(y, slice.min, slice.max){
  x <- seq(0,100)
  auc <- integrate(splinefun(x, y), slice.min, slice.max)$value
  return(auc)
}

# begin slice.min loop
counter <- 0
for (i in seq(0,100,10)){
  for (j in seq(0,100,10)){
    if (j > i){
      counter <- counter + 1
      auc.norm.slice.pre <- apply(percs.pre.norm, 1, function(x) auc.slice.fun(x, i, j))
      auc.norm.slice.pst <- apply(percs.pst.norm, 1, function(x) auc.slice.fun(x, i, j))
      pac.norm.slice <- auc.norm.slice.pre - auc.norm.slice.pst
      if (counter == 1){
        pac.norm.slice.df <- data.frame(x = pac.norm.slice)
        colnames(pac.norm.slice.df) <- paste0("pac_", i, "_", j)
      } else {
        pac.norm.slice.df <- cbind(pac.norm.slice.df, data.frame(x = pac.norm.slice))
        colnames(pac.norm.slice.df)[ncol(pac.norm.slice.df)] <- paste0("pac_", i, "_", j)
      }
    }
  }
}

# define response variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)
vars <- paste0(vars, "_dif")

# get plot ids
plot.ids <- fuel.plots[,"Plot"] |> as.data.frame()

# loop through them
par(mfrow = c(5,4), las = 1, mar = c(5,5,2,1))
for (var in vars){
  y <- fuel.plots[,var] |> as.data.frame()
  x <- pac.norm.slice.df
  mod.df <- cbind(plot.ids, y, x)
  df.pred.obs <- tuned.model(mod.df)
  perf.mets <- prd.obs.plot(df.pred.obs, var)
  if (var == vars[1]){
    perf.mets.normpacslice <- perf.mets
  } else {
    perf.mets.normpacslice <- rbind(perf.mets.normpacslice, perf.mets)
  }
}
perf.mets.normpacslice$method <- "normpacslice"

Results Comparison

Lastly, let’s take a look at all of the results together to compare methods. Given the poor performance of the singular PAC measures, I’ll compare four methods for fuel consumption mapping:

  1. Modeled Fuel Load Subtraction
    1. PrePost model
    2. Pooled model
  2. Profile Area Change
    1. Raw height PAC for profile slices
    2. Normalized height PAC for profile slices
Show Code
# define the variables of interest
vars <- c(
  "hr1", "hr10", "hr100", "hr1000", "nonwody", "trees", "llm", 
  "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af"
)

# remove "_dif" from variable names in performance metrics tables
perf.mets.rawpacslice$var <- sub("_dif", "", perf.mets.rawpacslice$var)
perf.mets.normpacslice$var <- sub("_dif", "", perf.mets.normpacslice$var)

# set up the plot
par(mfrow = c(1,2), las = 1)

# get ranges
r2.range <- range(c(
  perf.mets.prepost$r2,
  perf.mets.pooled$r2,
  perf.mets.rawpacslice$r2,
  perf.mets.normpacslice$r2
))
rrmse.range <- range(c(
  perf.mets.prepost$rrmse,
  perf.mets.pooled$rrmse,
  perf.mets.rawpacslice$rrmse,
  perf.mets.normpacslice$rrmse
))

# create empty plot
x <- rep(100,length(vars))
names(x) <- vars
dotchart(x, pch = 16, xlab = expression(R^2),
         xlim = c(0,1))

# loop though variables
for (i in seq(1,length(vars))){
  var <- vars[i]
  points(x = perf.mets.prepost$r2[perf.mets.prepost$var == var],
         y = i, pch = 16, col = 1)
  points(x = perf.mets.pooled$r2[perf.mets.pooled$var == var],
         y = i, pch = 16, col = 2)
  points(x = perf.mets.rawpacslice$r2[perf.mets.rawpacslice$var == var],
         y = i, pch = 16, col = 3)
  points(x = perf.mets.normpacslice$r2[perf.mets.normpacslice$var == var],
         y = i, pch = 16, col = 4)
}

# create empty plot
y <- rep(10000, length(vars))
names(y) <- vars
dotchart(y, pch = 16, xlab = expression(symbol("%")*RMSE),
         xlim = c(0,200))

# loop though variables
for (i in seq(1,length(vars))){
  var <- vars[i]
  points(x = perf.mets.prepost$rrmse[perf.mets.prepost$var == var],
         y = i, pch = 16, col = 1)
  points(x = perf.mets.pooled$rrmse[perf.mets.pooled$var == var],
         y = i, pch = 16, col = 2)
  points(x = perf.mets.rawpacslice$rrmse[perf.mets.rawpacslice$var == var],
         y = i, pch = 16, col = 3)
  points(x = perf.mets.normpacslice$rrmse[perf.mets.normpacslice$var == var],
         y = i, pch = 16, col = 4)
}

# add legend
legend("topright", legend = c("PrePost", "Pooled", "Raw PAC Slice", "Norm PAC Slice"),
       pch = 16, col = c(1,2,3,4))

So, clearly the PrePost approach does the best nearly across the board. This begs the question: can the lidar predictor variables that were used in the PrePost models be estimated? It also begs the question: which among these fuel variables are good enough to even move forward with modeling? Obviously some of them (e.g., trees, nonwoody, shrub_sd, 1-100 hr fuels) are pretty garbage-y. But several are quite good. So, maybe the focus should be on modeling the variables that we can more accurately predict.

I propose the following… We focus on the integrated fuel metrics: af, tf, acf, cf, and uf. We take a look at what lidar predictor variables were selected for the construction of those models in the PrePost modeling framework. If the list of variables looks reasonable, then we explore the process for modeling them.

I think a bigger question is: is any of what is already done here sufficiently novel to consider publishing? I don’t want to step on anyone’s toes, and I don’t know if you, Ben Bright, Ryan McCarley, or anyone else is already working towards Monroe Mountian fuel consumption mapping stuff. If so, I’ll happily yield the floor. But if no one is, then maybe these results in and of themselves warrant consideration for publication?