A focus on model comparisons, with spatial cross-validation and a new consumption modeling approach
Author
Mickey Campbell
Introduction
This document serves as an addendum to Monroe Mountain Fuel Consumption Modeling. It will be comparably sparse in explanatory text, since most details are contained within the previously linked document. The primary objectives of this document are:
1. To update the results using the smaller, shorter-term post-fire lidar datasets.
The previous document focused on results driven by a comparison between the full Monroe Mountain 2016 and 2023 lidar datasets. While this serves as a nice, spatially comprehensive comparison, the fires burned in 2019-2020. Given the time lag between fire and lidar data (3-4 years), a lot of regeneration could have (and likely has) occurred, likely adding uncertainty into the fuel consumption modeling procedure. By focusing on two post-fire lidar dataset that were collected 1 year post-fire, we can hopefully more accurately capture consumption before the burn areas had a chance to regrow.
2. To compare results between different fuel consumption modeling approaches, but using a more robust validation technique.
In a meeting with Andy on Friday, 6/7/2024, it was revealed that spatial cross-validation (as opposed to simple k-fold or out-of-bag model assessment) demonstrated relatively poor performance in the fuel consumption models. That is, when you test the models on a spatially distinct cluster of fuel plots that are separated from the others on which the model is trained, the predictive power decreases substantially. However, this is the best way to assess how generalizable these models are, and how well they would perform when applied to truly novel fuel and fire conditions. Thus, they represent a considerably more robust performance assessment than the simple out-of-bag estimates that were contained within the previous document.
3. To add a new fuel consumption modeling technique into the mix: profile density change (PDC).
In the same meeting, Andy raised the question of using densities, rather than height quantiles (as in the PAC method), as the basis of fuel consumption modeling. So, I implemented that method, which I am calling profile density change. Pretty simply, the technique just involves using a Gaussian kernel density estimator to quantify the point density at a series of vertical heights above the ground. This is done for both pre- and post-fire lidar data, and the differences between height-specific densities are used as candidate predictor variables for modeling fuel consumption.
4. For both PAC and PDC, adding pre-fire structure into the set of candidate predictor variables.
Pperhaps structural differences (height quantiles for PAC and point densities for PDC) alone are insufficient for quantifying consumption. Perhaps by adding the pre-fire height quantiles/point densities into the mix of candidate predictor variables, the models will more accurately be able to suss out how different pre-fire conditions are linked with consumption of different fuel components. For example, if the model knows that a subset of plots comprised tall forests before the fire, then it may be able to leverage that knowledge to understand which difference variables should be more indicative of canopy fuel consumption.
5. Test a combined PAC + PDC model.
Preliminary results were demonstrating that PAC was good at capturing some variables and PDC was good at capturing others. Given the variable selection approach baked into my modeling procedure, perhaps we can benefit from the best of both worlds while modeling fuel consumption by throwing all of the above into the soup of candidate predictors.
Field Data
Nothing new here – just repeating the same field plot wrangling and filtering that was done in the previous document.
Show Code
library(terra)library(rmarkdown)library(sf)library(lidR)library(lidRmetrics)library(VSURF)library(tuneRanger)library(ranger)library(mlr)library(dplyr)library(colorspace)library(RColorBrewer)library(stringr)# set random seedset.seed(5757)# read in the datafuel.plots <-vect("S:/ursa/campbell/fasmee/data/from_ryan/plots_wFuels.shp")# remove empty plotsfuel.plots <- fuel.plots[!is.na(fuel.plots$FID),]# separate pre-fire and post-fire plotsfuel.plots.pre <- fuel.plots[grep("pre", fuel.plots$FID),]fuel.plots.pst <- fuel.plots[grep("post", fuel.plots$FID),]# remove unnecessary columnsdel.cols <-c("FID", "X", "Y")keep.cols <-names(fuel.plots)[!names(fuel.plots) %in% del.cols]fuel.plots.pre <- fuel.plots.pre[,keep.cols]fuel.plots.pst <- fuel.plots.pst[,keep.cols]# move plot ID column from last to firstn <-ncol(fuel.plots.pre)fuel.plots.pre <- fuel.plots.pre[,c(n,1:(n-1))]fuel.plots.pst <- fuel.plots.pst[,c(n,1:(n-1))]# add "pre" and "pst" to column names for all attributesdif.names <-names(fuel.plots.pre)[2:ncol(fuel.plots.pre)]names(fuel.plots.pre)[2:ncol(fuel.plots.pre)] <-paste0(names(fuel.plots.pre)[2:ncol(fuel.plots.pre)], "_pre")names(fuel.plots.pst)[2:ncol(fuel.plots.pst)] <-paste0(names(fuel.plots.pst)[2:ncol(fuel.plots.pst)], "_pst")# merge themfuel.plots <-merge(fuel.plots.pre, as.data.frame(fuel.plots.pst))# pad single-digit plot ids with 0'sfuel.plots$Plot[as.numeric(fuel.plots$Plot) <100] <-paste0("00", fuel.plots$Plot[as.numeric(fuel.plots$Plot) <100])# sort of plot idfuel.plots <- fuel.plots[order(fuel.plots$Plot),]# get differences between pre- and post-fire fuel loads (i.e., calc consumption)for (dif.name in dif.names){ pre.fld <-paste0(dif.name, "_pre") pst.fld <-paste0(dif.name, "_pst") dif.fld <-paste0(dif.name, "_dif") fuel.plots[,dif.fld] <-as.data.frame(fuel.plots[,pre.fld]) -as.data.frame(fuel.plots[,pst.fld])}# remove the plot that had an extreme increase in ground fuelsfuel.plots <- fuel.plots[fuel.plots$hr1000_dif >-300,]
Definition of Field Names
For clarity, here are the field names from the fuel plot data again:
yr: year of measurment
hr1: 1 hour biomass
hr10: 10 hours biomass
hr100: 100 hour biomass
hr1000: 1000 hour biomass
nonwoody: herb biomass
shrubs: shrub biomass
trees: seedling biomass
llm: litter, lichen, and moss biomass
ground: duff biomass
shrb_sd: combined biomass of shrubs and seedlings
woody: combined biomass of 1hr, 10hr, 100hr, and 1000hr
uf: combined biomass of nonwoody, llm, ground, shrubs, trees, and woody
cf: tree biomass, including snags as well as live boles, branches, and foliage
cbd: canopy bulk density
cbh: canopy base height
acf: available canopy fuel (foliage and fine branches)
tf: total fuel (all biomass combined)
af: available fuel (combined biomass of uf and acf)
And here are the self-explanatory suffixes I’ve added:
pre: pre-fire measurement
pst: post-fire measurement
dif: difference between pre- and post-fire measurements
As I understand it, all of the fields are measured in Mg/ha, with the exception of cbd and cbh, which are probably measured in kg/m3 and m, respectively.
Modeling Fuel Consumption
Approach #1: Modeled Fuel Load Subtraction
Again, for more details on this, please refer back to the previous document. But, briefly, I will test two modeling approaches:
1a. Modeling pre- and post-fire fuel load separately (PrePost model). 1b. Building a single model that incorporates both pre- and post-fire fuel load data (Pooled model).
Both approaches predict fuel loads before and after the fire separately, the subtraction of which yields a fuel consumption estimate, which is assessed against field-measured consumption.
# suppress progress baroptions(lidR.progress = F)# directory structureals.dir.pre <-"S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_lidar/a04_height"# als.dir.pst <- "S:/ursa2/campbell/ems4d/data/als/als_2023/LAZ/Eastburn_processed"als.dir.pst.ann <-"S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_postfire_lidar/Annabella2021/LAZ/a03_hgt"als.dir.pst.mon <-"S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_postfire_lidar/Monroe2020/LAZ_1.4/a03_hgt"# read in lascatalogsctg.pre <-readLAScatalog(als.dir.pre, filter ="-drop_withheld -drop_class 7 18", progress = F)# ctg.pst <- readLAScatalog(als.dir.pst, filter = "-drop_withheld -drop_class 7 18", progress = F)ctg.pst.ann <-readLAScatalog(als.dir.pst.ann, filter ="-drop_withheld -drop_class 7 18", progress = F)ctg.pst.mon <-readLAScatalog(als.dir.pst.mon, filter ="-drop_withheld -drop_class 7 18", progress = F)# reproject plots to match crs of pre and pst ctgsfuel.plots.pre <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pre))# fuel.plots.pst <- fuel.plots |># st_as_sf() |># st_transform(st_crs(ctg.pst))fuel.plots.pst.ann <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pst.ann)) |>subset(substr(Plot, 1, 1) %in%c("5", "7"))fuel.plots.pst.mon <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pst.mon)) |>subset(!substr(Plot, 1, 1) %in%c("5", "7"))# run plot metricsmets.pre <-plot_metrics(las = ctg.pre,func =~lidRmetrics::metrics_set3(x = X,y = Y,z = Z,i = Intensity,ReturnNumber = ReturnNumber,NumberOfReturns = NumberOfReturns ),geometry = fuel.plots.pre,radius =8) |>as.data.frame()# mets.pst <- plot_metrics(# las = ctg.pst,# func = ~lidRmetrics::metrics_set3(# x = X,# y = Y,# z = Z,# i = Intensity,# ReturnNumber = ReturnNumber,# NumberOfReturns = NumberOfReturns# ),# geometry = fuel.plots.pst,# radius = 8# ) |># as.data.frame()mets.pst.ann <-plot_metrics(las = ctg.pst.ann,func =~lidRmetrics::metrics_set3(x = X,y = Y,z = Z,i = Intensity,ReturnNumber = ReturnNumber,NumberOfReturns = NumberOfReturns ),geometry = fuel.plots.pst.ann,radius =8) |>as.data.frame()mets.pst.mon <-plot_metrics(las = ctg.pst.mon,func =~lidRmetrics::metrics_set3(x = X,y = Y,z = Z,i = Intensity,ReturnNumber = ReturnNumber,NumberOfReturns = NumberOfReturns ),geometry = fuel.plots.pst.mon,radius =8) |>as.data.frame()# merge the ann and mon datasetsmets.pst <-rbind(mets.pst.ann, mets.pst.mon)# sort the datamets.pre <- mets.pre[order(mets.pre$Plot),]mets.pst <- mets.pst[order(mets.pst$Plot),]# separate out predictor and response data framesmets.pre <- mets.pre[,!colnames(mets.pre) %in%colnames(fuel.plots.pre)]mets.pst <- mets.pst[,!colnames(mets.pst) %in%colnames(fuel.plots.pre)]# remove unscalable variablesdel.cols <-c("n", "n_first", "n_intermediate", "n_last", "n_single","n_multiple", "n_return_1", "n_return_2", "n_return_3","n_return_4", "vn")# remove predictors with no variabilitysds.pre <-apply(mets.pre, 2, sd)sds.pst <-apply(mets.pst, 2, sd)del.cols.pre <-c(del.cols, names(sds.pre)[sds.pre ==0&!is.na(sds.pre)])del.cols.pst <-c(del.cols, names(sds.pst)[sds.pst ==0&!is.na(sds.pst)])# remove predictors with na valuesnas.pre <-apply(mets.pre, 2, function(x) sum(is.na(x)))nas.pst <-apply(mets.pst, 2, function(x) sum(is.na(x)))del.cols.pre <-c(del.cols.pre, names(nas.pre)[nas.pre >0])del.cols.pst <-c(del.cols.pst, names(nas.pst)[nas.pst >0])# keep columnskeep.cols.pre <-colnames(mets.pre)[!colnames(mets.pre) %in% del.cols.pre]keep.cols.pst <-colnames(mets.pst)[!colnames(mets.pst) %in% del.cols.pst]mets.pre <- mets.pre[,keep.cols.pre]mets.pst <- mets.pst[,keep.cols.pst]# define response variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")vars.pre <-paste0(vars, "_pre")# loop through variables and model, plot, and compile results of eachsuccesses <-0for (var.pre in vars.pre){ resp.var <- fuel.plots[,var.pre] |>as.data.frame() plot.id <- fuel.plots[,"Plot"] |>as.data.frame()if (length(unique(resp.var[,1])) <5){next } else { successes <- successes +1 } mod.df <-cbind(plot.id, resp.var, mets.pre) df.pred.obs <-tuned.model(mod.df) df.pred.pre <-data.frame(Plot = df.pred.obs$Plot,x = df.pred.obs$prd)colnames(df.pred.pre)[2] <-paste0(var.pre, "_prd")if (successes ==1){ df.pred.pre.merge <- df.pred.pre } else { df.pred.pre.merge <-merge(df.pred.pre.merge, df.pred.pre, all = T) }}# define response variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")vars.pst <-paste0(vars, "_pst")# loop through variables and model, plot, and compile results of eachsuccesses <-0for (var.pst in vars.pst){ resp.var <- fuel.plots[,var.pst] |>as.data.frame() plot.id <- fuel.plots[,"Plot"] |>as.data.frame()if (length(unique(resp.var[,1])) <5){next } else { successes <- successes +1 } mod.df <-cbind(plot.id, resp.var, mets.pst) df.pred.obs <-tuned.model(mod.df) df.pred.pst <-data.frame(Plot = df.pred.obs$Plot,x = df.pred.obs$prd)colnames(df.pred.pst)[2] <-paste0(var.pst, "_prd")if (successes ==1){ df.pred.pst.merge <- df.pred.pst } else { df.pred.pst.merge <-merge(df.pred.pst.merge, df.pred.pst, all = T) }}# merge pre and post fire predictionsdf.pred.merge <-merge(df.pred.pre.merge, df.pred.pst.merge, all = T)# get fuel metric namesvars <-colnames(df.pred.merge) |>sub("_pre_prd", "", x = _) |>sub("_pst_prd", "", x = _) |>unique()vars <- vars[vars !="Plot"]# loop through them and calculate differencesfor (var in vars){ var.pre <-paste0(var, "_pre_prd") var.pst <-paste0(var, "_pst_prd") var.dif <-paste0(var, "_dif_prd")if (!all(c(var.pre, var.pst) %in%colnames(df.pred.merge))) next df.pred.merge[,var.dif] <- df.pred.merge[,var.pre] - df.pred.merge[,var.pst]}# join back to fuel plot datadf.pre.post <- fuel.plots |>as.data.frame() |>merge(df.pred.merge, all = T)# loop through the variables and compare consumption predictions to observationspar(mfrow =c(4,4))for (var in vars){ var.dif.prd <-paste0(var, "_dif_prd") var.dif.obs <-paste0(var, "_dif")if (!all(c(var.dif.prd, var.dif.obs) %in%colnames(df.pre.post))) next df.pred.obs <-data.frame(obs = df.pre.post[,var.dif.obs],prd = df.pre.post[,var.dif.prd]) perf.mets <-prd.obs.plot(df.pred.obs, var)if (var == vars[1]){ perf.mets.prepost <- perf.mets } else { perf.mets.prepost <-rbind(perf.mets.prepost, perf.mets) }}
Show Code
perf.mets.prepost$method <-"prepost"
1b. Pooled model
Show Code
# define response variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")vars.pre <-paste0(vars, "_pre")vars.pst <-paste0(vars, "_pst")# subset the fuels datafuel.plots.pooled.pre <- fuel.plots[,c("Plot", vars.pre)] |>as.data.frame()fuel.plots.pooled.pst <- fuel.plots[,c("Plot", vars.pst)] |>as.data.frame()# remove "_pre" and "_pst" from column names to enable row mergingnames(fuel.plots.pooled.pre) <-sub("_pre", "", names(fuel.plots.pooled.pre))names(fuel.plots.pooled.pst) <-sub("_pst", "", names(fuel.plots.pooled.pst))# add "_pre" and "_pst" to plot idsfuel.plots.pooled.pre$Plot <-paste0(fuel.plots.pooled.pre$Plot, "_pre")fuel.plots.pooled.pst$Plot <-paste0(fuel.plots.pooled.pst$Plot, "_pst")# merge rowwisefuel.plots.pooled <-bind_rows(fuel.plots.pooled.pre, fuel.plots.pooled.pst)# merge mets rowwise toomets.pooled <-bind_rows(mets.pre, mets.pst)# remove columns with nasmets.pooled <- mets.pooled[,apply(mets.pooled, 2, function(x) sum(is.na(x)) ==0)]# loop through variables and model, plot, and compile results of eachsuccesses <-0for (var in vars){ resp.var <- fuel.plots.pooled[var] plot.id <- fuel.plots.pooled["Plot"]if (length(unique(resp.var[,1])) <5){next } else { successes <- successes +1 } mod.df <-cbind(plot.id, resp.var, mets.pooled) df.pred.obs <-tuned.model(mod.df) df.pred.pooled <-data.frame(Plot = df.pred.obs$Plot,x = df.pred.obs$prd)colnames(df.pred.pooled)[2] <-paste0(var, "_prd")if (successes ==1){ df.pred.pooled.merge <- df.pred.pooled } else { df.pred.pooled.merge <-merge(df.pred.pooled.merge, df.pred.pooled, all = T) }}# separate back out the pre- and post-fire plotsdf.pred.pooled.pre <- df.pred.pooled.merge[grep("_pre", df.pred.pooled.merge$Plot),]df.pred.pooled.pst <- df.pred.pooled.merge[grep("_pst", df.pred.pooled.merge$Plot),]# add "_pre" and "_pst" to prediction column namescolnames(df.pred.pooled.pre) <-sub("_prd", "_pre_prd", colnames(df.pred.pooled.pre))colnames(df.pred.pooled.pst) <-sub("_prd", "_pst_prd", colnames(df.pred.pooled.pst))# remove "_pre" and "_pst" from plot namesdf.pred.pooled.pre$Plot <-sub("_pre", "", df.pred.pooled.pre$Plot)df.pred.pooled.pst$Plot <-sub("_pst", "", df.pred.pooled.pst$Plot)# merge pre and post fire predictionsdf.pred.pooled.merge <-merge(df.pred.pooled.pre, df.pred.pooled.pst, all = T)# get fuel metric namesvars <-colnames(df.pred.pooled.merge) |>sub("_pre_prd", "", x = _) |>sub("_pst_prd", "", x = _) |>unique()vars <- vars[vars !="Plot"]# loop through them and calculate differencesfor (var in vars){ var.pre <-paste0(var, "_pre_prd") var.pst <-paste0(var, "_pst_prd") var.dif <-paste0(var, "_dif_prd")if (!all(c(var.pre, var.pst) %in%colnames(df.pred.pooled.merge))) next df.pred.pooled.merge[,var.dif] <- df.pred.pooled.merge[,var.pre] - df.pred.pooled.merge[,var.pst]}# join back to fuel plot datadf.pooled <- fuel.plots |>as.data.frame() |>merge(df.pred.pooled.merge, all = T)# loop through the variables and compare consumption predictions to observationspar(mfrow =c(5,4))for (var in vars){ var.dif.prd <-paste0(var, "_dif_prd") var.dif.obs <-paste0(var, "_dif")if (!all(c(var.dif.prd, var.dif.obs) %in%colnames(df.pooled))) next df.pred.obs <-data.frame(obs = df.pooled[,var.dif.obs],prd = df.pooled[,var.dif.prd]) perf.mets <-prd.obs.plot(df.pred.obs, var)if (var == vars[1]){ perf.mets.pooled <- perf.mets } else { perf.mets.pooled <-rbind(perf.mets.pooled, perf.mets) }}perf.mets.pooled$method <-"pooled"
Approach #2: Profile Area Change (PAC)
Briefly, the idea here is that the area under the curve (AUC) representing the quantile distribution of heights approximately represents the volume of fuels in a given plot/pixel. Thus, the difference between AUC before and after a fire shoud capture fuel volumetric (or loading) change, as in the visual example shown below.
Show Code
# define percentile metrics functionperc.fun <-function(z){ q <-quantile(z, seq(0,1,0.01)) |>as.list()names(q) <-paste0("p_", stringr::str_pad(seq(0,100,1), 3, "left", "0"))return(q)}# reproject plots to match crs of pre and pst ctgsfuel.plots.pre <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pre))# fuel.plots.pst <- fuel.plots |># st_as_sf() |># st_transform(st_crs(ctg.pst))fuel.plots.pst.ann <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pst.ann)) |>subset(substr(Plot, 1, 1) %in%c("5", "7"))fuel.plots.pst.mon <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pst.mon)) |>subset(!substr(Plot, 1, 1) %in%c("5", "7"))# run plot metricspercs.pre <-plot_metrics(las = ctg.pre,func =~perc.fun(Z),geometry = fuel.plots.pre,radius =8) |>st_drop_geometry()# percs.pst <- plot_metrics(# las = ctg.pst,# func = ~perc.fun(Z),# geometry = fuel.plots.pst,# radius = 8# ) |># st_drop_geometry()percs.pst.ann <-plot_metrics(las = ctg.pst.ann,func =~perc.fun(Z),geometry = fuel.plots.pst.ann,radius =8) |>st_drop_geometry()percs.pst.mon <-plot_metrics(las = ctg.pst.mon,func =~perc.fun(Z),geometry = fuel.plots.pst.mon,radius =8) |>st_drop_geometry()# merge the ann and mon datasetspercs.pst <-rbind(percs.pst.ann, percs.pst.mon)# sort the datapercs.pre <- percs.pre[order(percs.pre$Plot),]percs.pst <- percs.pst[order(percs.pst$Plot),]# separate out predictor and response data framespercs.pre <- percs.pre[,!colnames(percs.pre) %in%colnames(fuel.plots.pre)]percs.pst <- percs.pst[,!colnames(percs.pst) %in%colnames(fuel.plots.pre)]# set negative values to zeropercs.pre[percs.pre <0] <-0percs.pst[percs.pst <0] <-0# plot an example outex.row <-1percs.pre.ex <- percs.pre[ex.row,]percs.pst.ex <- percs.pst[ex.row,]par(mar =c(5,5,1,1), las =1)plot(x =c(0,100), y =range(c(percs.pre.ex, percs.pst.ex)),type ="n", xlab ="Percentile", ylab ="Height (m)")grid()polygon(x =c(0, seq(0,100), 100),y =c(0, percs.pre.ex, 0),col =adjust_transparency(2, 0.5),border =NA)lines(x =seq(0,100), y = percs.pre.ex, col =2, lwd =2)polygon(x =c(0, seq(0,100), 100),y =c(0, percs.pst.ex, 0),col =adjust_transparency(4, 0.5),border =NA)lines(x =seq(0,100), y = percs.pst.ex, col =4, lwd =2)legend("topleft", legend =c("Pre-fire", "Post-fire"), col =c(2, 4), fill =c(adjust_transparency(2, 0.5), adjust_transparency(4, 0.5)))
2a. Raw height PAC for profile slices
We found out in the previous document that a singular measure of PAC was insufficient for capturing nuance of consumption in different fuel components. To address this, I took “slices” out of the height quantile profile at a series of different quantile ranges and fed those into an RF to model consumption directly. The first way was using the raw height quantiles (i.e., in meters). In addition to the PAC values as predictors, I also threw in the pre-fire height quantiles as candidates. As mentioned in the Introduction, this may help the model distinguish forests of different pre-fire structure, which could help provide context for the PAC values.
Show Code
# create auc functionauc.slice.fun <-function(y, slice.min, slice.max){ x <-seq(0,100) auc <-integrate(splinefun(x, y), slice.min, slice.max)$valuereturn(auc)}# begin slice.min loopcounter <-0for (i inseq(0,100,10)){for (j inseq(0,100,10)){if (j > i){ counter <- counter +1 auc.raw.slice.pre <-apply(percs.pre, 1, function(x) auc.slice.fun(x, i, j)) auc.raw.slice.pst <-apply(percs.pst, 1, function(x) auc.slice.fun(x, i, j)) pac.raw.slice <- auc.raw.slice.pre - auc.raw.slice.pstif (counter ==1){ pac.raw.slice.df <-data.frame(x = pac.raw.slice)colnames(pac.raw.slice.df) <-paste0("pac_", i, "_", j) } else { pac.raw.slice.df <-cbind(pac.raw.slice.df, data.frame(x = pac.raw.slice))colnames(pac.raw.slice.df)[ncol(pac.raw.slice.df)] <-paste0("pac_", i, "_", j) } } }}# define response variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")vars <-paste0(vars, "_dif")# get plot idsplot.ids <- fuel.plots[,"Plot"] |>as.data.frame()# loop through thempar(mfrow =c(5,4), las =1, mar =c(5,5,2,1))for (var in vars){ y <- fuel.plots[,var] |>as.data.frame() x <-cbind(percs.pre, pac.raw.slice.df) mod.df <-cbind(plot.ids, y, x) df.pred.obs <-tuned.model(mod.df) perf.mets <-prd.obs.plot(df.pred.obs, var)if (var == vars[1]){ perf.mets.pacraw <- perf.mets } else { perf.mets.pacraw <-rbind(perf.mets.pacraw, perf.mets) }}perf.mets.pacraw$method <-"pacraw"
2b. Normalized height PAC for profile slices
The second way (more akin to how Hu et al. (2019) did it) was using normalized height quantiles (i.e., percent relative to the maximum per-plot height between pre- and post-fire data):
Show Code
# get plot-level maximum heightsmax.hgt.pre <-apply(percs.pre, 1, max)max.hgt.pst <-apply(percs.pst, 1, max)max.hgt <-apply(cbind(max.hgt.pre, max.hgt.pst), 1, max)# normalize heights to maxpercs.pre.norm <- percs.prefor (i inseq(1,nrow(percs.pre.norm))){ percs.pre.norm[i,] <- percs.pre.norm[i,] / max.hgt[i]}percs.pst.norm <- percs.pstfor (i inseq(1,nrow(percs.pst.norm))){ percs.pst.norm[i,] <- percs.pst.norm[i,] / max.hgt[i]}# begin slice.min loopcounter <-0for (i inseq(0,100,10)){for (j inseq(0,100,10)){if (j > i){ counter <- counter +1 auc.norm.slice.pre <-apply(percs.pre.norm, 1, function(x) auc.slice.fun(x, i, j)) auc.norm.slice.pst <-apply(percs.pst.norm, 1, function(x) auc.slice.fun(x, i, j)) pac.norm.slice <- auc.norm.slice.pre - auc.norm.slice.pstif (counter ==1){ pac.norm.slice.df <-data.frame(x = pac.norm.slice)colnames(pac.norm.slice.df) <-paste0("pac_", i, "_", j) } else { pac.norm.slice.df <-cbind(pac.norm.slice.df, data.frame(x = pac.norm.slice))colnames(pac.norm.slice.df)[ncol(pac.norm.slice.df)] <-paste0("pac_", i, "_", j) } } }}# define response variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")vars <-paste0(vars, "_dif")# get plot idsplot.ids <- fuel.plots[,"Plot"] |>as.data.frame()# loop through thempar(mfrow =c(5,4), las =1, mar =c(5,5,2,1))for (var in vars){ y <- fuel.plots[,var] |>as.data.frame() x <-cbind(percs.pre.norm, pac.norm.slice.df) mod.df <-cbind(plot.ids, y, x) df.pred.obs <-tuned.model(mod.df) perf.mets <-prd.obs.plot(df.pred.obs, var)if (var == vars[1]){ perf.mets.pacnorm <- perf.mets } else { perf.mets.pacnorm <-rbind(perf.mets.pacnorm, perf.mets) }}perf.mets.pacnorm$method <-"pacnorm"
Approach #3: Profile Density Change (PDC)
Here’s where things start to diverge from the previous document. As mentioned in the Introduction, Andy suggested exploring densities at different height ranges, rather than height quantiles, and differences therein, as potential predictors of fuel consumption. To do this, I used a Gaussian kernel density estimator with a 5cm bandwidth, calculated at every 5cm vertical interval from 0 to 30m in height. I calculated the difference between pre-fire and post-fire densities, and use those as predictors of fuel consumption. As with PAC, I also added in the pre-fire densities to give the model a sense for pre-fire structure, hopefully allowing it to make more forest-structure-specific predictions. First, I’ll plot out two density profiles for an example plot, one representing pre-fire densities, and one representing post-fire densities. The difference between the two at different heights should capture fuelbed-specific consumption (at least that’s the theory!).
Show Code
# define percentile metrics functiondens.fun <-function(z, bw =0.05){ z[z <0] <-0 d <-density(z, bw, from =0, to =30, n =length(seq(0,30,bw))) dy <-as.list(d$y)names(dy) <-paste0("d.", d$x)return(dy)}# reproject plots to match crs of pre and pst ctgsfuel.plots.pre <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pre))# fuel.plots.pst <- fuel.plots |># st_as_sf() |># st_transform(st_crs(ctg.pst))fuel.plots.pst.ann <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pst.ann)) |>subset(substr(Plot, 1, 1) %in%c("5", "7"))fuel.plots.pst.mon <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pst.ann)) |>subset(!substr(Plot, 1, 1) %in%c("5", "7"))# run plot metricsdens.pre <-plot_metrics(las = ctg.pre,func =~dens.fun(Z),geometry = fuel.plots.pre,radius =8) |>st_drop_geometry()# dens.pst <- plot_metrics(# las = ctg.pst,# func = ~dens.fun(Z),# geometry = fuel.plots.pst,# radius = 8# ) |># st_drop_geometry()dens.pst.ann <-plot_metrics(las = ctg.pst.ann,func =~dens.fun(Z),geometry = fuel.plots.pst.ann,radius =8) |>st_drop_geometry()dens.pst.mon <-plot_metrics(las = ctg.pst.mon,func =~dens.fun(Z),geometry = fuel.plots.pst.mon,radius =8) |>st_drop_geometry()# merge the ann and mon datasetsdens.pst <-rbind(dens.pst.ann, dens.pst.mon)# sort the datadens.pre <- dens.pre[order(dens.pre$Plot),]dens.pst <- dens.pst[order(dens.pst$Plot),]# separate out predictor and response data framesdens.pre <- dens.pre[,!colnames(dens.pre) %in%colnames(fuel.plots.pre)]dens.pst <- dens.pst[,!colnames(dens.pst) %in%colnames(fuel.plots.pre)]# calculate differencesdens.dif <- dens.pre - dens.pst# change column namescolnames(dens.pre) <-paste0(colnames(dens.pre), ".pre")colnames(dens.pst) <-paste0(colnames(dens.pst), ".pst")colnames(dens.dif) <-paste0(colnames(dens.dif), ".dif")# plot an example outex.row <-1dens.pre.ex <- dens.pre[ex.row,]dens.pst.ex <- dens.pst[ex.row,]par(mar =c(5,5,1,1), las =1)plot(x =c(0,30), y =range(c(dens.pre.ex, dens.pst.ex)),type ="n", xlab ="Height (m)", ylab ="Density")grid()polygon(x =c(0, seq(0,30,0.05), 30),y =c(0, dens.pre.ex, 0),col =adjust_transparency(2, 0.5),border =NA)lines(x =seq(0,30,0.05), y = dens.pre.ex, col =2, lwd =2)polygon(x =c(0, seq(0,30,0.05), 30),y =c(0, dens.pst.ex, 0),col =adjust_transparency(4, 0.5),border =NA)lines(x =seq(0,30,0.05), y = dens.pst.ex, col =4, lwd =2)legend("topright", legend =c("Pre-fire", "Post-fire"), col =c(2, 4), fill =c(adjust_transparency(2, 0.5), adjust_transparency(4, 0.5)))
Show Code
# define response variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")vars <-paste0(vars, "_dif")# get plot idsplot.ids <- fuel.plots[,"Plot"] |>as.data.frame()# loop through thempar(mfrow =c(5,4), las =1, mar =c(5,5,2,1))for (var in vars){ x <-cbind(dens.pre, dens.dif) y <- fuel.plots[,var] |>as.data.frame() mod.df <-cbind(plot.ids, y,x) df.pred.obs <-tuned.model(mod.df) perf.mets <-prd.obs.plot(df.pred.obs, var)if (var == vars[1]){ perf.mets.pdc <- perf.mets } else { perf.mets.pdc <-rbind(perf.mets.pdc, perf.mets) }}perf.mets.pdc$method <-"pdc"
Approach #4: PAC+PDC
PAC seems to do pretty well at predicting consumption in certain fuel beds, and PDC in others. Perhaps through their combined use, we can predict fuel consumption well across a greater diversity of fuel beds.
Show Code
# loop through thempar(mfrow =c(5,4), las =1, mar =c(5,5,2,1))for (var in vars){ x <-cbind(dens.pre, dens.dif, percs.pre.norm, pac.norm.slice.df) y <- fuel.plots[,var] |>as.data.frame() mod.df <-cbind(plot.ids, y,x) df.pred.obs <-tuned.model(mod.df) perf.mets <-prd.obs.plot(df.pred.obs, var)if (var == vars[1]){ perf.mets.pacpdc <- perf.mets } else { perf.mets.pacpdc <-rbind(perf.mets.pacpdc, perf.mets) }}perf.mets.pacpdc$method <-"pacpdc"
Results Comparison
Lastly, let’s take a look at all of the results together to compare methods.
Show Code
# define the variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")# remove "_dif" from variable names in performance metrics tablesperf.mets.pacraw$var <-sub("_dif", "", perf.mets.pacraw$var)perf.mets.pacnorm$var <-sub("_dif", "", perf.mets.pacnorm$var)perf.mets.pdc$var <-sub("_dif", "", perf.mets.pdc$var)perf.mets.pacpdc$var <-sub("_dif", "", perf.mets.pacpdc$var)# create "long" tableperf.mets.long <-rbind( perf.mets.prepost, perf.mets.pooled, perf.mets.pacraw, perf.mets.pacnorm, perf.mets.pdc, perf.mets.pacpdc)# add method to r2 and rrmse colnamesperf.mets.prepost$method <-NULLcolnames(perf.mets.prepost)[2:3] <-paste0(colnames(perf.mets.prepost)[2:3], ".prepost")perf.mets.pooled$method <-NULLcolnames(perf.mets.pooled)[2:3] <-paste0(colnames(perf.mets.pooled)[2:3], ".pooled")perf.mets.pacraw$method <-NULLcolnames(perf.mets.pacraw)[2:3] <-paste0(colnames(perf.mets.pacraw)[2:3], ".pacraw")perf.mets.pacnorm$method <-NULLcolnames(perf.mets.pacnorm)[2:3] <-paste0(colnames(perf.mets.pacnorm)[2:3], ".pacnorm")perf.mets.pdc$method <-NULLcolnames(perf.mets.pdc)[2:3] <-paste0(colnames(perf.mets.pdc)[2:3], ".pdc")perf.mets.pacpdc$method <-NULLcolnames(perf.mets.pacpdc)[2:3] <-paste0(colnames(perf.mets.pacpdc)[2:3], ".pacpdc")# merge them to create "wide" tableperf.mets.wide <-merge(perf.mets.prepost, perf.mets.pooled) |>merge(perf.mets.pacraw) |>merge(perf.mets.pacnorm) |>merge(perf.mets.pdc) |>merge(perf.mets.pacpdc)# define colorscols <-brewer.pal(6, "Dark2")# set up the plotpar(mfrow =c(1,2), las =1)# create empty plotx <-rep(100,length(vars))names(x) <- varsr2.cols <- perf.mets.wide[,grep("r2", colnames(perf.mets.wide))]dotchart(x, pch =16, xlab =expression(R^2),xlim =range(r2.cols))# loop though variablesfor (i inseq(1,length(vars))){ var <- vars[i]points(x = perf.mets.wide$r2.prepost[perf.mets.wide$var == var],y = i, pch =16, col = cols[1])points(x = perf.mets.wide$r2.pooled[perf.mets.wide$var == var],y = i, pch =16, col = cols[2])points(x = perf.mets.wide$r2.pacraw[perf.mets.wide$var == var],y = i, pch =16, col = cols[3])points(x = perf.mets.wide$r2.pacnorm[perf.mets.wide$var == var],y = i, pch =16, col = cols[4])points(x = perf.mets.wide$r2.pdc[perf.mets.wide$var == var],y = i, pch =16, col = cols[5])points(x = perf.mets.wide$r2.pacpdc[perf.mets.wide$var == var],y = i, pch =16, col = cols[6])}# create empty ploty <-rep(10000, length(vars))names(y) <- varsrrmse.cols <- perf.mets.wide[perf.mets.wide$var !="nonwody",grep("rrmse", colnames(perf.mets.wide))]dotchart(y, pch =16, xlab =expression(symbol("%")*RMSE),xlim =range(rrmse.cols))# loop though variablesfor (i inseq(1,length(vars))){ var <- vars[i]points(x = perf.mets.wide$rrmse.prepost[perf.mets.wide$var == var],y = i, pch =16, col = cols[1])points(x = perf.mets.wide$rrmse.pooled[perf.mets.wide$var == var],y = i, pch =16, col = cols[2])points(x = perf.mets.wide$rrmse.pacraw[perf.mets.wide$var == var],y = i, pch =16, col = cols[3])points(x = perf.mets.wide$rrmse.pacnorm[perf.mets.wide$var == var],y = i, pch =16, col = cols[4])points(x = perf.mets.wide$rrmse.pdc[perf.mets.wide$var == var],y = i, pch =16, col = cols[5])points(x = perf.mets.wide$rrmse.pacpdc[perf.mets.wide$var == var],y = i, pch =16, col = cols[6])}# add legendlegend("topright", legend =c("PrePost", "Pooled", "Raw PAC", "Norm PAC", "PDC", "PAC + PDC"),pch =16, col = cols)
Show Code
# figure out which approach performed best for each fuelbedfn <-function(x){ m <-which.max(x) cn <-colnames(r2.cols)[m] var <-sub("r2\\.", "", cn)return(var)}r2.best <-apply(r2.cols, 1, fn)fn <-function(x){ m <-which.max(x) cn <-colnames(rrmse.cols)[m] var <-sub("rrmse\\.", "", cn)return(var)}rrmse.best <-apply(r2.cols, 1, fn)df.best <-data.frame(var = perf.mets.wide$var,r2.best = r2.best,rrmse.best = rrmse.best)paged_table(df.best, options =list(rows.print =nrow(df.best)))
Show Code
# plot out best performer countspar(mfrow =c(1,2), mar =c(5,5,1,1), las =1)r2.best.n <-table(factor(df.best$r2.best, levels =c("prepost", "pooled", "pacraw", "pacnorm", "pdc", "pacpdc")))barplot(r2.best.n, col = cols, ylab =expression("Best Performer by "*R^2))rrmse.best.n <-table(factor(df.best$rrmse.best, levels =c("prepost", "pooled", "pacraw", "pacnorm", "pdc", "pacpdc")))barplot(rrmse.best.n, col = cols, ylab =expression("Best Performer by "*symbol("%")*RMSE))
Show Code
# plot out their performance as boxplotspar(mfrow =c(1,2), mar =c(5,3,1,1), las =1)perf.mets.long$method <-factor(perf.mets.long$method, levels =c("prepost", "pooled", "pacraw", "pacnorm", "pdc", "pacpdc"))bp <-boxplot(r2 ~ method, data = perf.mets.long, col = cols,xaxt ="n", ylim =range(r2.cols), ylab =expression(R^2),xlab =NA)tick <-seq_along(bp$names)axis(1, at = tick, labels = F)text(x = tick, y =par("usr")[3] -0.05*diff(par("usr")[3:4]),labels =c("PrePost", "Pooled", "PAC Raw", "PAC Norm", "PDC", "PAC + PDC"),srt =45, xpd = T)bp <-boxplot(rrmse ~ method, data = perf.mets.long, col = cols,xaxt ="n", ylim =range(rrmse.cols), ylab =expression(symbol("%")*RMSE), xlab =NA)tick <-seq_along(bp$names)axis(1, at = tick, labels = F)text(x = tick, y =par("usr")[3] -0.05*diff(par("usr")[3:4]),labels =c("PrePost", "Pooled", "PAC Raw", "PAC Norm", "PDC", "PAC + PDC"),srt =45, xpd = T)
PDC and PAC + PDC tend to yield the best performance, especially with respect to R2, and to a somewhat lesser extent, %RMSE. One of the two was best in terms of both R2 and %RMSE for nearly all of the fuel consumption models.
The big question is is PDC good/novel enough to put out into the world as a methods-focused publication?
Source Code
---title: "Monroe Mountain Fuel Consumption Modeling Part 2"subtitle: "A focus on model comparisons, with spatial cross-validation and a new consumption modeling approach"author: "Mickey Campbell"format: html: toc: true toc-expand: true toc-depth: 4 toc-title: Contents toc_location: left code-fold: true code-summary: "Show Code" code-tools: true---## IntroductionThis document serves as an addendum to [Monroe Mountain Fuel Consumption Modeling](https://rpubs.com/mickeycampbell/mmfcm]). It will be comparably sparse in explanatory text, since most details are contained within the previously linked document. The primary objectives of this document are:**1. To update the results using the smaller, shorter-term post-fire lidar datasets.**The previous document focused on results driven by a comparison between the full Monroe Mountain 2016 and 2023 lidar datasets. While this serves as a nice, spatially comprehensive comparison, the fires burned in 2019-2020. Given the time lag between fire and lidar data (3-4 years), a lot of regeneration could have (and likely has) occurred, likely adding uncertainty into the fuel consumption modeling procedure. By focusing on two post-fire lidar dataset that were collected 1 year post-fire, we can hopefully more accurately capture consumption before the burn areas had a chance to regrow.**2. To compare results between different fuel consumption modeling approaches, but using a more robust validation technique.**In a meeting with Andy on Friday, 6/7/2024, it was revealed that spatial cross-validation (as opposed to simple k-fold or out-of-bag model assessment) demonstrated relatively poor performance in the fuel consumption models. That is, when you test the models on a spatially distinct cluster of fuel plots that are separated from the others on which the model is trained, the predictive power decreases substantially. However, this is the best way to assess how generalizable these models are, and how well they would perform when applied to truly novel fuel and fire conditions. Thus, they represent a considerably more robust performance assessment than the simple out-of-bag estimates that were contained within the previous document.**3. To add a new fuel consumption modeling technique into the mix: profile density change (PDC).**In the same meeting, Andy raised the question of using densities, rather than height quantiles (as in the PAC method), as the basis of fuel consumption modeling. So, I implemented that method, which I am calling profile density change. Pretty simply, the technique just involves using a Gaussian kernel density estimator to quantify the point density at a series of vertical heights above the ground. This is done for both pre- and post-fire lidar data, and the differences between height-specific densities are used as candidate predictor variables for modeling fuel consumption.**4. For both PAC and PDC, adding pre-fire structure into the set of candidate predictor variables.**Pperhaps structural differences (height quantiles for PAC and point densities for PDC) alone are insufficient for quantifying consumption. Perhaps by adding the pre-fire height quantiles/point densities into the mix of candidate predictor variables, the models will more accurately be able to suss out how different pre-fire conditions are linked with consumption of different fuel components. For example, if the model knows that a subset of plots comprised tall forests before the fire, then it may be able to leverage that knowledge to understand which difference variables should be more indicative of canopy fuel consumption.**5. Test a combined PAC + PDC model.**Preliminary results were demonstrating that PAC was good at capturing some variables and PDC was good at capturing others. Given the variable selection approach baked into my modeling procedure, perhaps we can benefit from the best of both worlds while modeling fuel consumption by throwing all of the above into the soup of candidate predictors.## Field DataNothing new here -- just repeating the same field plot wrangling and filtering that was done in the previous document.```{r}#| warning: false#| message: falselibrary(terra)library(rmarkdown)library(sf)library(lidR)library(lidRmetrics)library(VSURF)library(tuneRanger)library(ranger)library(mlr)library(dplyr)library(colorspace)library(RColorBrewer)library(stringr)# set random seedset.seed(5757)# read in the datafuel.plots <-vect("S:/ursa/campbell/fasmee/data/from_ryan/plots_wFuels.shp")# remove empty plotsfuel.plots <- fuel.plots[!is.na(fuel.plots$FID),]# separate pre-fire and post-fire plotsfuel.plots.pre <- fuel.plots[grep("pre", fuel.plots$FID),]fuel.plots.pst <- fuel.plots[grep("post", fuel.plots$FID),]# remove unnecessary columnsdel.cols <-c("FID", "X", "Y")keep.cols <-names(fuel.plots)[!names(fuel.plots) %in% del.cols]fuel.plots.pre <- fuel.plots.pre[,keep.cols]fuel.plots.pst <- fuel.plots.pst[,keep.cols]# move plot ID column from last to firstn <-ncol(fuel.plots.pre)fuel.plots.pre <- fuel.plots.pre[,c(n,1:(n-1))]fuel.plots.pst <- fuel.plots.pst[,c(n,1:(n-1))]# add "pre" and "pst" to column names for all attributesdif.names <-names(fuel.plots.pre)[2:ncol(fuel.plots.pre)]names(fuel.plots.pre)[2:ncol(fuel.plots.pre)] <-paste0(names(fuel.plots.pre)[2:ncol(fuel.plots.pre)], "_pre")names(fuel.plots.pst)[2:ncol(fuel.plots.pst)] <-paste0(names(fuel.plots.pst)[2:ncol(fuel.plots.pst)], "_pst")# merge themfuel.plots <-merge(fuel.plots.pre, as.data.frame(fuel.plots.pst))# pad single-digit plot ids with 0'sfuel.plots$Plot[as.numeric(fuel.plots$Plot) <100] <-paste0("00", fuel.plots$Plot[as.numeric(fuel.plots$Plot) <100])# sort of plot idfuel.plots <- fuel.plots[order(fuel.plots$Plot),]# get differences between pre- and post-fire fuel loads (i.e., calc consumption)for (dif.name in dif.names){ pre.fld <-paste0(dif.name, "_pre") pst.fld <-paste0(dif.name, "_pst") dif.fld <-paste0(dif.name, "_dif") fuel.plots[,dif.fld] <-as.data.frame(fuel.plots[,pre.fld]) -as.data.frame(fuel.plots[,pst.fld])}# remove the plot that had an extreme increase in ground fuelsfuel.plots <- fuel.plots[fuel.plots$hr1000_dif >-300,]```### Definition of Field NamesFor clarity, here are the field names from the fuel plot data again:- **yr**: year of measurment- **hr1**: 1 hour biomass- **hr10**: 10 hours biomass- **hr100**: 100 hour biomass- **hr1000**: 1000 hour biomass- **nonwoody**: herb biomass- **shrubs**: shrub biomass- **trees**: seedling biomass- **llm**: litter, lichen, and moss biomass- **ground**: duff biomass- **shrb_sd**: combined biomass of shrubs and seedlings- **woody**: combined biomass of 1hr, 10hr, 100hr, and 1000hr- **uf**: combined biomass of nonwoody, llm, ground, shrubs, trees, and woody- **cf**: tree biomass, including snags as well as live boles, branches, and foliage- **cbd**: canopy bulk density- **cbh**: canopy base height- **acf**: available canopy fuel (foliage and fine branches)- **tf**: total fuel (all biomass combined)- **af**: available fuel (combined biomass of uf and acf)And here are the self-explanatory suffixes I've added:- **pre**: pre-fire measurement- **pst**: post-fire measurement- **dif**: difference between pre- and post-fire measurementsAs I understand it, all of the fields are measured in Mg/ha, with the exception of **cbd** and **cbh**, which are probably measured in kg/m^3^ and m, respectively.## Modeling Fuel Consumption### Approach #1: Modeled Fuel Load SubtractionAgain, for more details on this, please refer back to the [previous document](https://rpubs.com/mickeycampbell/mmfcm). But, briefly, I will test two modeling approaches:1a. Modeling pre- and post-fire fuel load separately (**PrePost model**).1b. Building a single model that incorporates both pre- and post-fire fuel load data (**Pooled model**).Both approaches predict fuel loads before and after the fire separately, the subtraction of which yields a fuel consumption estimate, which is assessed against field-measured consumption.```{r}#| warning: false#| message: false# define modeling functiontuned.model <-function(mod.df){ mod.df <-na.omit(mod.df) plot.ids <- mod.df$Plot folds <-substr(plot.ids, 1, 1) mod.df$Plot <-NULL x.cols <-2:ncol(mod.df) y.col <-1 v <-VSURF(x = mod.df[,x.cols],y = mod.df[,y.col],RFimplem ="randomForest",parallel = T,ncores =18,verbose = F ) x.cols <- x.cols[v$varselect.pred] mod.df <- mod.df[,c(y.col, x.cols)] cv.iters <-100 hyp.grid <-data.frame(mtry =sample(1:length(x.cols), cv.iters, T),min.node.size =sample(1:10, cv.iters, T),sample.fraction =runif(cv.iters, 0.1, 0.9),mse =rep(NA, cv.iters) )for (i inseq(1,cv.iters)){ form <-as.formula(paste0(colnames(mod.df)[1], " ~ .")) mod <-ranger(formula = form,data = mod.df,mtry = hyp.grid$mtry[i],sample.fraction = hyp.grid$sample.fraction[i],min.node.size = hyp.grid$min.node.size[i] ) hyp.grid$mse[i] <- mod$prediction.error } mtry <- hyp.grid$mtry[which.min(hyp.grid$mse)] min.node.size <- hyp.grid$min.node.size[which.min(hyp.grid$mse)] sample.fraction <- hyp.grid$sample.fraction[which.min(hyp.grid$mse)] form <-as.formula(paste0(colnames(mod.df)[1], " ~ ."))for (fold inunique(folds)){ mod.df.valid <- mod.df[folds == fold,] mod.df.train <- mod.df[folds != fold,] mod <-ranger(formula = form,data = mod.df.train,mtry = mtry,sample.fraction = sample.fraction,min.node.size = min.node.size ) pred <-predict(mod, mod.df.valid)$predictionif (fold ==unique(folds)[1]){ df.pred.obs <-data.frame(Plot = plot.ids[folds == fold],obs = mod.df.valid[,y.col],prd = pred ) } else { df.pred.obs <-rbind( df.pred.obs,data.frame(Plot = plot.ids[folds == fold],obs = mod.df.valid[,y.col],prd = pred ) ) } }return(df.pred.obs)}# define predicted vs. observed plotting functionprd.obs.plot <-function(df.pred.obs, name){ df.pred.obs <-na.omit(df.pred.obs) x <- df.pred.obs$obs y <- df.pred.obs$prd rg <-range(c(x,y), na.rm = T)par(mar =c(5,5,2,1), las =1)plot(x, y, type ="n", xlab ="Observed", ylab ="Predicted", main = name,ylim = rg, xlim = rg)grid()lines(x =c(-1e10, 1e10), y =c(-1e10, 1e10))points(x, y, pch =16, col =rgb(0,0,0,0.25), cex =1.25) mod <-lm(y ~ x) new.x <-range(x) new.y <-predict(mod, newdata =list(x = new.x))lines(new.x, new.y, lwd =2, col =2) r2 <-summary(mod)$adj.r.squared rrmse <- (100*sqrt(mean((x-y)^2))/mean(x)) r2.lab <-bquote(R^2== .(round(r2,2))) rrmse.lab <-bquote(symbol("%")*RMSE == .(round(rrmse)))legend("topleft",legend =c(r2.lab, rrmse.lab),text.col =2,bty ="n",x.intersp =0 )return(data.frame(var = name, r2 = r2, rrmse = rrmse))}```#### 1a. PrePost model```{r}#| warning: false#| message: false#| results: hide #| fig-height: 8#| fig-width: 8# suppress progress baroptions(lidR.progress = F)# directory structureals.dir.pre <-"S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_lidar/a04_height"# als.dir.pst <- "S:/ursa2/campbell/ems4d/data/als/als_2023/LAZ/Eastburn_processed"als.dir.pst.ann <-"S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_postfire_lidar/Annabella2021/LAZ/a03_hgt"als.dir.pst.mon <-"S:/ursa/campbell/fasmee/data/lidar_data/monroe_mtn_postfire_lidar/Monroe2020/LAZ_1.4/a03_hgt"# read in lascatalogsctg.pre <-readLAScatalog(als.dir.pre, filter ="-drop_withheld -drop_class 7 18", progress = F)# ctg.pst <- readLAScatalog(als.dir.pst, filter = "-drop_withheld -drop_class 7 18", progress = F)ctg.pst.ann <-readLAScatalog(als.dir.pst.ann, filter ="-drop_withheld -drop_class 7 18", progress = F)ctg.pst.mon <-readLAScatalog(als.dir.pst.mon, filter ="-drop_withheld -drop_class 7 18", progress = F)# reproject plots to match crs of pre and pst ctgsfuel.plots.pre <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pre))# fuel.plots.pst <- fuel.plots |># st_as_sf() |># st_transform(st_crs(ctg.pst))fuel.plots.pst.ann <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pst.ann)) |>subset(substr(Plot, 1, 1) %in%c("5", "7"))fuel.plots.pst.mon <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pst.mon)) |>subset(!substr(Plot, 1, 1) %in%c("5", "7"))# run plot metricsmets.pre <-plot_metrics(las = ctg.pre,func =~lidRmetrics::metrics_set3(x = X,y = Y,z = Z,i = Intensity,ReturnNumber = ReturnNumber,NumberOfReturns = NumberOfReturns ),geometry = fuel.plots.pre,radius =8) |>as.data.frame()# mets.pst <- plot_metrics(# las = ctg.pst,# func = ~lidRmetrics::metrics_set3(# x = X,# y = Y,# z = Z,# i = Intensity,# ReturnNumber = ReturnNumber,# NumberOfReturns = NumberOfReturns# ),# geometry = fuel.plots.pst,# radius = 8# ) |># as.data.frame()mets.pst.ann <-plot_metrics(las = ctg.pst.ann,func =~lidRmetrics::metrics_set3(x = X,y = Y,z = Z,i = Intensity,ReturnNumber = ReturnNumber,NumberOfReturns = NumberOfReturns ),geometry = fuel.plots.pst.ann,radius =8) |>as.data.frame()mets.pst.mon <-plot_metrics(las = ctg.pst.mon,func =~lidRmetrics::metrics_set3(x = X,y = Y,z = Z,i = Intensity,ReturnNumber = ReturnNumber,NumberOfReturns = NumberOfReturns ),geometry = fuel.plots.pst.mon,radius =8) |>as.data.frame()# merge the ann and mon datasetsmets.pst <-rbind(mets.pst.ann, mets.pst.mon)# sort the datamets.pre <- mets.pre[order(mets.pre$Plot),]mets.pst <- mets.pst[order(mets.pst$Plot),]# separate out predictor and response data framesmets.pre <- mets.pre[,!colnames(mets.pre) %in%colnames(fuel.plots.pre)]mets.pst <- mets.pst[,!colnames(mets.pst) %in%colnames(fuel.plots.pre)]# remove unscalable variablesdel.cols <-c("n", "n_first", "n_intermediate", "n_last", "n_single","n_multiple", "n_return_1", "n_return_2", "n_return_3","n_return_4", "vn")# remove predictors with no variabilitysds.pre <-apply(mets.pre, 2, sd)sds.pst <-apply(mets.pst, 2, sd)del.cols.pre <-c(del.cols, names(sds.pre)[sds.pre ==0&!is.na(sds.pre)])del.cols.pst <-c(del.cols, names(sds.pst)[sds.pst ==0&!is.na(sds.pst)])# remove predictors with na valuesnas.pre <-apply(mets.pre, 2, function(x) sum(is.na(x)))nas.pst <-apply(mets.pst, 2, function(x) sum(is.na(x)))del.cols.pre <-c(del.cols.pre, names(nas.pre)[nas.pre >0])del.cols.pst <-c(del.cols.pst, names(nas.pst)[nas.pst >0])# keep columnskeep.cols.pre <-colnames(mets.pre)[!colnames(mets.pre) %in% del.cols.pre]keep.cols.pst <-colnames(mets.pst)[!colnames(mets.pst) %in% del.cols.pst]mets.pre <- mets.pre[,keep.cols.pre]mets.pst <- mets.pst[,keep.cols.pst]# define response variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")vars.pre <-paste0(vars, "_pre")# loop through variables and model, plot, and compile results of eachsuccesses <-0for (var.pre in vars.pre){ resp.var <- fuel.plots[,var.pre] |>as.data.frame() plot.id <- fuel.plots[,"Plot"] |>as.data.frame()if (length(unique(resp.var[,1])) <5){next } else { successes <- successes +1 } mod.df <-cbind(plot.id, resp.var, mets.pre) df.pred.obs <-tuned.model(mod.df) df.pred.pre <-data.frame(Plot = df.pred.obs$Plot,x = df.pred.obs$prd)colnames(df.pred.pre)[2] <-paste0(var.pre, "_prd")if (successes ==1){ df.pred.pre.merge <- df.pred.pre } else { df.pred.pre.merge <-merge(df.pred.pre.merge, df.pred.pre, all = T) }}# define response variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")vars.pst <-paste0(vars, "_pst")# loop through variables and model, plot, and compile results of eachsuccesses <-0for (var.pst in vars.pst){ resp.var <- fuel.plots[,var.pst] |>as.data.frame() plot.id <- fuel.plots[,"Plot"] |>as.data.frame()if (length(unique(resp.var[,1])) <5){next } else { successes <- successes +1 } mod.df <-cbind(plot.id, resp.var, mets.pst) df.pred.obs <-tuned.model(mod.df) df.pred.pst <-data.frame(Plot = df.pred.obs$Plot,x = df.pred.obs$prd)colnames(df.pred.pst)[2] <-paste0(var.pst, "_prd")if (successes ==1){ df.pred.pst.merge <- df.pred.pst } else { df.pred.pst.merge <-merge(df.pred.pst.merge, df.pred.pst, all = T) }}# merge pre and post fire predictionsdf.pred.merge <-merge(df.pred.pre.merge, df.pred.pst.merge, all = T)# get fuel metric namesvars <-colnames(df.pred.merge) |>sub("_pre_prd", "", x = _) |>sub("_pst_prd", "", x = _) |>unique()vars <- vars[vars !="Plot"]# loop through them and calculate differencesfor (var in vars){ var.pre <-paste0(var, "_pre_prd") var.pst <-paste0(var, "_pst_prd") var.dif <-paste0(var, "_dif_prd")if (!all(c(var.pre, var.pst) %in%colnames(df.pred.merge))) next df.pred.merge[,var.dif] <- df.pred.merge[,var.pre] - df.pred.merge[,var.pst]}# join back to fuel plot datadf.pre.post <- fuel.plots |>as.data.frame() |>merge(df.pred.merge, all = T)# loop through the variables and compare consumption predictions to observationspar(mfrow =c(4,4))for (var in vars){ var.dif.prd <-paste0(var, "_dif_prd") var.dif.obs <-paste0(var, "_dif")if (!all(c(var.dif.prd, var.dif.obs) %in%colnames(df.pre.post))) next df.pred.obs <-data.frame(obs = df.pre.post[,var.dif.obs],prd = df.pre.post[,var.dif.prd]) perf.mets <-prd.obs.plot(df.pred.obs, var)if (var == vars[1]){ perf.mets.prepost <- perf.mets } else { perf.mets.prepost <-rbind(perf.mets.prepost, perf.mets) }}perf.mets.prepost$method <-"prepost"```#### 1b. Pooled model```{r}#| warning: false#| message: false#| fig-height: 10#| fig-width: 8# define response variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")vars.pre <-paste0(vars, "_pre")vars.pst <-paste0(vars, "_pst")# subset the fuels datafuel.plots.pooled.pre <- fuel.plots[,c("Plot", vars.pre)] |>as.data.frame()fuel.plots.pooled.pst <- fuel.plots[,c("Plot", vars.pst)] |>as.data.frame()# remove "_pre" and "_pst" from column names to enable row mergingnames(fuel.plots.pooled.pre) <-sub("_pre", "", names(fuel.plots.pooled.pre))names(fuel.plots.pooled.pst) <-sub("_pst", "", names(fuel.plots.pooled.pst))# add "_pre" and "_pst" to plot idsfuel.plots.pooled.pre$Plot <-paste0(fuel.plots.pooled.pre$Plot, "_pre")fuel.plots.pooled.pst$Plot <-paste0(fuel.plots.pooled.pst$Plot, "_pst")# merge rowwisefuel.plots.pooled <-bind_rows(fuel.plots.pooled.pre, fuel.plots.pooled.pst)# merge mets rowwise toomets.pooled <-bind_rows(mets.pre, mets.pst)# remove columns with nasmets.pooled <- mets.pooled[,apply(mets.pooled, 2, function(x) sum(is.na(x)) ==0)]# loop through variables and model, plot, and compile results of eachsuccesses <-0for (var in vars){ resp.var <- fuel.plots.pooled[var] plot.id <- fuel.plots.pooled["Plot"]if (length(unique(resp.var[,1])) <5){next } else { successes <- successes +1 } mod.df <-cbind(plot.id, resp.var, mets.pooled) df.pred.obs <-tuned.model(mod.df) df.pred.pooled <-data.frame(Plot = df.pred.obs$Plot,x = df.pred.obs$prd)colnames(df.pred.pooled)[2] <-paste0(var, "_prd")if (successes ==1){ df.pred.pooled.merge <- df.pred.pooled } else { df.pred.pooled.merge <-merge(df.pred.pooled.merge, df.pred.pooled, all = T) }}# separate back out the pre- and post-fire plotsdf.pred.pooled.pre <- df.pred.pooled.merge[grep("_pre", df.pred.pooled.merge$Plot),]df.pred.pooled.pst <- df.pred.pooled.merge[grep("_pst", df.pred.pooled.merge$Plot),]# add "_pre" and "_pst" to prediction column namescolnames(df.pred.pooled.pre) <-sub("_prd", "_pre_prd", colnames(df.pred.pooled.pre))colnames(df.pred.pooled.pst) <-sub("_prd", "_pst_prd", colnames(df.pred.pooled.pst))# remove "_pre" and "_pst" from plot namesdf.pred.pooled.pre$Plot <-sub("_pre", "", df.pred.pooled.pre$Plot)df.pred.pooled.pst$Plot <-sub("_pst", "", df.pred.pooled.pst$Plot)# merge pre and post fire predictionsdf.pred.pooled.merge <-merge(df.pred.pooled.pre, df.pred.pooled.pst, all = T)# get fuel metric namesvars <-colnames(df.pred.pooled.merge) |>sub("_pre_prd", "", x = _) |>sub("_pst_prd", "", x = _) |>unique()vars <- vars[vars !="Plot"]# loop through them and calculate differencesfor (var in vars){ var.pre <-paste0(var, "_pre_prd") var.pst <-paste0(var, "_pst_prd") var.dif <-paste0(var, "_dif_prd")if (!all(c(var.pre, var.pst) %in%colnames(df.pred.pooled.merge))) next df.pred.pooled.merge[,var.dif] <- df.pred.pooled.merge[,var.pre] - df.pred.pooled.merge[,var.pst]}# join back to fuel plot datadf.pooled <- fuel.plots |>as.data.frame() |>merge(df.pred.pooled.merge, all = T)# loop through the variables and compare consumption predictions to observationspar(mfrow =c(5,4))for (var in vars){ var.dif.prd <-paste0(var, "_dif_prd") var.dif.obs <-paste0(var, "_dif")if (!all(c(var.dif.prd, var.dif.obs) %in%colnames(df.pooled))) next df.pred.obs <-data.frame(obs = df.pooled[,var.dif.obs],prd = df.pooled[,var.dif.prd]) perf.mets <-prd.obs.plot(df.pred.obs, var)if (var == vars[1]){ perf.mets.pooled <- perf.mets } else { perf.mets.pooled <-rbind(perf.mets.pooled, perf.mets) }}perf.mets.pooled$method <-"pooled"```### Approach #2: Profile Area Change (PAC)Briefly, the idea here is that the area under the curve (AUC) representing the quantile distribution of heights approximately represents the volume of fuels in a given plot/pixel. Thus, the difference between AUC before and after a fire shoud capture fuel volumetric (or loading) change, as in the visual example shown below.```{r}#| warning: false#| message: false#| fig-width: 5#| fig-height: 5#| results: hide # define percentile metrics functionperc.fun <-function(z){ q <-quantile(z, seq(0,1,0.01)) |>as.list()names(q) <-paste0("p_", stringr::str_pad(seq(0,100,1), 3, "left", "0"))return(q)}# reproject plots to match crs of pre and pst ctgsfuel.plots.pre <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pre))# fuel.plots.pst <- fuel.plots |># st_as_sf() |># st_transform(st_crs(ctg.pst))fuel.plots.pst.ann <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pst.ann)) |>subset(substr(Plot, 1, 1) %in%c("5", "7"))fuel.plots.pst.mon <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pst.mon)) |>subset(!substr(Plot, 1, 1) %in%c("5", "7"))# run plot metricspercs.pre <-plot_metrics(las = ctg.pre,func =~perc.fun(Z),geometry = fuel.plots.pre,radius =8) |>st_drop_geometry()# percs.pst <- plot_metrics(# las = ctg.pst,# func = ~perc.fun(Z),# geometry = fuel.plots.pst,# radius = 8# ) |># st_drop_geometry()percs.pst.ann <-plot_metrics(las = ctg.pst.ann,func =~perc.fun(Z),geometry = fuel.plots.pst.ann,radius =8) |>st_drop_geometry()percs.pst.mon <-plot_metrics(las = ctg.pst.mon,func =~perc.fun(Z),geometry = fuel.plots.pst.mon,radius =8) |>st_drop_geometry()# merge the ann and mon datasetspercs.pst <-rbind(percs.pst.ann, percs.pst.mon)# sort the datapercs.pre <- percs.pre[order(percs.pre$Plot),]percs.pst <- percs.pst[order(percs.pst$Plot),]# separate out predictor and response data framespercs.pre <- percs.pre[,!colnames(percs.pre) %in%colnames(fuel.plots.pre)]percs.pst <- percs.pst[,!colnames(percs.pst) %in%colnames(fuel.plots.pre)]# set negative values to zeropercs.pre[percs.pre <0] <-0percs.pst[percs.pst <0] <-0# plot an example outex.row <-1percs.pre.ex <- percs.pre[ex.row,]percs.pst.ex <- percs.pst[ex.row,]par(mar =c(5,5,1,1), las =1)plot(x =c(0,100), y =range(c(percs.pre.ex, percs.pst.ex)),type ="n", xlab ="Percentile", ylab ="Height (m)")grid()polygon(x =c(0, seq(0,100), 100),y =c(0, percs.pre.ex, 0),col =adjust_transparency(2, 0.5),border =NA)lines(x =seq(0,100), y = percs.pre.ex, col =2, lwd =2)polygon(x =c(0, seq(0,100), 100),y =c(0, percs.pst.ex, 0),col =adjust_transparency(4, 0.5),border =NA)lines(x =seq(0,100), y = percs.pst.ex, col =4, lwd =2)legend("topleft", legend =c("Pre-fire", "Post-fire"), col =c(2, 4), fill =c(adjust_transparency(2, 0.5), adjust_transparency(4, 0.5)))```#### 2a. Raw height PAC for profile slicesWe found out in the previous document that a singular measure of PAC was insufficient for capturing nuance of consumption in different fuel components. To address this, I took "slices" out of the height quantile profile at a series of different quantile ranges and fed those into an RF to model consumption directly. The first way was using the raw height quantiles (i.e., in meters). **In addition to the PAC values as predictors, I also threw in the pre-fire height quantiles as candidates**. As mentioned in the *Introduction*, this may help the model distinguish forests of different pre-fire structure, which could help provide context for the PAC values.```{r}#| warning: false#| message: false#| fig-height: 10#| fig-width: 8# create auc functionauc.slice.fun <-function(y, slice.min, slice.max){ x <-seq(0,100) auc <-integrate(splinefun(x, y), slice.min, slice.max)$valuereturn(auc)}# begin slice.min loopcounter <-0for (i inseq(0,100,10)){for (j inseq(0,100,10)){if (j > i){ counter <- counter +1 auc.raw.slice.pre <-apply(percs.pre, 1, function(x) auc.slice.fun(x, i, j)) auc.raw.slice.pst <-apply(percs.pst, 1, function(x) auc.slice.fun(x, i, j)) pac.raw.slice <- auc.raw.slice.pre - auc.raw.slice.pstif (counter ==1){ pac.raw.slice.df <-data.frame(x = pac.raw.slice)colnames(pac.raw.slice.df) <-paste0("pac_", i, "_", j) } else { pac.raw.slice.df <-cbind(pac.raw.slice.df, data.frame(x = pac.raw.slice))colnames(pac.raw.slice.df)[ncol(pac.raw.slice.df)] <-paste0("pac_", i, "_", j) } } }}# define response variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")vars <-paste0(vars, "_dif")# get plot idsplot.ids <- fuel.plots[,"Plot"] |>as.data.frame()# loop through thempar(mfrow =c(5,4), las =1, mar =c(5,5,2,1))for (var in vars){ y <- fuel.plots[,var] |>as.data.frame() x <-cbind(percs.pre, pac.raw.slice.df) mod.df <-cbind(plot.ids, y, x) df.pred.obs <-tuned.model(mod.df) perf.mets <-prd.obs.plot(df.pred.obs, var)if (var == vars[1]){ perf.mets.pacraw <- perf.mets } else { perf.mets.pacraw <-rbind(perf.mets.pacraw, perf.mets) }}perf.mets.pacraw$method <-"pacraw"```#### 2b. Normalized height PAC for profile slicesThe second way (more akin to how Hu et al. (2019) did it) was using normalized height quantiles (i.e., percent relative to the maximum per-plot height between pre- and post-fire data):```{r}#| warning: false#| message: false#| fig-height: 10#| fig-width: 8# get plot-level maximum heightsmax.hgt.pre <-apply(percs.pre, 1, max)max.hgt.pst <-apply(percs.pst, 1, max)max.hgt <-apply(cbind(max.hgt.pre, max.hgt.pst), 1, max)# normalize heights to maxpercs.pre.norm <- percs.prefor (i inseq(1,nrow(percs.pre.norm))){ percs.pre.norm[i,] <- percs.pre.norm[i,] / max.hgt[i]}percs.pst.norm <- percs.pstfor (i inseq(1,nrow(percs.pst.norm))){ percs.pst.norm[i,] <- percs.pst.norm[i,] / max.hgt[i]}# begin slice.min loopcounter <-0for (i inseq(0,100,10)){for (j inseq(0,100,10)){if (j > i){ counter <- counter +1 auc.norm.slice.pre <-apply(percs.pre.norm, 1, function(x) auc.slice.fun(x, i, j)) auc.norm.slice.pst <-apply(percs.pst.norm, 1, function(x) auc.slice.fun(x, i, j)) pac.norm.slice <- auc.norm.slice.pre - auc.norm.slice.pstif (counter ==1){ pac.norm.slice.df <-data.frame(x = pac.norm.slice)colnames(pac.norm.slice.df) <-paste0("pac_", i, "_", j) } else { pac.norm.slice.df <-cbind(pac.norm.slice.df, data.frame(x = pac.norm.slice))colnames(pac.norm.slice.df)[ncol(pac.norm.slice.df)] <-paste0("pac_", i, "_", j) } } }}# define response variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")vars <-paste0(vars, "_dif")# get plot idsplot.ids <- fuel.plots[,"Plot"] |>as.data.frame()# loop through thempar(mfrow =c(5,4), las =1, mar =c(5,5,2,1))for (var in vars){ y <- fuel.plots[,var] |>as.data.frame() x <-cbind(percs.pre.norm, pac.norm.slice.df) mod.df <-cbind(plot.ids, y, x) df.pred.obs <-tuned.model(mod.df) perf.mets <-prd.obs.plot(df.pred.obs, var)if (var == vars[1]){ perf.mets.pacnorm <- perf.mets } else { perf.mets.pacnorm <-rbind(perf.mets.pacnorm, perf.mets) }}perf.mets.pacnorm$method <-"pacnorm"```### Approach #3: Profile Density Change (PDC)Here's where things start to diverge from the previous document. As mentioned in the *Introduction*, Andy suggested exploring densities at different height ranges, rather than height quantiles, and differences therein, as potential predictors of fuel consumption. To do this, I used a Gaussian kernel density estimator with a 5cm bandwidth, calculated at every 5cm vertical interval from 0 to 30m in height. I calculated the difference between pre-fire and post-fire densities, and use those as predictors of fuel consumption. As with PAC, I also added in the pre-fire densities to give the model a sense for pre-fire structure, hopefully allowing it to make more forest-structure-specific predictions. First, I'll plot out two density profiles for an example plot, one representing pre-fire densities, and one representing post-fire densities. The difference between the two at different heights should capture fuelbed-specific consumption (at least that's the theory!).```{r}#| warning: false#| message: false#| fig-height: 5#| fig-width: 8# define percentile metrics functiondens.fun <-function(z, bw =0.05){ z[z <0] <-0 d <-density(z, bw, from =0, to =30, n =length(seq(0,30,bw))) dy <-as.list(d$y)names(dy) <-paste0("d.", d$x)return(dy)}# reproject plots to match crs of pre and pst ctgsfuel.plots.pre <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pre))# fuel.plots.pst <- fuel.plots |># st_as_sf() |># st_transform(st_crs(ctg.pst))fuel.plots.pst.ann <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pst.ann)) |>subset(substr(Plot, 1, 1) %in%c("5", "7"))fuel.plots.pst.mon <- fuel.plots |>st_as_sf() |>st_transform(st_crs(ctg.pst.ann)) |>subset(!substr(Plot, 1, 1) %in%c("5", "7"))# run plot metricsdens.pre <-plot_metrics(las = ctg.pre,func =~dens.fun(Z),geometry = fuel.plots.pre,radius =8) |>st_drop_geometry()# dens.pst <- plot_metrics(# las = ctg.pst,# func = ~dens.fun(Z),# geometry = fuel.plots.pst,# radius = 8# ) |># st_drop_geometry()dens.pst.ann <-plot_metrics(las = ctg.pst.ann,func =~dens.fun(Z),geometry = fuel.plots.pst.ann,radius =8) |>st_drop_geometry()dens.pst.mon <-plot_metrics(las = ctg.pst.mon,func =~dens.fun(Z),geometry = fuel.plots.pst.mon,radius =8) |>st_drop_geometry()# merge the ann and mon datasetsdens.pst <-rbind(dens.pst.ann, dens.pst.mon)# sort the datadens.pre <- dens.pre[order(dens.pre$Plot),]dens.pst <- dens.pst[order(dens.pst$Plot),]# separate out predictor and response data framesdens.pre <- dens.pre[,!colnames(dens.pre) %in%colnames(fuel.plots.pre)]dens.pst <- dens.pst[,!colnames(dens.pst) %in%colnames(fuel.plots.pre)]# calculate differencesdens.dif <- dens.pre - dens.pst# change column namescolnames(dens.pre) <-paste0(colnames(dens.pre), ".pre")colnames(dens.pst) <-paste0(colnames(dens.pst), ".pst")colnames(dens.dif) <-paste0(colnames(dens.dif), ".dif")# plot an example outex.row <-1dens.pre.ex <- dens.pre[ex.row,]dens.pst.ex <- dens.pst[ex.row,]par(mar =c(5,5,1,1), las =1)plot(x =c(0,30), y =range(c(dens.pre.ex, dens.pst.ex)),type ="n", xlab ="Height (m)", ylab ="Density")grid()polygon(x =c(0, seq(0,30,0.05), 30),y =c(0, dens.pre.ex, 0),col =adjust_transparency(2, 0.5),border =NA)lines(x =seq(0,30,0.05), y = dens.pre.ex, col =2, lwd =2)polygon(x =c(0, seq(0,30,0.05), 30),y =c(0, dens.pst.ex, 0),col =adjust_transparency(4, 0.5),border =NA)lines(x =seq(0,30,0.05), y = dens.pst.ex, col =4, lwd =2)legend("topright", legend =c("Pre-fire", "Post-fire"), col =c(2, 4), fill =c(adjust_transparency(2, 0.5), adjust_transparency(4, 0.5)))``````{r}#| warning: false#| message: false#| fig-height: 10#| fig-width: 8# define response variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "shrubs", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")vars <-paste0(vars, "_dif")# get plot idsplot.ids <- fuel.plots[,"Plot"] |>as.data.frame()# loop through thempar(mfrow =c(5,4), las =1, mar =c(5,5,2,1))for (var in vars){ x <-cbind(dens.pre, dens.dif) y <- fuel.plots[,var] |>as.data.frame() mod.df <-cbind(plot.ids, y,x) df.pred.obs <-tuned.model(mod.df) perf.mets <-prd.obs.plot(df.pred.obs, var)if (var == vars[1]){ perf.mets.pdc <- perf.mets } else { perf.mets.pdc <-rbind(perf.mets.pdc, perf.mets) }}perf.mets.pdc$method <-"pdc"```### Approach #4: PAC+PDCPAC seems to do pretty well at predicting consumption in certain fuel beds, and PDC in others. Perhaps through their combined use, we can predict fuel consumption well across a greater diversity of fuel beds.```{r}#| warning: false#| message: false#| fig-height: 10#| fig-width: 8# loop through thempar(mfrow =c(5,4), las =1, mar =c(5,5,2,1))for (var in vars){ x <-cbind(dens.pre, dens.dif, percs.pre.norm, pac.norm.slice.df) y <- fuel.plots[,var] |>as.data.frame() mod.df <-cbind(plot.ids, y,x) df.pred.obs <-tuned.model(mod.df) perf.mets <-prd.obs.plot(df.pred.obs, var)if (var == vars[1]){ perf.mets.pacpdc <- perf.mets } else { perf.mets.pacpdc <-rbind(perf.mets.pacpdc, perf.mets) }}perf.mets.pacpdc$method <-"pacpdc"```## Results ComparisonLastly, let's take a look at all of the results together to compare methods.```{r}#| warning: false#| message: false#| fig-height: 10#| fig-width: 10# define the variables of interestvars <-c("hr1", "hr10", "hr100", "hr1000", "nonwody", "trees", "llm", "ground", "shrb_sd", "woody", "uf", "cf", "cbd", "acf", "tf", "af")# remove "_dif" from variable names in performance metrics tablesperf.mets.pacraw$var <-sub("_dif", "", perf.mets.pacraw$var)perf.mets.pacnorm$var <-sub("_dif", "", perf.mets.pacnorm$var)perf.mets.pdc$var <-sub("_dif", "", perf.mets.pdc$var)perf.mets.pacpdc$var <-sub("_dif", "", perf.mets.pacpdc$var)# create "long" tableperf.mets.long <-rbind( perf.mets.prepost, perf.mets.pooled, perf.mets.pacraw, perf.mets.pacnorm, perf.mets.pdc, perf.mets.pacpdc)# add method to r2 and rrmse colnamesperf.mets.prepost$method <-NULLcolnames(perf.mets.prepost)[2:3] <-paste0(colnames(perf.mets.prepost)[2:3], ".prepost")perf.mets.pooled$method <-NULLcolnames(perf.mets.pooled)[2:3] <-paste0(colnames(perf.mets.pooled)[2:3], ".pooled")perf.mets.pacraw$method <-NULLcolnames(perf.mets.pacraw)[2:3] <-paste0(colnames(perf.mets.pacraw)[2:3], ".pacraw")perf.mets.pacnorm$method <-NULLcolnames(perf.mets.pacnorm)[2:3] <-paste0(colnames(perf.mets.pacnorm)[2:3], ".pacnorm")perf.mets.pdc$method <-NULLcolnames(perf.mets.pdc)[2:3] <-paste0(colnames(perf.mets.pdc)[2:3], ".pdc")perf.mets.pacpdc$method <-NULLcolnames(perf.mets.pacpdc)[2:3] <-paste0(colnames(perf.mets.pacpdc)[2:3], ".pacpdc")# merge them to create "wide" tableperf.mets.wide <-merge(perf.mets.prepost, perf.mets.pooled) |>merge(perf.mets.pacraw) |>merge(perf.mets.pacnorm) |>merge(perf.mets.pdc) |>merge(perf.mets.pacpdc)# define colorscols <-brewer.pal(6, "Dark2")# set up the plotpar(mfrow =c(1,2), las =1)# create empty plotx <-rep(100,length(vars))names(x) <- varsr2.cols <- perf.mets.wide[,grep("r2", colnames(perf.mets.wide))]dotchart(x, pch =16, xlab =expression(R^2),xlim =range(r2.cols))# loop though variablesfor (i inseq(1,length(vars))){ var <- vars[i]points(x = perf.mets.wide$r2.prepost[perf.mets.wide$var == var],y = i, pch =16, col = cols[1])points(x = perf.mets.wide$r2.pooled[perf.mets.wide$var == var],y = i, pch =16, col = cols[2])points(x = perf.mets.wide$r2.pacraw[perf.mets.wide$var == var],y = i, pch =16, col = cols[3])points(x = perf.mets.wide$r2.pacnorm[perf.mets.wide$var == var],y = i, pch =16, col = cols[4])points(x = perf.mets.wide$r2.pdc[perf.mets.wide$var == var],y = i, pch =16, col = cols[5])points(x = perf.mets.wide$r2.pacpdc[perf.mets.wide$var == var],y = i, pch =16, col = cols[6])}# create empty ploty <-rep(10000, length(vars))names(y) <- varsrrmse.cols <- perf.mets.wide[perf.mets.wide$var !="nonwody",grep("rrmse", colnames(perf.mets.wide))]dotchart(y, pch =16, xlab =expression(symbol("%")*RMSE),xlim =range(rrmse.cols))# loop though variablesfor (i inseq(1,length(vars))){ var <- vars[i]points(x = perf.mets.wide$rrmse.prepost[perf.mets.wide$var == var],y = i, pch =16, col = cols[1])points(x = perf.mets.wide$rrmse.pooled[perf.mets.wide$var == var],y = i, pch =16, col = cols[2])points(x = perf.mets.wide$rrmse.pacraw[perf.mets.wide$var == var],y = i, pch =16, col = cols[3])points(x = perf.mets.wide$rrmse.pacnorm[perf.mets.wide$var == var],y = i, pch =16, col = cols[4])points(x = perf.mets.wide$rrmse.pdc[perf.mets.wide$var == var],y = i, pch =16, col = cols[5])points(x = perf.mets.wide$rrmse.pacpdc[perf.mets.wide$var == var],y = i, pch =16, col = cols[6])}# add legendlegend("topright", legend =c("PrePost", "Pooled", "Raw PAC", "Norm PAC", "PDC", "PAC + PDC"),pch =16, col = cols)# figure out which approach performed best for each fuelbedfn <-function(x){ m <-which.max(x) cn <-colnames(r2.cols)[m] var <-sub("r2\\.", "", cn)return(var)}r2.best <-apply(r2.cols, 1, fn)fn <-function(x){ m <-which.max(x) cn <-colnames(rrmse.cols)[m] var <-sub("rrmse\\.", "", cn)return(var)}rrmse.best <-apply(r2.cols, 1, fn)df.best <-data.frame(var = perf.mets.wide$var,r2.best = r2.best,rrmse.best = rrmse.best)paged_table(df.best, options =list(rows.print =nrow(df.best)))# plot out best performer countspar(mfrow =c(1,2), mar =c(5,5,1,1), las =1)r2.best.n <-table(factor(df.best$r2.best, levels =c("prepost", "pooled", "pacraw", "pacnorm", "pdc", "pacpdc")))barplot(r2.best.n, col = cols, ylab =expression("Best Performer by "*R^2))rrmse.best.n <-table(factor(df.best$rrmse.best, levels =c("prepost", "pooled", "pacraw", "pacnorm", "pdc", "pacpdc")))barplot(rrmse.best.n, col = cols, ylab =expression("Best Performer by "*symbol("%")*RMSE))# plot out their performance as boxplotspar(mfrow =c(1,2), mar =c(5,3,1,1), las =1)perf.mets.long$method <-factor(perf.mets.long$method, levels =c("prepost", "pooled", "pacraw", "pacnorm", "pdc", "pacpdc"))bp <-boxplot(r2 ~ method, data = perf.mets.long, col = cols,xaxt ="n", ylim =range(r2.cols), ylab =expression(R^2),xlab =NA)tick <-seq_along(bp$names)axis(1, at = tick, labels = F)text(x = tick, y =par("usr")[3] -0.05*diff(par("usr")[3:4]),labels =c("PrePost", "Pooled", "PAC Raw", "PAC Norm", "PDC", "PAC + PDC"),srt =45, xpd = T)bp <-boxplot(rrmse ~ method, data = perf.mets.long, col = cols,xaxt ="n", ylim =range(rrmse.cols), ylab =expression(symbol("%")*RMSE), xlab =NA)tick <-seq_along(bp$names)axis(1, at = tick, labels = F)text(x = tick, y =par("usr")[3] -0.05*diff(par("usr")[3:4]),labels =c("PrePost", "Pooled", "PAC Raw", "PAC Norm", "PDC", "PAC + PDC"),srt =45, xpd = T)```PDC and PAC + PDC tend to yield the best performance, especially with respect to R^2^, and to a somewhat lesser extent, %RMSE. One of the two was best in terms of both R^2^ and %RMSE for nearly all of the fuel consumption models.The big question is **is PDC good/novel enough to put out into the world as a methods-focused publication?**