Issues with PJ Biomass Modeling

Background

As we all know, I am in the midst of modeling PJ biomass for the CNH project, and have been for a number of months at this point. As we also know, about a month ago, we met and decided that, to improve our modeling results, it would be advantageous to collect a few more high-biomass sites, given our limited sample on the high end of the biomass spectrum. To that end, Kelly went and collected 10 more plots and delivered the data to me recently. I was able to incorporate the plots into our modeling procedure, expecting that model results would drastically improve and, well… Things not only didn’t improve but they got significantly worse. Something weird is going on.

On top of that, I’ve been doing a lot more reviewing of the biomass literature lately, and I’ve realize that the most commonly-reported unit for aboveground biomass is megagrams per hectare (Mg ha^-1). In all of the past results I’ve shared with this group, the units I’ve been using are kilograms per square meter (kg m^-2). I did this because the allometric equations I’ve been using produce per-tree biomass in kg, and the plot radii used to collect the data were measured in meters (15 m). In keeping with the industry standard, I decided to convert kg m^-2 to Mg ha^-1 (multiply by 10). As a result, the biomass estimates that emerged appear to be WAY too high in comparison to what others who have studied PJ biomass have produced.

So, I think it’s worth meeting to discuss how and why these problems are arising. And I thought it would be helpful to put together an R markdown document to try to explain what might be happening.

Maximum reported PJ biomass in the literature

After reviewing the PJ biomass literature, it appears that PJ biomass rarely exceeds 70 Mg ha^-1. Here is a selection of maximum reported PJ biomass totals:

DeBano et al. (1987)
- 69 Mg ha^-1
Grier et al. (1992)
- 54 Mg ha^-1
Huang et al. (2009)
- 38 Mg ha^-1
Huffman et al. (2012)
- 65 Mg ha^-1
Filipelli et al. (2020)
- 60 Mg ha^-1

Estimated PJ biomass in our data

For comparison purposes, let’s take a look at our data. To do this, I’ll load a summary data frame that is used for modeling that contains plot-level biomass estimates (along with a suite of lidar metrics tha we won’t worry about for now).

library(knitr)
library(stringr)

# set working directory and read in data
df.1 <- read.csv("Modeling/biomass_plots_with_lidar_metrics.csv")

# replace underscores with periods in column names
colnames(df.1) <- str_replace_all(colnames(df.1), "_", ".")

# convert biomass from kg/m2 to Mg/ha
df.1$biomass <- df.1$biomass * 10

# print the first few rows of the data frame
kable(df.1[1:8])

plot	lat	long	n.trees	mean.drc	date	biomass	mortality
1	37.35223	-109.9600	31	17.500605	2018-04-07	43.9803661	0.3777196
10	37.43988	-109.9196	71	23.752216	2018-04-20	131.4698865	0.1945273
100	37.58169	-109.8869	77	10.923300	2018-05-04	40.7244259	0.0164353
101	37.60295	-109.9033	109	20.284246	2018-05-05	212.7102934	0.3546020
102	37.61189	-109.8735	211	9.027764	2018-05-05	88.6524352	0.0781815
103	37.58645	-109.9213	35	5.789401	2018-05-05	4.1070113	0.0464187
104A	37.59344	-109.9224	44	18.720322	2018-05-06	103.9831421	0.3537040
104B	37.59464	-109.9223	139	15.284312	2018-05-06	141.9481435	0.4513033
105	37.56707	-109.8962	91	16.721960	2018-05-06	101.8238454	0.4423604
106	37.54565	-109.9401	40	22.783494	2018-11-03	46.8608143	0.2404879
107	37.60271	-109.8955	112	24.104814	2018-11-03	217.1068131	0.4502042
108	37.55420	-109.9386	48	25.769101	2019-04-22	89.4048964	0.2718539
109A	37.54067	-109.9425	26	19.532921	2019-04-22	36.2508830	0.0579860
109B	37.53984	-109.9425	24	15.075848	2019-04-22	18.0505481	0.1212702
11	37.37145	-109.9585	64	21.366451	2018-04-20	98.1975928	0.2348037
110	37.56520	-109.9258	47	25.426520	2019-04-22	79.3616768	0.2800627
111	37.58792	-109.9104	49	22.966573	2019-04-22	80.4883206	0.4563638
1114	37.61010	-109.8823	62	7.483871	2020-06-15	10.2010074	0.2711873
1115	37.58688	-109.8836	103	5.635534	2020-06-15	6.8789137	0.1224911
1116	37.61420	-109.8761	55	9.265455	2020-06-15	23.7499889	0.1804297
1117	37.60565	-109.8685	88	6.802273	2020-06-15	11.5153462	0.0770625
1118	37.60851	-109.8685	107	5.208411	2020-06-15	7.6498479	0.0688484
1119	37.58373	-109.9039	51	9.423529	2020-06-16	11.7222654	0.2471085
112	37.60026	-109.9120	137	23.866901	2019-04-23	306.4341535	0.2130539
1120	37.61559	-109.8733	77	6.051948	2020-06-16	9.0262551	0.1083247
1121	37.60533	-109.9107	72	10.326389	2020-06-16	23.3475251	0.4814150
113A	37.60144	-109.8968	79	25.876154	2019-04-24	172.3583399	0.2917850
113B	37.60024	-109.9001	60	23.029991	2019-04-24	135.8345509	0.2146876
114	37.58327	-109.9011	67	19.943040	2019-04-24	84.1151732	0.1959126
11B	37.37005	-109.9566	66	23.029218	2019-04-20	103.5950332	0.2311981
12	37.45719	-109.9090	89	17.413216	2018-04-21	102.2702630	0.3428152
13	37.38676	-109.9421	57	28.486975	2019-04-20	119.7649108	0.2752291
14	37.43282	-109.9286	32	28.724216	2019-04-21	82.2964753	0.3451241
1403	37.76833	-110.0592	36	11.575000	2020-06-17	12.1600277	0.3150780
1404	37.76903	-110.0645	102	9.952941	2020-06-17	20.2298588	0.3044603
15A	37.48187	-109.9038	57	24.301565	2019-04-21	93.8015586	0.2235890
15B	37.48089	-109.9061	54	29.854789	2019-04-21	120.4782041	0.2635713
16	37.50892	-109.8772	134	17.765550	2018-05-07	152.6605367	0.4275046
17A	37.50439	-109.8858	135	19.348022	2018-05-07	155.3071458	0.3364023
17B	37.50511	-109.8854	122	18.136020	2018-05-07	142.9515288	0.3279061
19	37.51656	-109.8767	24	5.260922	2018-04-08	5.8394272	0.0000000
2	37.43807	-109.9100	70	15.511645	2018-11-03	45.8093373	0.2735036
200	37.52385	-110.0696	13	18.492380	2018-11-03	10.1026665	0.3360591
201	37.54160	-110.0231	54	21.500645	2018-11-03	58.5323700	0.2794712
202	37.53050	-110.0617	20	18.833064	2019-04-23	26.2246752	0.4019056
203	37.55587	-110.0195	34	19.364103	2018-04-28	52.9514127	0.0927232
204A	37.55809	-110.0014	27	35.054525	2019-04-23	89.1930037	0.3979245
204B	37.55888	-110.0010	30	35.546730	2019-04-23	158.3045793	0.4451332
205	37.51403	-110.0917	13	47.516482	2019-04-25	105.6664943	0.2825175
208A	37.53958	-110.0478	29	31.740388	2018-05-07	136.1453661	0.1906853
208B	37.53860	-110.0477	30	19.316090	2018-11-03	29.3816658	0.4060838
3	37.36986	-109.9440	102	21.545867	2018-04-07	139.5917758	0.2129052
300	37.75340	-110.0898	33	31.177316	2019-04-26	133.0137676	0.3336208
301	37.74060	-110.0250	5	6.681661	2018-04-27	0.4209419	0.0000000
302	37.76788	-110.0842	37	23.317676	2019-04-26	63.4409242	0.4974081
303	37.74370	-110.0103	31	12.922065	2019-04-26	21.5616588	0.0000000
4	37.51177	-109.8815	52	26.332381	2019-04-18	105.5235005	0.2445582
400	37.80253	-110.1699	21	38.460677	2019-04-25	80.7302024	0.3091873
401	37.79959	-110.1637	18	34.807375	2019-04-26	73.8836559	0.2443716
402	37.79940	-110.1568	23	16.633322	2018-04-27	27.3407906	0.3527163
5A	37.49047	-109.9040	49	26.439261	2019-04-19	112.0458976	0.1880696
5B	37.49011	-109.9038	39	31.059682	2019-04-19	96.4524915	0.2982917
7	37.44877	-109.9257	46	27.939301	2019-04-19	89.3782193	0.1477916
8	37.42201	-109.9434	40	20.279588	2019-04-19	42.2887748	0.2248098
9	37.35397	-109.9692	36	29.132126	2019-04-20	85.3381482	0.2425134
CM4	37.58076	-109.8852	70	15.425057	2019-05-03	64.6955800	0.0138164
MD1	37.49013	-109.8975	97	22.574577	2019-05-04	142.6904922	0.0696933

# plot histogram of biomass
par(mar = c(5,5,1,1))
hist(df.1$biomass,
     main = NA,
     xlab = "Biomass (Mg/ha)",
     breaks = seq(0,325,25),
     las = 1)

The median biomass among our plots is 82.3 Mg ha^-1. The maximum biomass among our plots is 306.43 Mh ha^-1. So… Clearly something’s not right.

PJ allometry

For this study I’ve been relying on the allometric equations provided by Chonacky et al. (2014). It’s widely-cited, widely-used, and has allometries for all species found in our study area.

The allometric equation for pinyon pines is:

\[ Biomass_{kg} = e^{-3.2007 + 2.5339 \cdot ln(DRC_{cm})} \]

The allometric equation for juniper is:

\[ Biomass_{kg} = e^{-2.7096 + 2.1942 \cdot ln(DRC_{cm})} \]

This results in the following DRC-biomass relationships:

# create vector of drcs and calculate allometric biomass for p and j
drc <- seq(0,150)
bio.pied <- exp(-3.2007 + 2.5339 * log(drc))
bio.juos <- exp(-2.7096 + 2.1942 * log(drc))

# plot the data
par(mar = c(5,5,1,1))
plot(bio.pied ~ drc,
     type = "l",
     lty = 1,
     lwd = 2,
     las = 1,
     col = "blue",
     xlab = "DRC (cm)",
     ylab = "Biomass (kg)",
     ylim = c(0,max(bio.pied, bio.juos)))

lines(bio.juos ~ drc,
      lwd = 2,
      col = "red")

grid()

# add legend
legend("topleft",
       legend = c("PIED", "JUOS"),
       lwd = 2,
       col = c("blue", "red"))

As you can see, DRC being equal, a pinyon will have a higher biomass than a juniper.

A deeper dive into the plot data

Let’s take a closer look at what may be driving our anomalously-high biomass plot estimates. To do this, I’ll load a more detailed dataset that contains all tree-level information in all plots collected for the study. Note that biomass is shown in kg.

# read in the data
df.2 <- read.csv("Field_Data/all_plot_data_20200626.csv")

# print the first few rows
cols <- c("date", "plot", "lat", "long", "species", "drc.cm", "biomass")
kable(head(df.2[,cols]))

date	plot	lat	long	species	drc.cm	biomass
2018-04-07	3	37.36986	-109.944	JUOS	28.011270	99.7638729
2018-04-07	3	37.36986	-109.944	PIED	6.167465	4.0926530
2018-04-07	3	37.36986	-109.944	JUOS	32.685385	139.9685115
2018-04-07	3	37.36986	-109.944	JUOS	15.392776	26.8192440
2018-04-07	3	37.36986	-109.944	JUOS	44.767928	279.1177435
2018-04-07	3	37.36986	-109.944	PIED	2.668771	0.4899863

Note that this dataset contains all of Kelly’s and Steve’s data. However, one of Kelly’s plots (plot #6) was incomplete, and many of Steve’s plots were outside of the lidar boxes. So, let’s narrow it down to just the plots that will ultimately be used in the study:

# create a list of plots from the plot-level summary data frame
plots <- df.1$plot

# subset the detailed, tree-level data frame to just data collected within those plots
df.2 <- df.2[df.2$plot %in% plots,]

As a result, there are 67 plots, containing a total of 4103 trees, 2140 of which are pinyons and 1928 of which are junipers – a fairly even split! Let’s take a look at how the DRCs and resultant biomass estimates compare between these two species in our tree data:

# create normalized kernel density estimates of drc for each species
d.drc.juos <- density(df.2$drc.cm[df.2$species == "JUOS"])
d.drc.juos.x <- d.drc.juos$x
d.drc.juos.y <- d.drc.juos$y / max(d.drc.juos$y)

d.drc.pied <- density(df.2$drc.cm[df.2$species == "PIED"])
d.drc.pied.x <- d.drc.pied$x
d.drc.pied.y <- d.drc.pied$y / max(d.drc.pied$y)

# plot the drc data
par(mar = c(5,5,1,1))
plot(d.drc.juos.y ~ d.drc.juos.x,
     type = "l",
     lwd = 2,
     col = "red",
     xlab = "DRC (cm)",
     ylab = "Relative Abundance",
     las = 1)

lines(d.drc.pied.y ~ d.drc.pied.x,
      lwd = 2,
      col = "blue")

grid()

legend("topright",
       legend = c("PIED", "JUOS"),
       lwd = 2,
       col = c("blue", "red"))

# create normalized kernel density estimates of biomass for each species
d.bio.juos <- density(df.2$biomass[df.2$species == "JUOS"])
d.bio.juos.x <- d.bio.juos$x
d.bio.juos.y <- d.bio.juos$y / max(d.bio.juos$y)

d.bio.pied <- density(df.2$biomass[df.2$species == "PIED"])
d.bio.pied.x <- d.bio.pied$x
d.bio.pied.y <- d.bio.pied$y / max(d.bio.pied$y)

# plot the biomass data
par(mar = c(5,5,1,1))
plot(d.bio.juos.y ~ d.bio.juos.x,
     type = "l",
     lwd = 2,
     col = "red",
     xlab = "Biomass (kg)",
     ylab = "Relative Abundance",
     las = 1)

lines(d.bio.pied.y ~ d.bio.pied.x,
      lwd = 2,
      col = "blue")

grid()

legend("topright",
       legend = c("PIED", "JUOS"),
       lwd = 2,
       col = c("blue", "red"))

Notice how much larger the JUOS DRCs are and, as a result, how much higher the tree-level biomass estimates are for JUOS as compared to PIED. This is where our anomalously-high plot-level biomass estimates are likely stemming from (pardon the pun). Let’s take a look at plot-level biomass estimates vs. relative abundance of pinyon vs. juniper trees:

# loop through plots
plots <- unique(df.2$plot)
for (plot in plots){
  
  # create a subset data frame for just that plot
  df.sub <- df.2[df.2$plot == plot,]
  
  # calculate summary statistics
  n.juos <- nrow(df.sub[df.sub$species == "JUOS",])
  n.pied <- nrow(df.sub[df.sub$species == "PIED",])
  biomass.juos <- sum(df.sub$biomass[df.sub$species == "JUOS"])
  biomass.pied <- sum(df.sub$biomass[df.sub$species == "PIED"])
  biomass <- biomass.juos + biomass.pied
  
  # create a new summary data frame
  if (plot == plots[1]){
    df.3 <- data.frame(plot = plot,
                       n.juos = n.juos,
                       n.pied = n.pied,
                       biomass.juos = biomass.juos,
                       biomass.pied = biomass.pied,
                       biomass = biomass)
  } else {
    df.3.temp <- data.frame(plot = plot,
                       n.juos = n.juos,
                       n.pied = n.pied,
                       biomass.juos = biomass.juos,
                       biomass.pied = biomass.pied,
                       biomass = biomass)
    df.3 <- rbind(df.3, df.3.temp)
  }
  
}

# calculate per-plot relative percent of juniper trees by count
df.3$rel.perc.juos <- df.3$n.juos / (df.3$n.juos + df.3$n.pied)

# convert biomass to Mg/ha (from kg), given a plot radius of 15 m
df.3$biomass.kg.m2 <- df.3$biomass / (pi * 15 ^ 2)
df.3$biomass.mg.ha <- df.3$biomass.kg.m2 * 10

# plot total plot biomass vs relatively percent juniper trees
par(mar = c(5,5,1,1))
plot(biomass.mg.ha ~ rel.perc.juos,
     data = df.3,
     pch = 16,
     xlab = "Relative Percent Juniper Trees",
     ylab = "Biomass (Mg/ha)",
     las = 1)

# add regression line
abline(lm(biomass.mg.ha ~ rel.perc.juos, data = df.3),
       lwd = 2,
       col = "red")

grid()

So, the greater the relative abundance of junipers there are, the higher the biomass within a plot. While that’s not an inherently problematic notion, let’s take a closer look at some of the lidar data. In PJ, it has been demonstrated that aboveground biomas is closely related to canopy cover. However, let’s see how canopy cover and AGB compare in our data:

# join the lidar metrics data frame (df.1) to the new summary data frame (df.3)
df.1.temp <- df.1[,c("plot", "cc")]
df.3 <- merge(df.3, df.1.temp)

# plot cc vs. agb
par(mar = c(5,5,1,1))
plot(biomass.mg.ha ~ cc,
     data = df.3,
     pch = 16,
     xlab = "Canopy Cover (%)",
     ylab = "Biomass (Mg/ha)",
     las = 1)

grid()

Notice how there’s definitely a relationship, but it’s not a clean one. Now look at that same plot, except with points colored by the relative percent of juniper (more red = more juniper; more blue = more pinyon):

# add color information to points
pal <- colorRampPalette(c("blue","red"))
df.3$col <- pal(10)[as.numeric(cut(df.3$rel.perc.juos, breaks = 10))]

# plot cc vs. agb
par(mar = c(5,5,1,1))
plot(biomass.mg.ha ~ cc,
     data = df.3,
     pch = 16,
     col = df.3$col,
     xlab = "Canopy Cover (%)",
     ylab = "Biomass (Mg/ha)",
     las = 1)

grid()

Most of those points in the bottom-right portion of the graph are the points that Kelly just went out and collected recently when she was targeting high-biomass areas. Most of these high-biomass areas were in the higher elevations, where pinyon was the dominant species and canopy covers were high. However, due to their lack of junipers, they are demonstrating relatively low biomass. Incidentally, these points tend to have biomass levels that one might actually expect in PJ woodlands, indicating that the pinyon-based biomass totals are probably pretty accurate. Sites dominated by juniper, however, are likely vastly over-estimating biomass. So, why is that the case? It may have to do with how DRCs were measured for juniper trees.

Taking a look back at the tree-level data, Kelly noted when trees were multi-stemmed (indicated by “ms” in the notes field). Let’s see the relative proportions of multi-stemmed pinyons and junipers:

# derive count and percent of multi-stemmed pinyons and junipers
n.juos <- nrow(df.2[df.2$species == "JUOS",])
n.juos.ms <- nrow(df.2[df.2$species == "JUOS" & grepl("ms", df.2$notes),])
perc.juos.ms <- n.juos.ms / n.juos

n.pied <- nrow(df.2[df.2$species == "PIED",])
n.pied.ms <- nrow(df.2[df.2$species == "PIED" & grepl("ms", df.2$notes),])
perc.pied.ms <- n.pied.ms / n.pied

There are 1928 juniper trees, 392 (20.33%) of which are multi-stemmed. There are 2140 pinyon trees, 132 (6.17%) of which are multi-stemmed. Here is a comparison between the DRCs of multi-stemmed vs single-stemmed juniper trees:

# add a ms factor column to the data frame
df.2$ms <- "Single-Stemmed"
df.2$ms[grepl("ms", df.2$notes)] <- "Multi-Stemmed"
df.2$ms <- as.factor(df.2$ms)

# create a boxplot
par(mar = c(2,5,1,1))
boxplot(drc.cm ~ ms,
        data = df.2,
        ylab = "DRC (cm)",
        las = 1,
        xlab = NA)

# compare means
t.test(drc.cm ~ ms, data = df.2)

## 
##  Welch Two Sample t-test
## 
## data:  drc.cm by ms
## t = 11.03, df = 618.73, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   7.937756 11.376385
## sample estimates:
##  mean in group Multi-Stemmed mean in group Single-Stemmed 
##                     26.78908                     17.13201

While there are still many large single-stemmed junipers, on average, multi-stemmed DRCs are significantly larger than single-stemmed DRCs. The question then becomes, how were DRCs computed on multi-stemmed trees. In conversations with Kelly from several months ago, it seems that the approach she took to characterizing a single DRC measure for a multi-stemmed tree evolved over time. In some cases, she took a single stem (e.g. the largest of the multiple stems) as the representative for the tree, in other cases, she summed the DRCs of the stems, and later on she began taking a single DRC beneath the point of stem splitting (as I understand it). According to this paper, the best approach is to take a root sum of squared diameters:

\[ diameter = \sqrt{\sum_{i=1}^{n} d_i^2} \]

This would result in a smaller diameter than the sum of stem diameters. In some cases, she provided several stem measurements, allowing me to apply this equation (which I have not yet done), but in most, I have no way of determining which approach she took to characterizing multi-stem DRC. That said, clearly there are still a lot of large single-stemmed trees, so even if we were to be able to go back and perfectly characterize multi-stem DRC, I suspect our allometric biomass estimates would still be anomalously high.