Background

As we all know, I am in the midst of modeling PJ biomass for the CNH project, and have been for a number of months at this point. As we also know, about a month ago, we met and decided that, to improve our modeling results, it would be advantageous to collect a few more high-biomass sites, given our limited sample on the high end of the biomass spectrum. To that end, Kelly went and collected 10 more plots and delivered the data to me recently. I was able to incorporate the plots into our modeling procedure, expecting that model results would drastically improve and, well… Things not only didn’t improve but they got significantly worse. Something weird is going on.

On top of that, I’ve been doing a lot more reviewing of the biomass literature lately, and I’ve realize that the most commonly-reported unit for aboveground biomass is megagrams per hectare (Mg ha-1). In all of the past results I’ve shared with this group, the units I’ve been using are kilograms per square meter (kg m-2). I did this because the allometric equations I’ve been using produce per-tree biomass in kg, and the plot radii used to collect the data were measured in meters (15 m). In keeping with the industry standard, I decided to convert kg m-2 to Mg ha-1 (multiply by 10). As a result, the biomass estimates that emerged appear to be WAY too high in comparison to what others who have studied PJ biomass have produced.

So, I think it’s worth meeting to discuss how and why these problems are arising. And I thought it would be helpful to put together an R markdown document to try to explain what might be happening.

Maximum reported PJ biomass in the literature

After reviewing the PJ biomass literature, it appears that PJ biomass rarely exceeds 70 Mg ha-1. Here is a selection of maximum reported PJ biomass totals:

Estimated PJ biomass in our data

For comparison purposes, let’s take a look at our data. To do this, I’ll load a summary data frame that is used for modeling that contains plot-level biomass estimates (along with a suite of lidar metrics tha we won’t worry about for now).

library(knitr)
library(stringr)

# set working directory and read in data
df.1 <- read.csv("Modeling/biomass_plots_with_lidar_metrics.csv")

# replace underscores with periods in column names
colnames(df.1) <- str_replace_all(colnames(df.1), "_", ".")

# convert biomass from kg/m2 to Mg/ha
df.1$biomass <- df.1$biomass * 10

# print the first few rows of the data frame
kable(df.1[1:8])
plot lat long n.trees mean.drc date biomass mortality
1 37.35223 -109.9600 31 17.500605 2018-04-07 43.9803661 0.3777196
10 37.43988 -109.9196 71 23.752216 2018-04-20 131.4698865 0.1945273
100 37.58169 -109.8869 77 10.923300 2018-05-04 40.7244259 0.0164353
101 37.60295 -109.9033 109 20.284246 2018-05-05 212.7102934 0.3546020
102 37.61189 -109.8735 211 9.027764 2018-05-05 88.6524352 0.0781815
103 37.58645 -109.9213 35 5.789401 2018-05-05 4.1070113 0.0464187
104A 37.59344 -109.9224 44 18.720322 2018-05-06 103.9831421 0.3537040
104B 37.59464 -109.9223 139 15.284312 2018-05-06 141.9481435 0.4513033
105 37.56707 -109.8962 91 16.721960 2018-05-06 101.8238454 0.4423604
106 37.54565 -109.9401 40 22.783494 2018-11-03 46.8608143 0.2404879
107 37.60271 -109.8955 112 24.104814 2018-11-03 217.1068131 0.4502042
108 37.55420 -109.9386 48 25.769101 2019-04-22 89.4048964 0.2718539
109A 37.54067 -109.9425 26 19.532921 2019-04-22 36.2508830 0.0579860
109B 37.53984 -109.9425 24 15.075848 2019-04-22 18.0505481 0.1212702
11 37.37145 -109.9585 64 21.366451 2018-04-20 98.1975928 0.2348037
110 37.56520 -109.9258 47 25.426520 2019-04-22 79.3616768 0.2800627
111 37.58792 -109.9104 49 22.966573 2019-04-22 80.4883206 0.4563638
1114 37.61010 -109.8823 62 7.483871 2020-06-15 10.2010074 0.2711873
1115 37.58688 -109.8836 103 5.635534 2020-06-15 6.8789137 0.1224911
1116 37.61420 -109.8761 55 9.265455 2020-06-15 23.7499889 0.1804297
1117 37.60565 -109.8685 88 6.802273 2020-06-15 11.5153462 0.0770625
1118 37.60851 -109.8685 107 5.208411 2020-06-15 7.6498479 0.0688484
1119 37.58373 -109.9039 51 9.423529 2020-06-16 11.7222654 0.2471085
112 37.60026 -109.9120 137 23.866901 2019-04-23 306.4341535 0.2130539
1120 37.61559 -109.8733 77 6.051948 2020-06-16 9.0262551 0.1083247
1121 37.60533 -109.9107 72 10.326389 2020-06-16 23.3475251 0.4814150
113A 37.60144 -109.8968 79 25.876154 2019-04-24 172.3583399 0.2917850
113B 37.60024 -109.9001 60 23.029991 2019-04-24 135.8345509 0.2146876
114 37.58327 -109.9011 67 19.943040 2019-04-24 84.1151732 0.1959126
11B 37.37005 -109.9566 66 23.029218 2019-04-20 103.5950332 0.2311981
12 37.45719 -109.9090 89 17.413216 2018-04-21 102.2702630 0.3428152
13 37.38676 -109.9421 57 28.486975 2019-04-20 119.7649108 0.2752291
14 37.43282 -109.9286 32 28.724216 2019-04-21 82.2964753 0.3451241
1403 37.76833 -110.0592 36 11.575000 2020-06-17 12.1600277 0.3150780
1404 37.76903 -110.0645 102 9.952941 2020-06-17 20.2298588 0.3044603
15A 37.48187 -109.9038 57 24.301565 2019-04-21 93.8015586 0.2235890
15B 37.48089 -109.9061 54 29.854789 2019-04-21 120.4782041 0.2635713
16 37.50892 -109.8772 134 17.765550 2018-05-07 152.6605367 0.4275046
17A 37.50439 -109.8858 135 19.348022 2018-05-07 155.3071458 0.3364023
17B 37.50511 -109.8854 122 18.136020 2018-05-07 142.9515288 0.3279061
19 37.51656 -109.8767 24 5.260922 2018-04-08 5.8394272 0.0000000
2 37.43807 -109.9100 70 15.511645 2018-11-03 45.8093373 0.2735036
200 37.52385 -110.0696 13 18.492380 2018-11-03 10.1026665 0.3360591
201 37.54160 -110.0231 54 21.500645 2018-11-03 58.5323700 0.2794712
202 37.53050 -110.0617 20 18.833064 2019-04-23 26.2246752 0.4019056
203 37.55587 -110.0195 34 19.364103 2018-04-28 52.9514127 0.0927232
204A 37.55809 -110.0014 27 35.054525 2019-04-23 89.1930037 0.3979245
204B 37.55888 -110.0010 30 35.546730 2019-04-23 158.3045793 0.4451332
205 37.51403 -110.0917 13 47.516482 2019-04-25 105.6664943 0.2825175
208A 37.53958 -110.0478 29 31.740388 2018-05-07 136.1453661 0.1906853
208B 37.53860 -110.0477 30 19.316090 2018-11-03 29.3816658 0.4060838
3 37.36986 -109.9440 102 21.545867 2018-04-07 139.5917758 0.2129052
300 37.75340 -110.0898 33 31.177316 2019-04-26 133.0137676 0.3336208
301 37.74060 -110.0250 5 6.681661 2018-04-27 0.4209419 0.0000000
302 37.76788 -110.0842 37 23.317676 2019-04-26 63.4409242 0.4974081
303 37.74370 -110.0103 31 12.922065 2019-04-26 21.5616588 0.0000000
4 37.51177 -109.8815 52 26.332381 2019-04-18 105.5235005 0.2445582
400 37.80253 -110.1699 21 38.460677 2019-04-25 80.7302024 0.3091873
401 37.79959 -110.1637 18 34.807375 2019-04-26 73.8836559 0.2443716
402 37.79940 -110.1568 23 16.633322 2018-04-27 27.3407906 0.3527163
5A 37.49047 -109.9040 49 26.439261 2019-04-19 112.0458976 0.1880696
5B 37.49011 -109.9038 39 31.059682 2019-04-19 96.4524915 0.2982917
7 37.44877 -109.9257 46 27.939301 2019-04-19 89.3782193 0.1477916
8 37.42201 -109.9434 40 20.279588 2019-04-19 42.2887748 0.2248098
9 37.35397 -109.9692 36 29.132126 2019-04-20 85.3381482 0.2425134
CM4 37.58076 -109.8852 70 15.425057 2019-05-03 64.6955800 0.0138164
MD1 37.49013 -109.8975 97 22.574577 2019-05-04 142.6904922 0.0696933
# plot histogram of biomass
par(mar = c(5,5,1,1))
hist(df.1$biomass,
     main = NA,
     xlab = "Biomass (Mg/ha)",
     breaks = seq(0,325,25),
     las = 1)

The median biomass among our plots is 82.3 Mg ha-1. The maximum biomass among our plots is 306.43 Mh ha-1. So… Clearly something’s not right.

PJ allometry

For this study I’ve been relying on the allometric equations provided by Chonacky et al. (2014). It’s widely-cited, widely-used, and has allometries for all species found in our study area.

The allometric equation for pinyon pines is:

\[ Biomass_{kg} = e^{-3.2007 + 2.5339 \cdot ln(DRC_{cm})} \]

The allometric equation for juniper is:

\[ Biomass_{kg} = e^{-2.7096 + 2.1942 \cdot ln(DRC_{cm})} \]

This results in the following DRC-biomass relationships:

# create vector of drcs and calculate allometric biomass for p and j
drc <- seq(0,150)
bio.pied <- exp(-3.2007 + 2.5339 * log(drc))
bio.juos <- exp(-2.7096 + 2.1942 * log(drc))

# plot the data
par(mar = c(5,5,1,1))
plot(bio.pied ~ drc,
     type = "l",
     lty = 1,
     lwd = 2,
     las = 1,
     col = "blue",
     xlab = "DRC (cm)",
     ylab = "Biomass (kg)",
     ylim = c(0,max(bio.pied, bio.juos)))

lines(bio.juos ~ drc,
      lwd = 2,
      col = "red")

grid()

# add legend
legend("topleft",
       legend = c("PIED", "JUOS"),
       lwd = 2,
       col = c("blue", "red"))

As you can see, DRC being equal, a pinyon will have a higher biomass than a juniper.

A deeper dive into the plot data

Let’s take a closer look at what may be driving our anomalously-high biomass plot estimates. To do this, I’ll load a more detailed dataset that contains all tree-level information in all plots collected for the study. Note that biomass is shown in kg.

# read in the data
df.2 <- read.csv("Field_Data/all_plot_data_20200626.csv")

# print the first few rows
cols <- c("date", "plot", "lat", "long", "species", "drc.cm", "biomass")
kable(head(df.2[,cols]))
date plot lat long species drc.cm biomass
2018-04-07 3 37.36986 -109.944 JUOS 28.011270 99.7638729
2018-04-07 3 37.36986 -109.944 PIED 6.167465 4.0926530
2018-04-07 3 37.36986 -109.944 JUOS 32.685385 139.9685115
2018-04-07 3 37.36986 -109.944 JUOS 15.392776 26.8192440
2018-04-07 3 37.36986 -109.944 JUOS 44.767928 279.1177435
2018-04-07 3 37.36986 -109.944 PIED 2.668771 0.4899863

Note that this dataset contains all of Kelly’s and Steve’s data. However, one of Kelly’s plots (plot #6) was incomplete, and many of Steve’s plots were outside of the lidar boxes. So, let’s narrow it down to just the plots that will ultimately be used in the study:

# create a list of plots from the plot-level summary data frame
plots <- df.1$plot

# subset the detailed, tree-level data frame to just data collected within those plots
df.2 <- df.2[df.2$plot %in% plots,]

As a result, there are 67 plots, containing a total of 4103 trees, 2140 of which are pinyons and 1928 of which are junipers – a fairly even split! Let’s take a look at how the DRCs and resultant biomass estimates compare between these two species in our tree data:

# create normalized kernel density estimates of drc for each species
d.drc.juos <- density(df.2$drc.cm[df.2$species == "JUOS"])
d.drc.juos.x <- d.drc.juos$x
d.drc.juos.y <- d.drc.juos$y / max(d.drc.juos$y)

d.drc.pied <- density(df.2$drc.cm[df.2$species == "PIED"])
d.drc.pied.x <- d.drc.pied$x
d.drc.pied.y <- d.drc.pied$y / max(d.drc.pied$y)

# plot the drc data
par(mar = c(5,5,1,1))
plot(d.drc.juos.y ~ d.drc.juos.x,
     type = "l",
     lwd = 2,
     col = "red",
     xlab = "DRC (cm)",
     ylab = "Relative Abundance",
     las = 1)

lines(d.drc.pied.y ~ d.drc.pied.x,
      lwd = 2,
      col = "blue")

grid()

legend("topright",
       legend = c("PIED", "JUOS"),
       lwd = 2,
       col = c("blue", "red"))

# create normalized kernel density estimates of biomass for each species
d.bio.juos <- density(df.2$biomass[df.2$species == "JUOS"])
d.bio.juos.x <- d.bio.juos$x
d.bio.juos.y <- d.bio.juos$y / max(d.bio.juos$y)

d.bio.pied <- density(df.2$biomass[df.2$species == "PIED"])
d.bio.pied.x <- d.bio.pied$x
d.bio.pied.y <- d.bio.pied$y / max(d.bio.pied$y)

# plot the biomass data
par(mar = c(5,5,1,1))
plot(d.bio.juos.y ~ d.bio.juos.x,
     type = "l",
     lwd = 2,
     col = "red",
     xlab = "Biomass (kg)",
     ylab = "Relative Abundance",
     las = 1)

lines(d.bio.pied.y ~ d.bio.pied.x,
      lwd = 2,
      col = "blue")

grid()

legend("topright",
       legend = c("PIED", "JUOS"),
       lwd = 2,
       col = c("blue", "red"))

Notice how much larger the JUOS DRCs are and, as a result, how much higher the tree-level biomass estimates are for JUOS as compared to PIED. This is where our anomalously-high plot-level biomass estimates are likely stemming from (pardon the pun). Let’s take a look at plot-level biomass estimates vs. relative abundance of pinyon vs. juniper trees:

# loop through plots
plots <- unique(df.2$plot)
for (plot in plots){
  
  # create a subset data frame for just that plot
  df.sub <- df.2[df.2$plot == plot,]
  
  # calculate summary statistics
  n.juos <- nrow(df.sub[df.sub$species == "JUOS",])
  n.pied <- nrow(df.sub[df.sub$species == "PIED",])
  biomass.juos <- sum(df.sub$biomass[df.sub$species == "JUOS"])
  biomass.pied <- sum(df.sub$biomass[df.sub$species == "PIED"])
  biomass <- biomass.juos + biomass.pied
  
  # create a new summary data frame
  if (plot == plots[1]){
    df.3 <- data.frame(plot = plot,
                       n.juos = n.juos,
                       n.pied = n.pied,
                       biomass.juos = biomass.juos,
                       biomass.pied = biomass.pied,
                       biomass = biomass)
  } else {
    df.3.temp <- data.frame(plot = plot,
                       n.juos = n.juos,
                       n.pied = n.pied,
                       biomass.juos = biomass.juos,
                       biomass.pied = biomass.pied,
                       biomass = biomass)
    df.3 <- rbind(df.3, df.3.temp)
  }
  
}

# calculate per-plot relative percent of juniper trees by count
df.3$rel.perc.juos <- df.3$n.juos / (df.3$n.juos + df.3$n.pied)

# convert biomass to Mg/ha (from kg), given a plot radius of 15 m
df.3$biomass.kg.m2 <- df.3$biomass / (pi * 15 ^ 2)
df.3$biomass.mg.ha <- df.3$biomass.kg.m2 * 10

# plot total plot biomass vs relatively percent juniper trees
par(mar = c(5,5,1,1))
plot(biomass.mg.ha ~ rel.perc.juos,
     data = df.3,
     pch = 16,
     xlab = "Relative Percent Juniper Trees",
     ylab = "Biomass (Mg/ha)",
     las = 1)

# add regression line
abline(lm(biomass.mg.ha ~ rel.perc.juos, data = df.3),
       lwd = 2,
       col = "red")

grid()

So, the greater the relative abundance of junipers there are, the higher the biomass within a plot. While that’s not an inherently problematic notion, let’s take a closer look at some of the lidar data. In PJ, it has been demonstrated that aboveground biomas is closely related to canopy cover. However, let’s see how canopy cover and AGB compare in our data:

# join the lidar metrics data frame (df.1) to the new summary data frame (df.3)
df.1.temp <- df.1[,c("plot", "cc")]
df.3 <- merge(df.3, df.1.temp)

# plot cc vs. agb
par(mar = c(5,5,1,1))
plot(biomass.mg.ha ~ cc,
     data = df.3,
     pch = 16,
     xlab = "Canopy Cover (%)",
     ylab = "Biomass (Mg/ha)",
     las = 1)

grid()

Notice how there’s definitely a relationship, but it’s not a clean one. Now look at that same plot, except with points colored by the relative percent of juniper (more red = more juniper; more blue = more pinyon):

# add color information to points
pal <- colorRampPalette(c("blue","red"))
df.3$col <- pal(10)[as.numeric(cut(df.3$rel.perc.juos, breaks = 10))]

# plot cc vs. agb
par(mar = c(5,5,1,1))
plot(biomass.mg.ha ~ cc,
     data = df.3,
     pch = 16,
     col = df.3$col,
     xlab = "Canopy Cover (%)",
     ylab = "Biomass (Mg/ha)",
     las = 1)

grid()

Most of those points in the bottom-right portion of the graph are the points that Kelly just went out and collected recently when she was targeting high-biomass areas. Most of these high-biomass areas were in the higher elevations, where pinyon was the dominant species and canopy covers were high. However, due to their lack of junipers, they are demonstrating relatively low biomass. Incidentally, these points tend to have biomass levels that one might actually expect in PJ woodlands, indicating that the pinyon-based biomass totals are probably pretty accurate. Sites dominated by juniper, however, are likely vastly over-estimating biomass. So, why is that the case? It may have to do with how DRCs were measured for juniper trees.

Taking a look back at the tree-level data, Kelly noted when trees were multi-stemmed (indicated by “ms” in the notes field). Let’s see the relative proportions of multi-stemmed pinyons and junipers:

# derive count and percent of multi-stemmed pinyons and junipers
n.juos <- nrow(df.2[df.2$species == "JUOS",])
n.juos.ms <- nrow(df.2[df.2$species == "JUOS" & grepl("ms", df.2$notes),])
perc.juos.ms <- n.juos.ms / n.juos

n.pied <- nrow(df.2[df.2$species == "PIED",])
n.pied.ms <- nrow(df.2[df.2$species == "PIED" & grepl("ms", df.2$notes),])
perc.pied.ms <- n.pied.ms / n.pied

There are 1928 juniper trees, 392 (20.33%) of which are multi-stemmed. There are 2140 pinyon trees, 132 (6.17%) of which are multi-stemmed. Here is a comparison between the DRCs of multi-stemmed vs single-stemmed juniper trees:

# add a ms factor column to the data frame
df.2$ms <- "Single-Stemmed"
df.2$ms[grepl("ms", df.2$notes)] <- "Multi-Stemmed"
df.2$ms <- as.factor(df.2$ms)

# create a boxplot
par(mar = c(2,5,1,1))
boxplot(drc.cm ~ ms,
        data = df.2,
        ylab = "DRC (cm)",
        las = 1,
        xlab = NA)

# compare means
t.test(drc.cm ~ ms, data = df.2)
## 
##  Welch Two Sample t-test
## 
## data:  drc.cm by ms
## t = 11.03, df = 618.73, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   7.937756 11.376385
## sample estimates:
##  mean in group Multi-Stemmed mean in group Single-Stemmed 
##                     26.78908                     17.13201

While there are still many large single-stemmed junipers, on average, multi-stemmed DRCs are significantly larger than single-stemmed DRCs. The question then becomes, how were DRCs computed on multi-stemmed trees. In conversations with Kelly from several months ago, it seems that the approach she took to characterizing a single DRC measure for a multi-stemmed tree evolved over time. In some cases, she took a single stem (e.g. the largest of the multiple stems) as the representative for the tree, in other cases, she summed the DRCs of the stems, and later on she began taking a single DRC beneath the point of stem splitting (as I understand it). According to this paper, the best approach is to take a root sum of squared diameters:

\[ diameter = \sqrt{\sum_{i=1}^{n} d_i^2} \]

This would result in a smaller diameter than the sum of stem diameters. In some cases, she provided several stem measurements, allowing me to apply this equation (which I have not yet done), but in most, I have no way of determining which approach she took to characterizing multi-stem DRC. That said, clearly there are still a lot of large single-stemmed trees, so even if we were to be able to go back and perfectly characterize multi-stem DRC, I suspect our allometric biomass estimates would still be anomalously high.