Introduction

Since the Industrial Revolution, the concentration of greenhouse gases has increased in the atmosphere (Filonchyk et al., 2024) This is very alarming, which emphasizes the urgency of safeguarding the forests. Reducing Emissions from Deforestation and Forest Degradation (REDD+) is a policy that developed under the United Nations Framework Convention on Climate Change (UNFCCC) (Den Besten et al., 2014) with the aim of financing forest conservation in developing countries through the creation and trading of forest carbon credits.

To participate in this program, countries must establish Forest Reference Emission Levels (FRELs) and/or Forest Reference Levels (FRLs), which are useful for assessing their progress in the implementation of REDD+ activities (Programa ONU-REDD., 2021). Among the data to be included are uncertainty values, which can be derived from various sources of error, such as activity data, emission factors (including biomass allometric models) and data integration (Andres Espejo & Catalina Becerra, 2020).

The conversion of forest inventory data to biomass involves uncertainty that is rarely propagated. Typically, the development of allometric equations does not include many trees with large diameters, which leads to considerable variation in biomass estimates as trunk diameter increases. As a result, the estimates are less consistent for large trees, which can affect the accuracy of total biomass estimation (Wayson et al., 2015).

The percentage of uncertainty is directly related to the discounts applicable to carbon credits, which makes it an aspect of great economic relevance. Identifying and quantifying the main sources of uncertainty enables the implementation of strategies to reduce its impact (Yanai et al., 2023). Espejo and Becerra (2020) indicate that participating countries should discuss sources of error and demonstrate that their contribution to the total uncertainty is low.

There is an existing study conducted in the Huntington Forest that provides estimates of forest biomass and carbon (Patton et al., 2022). However, it does not account for the uncertainty introduced by the use of allometric equations. This highlights the importance of the present research, which aims to estimate biomass while explicitly incorporating this source of uncertainty.

The central question for this paper is, how large is the uncertainty in the estimation of biomass in Huntington Wildlife Forest?

Looking ahead, the approach developed for Huntington Forest could provide a methodological basis for countries—especially developing nations—to integrate allometric uncertainty into their REDD+ carbon estimation frameworks. This methodology could be adapted for use in other regions, enhancing the accuracy of carbon and biomass estimates, as well as uncertainty assessments, worldwide.

Experimental Design

This dataset consists of a series of uniformly spaced permanent sample plots, located every 0.5 km, which are periodically remeasured to assess forest conditions and monitor changes over time. Measurements are typically conducted every 10 years. They are 288 plots in the Continuous Forest Inventory for HWF. The Study site was divided into 20 forest compartments ranging in size from 64 to 550 ha. Approximately 3 to 4 compartments were grouped for each silvicultural category. The compartments were organized based on the silvicultural treatment to which they had been exposed.

The categories silvicultural are: Low Management (Compartment 1,18,19,20), Partial + Shelterwood (Compartment 2,8,14), Burn (Compartment 3,4,16), Partial (Compartment 5,10,13), Salvage + Shelterwood (Compartment 6,7,11,15), and Overstory Removal (Compartment 9,12,17) (Patton et al., 2022).

The data includes two plot types based on tree size classes. The first is the Sawtimber Plot, which is a fixed circular plot of 1/5 acre. In these plots, diameter at breast height (DBH) is recorded for hardwood trees with a DBH of 11.0 inches or greater, and for softwood trees with a DBH of 9.0 inches or greater.

The second type is the Pole Timber Plot, a 1/10 acre fixed circular plot. In these plots, DBH is measured for hardwood trees ranging from 5.0 to 10.9 inches, and for softwood trees ranging from 5.0 to 8.9 inches.

As background, I used the study by (Patton et al., 2022) For this project, I referenced the summary of regional biomass allometric equation parameters and the summary statistics for nine common tree species in the Huntington Wildlife Forest, as provided in that study.

When propagating errors in biomass estimates using tree allometric equations, detailed information from the original publication is required, such as the sum of squared errors (SSE), R², sample size (n), among other metrics (Wayson et al., 2015) Since this information was not available, I used pseudo-populations to estimate the uncertainty of the allometric model. To support this approach, I followed the methodology described in Wayson et al. (2015).

This part of the study is not included in this paper, but it is important to recognize the role of pseudo-data in the success of this Project.

Data Description

The Continuous Forest Inventory (CFI) system at the Huntington Wildlife Forest (HWF) for the year 2011 has 5719 trees and the number of plots is 288. To carry out the entire process, the work began with the data frame containing the coefficient values, which were used to run the Monte Carlo simulations. These data were generated using the pseudo-data method.

This data frame is called “frameCoeff”, and it includes 9 tree species and 7 columns: Which are: especie (specie), ecuacion (equation), intercepto (Intercept), sd_intercepto (sd_ intercept), pendiente (slope), sd_pendiente (slope intercept), CF (coefficient of variation).

The specie value corresponds to the tree species being analyzed, with the following equivalencies: AbiesBal (Abies balsameae), PiceaRubens (Picea rubens), PinusStrobus (Pinus Strobus), TsugaCana (Tsuga canadensis), AcerRu (Acer rubrum), AcerSacch (Acer saccharum), BetulaAlle (Betula alleghaniensis), FagusGran (Fagus grandifolia), FraxisAme (Fraxinus americana)

The values from each equation are later used to perform a merger with the inventory dataset (Plot1HWF). The goal is to match each species with its corresponding slope and intercept values and then estimate the biomass using the specific allometric equation associated with that species.

The total of trees for this inventory data are 5719, and the columns that were use are: PLOTTREE (number of plot + tree number), Plot (number of plot), PlotSize (Size of the plot, Pole or saw), Tree Num (number of tree), DBH (Diameter at Breast High in inches), DBHm (Diameter at Breast High in centimeters), Equation (number of equation)

Analysis Methods

The procedure used for the creation of the results is presented below.

-Packages loading

#Packages 
library(readxl)

Warning: package 'readxl' was built under R version 4.4.2

library(plyr)

Warning: package 'plyr' was built under R version 4.4.3

library(doBy)

Warning: package 'doBy' was built under R version 4.4.3

library(dplyr)

Warning: package 'dplyr' was built under R version 4.4.2

library(knitr)

Warning: package 'knitr' was built under R version 4.4.2

library(ggplot2)

Warning: package 'ggplot2' was built under R version 4.4.2

library(RColorBrewer)

-Import data frame

# Import data frame of coefficients

frameCoeff <- read_excel("C:/Users/vanco/Desktop/ResearchR/Research/Process_data/DataFrame.xlsx")
#View(Coefficientes)

# Import data of the inventory 

Plot1HWF <- read_excel("C:/Users/vanco/Desktop/ResearchR/Research/Raw_data/Plot1HWF.xlsx")

class(frameCoeff)

[1] "tbl_df"     "tbl"        "data.frame"

dim(Plot1HWF)

[1] 5719    9

-Creation of the data frame

# data frame to save results of each iteration 

resultados_df <- data.frame(Plot = character(), PloKg_Ha.sum = numeric(), stringsAsFactors = FALSE)

-Creation of the loop

# 1. Repeat the process 10,000 times to generate the simulation.

for (j in 1:10000) {
  frameCoeff$MCintercept <- sapply(1:nrow(frameCoeff), function(i) rnorm(1, mean = frameCoeff$intercepto[i], sd = frameCoeff$sd_intercepto[i]))
  frameCoeff$MCslope <- sapply(1:nrow(frameCoeff), function(i) rnorm(1, mean = frameCoeff$pendiente[i], sd = frameCoeff$sd_pendiente[i]))

# 2. Create a results data frame to store outputs from each iteration.
  
  MatrixResult <- frameCoeff[, c("especie", "ecuacion", "MCintercept", "MCslope", "CF")]
  


# 3. Merge species-specific coefficients (slope and intercept) with the main dataset using species as the key variable (using the plyr or dplyr package)
  
  CombinationSI <- merge(MatrixResult, Plot1HWF, by.x = "ecuacion", by.y = "Equation") #Two because of the different name 
  

  
# 4. Biomass estimation using the general equation for the estimation of the Biomass 
  
  #Y^kg=e(β0 +β ⋅ln(dbh)) ×CF
  
# 5. Where Yˆkg is an estimate of above-ground biomass in kilograms, β0 & β1 are coefficients from the slope and the intercept, DBH is tree diameter in centimeters, and CF is the biomass correction factor
  
  CombinationSI$Y_kg <- (exp(CombinationSI$MCintercept + CombinationSI$MCslope * log(CombinationSI$DBHcm))) * CombinationSI$CF
  

  
  #6. Sum the biomass of poles an saw in each plot Calculate the total Y_kg by PlotSize and then scale it to hectares 

SumBioPlots<-summaryBy(Y_kg~Plot+PlotSize, 
          data=CombinationSI, FUN=sum)


# 7. Convert to Kg/ha calling plot area to hectarea 
#dplyr


SumBioPlots <- SumBioPlots %>%
  mutate(PloKg_Ha = ifelse(PlotSize == "POLE", ((Y_kg.sum*10000)/202.343), 
                                       ifelse(PlotSize == "SAW", ((Y_kg.sum*10000)/809.372), NA)))

# 8. Sum PLot (Pole+Saw) across plots 

Sum_A_Plots<-summaryBy(PloKg_Ha~Plot, 
                       data=SumBioPlots, FUN=sum)



# 9. Average across plots- Mean of the sum of the plots  

Average_A_Plots <- mean(Sum_A_Plots$PloKg_Ha, na.rm = TRUE)

#print(Average_A_Plots)


# 10. Add the results of Average_A_Plots to the data frame

resultados_df <- rbind(resultados_df, Average_A_Plots)

}


# 11. View the results of all iterations  
colnames(resultados_df) <- "PloKg_Ha"
resultados_df$mg_ha<- ((resultados_df$PloKg_Ha/1000)/2) #mg/ha



# 12. Average the results of all iterations 
total_mean_PloHa <- mean(resultados_df$PloKg_Ha, na.rm = TRUE)
total_mean_mg_ha <- mean(resultados_df$mg_ha, na.rm = TRUE)

# 13. Standar deviation
total_sd_PloHa <- sd(resultados_df$PloKg_Ha, na.rm = TRUE)
total_sd_mgha <- sd(resultados_df$mg_ha, na.rm = TRUE)



# 14. Totals

#Total SD for mgHa
print(total_sd_mgha)

[1] 19.44279

#Total mean for mgHa
print(total_mean_mg_ha)

[1] 114.1975

# 15. Calculation of the coefficient of variation (CV) 
# Formula = deviation / mean *100

CV<-((total_sd_mgha/total_mean_mg_ha))*100
print(CV)

[1] 17.02558

# round to two decimal places

total_mean_mg_ha <- round(total_mean_mg_ha, 2)
total_sd_mgha <- round(total_sd_mgha, 2)
CV<-((total_sd_mgha/total_mean_mg_ha))*100


#16. Graphs 

#hist(resultados_df$PloKg_Ha)
#hist(resultados_df$mg_ha)

histmg_ha <- qplot(resultados_df$mg_ha, 
                   geom = "histogram",      
                   fill = I("forestgreen"),     
                   color = I("black"),       
                   bins = 20)

Warning: `qplot()` was deprecated in ggplot2 3.4.0.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.

histmg_ha <- histmg_ha + 
  ggtitle("Biomass (mg per hectare)") +  
  xlab("Biomass in mg/ha") +             
  ylab("Frequency") +                   
  theme(plot.title = element_text(hjust = 0.5))

Results

The result consists of the analysis of 10,000 simulations estimating the biomass of the Huntington forest. Each simulation is the result of applying allometric equations to forest inventory data. The equations themselves remain the same, but the input values for the intercept and slope change in each simulation for each of the 9 allometric equations. The variation in values comes from the inputs in the allometric equations, since the intercept and slope values used to substitute into the equations are also generated through simulation. The simulation of the slope and intercept values, along with their respective standard deviations, is derived from the pseudo-replication method.

Results for wildlife forest of Huntington:

[1] "Results for wildlife forest of Huntington: mean = 114.2 mg_ha , sd 19.44 mg_ha , min = 94.76 mg_ha , max = 133.64 mg_ha , CV = 17.02 %"

The following image shows the distribution of the 10,000 simulations. Based on the formulas used, we can observe that for the year 2011, a biomass value of approximately 95 Mg/ha was originally reported without accounting for uncertainty. However, when uncertainty was included, the estimated average increased

Biomass by species according to the simulations

The following chart shows the total biomass distribution by species within the forest inventory of Huntington. The species contributing the most to total biomass were Acer saccharum Betula alleghaniensis and Fagus grandifolia.

Although fewer in number, some trees likely contribute more to the total biomass because of their larger diameters.

Abundance per Species

The chart shows the total number of individuals by species in the Huntington Forest inventory, with Acer saccharum standing out as the most abundant species.

The abundance of different species may be related to the dominance of some of them depending on the region type. It is also important to consider that this forest has experienced various disturbances, and some species take advantage of these events to develop more successfully than others.

Discussion

The results generally show a value similar to that calculated by Patton et al. (2022) for the year 2011, where a value of approximately 95 Mg/ha was reported without accounting for uncertainty, and an average of 113.74 Mg/ha was estimated when including uncertainty, with a minimum value of 94.66 Mg/ha and a maximum of 132.82 Mg/ha, falling within the presented range. It is useful to have a range that allows for predicting the mass of an individual tree within the inventory. To obtain these values, 10,000 simulations were performed in a mixed forest using 9 allometric equations.

The resulting coefficient of variation was 17%, which is relatively high. This can be explained using various allometric equations in the mixed forest. These values can be compared with the data reported in the paper by Lin et al. (2023), where the coefficient of variation obtained using the Slope-Intercept Sampling method and 10,000 simulations ranges from 3 to 11%, depending on the forest type, with more diverse forests showing higher variation.

This forest has undergone various silvicultural management categories, including Low Management, Partial + Shelterwood, Burn, Partial, Salvage + Shelterwood, and Overstory Removal (Patton et al., 2022) As a result, considerable variation in tree diameters is expected when estimating the total biomass in this inventory. This heterogeneity in silvicultural treatments may be one of the main causes of the high value observed in the coefficient of variation.

In future stages, the code and the values used in its development are expected to be refined to apply the procedure to the rest of the data previously collected in Huntington. This will enable a temporal analysis to assess how uncertainty has evolved over time.

Conclusions

• Providing code for uncertainty propagation using allometric equations (even when full information from the original equations is not available) can offer a valuable tool for developing countries seeking to apply this method in their monitoring reports.

• The coefficient of variation may have been relatively high compared to other studies due to the heterogeneous silvicultural management practices in the Huntington Forest, as well as the use of multiple allometric equations throughout the biomass estimation process.

References

Andres Espejo & Catalina Becerra. (2020). Updates on FCPF Requirements: Uncertainty Analysis, Technical Corrections, Monitoring Report. https://www.forestcarbonpartnership.org/system/files/documents/Uncertainty_Technical%20Corrections_MR_1.pdf

Den Besten, J. W., Arts, B., & Verkooijen, P. (2014). The evolution of REDD+: An analysis of discursive-institutional dynamics. Environmental Science & Policy, 35, 40-48. https://doi.org/10.1016/j.envsci.2013.03.009

Filonchyk, M., Peterson, M. P., Zhang, L., Hurynovich, V., & He, Y. (2024). Greenhouse gases emissions and global climate change: Examining the influence of CO2, CH4, and N2O. Science of The Total Environment, 935, 173359. https://doi.org/10.1016/j.scitotenv.2024.173359

Patton, R. M., Kiernan, D. H., Burton, J. I., & Drake, J. E. (2022). Management trade-offs between forest carbon stocks, sequestration rates and structural complexity in the central Adirondacks. Forest Ecology and Management, 525, 120539. https://doi.org/10.1016/j.foreco.2022.120539

Programa ONU-REDD. (2021). Niveles de referencia de emisiones forestales y niveles de referencia forestales (Módulo 6). Programa ONU-REDD. https://www.un-redd.org/sites/default/files/2021-10/112415_module6_ES.pdf

Wayson, C. A., Johnson, K. D., Cole, J. A., Olguín, M. I., Carrillo, O. I., & Birdsey, R. A. (2015). Estimating uncertainty of allometric biomass equations with incomplete fit error information using a pseudo-data approach: Methods. Annals of Forest Science, 72(6), 825-834. https://doi.org/10.1007/s13595-014-0436-7

Yanai, R. D., Young, A. R., Campbell, J. L., Westfall, J. A., Barnett, C. J., Dillon, G. A., Green, M. B., & Woodall, C. W. (2023). Measurement uncertainty in a national forest inventory: Results from the northern region of the USA. Canadian Journal of Forest Research, 53(3), 163-177. https://doi.org/10.1139/cjfr-2022-0062

Allometric uncertainty of forest biomass

Sofia Arreaga

2025-04-28