Forest Inventory Analysis: Automating Diameter Distributions and Taxonomic Diversity in R

Code Language Note All narrative and analysis in this document is written in English. However, variable names, column headers, and some code comments are in Spanish, as the original dataset and field inventory were conducted in Spanish. This is intentional and does not affect the results or interpretation of the analysis. :::

1. Reproducible Sampling and Data Cleaning A subset of four forest plots was generated through randomized sampling (Seed: 344v). To ensure a robust analysis, data cleaning protocols were implemented: a 10 cm minimum DBH ( Diameter at Breast Height ) floor was applied to sub-threshold individuals, and missing dendrometric records were handled via group-mean imputation (per-plot basis) to maintain the statistical distribution.

bd <- read.csv2("Datos_Taller1.csv")
set.seed(344)

bd1 <- bd[bd$Parcela %in% sample(unique(bd$Parcela), 4),]

table(bd1$Parcela)

 PP_1 PP_12 PP_19  PP_3 
  269   184   248   306 

A new variable, DBH_cm, was derived from the raw circumference data (CAP_cm) using the geometric constant pi. This ensures that subsequent structural analyses, such as basal area calculations, are based on diameter units

bd1$DAP_cm <- bd1$CAP_cm/pi
summary(bd1$DAP_cm)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  9.358  12.064  15.374  17.410  20.563  67.609      10 

A minimum threshold of 10 cm was assigned to all individuals with a Diameter at Breast Height (DBH) below this value. Furthermore, missing DBH data were addressed through mean imputation; specifically, null values were replaced with the calculated average DBH of their respective sampling plot to maintain the structural integrity of the dataset for subsequent dendrometric analysis.

bd1$DAP_cm [bd1$DAP_cm < 10 & !is.na(bd1$DAP_cm)] <- 10
prom <- tapply(bd1$DAP_cm, na.rm= T, bd1$Parcela, mean) 
bd1$DAP_cm [is.na(bd1$DAP_cm)] <- prom[bd1$Parcela[is.na(bd1$DAP_cm)]]
summary(bd1)
       ID           Parcela              Placa         Familia         
 Min.   :   1.0   Length:1007        Min.   :  1.0   Length:1007       
 1st Qu.: 252.5   Class :character   1st Qu.: 63.5   Class :character  
 Median : 749.0   Mode  :character   Median :126.0   Mode  :character  
 Mean   : 944.2                      Mean   :130.3                     
 3rd Qu.:1441.5                      3rd Qu.:191.0                     
 Max.   :1983.0                      Max.   :306.0                     
                                                                       
   Especie              CAP_cm          POM_cm         Altura_m    
 Length:1007        Min.   : 29.4   Min.   :120.0   Min.   : 2.47  
 Class :character   1st Qu.: 37.9   1st Qu.:130.0   1st Qu.:11.80  
 Mode  :character   Median : 48.3   Median :130.0   Median :13.93  
                    Mean   : 54.7   Mean   :134.8   Mean   :14.57  
                    3rd Qu.: 64.6   3rd Qu.:130.0   3rd Qu.:16.70  
                    Max.   :212.4   Max.   :350.0   Max.   :40.38  
                    NA's   :10      NA's   :7       NA's   :539    
     DAP_cm     
 Min.   :10.00  
 1st Qu.:12.10  
 Median :15.41  
 Mean   :17.42  
 3rd Qu.:20.47  
 Max.   :67.61  
                

2. Structural Characterization and Floristic Diversity Assessment

Following the data cleaning phase, a comprehensive structural and taxonomic analysis was conducted at the plot level. Descriptive statistics for Diameter at Breast Height (DBH), including mean and standard deviation, were calculated alongside stand density normalized to a per-hectare basis (ind ha⁻¹).Floristic richness was quantified by determining the number of species, genera, and families per sampling unit. To evaluate taxonomic uncertainty, both the absolute frequency and the percentage of unidentified individuals were recorded for each plot. Furthermore, the dominant height (\(H_d\)) was estimated as the arithmetic mean of the top 20% tallest trees with available height records per plot, providing an indicator of upper canopy development and site potential.

unicos <- function (x) length(unique(x))

bd1$Genero <-  substr(bd1$Especie, start = 1, stop = (regexpr(" ", bd1$Especie)-1))

df<- data.frame(
  promedio_DAP = round(tapply(bd1$DAP_cm, bd1$Parcela, mean), 3),
  sd_DAP = round(tapply(bd1$DAP_cm, bd1$Parcela, sd), 3),
  num_ind = tapply(bd1$Placa, bd1$Parcela, max),
  num_esp = tapply(bd1$Especie, bd1$Parcela,  function(x) length(unique(x)) ),
  num_gen = tapply(bd1$Genero, bd1$Parcela,  unicos ),
  num_fam = sapply(tapply(bd1$Familia, bd1$Parcela,  unique ), length),
  num_sin_info = tapply(bd1$Especie == "", bd1$Parcela,sum),
  porcentaje = round(((tapply(bd1$Especie == "", bd1$Parcela,sum)*100)
                /(tapply(bd1$Especie, bd1$Parcela,length))),3)
)
summary(df)
  promedio_DAP       sd_DAP         num_ind         num_esp     
 Min.   :15.95   Min.   :5.668   Min.   :184.0   Min.   :42.00  
 1st Qu.:16.81   1st Qu.:6.625   1st Qu.:232.0   1st Qu.:46.50  
 Median :17.61   Median :6.961   Median :258.5   Median :51.50  
 Mean   :17.64   Mean   :7.161   Mean   :251.8   Mean   :52.25  
 3rd Qu.:18.43   3rd Qu.:7.497   3rd Qu.:278.2   3rd Qu.:57.25  
 Max.   :19.38   Max.   :9.053   Max.   :306.0   Max.   :64.00  
    num_gen         num_fam       num_sin_info      porcentaje   
 Min.   :37.00   Min.   :30.00   Min.   : 26.00   Min.   :14.11  
 1st Qu.:37.75   1st Qu.:30.00   1st Qu.: 32.75   1st Qu.:14.13  
 Median :44.00   Median :35.00   Median : 38.50   Median :14.87  
 Mean   :44.75   Mean   :35.25   Mean   : 52.00   Mean   :19.54  
 3rd Qu.:51.00   3rd Qu.:40.25   3rd Qu.: 57.75   3rd Qu.:20.29  
 Max.   :54.00   Max.   :41.00   Max.   :105.00   Max.   :34.31  
df
      promedio_DAP sd_DAP num_ind num_esp num_gen num_fam num_sin_info
PP_1        17.101  6.978     269      64      54      41           42
PP_12       19.375  9.053     184      42      37      30           26
PP_19       18.117  6.944     248      48      38      30           35
PP_3        15.951  5.668     306      55      50      40          105
      porcentaje
PP_1      15.613
PP_12     14.130
PP_19     14.113
PP_3      34.314

The vertical structure of the forest stands was characterized by estimating the dominant height (H_d) for each sampling unit. This parameter was defined as the arithmetic mean of the upper stratum, specifically the top 20% of individuals with available height records within each plot (n = 20%). By filtering for the tallest individuals, this metric provides a robust proxy for site productivity and avoids the high variability often associated with suppressed or understory trees.

#### (12) Filtrar registros con altura válida
alturas <- bd1[!is.na(bd1$Altura_m), ]

#### (13) Ordenar alturas de mayor a menor por parcela
# Usamos split para separar por parcela y sort con decreasing=TRUE
lista_alturas <- split(alturas$Altura_m, alturas$Parcela)
lista_ordenada <- lapply(lista_alturas, sort, decreasing = TRUE)

#### (14) Calcular el promedio del 20% superior de forma dinámica
# Creamos una función que calcula el n (20%) y saca el promedio
calc_hdom <- function(x) {
  n <- ceiling(length(x) * 0.20) # Calcula el 20% real de esta parcela
  mean(x[1:n])                   # Promedia solo los n más altos
}

# Aplicamos la función a cada elemento de la lista ordenada
hdom_valores <- sapply(lista_ordenada, calc_hdom)

#### (15) Asignar al dataframe (Ya no necesitas cambiar números manuales)
df$Hdom <- round(hdom_valores, 3)

# Verificar y Exportar
print(df)
      promedio_DAP sd_DAP num_ind num_esp num_gen num_fam num_sin_info
PP_1        17.101  6.978     269      64      54      41           42
PP_12       19.375  9.053     184      42      37      30           26
PP_19       18.117  6.944     248      48      38      30           35
PP_3        15.951  5.668     306      55      50      40          105
      porcentaje   Hdom
PP_1      15.613 20.247
PP_12     14.130 21.920
PP_19     14.113 21.620
PP_3      34.314 20.408
write.csv2(df, "Tabla_datos_parcela_punto2.csv", row.names = FALSE)

3. Allometric Relationship: DBH vs. Total Height

The allometric relationship between Diameter at Breast Height (DBH) and total height was examined for the four selected forest plots. To facilitate a comparative structural analysis, a unified scatter plot was generated. Individual trees were represented by distinct color codes corresponding to their respective sampling units. This visualization allowed for the identification of vertical growth patterns and potential stratification differences among the studied plots.

bd1$DAP_m<-bd1$DAP_cm/100
summary(bd1$DAP_m)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1000  0.1210  0.1541  0.1742  0.2047  0.6761 
#bd1$DAP_m
#bd1
colores_parcelas <- c("#00c5cd", "#473c8b", "lightgoldenrod3", "#CD3278")
plot(bd1$DAP_m, bd1$Altura_m,
     type = "p",
     col = colores_parcelas, 
     ylim = c(5,50),
     xlim = c(0, 1),
     pch=18, 
     las=1,
     cex.main= 1, 
     xlab = "DBH (m)", 
     ylab = " Height (m)",
     cex.lab= 1,
     mar = c(5,5,5,5),
     family= "serif",
     font.lab= 6)

legend(x = 0.7, y = 50, legend = c(" Plot 1", " Plot 12", " Plot 19", " Plot 3"), 
       fill = colores_parcelas, title = " Plot", text.font=6 )

4. Identification of Extreme Dendrometric Values

To characterize the structural boundaries of each sampling unit, the individuals with the minimum and maximum values for Diameter at Breast Height (DBH) and total height were identified. This selection process was performed independently for each plot to capture the full range of variability within the stands. The resulting data were consolidated into a summary table, providing a clear overview of the extreme architectural limits observed during the forest inventory.

# max dap
maxdap_1  <- bd1[bd1$DAP_cm == tapply(bd1$DAP_cm, bd1$Parcela, max)["PP_1"]  & bd1$Parcela == "PP_1",  ]
maxdap_12 <- bd1[bd1$DAP_cm == tapply(bd1$DAP_cm, bd1$Parcela, max)["PP_12"] & bd1$Parcela == "PP_12", ]
maxdap_19 <- bd1[bd1$DAP_cm == tapply(bd1$DAP_cm, bd1$Parcela, max)["PP_19"] & bd1$Parcela == "PP_19", ]
maxdap_3  <- bd1[bd1$DAP_cm == tapply(bd1$DAP_cm, bd1$Parcela, max)["PP_3"]  & bd1$Parcela == "PP_3",  ]

# min dap
mindap_1  <- bd1[bd1$DAP_cm == tapply(bd1$DAP_cm, bd1$Parcela, min)["PP_1"]  & bd1$Parcela == "PP_1",  ]
mindap_12 <- bd1[bd1$DAP_cm == tapply(bd1$DAP_cm, bd1$Parcela, min)["PP_12"] & bd1$Parcela == "PP_12", ]
mindap_19 <- bd1[bd1$DAP_cm == tapply(bd1$DAP_cm, bd1$Parcela, min)["PP_19"] & bd1$Parcela == "PP_19", ]
mindap_3  <- bd1[bd1$DAP_cm == tapply(bd1$DAP_cm, bd1$Parcela, min)["PP_3"]  & bd1$Parcela == "PP_3",  ]

# max alt
maxalt_1  <- bd1[bd1$Altura_m == tapply(bd1$Altura_m, bd1$Parcela, max, na.rm = TRUE)["PP_1"]  & bd1$Parcela == "PP_1"  & !is.na(bd1$Altura_m), ]
maxalt_12 <- bd1[bd1$Altura_m == tapply(bd1$Altura_m, bd1$Parcela, max, na.rm = TRUE)["PP_12"] & bd1$Parcela == "PP_12" & !is.na(bd1$Altura_m), ]
maxalt_19 <- bd1[bd1$Altura_m == tapply(bd1$Altura_m, bd1$Parcela, max, na.rm = TRUE)["PP_19"] & bd1$Parcela == "PP_19" & !is.na(bd1$Altura_m), ]
maxalt_3  <- bd1[bd1$Altura_m == tapply(bd1$Altura_m, bd1$Parcela, max, na.rm = TRUE)["PP_3"]  & bd1$Parcela == "PP_3"  & !is.na(bd1$Altura_m), ]

# min alt
minalt_1  <- bd1[bd1$Altura_m == tapply(bd1$Altura_m, bd1$Parcela, min, na.rm = TRUE)["PP_1"]  & bd1$Parcela == "PP_1"  & !is.na(bd1$Altura_m), ]
minalt_12 <- bd1[bd1$Altura_m == tapply(bd1$Altura_m, bd1$Parcela, min, na.rm = TRUE)["PP_12"] & bd1$Parcela == "PP_12" & !is.na(bd1$Altura_m), ]
minalt_19 <- bd1[bd1$Altura_m == tapply(bd1$Altura_m, bd1$Parcela, min, na.rm = TRUE)["PP_19"] & bd1$Parcela == "PP_19" & !is.na(bd1$Altura_m), ]
minalt_3  <- bd1[bd1$Altura_m == tapply(bd1$Altura_m, bd1$Parcela, min, na.rm = TRUE)["PP_3"]  & bd1$Parcela == "PP_3"  & !is.na(bd1$Altura_m), ]

# tablas
max_dap <- rbind(maxdap_1, maxdap_12, maxdap_19, maxdap_3)
max_dap$Observación <- "DAP maximo"
min_dap <- rbind(mindap_1, mindap_12, mindap_19, mindap_3)
min_dap$Observación <- "DAP minimo"
max_alt <- rbind(maxalt_1, maxalt_12, maxalt_19, maxalt_3)
max_alt$Observación <- "Alt maxima"
min_alt <- rbind(minalt_1, minalt_12, minalt_19, minalt_3)
min_alt$Observación <- "Alt minima"

tabla_final <- rbind(max_alt, min_alt, max_dap, min_dap)
tabla_final
       ID Parcela Placa         Familia                       Especie CAP_cm
76     76    PP_1    76           Indet                                 99.6
1325 1325   PP_12    64        Peraceae                  Pera arborea   81.7
1886 1886   PP_19   151 Melastomataceae            Conostegia montana   70.1
603   603    PP_3    89        Fagaceae            Quercus humboldtii  133.9
268   268    PP_1   268       Rubiaceae         Guettarda crispiflora   41.1
1390 1390   PP_12   129        Fabaceae               Abarema jupunba   40.8
1830 1830   PP_19    95  Calophyllaceae       Calophyllum brasiliense   37.9
689   689    PP_3   175     Burseraceae          Dacryodes belemensis   34.7
233   233    PP_1   233           Indet                                160.2
1316 1316   PP_12    55        Fabaceae Enterolobium contortisiliquum  212.4
1812 1812   PP_19    77    Symplocaceae                                166.6
730   730    PP_3   216        Fagaceae            Quercus humboldtii  141.3
42     42    PP_1    42   Euphorbiaceae          Alchornea glandulosa   30.7
105   105    PP_1   105   Anacardiaceae             Mauria ferruginea   30.2
152   152    PP_1   152        Fabaceae                                 29.9
166   166    PP_1   166   Actinidiaceae            Saurauia laevigata   30.4
183   183    PP_1   183        Fabaceae          Macrolobium pittieri   29.4
202   202    PP_1   202     Cyatheaceae           Cyathea corallifera   31.4
235   235    PP_1   235      Salicaceae         Hasseltia lateriflora   30.6
244   244    PP_1   244        Fabaceae                                 30.7
269   269    PP_1   269   Actinidiaceae                                 31.3
1286 1286   PP_12    25      Sapotaceae       Chrysophyllum argenteum   31.1
1288 1288   PP_12    27           Indet                                 30.9
1313 1313   PP_12    52           Indet                                 30.8
1353 1353   PP_12    92       Arecaceae              Oenocarpus minor   31.1
1371 1371   PP_12   110        Moraceae        Pseudolmedia laevigata   31.1
1972 1972   PP_19   237  Chloranthaceae       Hedyosmum cuatrecazanum   30.9
690   690    PP_3   176   Myristicaceae                                 31.4
786   786    PP_3   272        Fagaceae            Quercus humboldtii   31.1
787   787    PP_3   273    Hypericaceae              Vismia baccifera   30.5
791   791    PP_3   277     Burseraceae                                 30.9
801   801    PP_3   287       Myrtaceae                                 30.2
     POM_cm Altura_m   DAP_cm        Genero     DAP_m Observación
76      230    25.97 31.70366               0.3170366  Alt maxima
1325    130    30.30 26.00592          Pera 0.2600592  Alt maxima
1886    130    40.38 22.31352    Conostegia 0.2231352  Alt maxima
603     130    26.40 42.62169       Quercus 0.4262169  Alt maxima
268     130     2.47 13.08254     Guettarda 0.1308254  Alt minima
1390    130     8.10 12.98704       Abarema 0.1298704  Alt minima
1830    130     6.22 12.06394   Calophyllum 0.1206394  Alt minima
689     130     7.60 11.04535     Dacryodes 0.1104535  Alt minima
233     130       NA 50.99324               0.5099324  DAP maximo
1316    206       NA 67.60902  Enterolobium 0.6760902  DAP maximo
1812    130       NA 53.03043               0.5303043  DAP maximo
730     130       NA 44.97719       Quercus 0.4497719  DAP maximo
42      130     8.37 10.00000     Alchornea 0.1000000  DAP minimo
105     130    11.20 10.00000        Mauria 0.1000000  DAP minimo
152     130       NA 10.00000               0.1000000  DAP minimo
166     130       NA 10.00000      Saurauia 0.1000000  DAP minimo
183     130       NA 10.00000   Macrolobium 0.1000000  DAP minimo
202     136    10.50 10.00000       Cyathea 0.1000000  DAP minimo
235     130       NA 10.00000     Hasseltia 0.1000000  DAP minimo
244     130       NA 10.00000               0.1000000  DAP minimo
269     150       NA 10.00000               0.1000000  DAP minimo
1286    130       NA 10.00000 Chrysophyllum 0.1000000  DAP minimo
1288    130       NA 10.00000               0.1000000  DAP minimo
1313    130       NA 10.00000               0.1000000  DAP minimo
1353    130       NA 10.00000    Oenocarpus 0.1000000  DAP minimo
1371    130       NA 10.00000  Pseudolmedia 0.1000000  DAP minimo
1972    130       NA 10.00000     Hedyosmum 0.1000000  DAP minimo
690     130    11.80 10.00000               0.1000000  DAP minimo
786     130       NA 10.00000       Quercus 0.1000000  DAP minimo
787     130    14.05 10.00000        Vismia 0.1000000  DAP minimo
791     130       NA 10.00000               0.1000000  DAP minimo
801     130       NA 10.00000               0.1000000  DAP minimo
write.csv2(tabla_final, "maximos y minimos, punto 4.csv")

5. Diameter Class Distribution and Structural Patterns

The horizontal structure of the selected forest plots was analyzed by generating diameter distributions. Trees were categorized into 5 cm class intervals, with a specific accumulated category for individuals with a DBH \(\ge\) 60 cm to account for large-diameter specimens. To allow for a synchronous structural comparison, a four-panel graphical layout was implemented. This visualization facilitated the identification of recruitment patterns and the overall successional stage of each stand based on the frequency of individuals across diameter classes.

dap_pp1  <- c(bd1$DAP_cm[bd1$Parcela == "PP_1"])
dap_pp12 <- c(bd1$DAP_cm[bd1$Parcela == "PP_12"])
dap_pp19 <- c(bd1$DAP_cm[bd1$Parcela == "PP_19"])
dap_pp3  <- c(bd1$DAP_cm[bd1$Parcela == "PP_3"])

grafica <- function(datos, nombre) {
  l  <- c(10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 90)
  mc <- c(12.5, 17.5, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5, 52.5, 57.5, "60 <")
  x_mc   <- cut(datos, breaks = l, labels = mc, right = FALSE)
  marcas <- table(x_mc)
  barplot(marcas, ylim = c(0, 160),
          xlab = "DBH (m)", ylab = "Frecuency",
          family = "serif", font = 1, font.lab = 6,
          col = "#CAE1FF", las = 1, cex.names = 1)
  mtext(nombre, side = 3, adj = 0, line = 1.5, font = 7)
}

par(mfrow = c(2, 2))
grafica(dap_pp1,  "Plot 1")
grafica(dap_pp12, "Plot 12")
grafica(dap_pp19, "Plot 19")
grafica(dap_pp3,  "Plot 3")

6. Floristic Composition and Taxonomic Dominance

The floristic composition was evaluated by quantifying family-level abundance within each sampling unit. To identify the primary taxonomic drivers of each stand, the ten most abundant families per plot were isolated and ranked. These results were visualized using a multi-panel bar chart, providing a comparative overview of family dominance. This analysis allowed for the characterization of the botanical identity of the plots and the identification of common patterns in family distribution across the study area.

grafpp1  <- (sort(table(bd1$Familia[bd1$Parcela == "PP_1"  & !is.na(bd1$Familia)]), decreasing = T)[1:10])
grafpp12 <- (sort(table(bd1$Familia[bd1$Parcela == "PP_12" & !is.na(bd1$Familia)]), decreasing = T)[1:10])
grafpp19 <- (sort(table(bd1$Familia[bd1$Parcela == "PP_19" & !is.na(bd1$Familia)]), decreasing = T)[1:10])
grafpp3  <- (sort(table(bd1$Familia[bd1$Parcela == "PP_3"  & !is.na(bd1$Familia)]), decreasing = T)[1:10])

#| fig-width: 12
#| fig-height: 9

par(mfrow = c(2, 2))
par(mar = c(9, 4, 3, 1))

barplot(grafpp1,  col = "#CAE1FF", las = 2, family = "serif", font = 1, font.lab = 6, ylim = c(0, 50), ylab = "Frequency")
mtext("Plot 1",  side = 3, adj = 0, font = 7, line = 1.5)

barplot(grafpp12, col = "#CAE1FF", las = 2, family = "serif", font = 1, font.lab = 6, ylim = c(0, 50), ylab = "Frequency")
mtext("Plot 12", side = 3, adj = 0, font = 7, line = 1.5)

barplot(grafpp19, col = "#CAE1FF", las = 2, family = "serif", font = 1, font.lab = 6, ylim = c(0, 50), ylab = "Frequency")
mtext("Plot 19", side = 3, adj = 0, font = 7, line = 1.5)

barplot(grafpp3,  col = "#CAE1FF", las = 2, family = "serif", font = 1, font.lab = 6, ylim = c(0, 50), ylab = "Frequency")
mtext("Plot 3",  side = 3, adj = 0, font = 7, line = 1.5)