Introduction

This report seeks to answer the following question:

Does the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species?

We will be using a data set called maple obtained from maple trees at the Harvard Forest from 2011 to 2022, stored in the Environmental Data Initiative’s portal. More information can be found here: https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6. This data set contains 14 total data tables on many variables giving us in-depth information for each tree. The data set originally only had data on 20 masting sugar maples, but introduced the non-masting counterpart in 2015 for comparison. Data tables including data on both types of maple trees can be found in the maple-tap and maple-sap data sets. There seems to be no seed monitoring data on the red maples despite the website’s claim. As a result, we will stick to comparing sap data. Each data set can be viewed below, maple-tap first, then maple-sap.

The variables most relevant to this study are the following: date and tree for identification, and species tells us whether the tree is a sugar maple (coded ACSA) or red maple (coded ACRU). dbh is the tree’s diameter at 1.4 meters above the ground, sugar is sugar concentration measured in Brixx (weight percent) collected directly from the tap, and sap.wt is the weight of sap collected.

Throughout, we will need the functionality of the tidyverse package to assist in data cleaning and to create visualizations.

library(tidyverse)

Tree Diameter

Let’s first assess the tree diameter. Tree diameter was measured in the years 2012, 2016, and 2017. We will only assess tree diameter in 2016 and 2017 to make the comparison as fair as possible.

maple_tap_diameters <- maple_tap %>% 
  filter(date == "2016-02-01" | date == "2017-02-18", tap == "A")
ggplot(maple_tap_diameters) +
  geom_boxplot(aes(x = species, y = dbh)) +
  coord_flip() +
  labs(x = "Maple Tree Species",
       y = "Tree Diameter at 1.4m (cm)",
       title = "Comparison of Tree Diameter by Species",
       caption = "data obtained from portal.edirepository.org")

It seems clear that the masting sugar maples are wider than the nonmasting red maples. However, tree age probably a better predictor of tree diameter than masting status, and we are not given any information on the age of these trees. A better comparison would be the difference in tree diameters between the two years. Let’s compare.

maple_diameter_diff <- maple_tap_diameters %>% 
  group_by(tree, species) %>% 
  summarize(`2016 measurement` = min(dbh, na.rm = TRUE),
            `2017 measurement` = max(dbh, na.rm = TRUE),
            difference = `2017 measurement` - `2016 measurement`)
ggplot(maple_diameter_diff) +
  geom_boxplot(aes(x = species, y = difference)) +
  coord_flip() +
  labs(x = "Maple Tree Species",
       y = "One-Year Difference in Tree Diameter at 1.4m (cm)",
       title = "Tree Diameter Growth over 1 Year",
       caption = "data obtained from portal.edirepository.org")

Based on the boxplots alone, it appears that sugar maples indeed exhibited a higher growth rate than red maples. However, note the high outlier for the red maples. We can run a simple linear regression to see if the difference in growth rates is significant, and will exclude the outlier to do so.

diffs_wo_outlier <- maple_diameter_diff[-24,]
growth_model <- lm(difference ~ species, data = diffs_wo_outlier)
summary(growth_model)
## 
## Call:
## lm(formula = difference ~ species, data = diffs_wo_outlier)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.71579 -0.34079 -0.01579  0.15439  1.48421 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.3556     0.1731   2.055   0.0501 .
## speciesACSA   0.3602     0.2101   1.715   0.0983 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5192 on 26 degrees of freedom
## Multiple R-squared:  0.1016, Adjusted R-squared:  0.06705 
## F-statistic:  2.94 on 1 and 26 DF,  p-value: 0.09828

The R2 value of the model is \(0.06\), meaning only \(6\%\) of the variation in diameter growth can be explained by the species difference. In addition, the p-value on our species variable is \(0.098\), which is not below our 0.05 cutoff. Therefore, there is not sufficient evidence to conclude that masting sugar maples experience more diameter growth than non-masting red maples.

Sap Data

We can also compare sap data between sugar maples and red maples. Displayed below is a comparison of sugar concentration between species and

ggplot(maple_sap) +
  geom_boxplot(aes(x = species, y = sugar)) +
  coord_flip() +
  labs(x = "Maple Tree Species",
       y = "Sugar Concentration (Brixx)",
       title = "Comparison of Sugar Concentration by Species",
       caption = "data obtained from portal.edirepository.org")

ggplot(maple_sap) +
  geom_boxplot(aes(x = species, y = sap.wt)) +
  coord_flip() +
  labs(x = "Maple Tree Species",
       y = "Sap Weight (kg)",
       title = "Comparison of Sap Production by Species",
       caption = "data obtained from portal.edirepository.org")

It certainly appears that sugar maples have both a higher sugar concentration and more sap production than their non-masting congener. We can fit these two variables to a simple regression model to test for significance. Let’s check the sugar concentration first.

Sugar Concentration Model

sap_sugar_model <- lm(sugar ~ species, data = maple_sap)
summary(sap_sugar_model)
## 
## Call:
## lm(formula = sugar ~ species, data = maple_sap)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7419 -0.4419 -0.0419  0.3581 19.4581 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.83537    0.02222   82.62   <2e-16 ***
## speciesACSA  0.70651    0.02339   30.20   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6228 on 8011 degrees of freedom
##   (1009 observations deleted due to missingness)
## Multiple R-squared:  0.1022, Adjusted R-squared:  0.1021 
## F-statistic: 912.2 on 1 and 8011 DF,  p-value: < 2.2e-16

Our R2 value this time tells us that \(10\%\) of the variation in sugar concentration can be explained by the difference in species. This means our model is not a great fit for our data, although it does indicate that the difference is significant. Let’s check sap weight next.

Sap Weight Model

sap_weight_model <- lm(sap.wt ~ species, data = maple_sap)
summary(sap_weight_model)
## 
## Call:
## lm(formula = sap.wt ~ species, data = maple_sap)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1475 -2.2575 -0.5975  1.6225 19.8825 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.89603    0.09755   19.44   <2e-16 ***
## speciesACSA  2.26149    0.10379   21.79   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.054 on 8389 degrees of freedom
##   (631 observations deleted due to missingness)
## Multiple R-squared:  0.05356,    Adjusted R-squared:  0.05345 
## F-statistic: 474.7 on 1 and 8389 DF,  p-value: < 2.2e-16

The R2 value indicates that only \(5\%\) of the variation in sap weight can be explained by the difference in species. It is likely that larger trees have a higher sap production than smaller trees. Thus it would be a good idea to control for tree diameter instead of directly comparing sap production and tree species.

Controlling for Tree Size

We will add a column to the maple_sap data table using tree diameters gathered from maple_tap. We perform a long series of transformations to make the tree ID numbers consistent across both data sets.

maple_sap_2 <- maple_sap %>% 
  separate(date, into = c("year", "month", "date"), sep = "-") %>% 
  separate(tree, into = c("speciesID", "treeID"), sep = 2)
rows_w_AR <- which(maple_sap_2$speciesID == "AR")
for (i in rows_w_AR) {
  maple_sap_2$speciesID[i] <- "HFR"
}
maple_sap_clean <- maple_sap_2 %>% 
  unite("tree", speciesID, treeID, sep = "")

Now we can add the column.

avg_diameters <- maple_tap %>% 
  group_by(tree) %>% 
  separate(date, into = c("year", "month", "date"), sep = "-") %>% 
  filter(year == "2016" | year == "2017", tap == "A") %>% 
  summarize(mean_diameter = mean(dbh))
maple_full_data <- left_join(maple_sap_clean, avg_diameters, by = "tree")

We will perform a multiple regression analysis for both sugar concentration and sap weight, controlling for tree diameter in both.

Sugar Concentration Model

multiple_sugar_model <- lm(sugar ~ species + mean_diameter, data = maple_full_data)
summary(multiple_sugar_model)
## 
## Call:
## lm(formula = sugar ~ species + mean_diameter, data = maple_full_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5992 -0.3874 -0.0665  0.3006 19.6622 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.0995644  0.0347247   31.66   <2e-16 ***
## speciesACSA   0.3932534  0.0252928   15.55   <2e-16 ***
## mean_diameter 0.0156632  0.0005839   26.83   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.597 on 7992 degrees of freedom
##   (1027 observations deleted due to missingness)
## Multiple R-squared:  0.1766, Adjusted R-squared:  0.1763 
## F-statistic: 856.8 on 2 and 7992 DF,  p-value: < 2.2e-16

Our R2 value improves from \(0.10\) to \(0.18\) when controlling for diameter. This still isn’t great, but the model does indicate a signficant correlation between sugar concentration and both variables.

Sap Weight

multiple_weight_model <- lm(sap.wt ~ species + mean_diameter, data = maple_full_data)
summary(multiple_weight_model)
## 
## Call:
## lm(formula = sap.wt ~ species + mean_diameter, data = maple_full_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5383 -2.2602 -0.5788  1.6210 19.6610 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.901149   0.166912   5.399 6.89e-08 ***
## speciesACSA   1.828384   0.119207  15.338  < 2e-16 ***
## mean_diameter 0.021339   0.002909   7.335 2.42e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 8371 degrees of freedom
##   (648 observations deleted due to missingness)
## Multiple R-squared:  0.05965,    Adjusted R-squared:  0.05943 
## F-statistic: 265.5 on 2 and 8371 DF,  p-value: < 2.2e-16

Our R2 value did not improve, meaning our model does not explain the data well, despite the model again indicating a significant correlation between both variables.

Conclusion

While the data does show that non-masting red maples exhibit muted dynamics in the Harvard Forest compared to the masting sugar maples, we did not come up with a model that was a good fit for the data. The data provided for us only contained limited information about the red maples, so it was difficult to draw a confident conclusion. We cannot conclude that the difference in dynamics was due to the difference in species, there may have been other confounding variables that we do not have access to.