Introduction

This report explores the question: Does the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species?

For some context, this report uses data that was published by a group of researchers in 2021 who went around the Harvard Forest in central Massachusetts to collect data from maple trees over a several-year period starting from 2011. Their goal of the study was to discover maple trees’ reproduction and sap flow, in which their data sets contain different variables of measurement.

However, for this report, out of the 14 data sets found from the group’s research (which can be found here), we will be using the hf285-01-maple-tap.csv (maple tap) and hf285-02-maple-sap.csv (maple sap) data sets. The maple tap data set contains 382 observations with 7 variables, but we will only be using the date, tree, species, and dbh (tree trunk diameter at breast height in cm) variables. For the maple sap data set there are 9,022 observations with 8 variables, but we will only be using the date, tree, sugar, species, and sap.wt variables.

In order to do our analysis in R studio, we will be using the tidyverse package for functions such as plotting graphs along with the DT package to visualize our data sets as a data table. Finally, we need to use the readr package to import the .csv files from the researchers as our data sets.

Citation: Rapp, J., E. Crone, and K. Stinson. 2023. Maple Reproduction and Sap Flow at Harvard Forest since 2011 ver 6. Environmental Data Initiative. https://doi.org/10.6073/pasta/7c2ddd7b75680980d84478011c5fbba9 (Accessed 2025-12-08).

library(tidyverse)
library(DT)
library(readr)

Maple Tap Data

To start this report, we must import the maple tap data set and examine it to see what needs to be cleaned up.

maple_tap <- read_csv("hf285-01-maple-tap.csv")

datatable(maple_tap, options = list(scrollX = TRUE))

Cleaning the Maple Tap Data Set

As stated in the introduction, we are only going to use the date, tree, species and dbh variables for this analysis. The location at which each tree was tapped is not relevant to the research question. Instead, we want to focus on the trunk diameter and how sugar maple and red maple trees compare over time. To prepare this data set for analysis, the date variable is first separated into year, month, and day columns. The dbh variable was renamed to diameter, and tree was renamed to tree_id for improved clarity. Finally, only the needed columns from the data set are selected, and any rows that have NA in the diameter variable were removed.

Note: The species ACSA refers to Sugar Maple trees and ACRU refers to Red Maple trees.

maple_tap_clean <- maple_tap %>% 
  separate(date, into = c("year", "month", "day"), sep = "-") %>% 
  mutate(year = as.numeric(year), 
         month = as.numeric(month), 
         day = as.numeric(day),
         species = case_when(
           species == "ACSA" ~ "Sugar Maple",
           species == "ACRU" ~ "Red Maple",
           TRUE ~ species)
         ) %>% 
  rename("diameter" = "dbh", "tree_id" = "tree") %>% 
  select(year, tree_id, species, diameter) %>% 
  drop_na(diameter)

datatable(maple_tap_clean, options = list(scrollX = TRUE))

Diameter Analysis

Diameter Over Time Boxplot

Now that we have a clean data set for maple tap, we can now make a boxplot to show a comparison of trunk diameter over time between sugar maple trees and red maple trees. In order to clearly distinguish the two tree species for each year, we must wrap the year variable with the factor() function. This converts the year from a continuous variable to a categorical variable, which allows the boxplots for each species and year to appear side by side, making easier comparisons. We are also using the geom_jitter function to show where the individual points would approximately be located within each year without much overlapping, unlike if we used geom_point().

ggplot(maple_tap_clean, aes(x = factor(year), y = diameter, color = species)) +
  geom_boxplot() +
  geom_jitter(width = .2, alpha = .35) +
  labs(
    title = "Comparison of Trunk Diameter Over Time Between Species",
    x = "Year",
    y = "Trunk Diameter (in cm)",
    color = "Species",
    caption = "Data Adapted from hf285-01-maple-tap.csv,\n obtained from [https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6]"
  )

When examining the years 2016 and 2017 in this boxplot, the sugar maple trees (ACSA) consistently exhibit higher median diameters (approximately 65cm), compared to red maple trees median diameters (just above 40cm). The sugar maple trees also show more variability in diameters, as seen by the longer whiskers compared to the red maples. I disregarded 2012 from this comparison because there is no diameter data for red maple trees during this year, so I am unable to compare it with sugar maple trees. However, this boxplot is not sufficient evidence alone to help us answer whether red maple trees exhibit muted dynamics compared to sugar maple trees.

Diameter Linear Model

Another method used to assess whether red maple trees exhibit a muted dynamic compared to sugar maple trees is through a linear model. In this model, species is going to be the predictor variable, and diameter is the response variable.

diameter_model <- lm(diameter ~ species, data = maple_tap_clean)

Although the linear model was created, it must first be evaluated whether it is appropriate to use for the data. To test this, a new data set was created with the model’s residuals, which were then plotted to ensure they are around the zero reference line and did not display any patterns. The modelr package was used to extract the predicted values and residuals from the model in order to create this new data set.

library(modelr)
## Warning: package 'modelr' was built under R version 4.4.3
maple_tap_w_resids <- maple_tap_clean %>% 
  add_predictions(diameter_model) %>% 
  add_residuals(diameter_model)

ggplot(maple_tap_w_resids, aes(x = species, y = resid, color = species)) +
  geom_jitter(width = .25, height = 0) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    title = "Residuals by Species",
    x = "Species",
    y = "Residuals"
  )

Once again, the geom_jitter() function was used instead of the geom_point() function because it improves visualization when there is a fixed x-axis value with categorical variables by reducing overplotting. This residual plot shows no visible patterns for either species, which means that a linear model is appropriate to use for this data.

summary(diameter_model)
## 
## Call:
## lm(formula = diameter ~ species, data = maple_tap_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.403  -8.648  -2.103  10.697  21.017 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          46.583      2.405  19.373  < 2e-16 ***
## speciesSugar Maple   20.120      2.673   7.528 9.19e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.78 on 124 degrees of freedom
## Multiple R-squared:  0.3137, Adjusted R-squared:  0.3081 
## F-statistic: 56.67 on 1 and 124 DF,  p-value: 9.191e-12

Reviewing this summary, the average trunk diameter of red maple trees is 46.583cm, while sugar maple trees have trunk diameters that are 20.12cm larger on average. The p-value of the F-statistic is 9.19e-12, which is way below the level of significance at 0.05. This means that species is a statistically significant predictor of trunk diameter, and the model equation to calculate for diameter is diameter = 46.583 + 20.12(Sugar Maple). We can conclude that sugar maple trees have significantly larger trunks than red maple trees, and species can explain 30.81% of the variability of trunk diameter. However, this result is based on only one response variable from the many data sets provided by the researchers original study, and therefore we cannot conclude that red maple trees exhibit muted dynamics compared to sugar maple trees.

Maple Sap Data

The second data set being used in this report is the maple sap data set. This contains data about tree sap production with the variable sap.wt as well as sugar content production with the sugar variable.

maple_sap <- read_csv("hf285-02-maple-sap.csv")

datatable(maple_sap, options = list(scrollX = TRUE))

Cleaning the Maple Sap Data Set

The cleaning process is going to be very familiar to that of the maple tap data set. For this analysis, only the date, tree, species, sugar, and sap.wt are retained. The remaining variables, including time, datetime, and tap were excluded due to not being relevant to evaluate whether red maple trees exhibit muted dynamics compared to sugar maple trees. The focus for this data set is to determine whether species is a significant predictor of sugar and sap weight, though this will be done in two separate models.

maple_sap_clean <- maple_sap %>% 
  separate(date, into = c("year", "month", "day"), sep = "-") %>% 
  mutate(year = as.character(year),
         month = as.numeric(month), 
         day = as.numeric(day),
         species = case_when(
           species == "ACSA" ~ "Sugar Maple",
           species == "ACRU" ~ "Red Maple",
           TRUE ~ species)
         ) %>% 
  rename("tree_id" = "tree", "sap_wt" = "sap.wt") %>% 
  select(year, tree_id, species, sugar, sap_wt) %>% 
  drop_na(sugar, sap_wt)

datatable(maple_sap_clean, options = list(scrollX = TRUE))

Sugar Analysis

Sugar Contents Over Time Boxplot

Similar to the maple tap data set, a boxplot will be used to compare sugar content over time between red maple and sugar maple trees.

ggplot(maple_sap_clean, aes(x = factor(year), y = sugar, color = species)) +
  geom_boxplot() +
  labs(
    title = "Comparison of Sugar Content Over Time Between Species",
    x = "Year",
    y = "Sugar Content",
    color = "Species",
    caption = "Data Adapted from hf285-02-maple-sap.csv,\n obtained from [https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6]"
  )

Predictably, the sugar maple trees consistently exhibit much higher median sugar contents compared to red maple trees across the observed years. From the years 2015 to 2018, both species show similar patterns of medians fluctuations, though the difference of median levels stays consistent. Overall, the red maple trees display lower median sugar contents and less variability each year, whereas sugar maple trees show a much greater spread higher upper limits, including many high-end outliers.

Sugar Linear Model

sugar_model <- lm(sugar ~ species, data = maple_sap_clean)

As with the previous model, the residuals must be tested for randomness to deem it appropriate to use a linear model for sugar contents.

maple_sap_w_resids <- maple_sap_clean %>% 
  add_predictions(sugar_model) %>% 
  add_residuals(sugar_model)

ggplot(maple_sap_w_resids, aes(x = species, y = resid, color = species)) +
  geom_jitter(width = .45, height = 0) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    title = "Residuals by Species",
    x = "Species",
    y = "Residuals"
  )

From this residual plot, neither species displays any visible patterns and all the residuals are clustered near the zero reference line. This means a linear model would be appropriate to use for sugar contents.

summary(sugar_model)
## 
## Call:
## lm(formula = sugar ~ species, data = maple_sap_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7366 -0.4366 -0.0366  0.3634  4.7634 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.84293    0.02140   86.13   <2e-16 ***
## speciesSugar Maple  0.69367    0.02253   30.79   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5832 on 7573 degrees of freedom
## Multiple R-squared:  0.1113, Adjusted R-squared:  0.1111 
## F-statistic:   948 on 1 and 7573 DF,  p-value: < 2.2e-16

This summary evaluates that the average sugar content of red maple trees is 1.843, while sugar maple trees have sugar content that is 0.697 more on average. The p-value of the F-statistic is 2.2e-16, which is way below the level of significance at 0.05. This means that species is a statistically significant predictor of sugar content, and the model equation to calculate sugar is sugar = 1.843 + 0.697(Sugar Maple). We can conclude that sugar maple trees have significantly more sugar content than red maple trees, and species can explain 11.11% of the variability of sugar within the trees.

Sap Weight Analysis

Sap Weight Over Time Boxplot

For the final analysis, a boxplot will be used to compare sap weight over time between red maple and sugar maple trees.

ggplot(maple_sap_clean, aes(x = factor(year), y = sap_wt, color = species)) +
  geom_boxplot() +
  labs(
    title = "Comparison of Sap Weight Over Time Between Species",
    x = "Year",
    y = "Sap Weight (in kg)",
    color = "Species",
    caption = "Data Adapted from hf285-02-maple-sap.csv,\n obtained from [https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6]"
  )

Confirming the previous results, this boxplot shows that red maple trees have lower median sap weights and less variability than sugar maple trees. Although the species share a similar sap weight fluctuation pattern, the difference between species remains consistent over time as seen in the previous analyses. These results suggest that red maple trees exhibit reduced sap production compared to sugar maple trees. However, we can only see if this difference is statistically significant with a linear model.

Sap Weight Linear Model

sap_wt_model <- lm(sap_wt ~ species, data = maple_sap_clean)

As with the previous model, the residuals must be tested for randomness to deem it appropriate to use a linear model for sugar contents.

maple_sap_w_resids <- maple_sap_clean %>% 
  add_predictions(sap_wt_model) %>% 
  add_residuals(sap_wt_model)

ggplot(maple_sap_w_resids, aes(x = species, y = resid, color = species)) +
  geom_jitter(width = .45, height = 0) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    title = "Residuals by Species",
    x = "Species",
    y = "Residuals"
  )

Consistent with the other residual plots, this one does not show any noticeable patterns and the residuals are clustered near the zero reference line.

summary(sap_wt_model)
## 
## Call:
## lm(formula = sap_wt ~ species, data = maple_sap_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4227 -2.2895 -0.5727  1.6073 19.6073 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          2.3095     0.1111   20.78   <2e-16 ***
## speciesSugar Maple   2.1232     0.1170   18.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.029 on 7573 degrees of freedom
## Multiple R-squared:  0.04166,    Adjusted R-squared:  0.04154 
## F-statistic: 329.2 on 1 and 7573 DF,  p-value: < 2.2e-16

This summary evaluates that the average sap weight of red maple trees is 2.31kg, while sugar maple trees have sugar content that is 2.12kg more on average. The p-value of the F-statistic is 2.2e-16, which is way below the level of significance at 0.05. This means that species is a statistically significant predictor of sap weight, and the model equation to calculate sap weight is sap weight = 2.31 + 2.12(Sugar Maple). We can conclude that sugar maple trees produce significantly more sap than red maple trees.

Conclusion

In summary, the non-masting red maple trees exhibit muted dynamics compared to the masting sugar maple trees. Across various analyses, red maple trees consistently showed significantly lower values in trunk diameter, sugar content, and sap weight. These findings support a pattern of reduced productive and structural dynamics in red maple trees compared to sugar maple trees.