library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(modelr)
library(DT)

In this report we will be trying to figure out “does the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species?.” In order for us to figure out this question. We’ll have to analyse data such as if a tree is masting, or its growth, and how much it reproduces. Masting is when a tree produces a lot of seed. In order to do this we first have to look through 14 spreadsheets of data and figure out what we want to use. What I have come down to was 3 spreadsheets of data that will be useful to use for this experiment. The 3 sheets are named hf285-01-maple-tap.csv, hf285-02-maple-sap.csv and hf285-09-maple-seed-count.csv. The reason for narrowing down to these three data sets is that they are the only two that have both species of trees in them and the 9th one has the seed count for each year of the sugar maple. We got all this information from “https://doi.org/10.6073/pasta/7c2ddd7b75680980d84478011c5fbba9”. ACSA: “Acer saccharum” which is the Sugar Maple. ACRU: “Acer rubrum” which is the Red Maple. The ACRU does not Mast compare to its counterpart ACSA.

Data import

library(readr)
hf285_01_maple_tap <- read_csv("C:/Users/Jack/Downloads/hf285-01-maple-tap.csv")
## Rows: 382 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): tree, tap, species
## dbl  (3): dbh, tap.bearing, tap.height
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hf285_02_maple_sap <- read_csv("C:/Users/Jack/Downloads/hf285-02-maple-sap.csv")
## Rows: 9022 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): tree, tap, species
## dbl  (2): sugar, sap.wt
## dttm (1): datetime
## date (1): date
## time (1): time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hf285_09_maple_seed_count <- read_csv("C:/Users/Jack/Downloads/hf285-09-maple-seed-count.csv")
## Rows: 220 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): tree, note
## dbl  (3): EC.count, JR.count, total.count
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(hf285_01_maple_tap)
View(hf285_02_maple_sap)
View(hf285_09_maple_seed_count)

Data cleaning

We first will clean out the data set hf285-09-maple-seed-count to Seed_Clean. After that we next take the data of the seeds produced each year and graph them in order to determine which years the ACSA trees were masting. For the years that the ACSA trees were masting were 2011, 2013, 2017, and 2019. The years that were Nonmasting were 2012, 2014, 2015, 2016, 2018, 2020, 2021.

Seed_Clean <- hf285_09_maple_seed_count %>%
  select(-note, -EC.count, -JR.count)%>%
  separate(date, into = c("Year", "Month", "Day"))%>%
  mutate(Masting_year = case_when(Year == 2011 | Year == 2013 | Year == 2017 | Year == 2019 ~ "Masting", Year == 2012 | Year == 2014 | Year == 2015 | Year == 2016 | Year == 2018 | Year == 2020 | Year == 2021 ~ "Nonmasting"))%>%
  mutate(Year = as.numeric(Year))
view(Seed_Clean)
summary(Seed_Clean)
##      tree                Year         Month               Day           
##  Length:220         Min.   :2011   Length:220         Length:220        
##  Class :character   1st Qu.:2013   Class :character   Class :character  
##  Mode  :character   Median :2016   Mode  :character   Mode  :character  
##                     Mean   :2016                                        
##                     3rd Qu.:2019                                        
##                     Max.   :2021                                        
##   total.count    Masting_year      
##  Min.   : 0.00   Length:220        
##  1st Qu.: 0.00   Class :character  
##  Median : 0.00   Mode  :character  
##  Mean   :16.74                     
##  3rd Qu.:35.25                     
##  Max.   :77.00
ggplot(data = Seed_Clean, mapping = aes(x = Year, y = total.count))+
  geom_boxplot()+
  labs( title = "Seed production", y = "Seed count")
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

Next we will clean out hf285-01-maple-tap then call it Maple_Clean, and hf285-02-maple-sap then call it Sugar_Clean. We will then remove all rows with “na” in them to clear out information that can skew the data. We will also separate the dates into Year, Month, and Day. Then convert Year to a numeric value so we can use it in our graphs. Then we’ll tag on the years that were and weren’t masting. After that we’ll get the average dbh(Diameter at breast height) per year from the Tap_Clean data set.

Tap_Clean <- hf285_01_maple_tap%>%
  drop_na()%>%
  mutate(tree = sub("^HFR", "AR", tree))%>%
  separate(date, into = c("Year", "Month", "Day"))%>%
    mutate(Masting_year = case_when(Year == 2011 | Year == 2013 | Year == 2017 | Year == 2019 ~ "Masting", Year == 2012 | Year == 2014 | Year == 2015 | Year == 2016 | Year == 2018 | Year == 2020 | Year == 2021 ~ "Nonmasting"))%>%
  mutate(Year = as.numeric(Year))

view(Tap_Clean)
summary(Tap_Clean)
##       Year         Month               Day                tree          
##  Min.   :2012   Length:119         Length:119         Length:119        
##  1st Qu.:2016   Class :character   Class :character   Class :character  
##  Median :2016   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2016                                                           
##  3rd Qu.:2017                                                           
##  Max.   :2017                                                           
##      tap              species               dbh         tap.bearing   
##  Length:119         Length:119         Min.   :31.00   Min.   :  1.0  
##  Class :character   Class :character   1st Qu.:54.50   1st Qu.: 74.0  
##  Mode  :character   Mode  :character   Median :63.70   Median :182.0  
##                                        Mean   :62.68   Mean   :171.8  
##                                        3rd Qu.:75.45   3rd Qu.:253.0  
##                                        Max.   :86.20   Max.   :359.0  
##    tap.height  Masting_year      
##  Min.   : 56   Length:119        
##  1st Qu.: 96   Class :character  
##  Median :120   Mode  :character  
##  Mean   :120                     
##  3rd Qu.:146                     
##  Max.   :185
Sap_Clean <- hf285_02_maple_sap%>%
  select( -time, -datetime)%>%
  separate(date, into = c("Year", "Month", "Day"))%>%
    mutate(Masting_year = case_when(Year == 2011 | Year == 2013 | Year == 2017 | Year == 2019 ~ "Masting", Year == 2012 | Year == 2014 | Year == 2015 | Year == 2016 | Year == 2018 | Year == 2020 | Year == 2021 ~ "Nonmasting"))%>%
  drop_na()%>%
  mutate(Year = as.numeric(Year))
view(Sap_Clean)
summary(Sap_Clean)
##       Year         Month               Day                tree          
##  Min.   :2012   Length:7145        Length:7145        Length:7145       
##  1st Qu.:2014   Class :character   Class :character   Class :character  
##  Median :2016   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2016                                                           
##  3rd Qu.:2018                                                           
##  Max.   :2021                                                           
##      tap                sugar        species              sap.wt      
##  Length:7145        Min.   :0.80   Length:7145        Min.   : 0.010  
##  Class :character   1st Qu.:2.00   Class :character   1st Qu.: 1.780  
##  Mode  :character   Median :2.40   Mode  :character   Median : 3.420  
##                     Mean   :2.48                      Mean   : 4.146  
##                     3rd Qu.:2.90                      3rd Qu.: 5.800  
##                     Max.   :7.30                      Max.   :24.040  
##  Masting_year      
##  Length:7145       
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

Exploratory plots

Tap_Whole_date <- Tap_Clean %>%
  mutate(
    Year = as.numeric(Year),
    Month = as.numeric(Month),
    Day = as.numeric(Day),
    Date = as.Date(sprintf("%04d-%02d-%02d", Year, Month, Day))
  )
Avg_dbh_year <- Tap_Whole_date %>%
  group_by(Year, species) %>%
  summarise(mean_dbh = mean(dbh, na.rm = TRUE))
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
view(Avg_dbh_year)
summary(Avg_dbh_year)
##       Year        species             mean_dbh    
##  Min.   :2012   Length:5           Min.   :46.17  
##  1st Qu.:2016   Class :character   1st Qu.:46.99  
##  Median :2016   Mode  :character   Median :65.27  
##  Mean   :2016                      Mean   :58.58  
##  3rd Qu.:2017                      3rd Qu.:66.98  
##  Max.   :2017                      Max.   :67.47

We will make a bar graph comparing the sugar production for both species and compare when it is a masting year or not.

ggplot(data = Sap_Clean, mapping = aes(x= species, y = sugar, fill = Masting_year))+
  geom_boxplot()+
  labs(title = "Sugar production")

For the result, the ACRU produces more sugar during non masting years, while the ACSA produces more sugar during the masting years.

We’ll then do the same thing for sap.wt (Sap weight) as well. Comparing sap weight to the species masting and non masting years.

ggplot(data = Sap_Clean, mapping = aes(x= species, y = sap.wt, fill = Masting_year))+
  geom_boxplot()+
  labs(title = "Sap production")

The results show that ACRU has more sap production in non masting years, while ACSA has more sap production in masting years.

This is a common trend for both. That during non masting years the ACRU produces both more sap and sugar, while the ACSA produces more sugar and sap during masting years.

Statistical tests — sap and sugar

We will then use linear regression to check these trends to see if they have any meaning.

Sugar_check <- lm(sugar ~ species, data = Sap_Clean)
summary(Sugar_check)
## 
## Call:
## lm(formula = sugar ~ species, data = Sap_Clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7539 -0.3539 -0.0539  0.3461  4.7461 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.84293    0.02136   86.29   <2e-16 ***
## speciesACSA  0.71099    0.02256   31.51   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5821 on 7143 degrees of freedom
## Multiple R-squared:  0.1221, Adjusted R-squared:  0.1219 
## F-statistic: 993.1 on 1 and 7143 DF,  p-value: < 2.2e-16
Sap_check <- lm(sap.wt ~ species, data = Sap_Clean)
summary(Sap_check)
## 
## Call:
## lm(formula = sap.wt ~ species, data = Sap_Clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3497 -2.2597 -0.5997  1.5803 19.6803 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.3095     0.1100   21.00   <2e-16 ***
## speciesACSA   2.0502     0.1162   17.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.997 on 7143 degrees of freedom
## Multiple R-squared:  0.04178,    Adjusted R-squared:  0.04164 
## F-statistic: 311.4 on 1 and 7143 DF,  p-value: < 2.2e-16

From the sugar we get a small p-value(2e-16) which tells us that the species plays a strong effect on the sugar production. The r^2 (0.1219) value is small which tells us that the model does not explain much variance in it. The same thing happens as well with the sap weight. Small p-value (2e-16) and r^2 (0.04164).

We then add the years to both models since the years may be able to tell us if there is more growth throughout the years that the tree lives.

Sugar_Check_m <- lm(sugar ~ species + Masting_year, data = Sap_Clean)
summary(Sugar_Check_m)
## 
## Call:
## lm(formula = sugar ~ species + Masting_year, data = Sap_Clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7390 -0.3798 -0.0798  0.3610  4.7202 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             1.87299    0.02385  78.539  < 2e-16 ***
## speciesACSA             0.70679    0.02260  31.274  < 2e-16 ***
## Masting_yearNonmasting -0.04075    0.01442  -2.827  0.00471 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5819 on 7142 degrees of freedom
## Multiple R-squared:  0.123,  Adjusted R-squared:  0.1228 
## F-statistic:   501 on 2 and 7142 DF,  p-value: < 2.2e-16
Sap_Check_m <- lm(sap.wt ~ species + Masting_year, data = Sap_Clean)
summary(Sap_Check_m)
## 
## Call:
## lm(formula = sap.wt ~ species + Masting_year, data = Sap_Clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3862 -2.2662 -0.5929  1.5738 19.7071 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             2.36357    0.12286  19.239   <2e-16 ***
## speciesACSA             2.04260    0.11643  17.544   <2e-16 ***
## Masting_yearNonmasting -0.07328    0.07427  -0.987    0.324    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.997 on 7142 degrees of freedom
## Multiple R-squared:  0.04191,    Adjusted R-squared:  0.04164 
## F-statistic: 156.2 on 2 and 7142 DF,  p-value: < 2.2e-16

In the sugar model the p-value stays the same for species and the p-value for years is low(2e-16) while the r^2 value hardly improves (0.1228) at all. In the Sap model the p-value stays the same, but the p-value for years is high at (0.324) which is above our 0.05 threshold which tells us that we should not use it in our model. the r^2 value barely improves as well(0.04164). The small r^2 values indicate these models explain only a small proportion of variance

DBH (diameter at breast height) analyses

We’ll next graph a box plot comparing the year to dbh, organising it by the species and if it is a masting year. Then we will make another graph that will compare the average dbh over the years. Separated by species. These graphs will help us see if dbh plays an effect on how many seeds are produced.

ggplot(data = Tap_Clean, mapping = aes(x= Year, y = dbh, fill = species, color = Masting_year))+
  geom_boxplot()+
  scale_color_manual(values = c(
    "Masting" = "yellow",
    "Nonmasting" = "purple"))+
  labs( title = "dbh in each year")

For the second graph the slopes of both lines tells us how much each tree grows on average. ACRU grows about 0.8166667 cm per year, while ACSA grows about 0.4353520 cm per year. This shows that ACRU grows much quicker than ACSA.

ggplot(Avg_dbh_year, aes(x = Year, y = mean_dbh, color = species)) +
  geom_line() +
  geom_point() +
  labs(x = "Year", y = "Average dbh", title = "Avg dbh over the years")

slopes_dbh <- Avg_dbh_year %>%
  group_by(species) %>%
  do(model = lm(mean_dbh ~ Year, data = .)) %>%
  mutate(slope = coef(model)[["Year"]])

slopes_dbh %>% select(species, slope)
## # A tibble: 2 × 2
## # Rowwise: 
##   species slope
##   <chr>   <dbl>
## 1 ACRU    0.817
## 2 ACSA    0.435

We’ll then use linear regression to see if there are any effects that species has on dbh

dbh_sp <- lm(dbh ~ species, data = Tap_Clean)
summary(dbh_sp)
## 
## Call:
## lm(formula = dbh ~ species, data = Tap_Clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.453  -8.618  -2.353  10.682  21.017 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   46.583      2.432   19.15  < 2e-16 ***
## speciesACSA   20.169      2.722    7.41 2.15e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.91 on 117 degrees of freedom
## Multiple R-squared:  0.3194, Adjusted R-squared:  0.3136 
## F-statistic: 54.91 on 1 and 117 DF,  p-value: 2.146e-11

We see that the p-value is small and the r^2 value is relative close to the middle. So the model fits the variance well.

We’ll now add the masting_year variable to the model to see how it affects dbh.

The p-value for the masting year is high(0.632) so the variable does not show if there is significance that the masting year has an effect on the tree’s seed production.

dbh_sp_Myear <- lm(dbh ~ species + Masting_year, data = Tap_Clean)
summary(dbh_sp_Myear)
## 
## Call:
## lm(formula = dbh ~ species + Masting_year, data = Tap_Clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.044  -8.623  -2.244  10.377  21.156 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              47.123      2.686  17.545  < 2e-16 ***
## speciesACSA              20.300      2.744   7.397 2.37e-11 ***
## Masting_yearNonmasting   -1.079      2.245  -0.481    0.632    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.95 on 116 degrees of freedom
## Multiple R-squared:  0.3208, Adjusted R-squared:  0.309 
## F-statistic: 27.39 on 2 and 116 DF,  p-value: 1.81e-10

Conclusion:

In conclusion we can say that there is enough evidence that the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species. The growth of the dbh for ACRU is greater than ACSU. The evidence in the p-value for the models for sap and sugar growth during the masting years for the ACSA compared to the growth of the non masting seasons for the ACRU with all the graphs shows that the red maple ACRU exhibits mutated dynamics compared to its masting sugar counterpart.

Citations

Citation Rapp, J., E. Crone, and K. Stinson. 2023. Maple Reproduction and Sap Flow at Harvard Forest since 2011 ver 6. Environmental Data Initiative. https://doi.org/10.6073/pasta/7c2ddd7b75680980d84478011c5fbba9 (Accessed 2025-12-05).