Introduction:

This report will answer the following question:

Does the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species?

We will be using multiple data tables from a 13 year study at Harvard involving the reproduction of maple trees along with the sap collection. Starting in 2011, this study originally only focused on the masting sugar maple tree, it was not until 2015 that the observation of the non-masting red maple tree began to attempt to answer the same question this report will be answering. For this report, we will be using the most recent data provided by the study and will only be referencing the data after both tree species were being observed, 2015 onward. A non-masting species, such as the red maple, continuously produces and releases seeds where as a masting species, such as the sugar maple, has one large seed production once every few years. The dynamics of a tree can be interpreted as the signs of life, these include but are not limited to, sap production, seed production, flower production , and trunk diameter growth. The signs of life we will be investigating are the sap production and the trunk diameter, through the data tables hf285_01_maple_tap and hf285_02_maple_sap. The use of data tables involving the seed or flower production will not be used as the data collected does not involve both species of maple tree. The relevant variables to this report are dbh (trunk diameter at breast height (1.4 m)), species (species of tree), date (date of data collection), and sap_wt (weight of sap collected (kg)). The full data sets can be viewed below:

Tap Data Set:

Sap Data Set:

Throughout, the tidyverse package will be used to create visualizations and statistical analysis of the data along with the data table package to display the data tables created.

library(tidyverse)
library(DT)

Citation:

Rapp, J., E. Crone, and K. Stinson. 2025. Maple Reproduction and Sap Flow at Harvard Forest since 2011 ver 7. Environmental Data Initiative. https://doi.org/10.6073/pasta/2ad66d124f043c7326ca2b607447c518 (Accessed 2025-12-09).

Trunk Diameter Over Time

We will start by analyzing the difference in trunk diameter of each species over each year. Before we can the analysis can begin, we must first check for any entry errors or outliers in the original data set.

summary(hf285_01_maple_tap)
##       date                tree               tap              species         
##  Min.   :2012-02-14   Length:382         Length:382         Length:382        
##  1st Qu.:2014-02-20   Class :character   Class :character   Class :character  
##  Median :2017-02-18   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2017-04-19                                                           
##  3rd Qu.:2020-02-02                                                           
##  Max.   :2022-02-12                                                           
##                                                                               
##       dbh         tap_bearing      tap_height   
##  Min.   :31.00   Min.   :  0.0   Min.   : 50.0  
##  1st Qu.:54.70   1st Qu.: 76.0   1st Qu.: 99.0  
##  Median :63.85   Median :180.0   Median :123.0  
##  Mean   :62.87   Mean   :173.8   Mean   :120.6  
##  3rd Qu.:74.88   3rd Qu.:261.0   3rd Qu.:142.0  
##  Max.   :86.20   Max.   :359.0   Max.   :188.0  
##  NA's   :256     NA's   :7       NA's   :7

From the summary of the original data set, we can determine that there are no obvious entry errors in the numerical variables and we can now focus on cleaning up the data set.

maple_tap <- hf285_01_maple_tap %>% 
  rename("Species" = `species`) %>% 
  rename("Date" = `date`) %>% 
  rename("Tree Number" = `tree`) %>% 
  rename("Tap" = `tap`) %>% 
  rename("Diameter" = `dbh`) %>% 
  select(-`tap_bearing`, -`tap_height`) %>% 
  mutate(`Species` = str_replace(`Species`, "ACSA", "Sugar Maple")) %>% 
  mutate(`Species` = str_replace(`Species`, "ACRU", "Red Maple")) %>% 
  mutate(`Tap`= replace_na(`Tap`, "Unknown")) %>% 
  mutate(`Diameter` = replace_na(`Diameter`, mean(`Diameter`, na.rm = TRUE))) %>% 
  separate(`Date`, into = c("Year", "Month", "Day")) %>% 
  select(-`Month`, -`Day`) %>% 
  filter(`Year` >= 2015)
datatable(maple_tap, options=list(scrollx=TRUE))

The data table above provides a more focused view on the relevant variables, which will allow us to analyze the data more effectively. While we can manipulate this data table more to draw a conclusion about which species has a larger trunk diameter and therefore, exhibits more dynamics, a visualization would be the most effective and would display the data more thoroughly.

ggplot(maple_tap)+
  geom_boxplot(mapping = aes(x = Year, y = Diameter, color = Species))+
  labs(y = "Diameter (cm)",
       title = "Trunk Diameter at 1.4 Meters (cm) Over Time")

From the years 2018 to 2022, it appears that there was no data collected for the diameters of the tree trunks, thus the average diameter collected was used to replace the unknown values. We can now refine this visualization to only present the years that have data collected for the diameters.

maple_tap_2 <- maple_tap %>% 
  filter(`Year` <=2017)
datatable(maple_tap_2, options=list(scrollx=TRUE))
ggplot(maple_tap_2)+
  geom_boxplot(mapping = aes(x = Year, y = Diameter, color = Species))+
  labs(y = "Diameter (cm)",
       title = "Trunk Diameter at 1.4 Meters (cm) Over Time")

The plot above now contains only the relevant data of both tree species and demonstrates that the sugar maple trees have a greater diameter growth over time compared to the red maples, meaning that in this area of observation, the red maples exhibit muted dynamics. However, this statement should not be fully supported as it is difficult to base a claim on only two data points, the data could be more reliable if there were more data points for the red maple in the future years.

Collected Sap Weight Over Time

In order to draw a more stable and trust worthy conclusion we will analyze the sap weight collected each year to further determine if the red maple tree exhibits muted dynamics is the sap weight collected each year. Before we can begin analyzing the data, we must first check for any entry errors in the original data set.

summary(hf285_02_maple_sap)
##       date                tree               tap           
##  Min.   :2012-02-14   Length:9022        Length:9022       
##  1st Qu.:2014-04-06   Class :character   Class :character  
##  Median :2017-02-21   Mode  :character   Mode  :character  
##  Mean   :2016-11-12                                        
##  3rd Qu.:2019-03-14                                        
##  Max.   :2022-04-03                                        
##                                                            
##       time                    datetime                       sugar       
##  Min.   :02:53:00.000000   Min.   :2012-02-14 16:25:00   Min.   : 0.800  
##  1st Qu.:14:30:00.000000   1st Qu.:2012-03-12 13:37:00   1st Qu.: 2.000  
##  Median :16:01:00.000000   Median :2013-03-26 16:06:00   Median : 2.400  
##  Mean   :15:11:00.093385   Mean   :2016-01-17 07:37:08   Mean   : 2.473  
##  3rd Qu.:16:35:00.000000   3rd Qu.:2020-03-24 16:51:00   3rd Qu.: 2.900  
##  Max.   :19:58:00.000000   Max.   :2022-04-03 15:50:00   Max.   :22.000  
##  NA's   :7737              NA's   :7737                  NA's   :1009    
##    species              sap_wt      
##  Length:9022        Min.   : 0.010  
##  Class :character   1st Qu.: 1.460  
##  Mode  :character   Median : 3.180  
##                     Mean   : 3.893  
##                     3rd Qu.: 5.580  
##                     Max.   :24.040  
##                     NA's   :631

In the summary of the data set, a few variables’ maximum values raise concern. The variables in question are sugar and sap_wt. We will test if these values are due to an entry error using two different methods below.

We will first test how skewed the data for sugar is, to see if it is heavily skewed right or if it is more equally distributed.

ggplot(hf285_02_maple_sap)+
  geom_histogram(mapping = aes(x = sugar), bins = 100)

The visualization above demonstrates that the data is heavily skewed to the right hand side, indicating a possible entry error. To better determine if this is an entry error we will rearrange the data set to present only the sugar data in descending order.

maple_sap_sugar <- hf285_02_maple_sap %>% 
  select(sugar) %>% 
  arrange(desc(sugar)) %>% 
  head(10)
datatable(maple_sap_sugar, options=list(scrollx=TRUE))

From the data set above, there is a large gap between the two largest values for sugar, indicating that 22 is a data entry error that we can assume was meant to be entered as 2.2.

We will now test if the maximum sap_wt is an entry error through the same methods as the sugar data.

ggplot(hf285_02_maple_sap)+
  geom_histogram(mapping = aes(x = sap_wt), bins = 100)

The visualization above reveals that there is not a large gap between the large chunk of data and the outlying values. There are also multiple values that are out towards the right hand side, which could indicate the maximum value is not necessarily a data entry error. To conclude whether or not the values were incorrectly entered, we will rearrange the data similarly to the previous test.

maple_sap_weight <- hf285_02_maple_sap %>% 
  select(sap_wt) %>% 
  arrange(desc(sap_wt)) %>% 
  head(10)
datatable(maple_sap_weight, options=list(scrollx=TRUE))

Based on the data and visualization above, it can be determined that the maximum sap weight was not an entry error and was entered correctly.

We can now move on to cleaning up the sap data set. The variable sugar is not relevant to the question we are trying to answer so the data entry error that was determined earlier, does not need to be corrected for this new data set.

maple_sap <- hf285_02_maple_sap %>% 
  select(-time, -`datetime`, -sugar) %>% 
  mutate(`species` = str_replace(`species`, "ACSA", "Sugar Maple")) %>% 
  mutate(`species` = str_replace(`species`, "ACRU", "Red Maple")) %>% 
  rename("sap weight" = `sap_wt`) %>% 
  mutate(`sap weight` = replace_na(`sap weight`, mean(`sap weight`, na.rm = TRUE))) %>% 
  separate(`date`, into = c("year", "Month", "Day")) %>% 
  select(-Month, -Day) %>% 
  filter(`year` >= 2015)
datatable(maple_sap, options=list(scrollx=TRUE))

The data table above now focuses on the relevant data and allows a visualization to be created to display the data so that it may be analysed.

ggplot(maple_sap)+
  geom_boxplot(mapping = aes(x = year, y = `sap weight`, color = species))+
  labs(x = "Year",
       y = "Sap Weight (kg)",
       title = "Sap Collected (kg) Over Time")

From the graph above, we can see that the study stopped collecting data for the red maple trees in 2019. To keep the data and visualization focused on comparing the two tree species, we will create another version of the data that focuses on the years where both tree species have data.

maple_sap_2 <- maple_sap %>% 
  filter(`year` <= 2018)
datatable(maple_sap_2, options=list(scrollx=TRUE))
ggplot(maple_sap_2)+
  geom_boxplot(mapping = aes(x = year, y = `sap weight`, color = species))+
  labs(x = "Year",
       y = "Sap Weight (kg)",
       title = "Sap Collected (kg) Over Time")

The graph above more accurately displays the comparison between the two tree species and allows us to state that the sugar maples have a higher average sap weight over time compared to the red maples. Based on the diameter of the trees from the earlier graph, it would make sense that the sap would follow a similar trend. In the graph above, there is a noticeable dip in sap production in 2016, there is no indication of an measured variable from the data sets causing this dip, we can assume that it was due to an external variable such as a significant change in weather, such as a drought. Overall, this visualization of the data supports and proves the earlier unreliable claim that the red maples show muted dynamics.

Conclusion

From the relevant data extracted, visualized and analysed, we can conclude that a non-masting red maple tree exhibits muted dynamics compared to a masting sugar maple tree. The data is able to support this claim and we were able to determine from the original data collected, what variables from which data tables could be manipulated to accurately display the difference in dynamics between the two maple species over a period of time, and determine that the non-masting red maple species exhibits muted dynamics compared to the masting sugar maple species.