This report will answer the following question:
Does the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species?
We will be using multiple data tables from a 13 year study at Harvard
involving the reproduction of maple trees along with the sap collection.
Starting in 2011, this study originally only focused on the masting
sugar maple tree, it was not until 2015 that the observation of the
non-masting red maple tree began to attempt to answer the same question
this report will be answering. For this report, we will be using the
most recent data provided by the study and will only be referencing the
data after both tree species were being observed, 2015 onward. A
non-masting species, such as the red maple, continuously produces and
releases seeds where as a masting species, such as the sugar maple, has
one large seed production once every few years. The dynamics of a tree
can be interpreted as the signs of life, these include but are not
limited to, sap production, seed production, flower production , and
trunk diameter growth. The signs of life we will be investigating are
the sap production and the trunk diameter, through the data tables
hf285_01_maple_tap and hf285_02_maple_sap. The
use of data tables involving the seed or flower production will not be
used as the data collected does not involve both species of maple tree.
The relevant variables to this report are dbh (trunk
diameter at breast height (1.4 m)), species (species of
tree), date (date of data collection), and
sap_wt (weight of sap collected (kg)). The full data sets
can be viewed below:
Tap Data Set:
Sap Data Set:
Throughout, the tidyverse package will be used to create visualizations and statistical analysis of the data along with the data table package to display the data tables created.
library(tidyverse)
library(DT)
Citation:
Rapp, J., E. Crone, and K. Stinson. 2025. Maple Reproduction and Sap Flow at Harvard Forest since 2011 ver 7. Environmental Data Initiative. https://doi.org/10.6073/pasta/2ad66d124f043c7326ca2b607447c518 (Accessed 2025-12-09).
We will start by analyzing the difference in trunk diameter of each species over each year. Before we can the analysis can begin, we must first check for any entry errors or outliers in the original data set.
summary(hf285_01_maple_tap)
## date tree tap species
## Min. :2012-02-14 Length:382 Length:382 Length:382
## 1st Qu.:2014-02-20 Class :character Class :character Class :character
## Median :2017-02-18 Mode :character Mode :character Mode :character
## Mean :2017-04-19
## 3rd Qu.:2020-02-02
## Max. :2022-02-12
##
## dbh tap_bearing tap_height
## Min. :31.00 Min. : 0.0 Min. : 50.0
## 1st Qu.:54.70 1st Qu.: 76.0 1st Qu.: 99.0
## Median :63.85 Median :180.0 Median :123.0
## Mean :62.87 Mean :173.8 Mean :120.6
## 3rd Qu.:74.88 3rd Qu.:261.0 3rd Qu.:142.0
## Max. :86.20 Max. :359.0 Max. :188.0
## NA's :256 NA's :7 NA's :7
From the summary of the original data set, we can determine that there are no obvious entry errors in the numerical variables and we can now focus on cleaning up the data set.
maple_tap <- hf285_01_maple_tap %>%
rename("Species" = `species`) %>%
rename("Date" = `date`) %>%
rename("Tree Number" = `tree`) %>%
rename("Tap" = `tap`) %>%
rename("Diameter" = `dbh`) %>%
select(-`tap_bearing`, -`tap_height`) %>%
mutate(`Species` = str_replace(`Species`, "ACSA", "Sugar Maple")) %>%
mutate(`Species` = str_replace(`Species`, "ACRU", "Red Maple")) %>%
mutate(`Tap`= replace_na(`Tap`, "Unknown")) %>%
mutate(`Diameter` = replace_na(`Diameter`, mean(`Diameter`, na.rm = TRUE))) %>%
separate(`Date`, into = c("Year", "Month", "Day")) %>%
select(-`Month`, -`Day`) %>%
filter(`Year` >= 2015)
datatable(maple_tap, options=list(scrollx=TRUE))
The data table above provides a more focused view on the relevant variables, which will allow us to analyze the data more effectively. While we can manipulate this data table more to draw a conclusion about which species has a larger trunk diameter and therefore, exhibits more dynamics, a visualization would be the most effective and would display the data more thoroughly.
ggplot(maple_tap)+
geom_boxplot(mapping = aes(x = Year, y = Diameter, color = Species))+
labs(y = "Diameter (cm)",
title = "Trunk Diameter at 1.4 Meters (cm) Over Time")
From the years 2018 to 2022, it appears that there was no data collected for the diameters of the tree trunks, thus the average diameter collected was used to replace the unknown values. We can now refine this visualization to only present the years that have data collected for the diameters.
maple_tap_2 <- maple_tap %>%
filter(`Year` <=2017)
datatable(maple_tap_2, options=list(scrollx=TRUE))
ggplot(maple_tap_2)+
geom_boxplot(mapping = aes(x = Year, y = Diameter, color = Species))+
labs(y = "Diameter (cm)",
title = "Trunk Diameter at 1.4 Meters (cm) Over Time")
The plot above now contains only the relevant data of both tree species and demonstrates that the sugar maple trees have a greater diameter growth over time compared to the red maples, meaning that in this area of observation, the red maples exhibit muted dynamics. However, this statement should not be fully supported as it is difficult to base a claim on only two data points, the data could be more reliable if there were more data points for the red maple in the future years.
In order to draw a more stable and trust worthy conclusion we will analyze the sap weight collected each year to further determine if the red maple tree exhibits muted dynamics is the sap weight collected each year. Before we can begin analyzing the data, we must first check for any entry errors in the original data set.
summary(hf285_02_maple_sap)
## date tree tap
## Min. :2012-02-14 Length:9022 Length:9022
## 1st Qu.:2014-04-06 Class :character Class :character
## Median :2017-02-21 Mode :character Mode :character
## Mean :2016-11-12
## 3rd Qu.:2019-03-14
## Max. :2022-04-03
##
## time datetime sugar
## Min. :02:53:00.000000 Min. :2012-02-14 16:25:00 Min. : 0.800
## 1st Qu.:14:30:00.000000 1st Qu.:2012-03-12 13:37:00 1st Qu.: 2.000
## Median :16:01:00.000000 Median :2013-03-26 16:06:00 Median : 2.400
## Mean :15:11:00.093385 Mean :2016-01-17 07:37:08 Mean : 2.473
## 3rd Qu.:16:35:00.000000 3rd Qu.:2020-03-24 16:51:00 3rd Qu.: 2.900
## Max. :19:58:00.000000 Max. :2022-04-03 15:50:00 Max. :22.000
## NA's :7737 NA's :7737 NA's :1009
## species sap_wt
## Length:9022 Min. : 0.010
## Class :character 1st Qu.: 1.460
## Mode :character Median : 3.180
## Mean : 3.893
## 3rd Qu.: 5.580
## Max. :24.040
## NA's :631
In the summary of the data set, a few variables’ maximum values raise
concern. The variables in question are sugar and
sap_wt. We will test if these values are due to an entry
error using two different methods below.
We will first test how skewed the data for sugar is, to
see if it is heavily skewed right or if it is more equally
distributed.
ggplot(hf285_02_maple_sap)+
geom_histogram(mapping = aes(x = sugar), bins = 100)
The visualization above demonstrates that the data is heavily skewed
to the right hand side, indicating a possible entry error. To better
determine if this is an entry error we will rearrange the data set to
present only the sugar data in descending order.
maple_sap_sugar <- hf285_02_maple_sap %>%
select(sugar) %>%
arrange(desc(sugar)) %>%
head(10)
datatable(maple_sap_sugar, options=list(scrollx=TRUE))
From the data set above, there is a large gap between the two largest
values for sugar, indicating that 22 is a data entry error
that we can assume was meant to be entered as 2.2.
We will now test if the maximum sap_wt is an entry error
through the same methods as the sugar data.
ggplot(hf285_02_maple_sap)+
geom_histogram(mapping = aes(x = sap_wt), bins = 100)
The visualization above reveals that there is not a large gap between the large chunk of data and the outlying values. There are also multiple values that are out towards the right hand side, which could indicate the maximum value is not necessarily a data entry error. To conclude whether or not the values were incorrectly entered, we will rearrange the data similarly to the previous test.
maple_sap_weight <- hf285_02_maple_sap %>%
select(sap_wt) %>%
arrange(desc(sap_wt)) %>%
head(10)
datatable(maple_sap_weight, options=list(scrollx=TRUE))
Based on the data and visualization above, it can be determined that the maximum sap weight was not an entry error and was entered correctly.
We can now move on to cleaning up the sap data set. The variable
sugar is not relevant to the question we are trying to
answer so the data entry error that was determined earlier, does not
need to be corrected for this new data set.
maple_sap <- hf285_02_maple_sap %>%
select(-time, -`datetime`, -sugar) %>%
mutate(`species` = str_replace(`species`, "ACSA", "Sugar Maple")) %>%
mutate(`species` = str_replace(`species`, "ACRU", "Red Maple")) %>%
rename("sap weight" = `sap_wt`) %>%
mutate(`sap weight` = replace_na(`sap weight`, mean(`sap weight`, na.rm = TRUE))) %>%
separate(`date`, into = c("year", "Month", "Day")) %>%
select(-Month, -Day) %>%
filter(`year` >= 2015)
datatable(maple_sap, options=list(scrollx=TRUE))
The data table above now focuses on the relevant data and allows a visualization to be created to display the data so that it may be analysed.
ggplot(maple_sap)+
geom_boxplot(mapping = aes(x = year, y = `sap weight`, color = species))+
labs(x = "Year",
y = "Sap Weight (kg)",
title = "Sap Collected (kg) Over Time")
From the graph above, we can see that the study stopped collecting data for the red maple trees in 2019. To keep the data and visualization focused on comparing the two tree species, we will create another version of the data that focuses on the years where both tree species have data.
maple_sap_2 <- maple_sap %>%
filter(`year` <= 2018)
datatable(maple_sap_2, options=list(scrollx=TRUE))
ggplot(maple_sap_2)+
geom_boxplot(mapping = aes(x = year, y = `sap weight`, color = species))+
labs(x = "Year",
y = "Sap Weight (kg)",
title = "Sap Collected (kg) Over Time")
The graph above more accurately displays the comparison between the two tree species and allows us to state that the sugar maples have a higher average sap weight over time compared to the red maples. Based on the diameter of the trees from the earlier graph, it would make sense that the sap would follow a similar trend. In the graph above, there is a noticeable dip in sap production in 2016, there is no indication of an measured variable from the data sets causing this dip, we can assume that it was due to an external variable such as a significant change in weather, such as a drought. Overall, this visualization of the data supports and proves the earlier unreliable claim that the red maples show muted dynamics.
From the relevant data extracted, visualized and analysed, we can conclude that a non-masting red maple tree exhibits muted dynamics compared to a masting sugar maple tree. The data is able to support this claim and we were able to determine from the original data collected, what variables from which data tables could be manipulated to accurately display the difference in dynamics between the two maple species over a period of time, and determine that the non-masting red maple species exhibits muted dynamics compared to the masting sugar maple species.