knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE)
This report seeks to answer the following question:
Does the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species?
I will be using multiple data sets, maple_sap
and
maple_tap
, obtained from https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6.
They include various forms of information about red maple species trees
and sugar maple species trees. The maple_sap
data set
contains 9 variables and 9,022 entries. Of these variables, the relevant
ones in this report are year
(year of data collection),
species
(type of tree: red maple or sugar maple), and
sap.wt
(weight of sap collected in kilograms). The
maple_tap
data set contains 8 variables and 382 entries. Of
these variables, the relevant ones are year
(year of data
collection), species
(type of tree: red maple or sugar
maple), and dbh
(diameter of tree at 1.4 meters above
ground).
The full citation for this data is: Rapp, J., E. Crone, and K. Stinson. 2023. Maple Reproduction and Sap Flow at Harvard Forest since 2011 ver 6. Environmental Data Initiative. https://doi.org/10.6073/pasta/7c2ddd7b75680980d84478011c5fbba9 (Accessed 2024-12-10).
Throughout this report, I will need the functionality of the tidyverse package and readr package.
library(tidyverse)
library(readr)
Before I start analyzing the data, I will load the data sheets I will be observing into R.
maple_sap <- read_csv("hf285-02-maple-sap.csv")
maple_tap <- read_csv("hf285-01-maple-tap.csv")
Here are the full data sets:
maple_sap
maple_tap
The first objective I have is to visually compare the weights of the
sap that has been collected from the red maple trees and the sugar maple
trees. To do so, I will be creating a box plot with data using the
maple_sap
data sheet. This will give me an idea of how the
specific data points prove or disprove the hypothesis that the
non-masting red maple species exhibit muted dynamics compared to the
masting sugar maple species.
Since I only need the year that the data was collected in order to make the visualization clearer to analyze, I first have to separate the original date columns into a year column and a day/month column.
maple_sap <- maple_sap %>%
separate(date, into = c("year", "month_day"), sep = -6)
Now I can create a box plot.
ggplot(data = maple_sap) + geom_boxplot(mapping = aes(x = year, y = sap.wt, color = species))+ labs(title = "Sap Weight Over Time", x = "Year", y = "Sap Weight")
In this plot, ACRU represents the red maple and ACSA represents the sugar maple. Visually, we can see that there was no sap weight data collected for red maples in any year outside of the range from 2015 - 2018; however, in these years with sap weight data, the red maple seems to under perform compared to the sugar maple.
I cannot use just this boxplot to prove that this answers the original question of if the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species. I want to test the significance of what I am seeing using a linear model.
sap_model <- lm(sap.wt ~ species, data = maple_sap)
summary(sap_model)
##
## Call:
## lm(formula = sap.wt ~ species, data = maple_sap)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1475 -2.2575 -0.5975 1.6225 19.8825
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.89603 0.09755 19.44 <2e-16 ***
## speciesACSA 2.26149 0.10379 21.79 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.054 on 8389 degrees of freedom
## (631 observations deleted due to missingness)
## Multiple R-squared: 0.05356, Adjusted R-squared: 0.05345
## F-statistic: 474.7 on 1 and 8389 DF, p-value: < 2.2e-16
From this linear model, we gain a great deal of knowledge. The intercept for the red maple is about 1.896. This has a p-value of 2e-16, meaning it meets the 5% cutoff and is therefore significant. Below that information, we can see that the sugar maple is on average about 2.261 units greater than the red maple in sap weight. This has the same p-value as the intercept, meaning it is also significant.
The residual standard error for this model is 3.054 on 8389 degrees of freedom, which is not too alarming considering the context of this data. The r-squared value, however, is only 0.053. This value means that the species of the tree alone is not a reliable enough predictor to predict the sap weight collected from each tree. If this value were closer to one, the model could be seen as more reliable.
Overall, from this linear model, I can conclude that although the sap weight of the sugar maples appear to be significantly greater than the sap weight of red maples visually, this model is not strong enough to back up that hypothesis with real absolute significance.
DBH refers to the diameter (in centimenters) at breast height for a
tree. I would like to compare the DBH of the red maple and the sugar
maple to see if there is any sort of significant difference in size
between the two species. To do so, I will be creating a box plot with
data using the maple_tap
data sheet.
Since I only need the year that the data was collected in order to make the visualization clearer to analyze, I first have to separate the original date columns into a year column and a day/month column.
maple_tap <- maple_tap %>%
separate(date, into = c("year", "month_day"), sep = -6)
Now I can create a boxplot.
ggplot(maple_tap, aes(x = species, y = dbh, fill = species)) +
geom_boxplot() +
labs(title = "DBH Comparison by Species", x = "Species", y = "DBH (cm)")
In this plot, ACRU represents the red maple and ACSA represents the sugar maple. Visually, there seems to be a large difference in DBH between the red maple and the sugar maple. This seems to support the idea that the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species.
Similar to testing the sap weight, I cannot simply make a conclusion based on this boxplot. I will create another linear model to test if any conclusions from the plot are significant.
model_dbh <- lm(dbh ~ species, data = maple_tap)
summary(model_dbh)
##
## Call:
## lm(formula = dbh ~ species, data = maple_tap)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.403 -8.648 -2.103 10.697 21.017
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.583 2.405 19.373 < 2e-16 ***
## speciesACSA 20.120 2.673 7.528 9.19e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.78 on 124 degrees of freedom
## (256 observations deleted due to missingness)
## Multiple R-squared: 0.3137, Adjusted R-squared: 0.3081
## F-statistic: 56.67 on 1 and 124 DF, p-value: 9.191e-12
From this linear model, we see that the intercept for the red maple is about 46.583. This has a p-value of 2e-16, meaning it meets the 5% cutoff and is therefore significant. Below that information, we can see that the sugar maple is on average about 20.12 units greater than the red maple in DBH. This has a p-value of 9.19e-12, meaning it is also significant meeting the 5% cutoff.
The residual standard error for this model is 11.78 on 124 degrees of freedom, which is not terrible with the context of this data. Similar to the sap weight model, the r-squared value is quite small. This one is 0.3081. If this value were closer to one, the model could be seen as more reliable, but because it is on the lower end of the scale, the species of the tree alone is not a reliable enough predictor to predict the DBH of each tree.
In similar fashion to the last model, this linear model allows me to conclude that although the DBH of the sugar maples appear to be significantly greater than the DBH of red maples visually, this model is not strong enough to back up that hypothesis with real absolute significance.
The answer to our original question of if the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species is no as of now. Although the sap weight and DBH of the sugar maple were significantly greater than those of the red maple, the linear models created were not statistically strong enough to prove that the species variable was the direct cause of the under performance of the red maple compared to the sugar maple.
Though, with more research and testing, I could discover other variables that contribute, along with species, to the sap weight, DBH, and other performances of the red maples and sugar maples. Looking at how elements such as sugar or tap bearing contribute with the species variable could allow a linear model with stronger significance to prove the answer to the question to be yes.
A few more factors to examine in order to strengthen the argument that the sugar maple is or is not better performing than the red maple would be the flower production of each species as well as the branch data from each species. The more elements tested and proven significant, the stronger the results and argument are. Until the tests are greatly significant and I gather more evidence, I have to answer no to the research question.