In this analysis, I will explore various aspects of data collected by Harvard students and professors on the maple tree population on campus. The research project “Maple Reproduction and Sap Flow” took place at a Harvard forest beginning in 2011 and carried out through several years. The goal is to determine if the non-masting red maple species exhibits muted dynamics compared to the masting sugar maple species. In this report I will load, clean, and visualize the data, while also performing statistical analyses to answer the research question.
The following libraries have been loaded in to help with data manipulation, visualization, and analysis.
library(dplyr)
library(ggplot2)
library(readr)
library(tidyr)
I have called in the following data sets to help me answer the research question at hand.
library(readxl)
maple_tap <- read_excel("C:/Users/maize/Downloads/hf285-01-maple-tap.xlsx")
maple_sap <- read_excel("C:/Users/maize/Downloads/hf285-02-maple-sap.xlsx")
maple_flower_qual <- read_excel("C:/Users/maize/Downloads/hf285-03-maple-flower-qual.xlsx")
In the following program I am going to remove all n/a or missing values from the three data sets. This will allow my research to be conducted much more smoothly.
maple_sap <- maple_sap %>% drop_na()
maple_tap <- maple_tap %>% drop_na()
maple_flower_qual <- maple_flower_qual%>% drop_na()
In this data visualization I am comparing the sap yield produced by ACSA and the ACRU. As we can see there is only one clear data point for the ACRU meaning that based off of this comparison alone we are not able to make a true and statistically proven decision on the research question.
ggplot(maple_tap, aes(x = date, y = bearing, color = species)) +
geom_line() +
labs(title = "Sap Yield Over Time",
x = "Date",
y = "Sap Yield",
color = "Species") +
theme_minimal()
By changing the data set and the points graphed we can see that on average for the data on the ACRU trees, they produce much less in sap weight as well as the sugar content of the sap. Based off of the last two graphs we see that not only is the population of ACRU much smaller than the population on the ACSA, but also ACRU is performing much lower in most categories of this data set.
ggplot(data = maple_sap) +
geom_point(mapping = aes(x = sugar, y = tap, color = species)) +
labs(title = "Sugar Production by Tap and Species",
x = "Sugar Yeild",
y = "Tap",
color = "Species") +
theme_minimal()
Before making a conclusion solely based on graphics it is important to perform statistical testing to ensure that the data is a fair representation of the variables it covers. I must first ensure that the date I am using is stored as a numerical value instead of a character.
maple_sap <- maple_sap %>% mutate(sugar = as.numeric(sugar))
sap_model <- lm(sugar ~ species + date, data = maple_sap)
summary(sap_model)
##
## Call:
## lm(formula = sugar ~ species + date, data = maple_sap)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7866 -0.4028 -0.0699 0.3466 19.4806
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.614e+00 1.184e-01 22.074 < 2e-16 ***
## speciesACSA 7.058e-01 2.333e-02 30.254 < 2e-16 ***
## date -5.263e-10 7.859e-11 -6.696 2.28e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6211 on 8010 degrees of freedom
## (1009 observations deleted due to missingness)
## Multiple R-squared: 0.1072, Adjusted R-squared: 0.107
## F-statistic: 481 on 2 and 8010 DF, p-value: < 2.2e-16
After analyzing the statistical data in this model we see that the Adjusted R-Square value is low, while the p-value is a good indicator to a well performing statistical model. We should again check the statistical values for the other data set.
maple_tap <- maple_tap %>% mutate(bearing = as.numeric(bearing))
tap_model <- lm(bearing ~ species , data = maple_tap)
summary(tap_model)
##
## Call:
## lm(formula = bearing ~ species, data = maple_tap)
##
## Residuals:
## Min 1Q Median 3Q Max
## -173.215 -97.215 6.444 87.785 184.785
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 179.56 17.64 10.177 <2e-16 ***
## speciesACSA -6.34 18.56 -0.342 0.733
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 105.9 on 373 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.0003128, Adjusted R-squared: -0.002367
## F-statistic: 0.1167 on 1 and 373 DF, p-value: 0.7328
After looking at this data statistical values we see that it is not as well of a fit for a model and that the other data better represents the relationship between species and the sap produced by the tree.
Based off of the data we have analyses it is clear to see that the difference in dynamics between the red maple and sugar maple species is why there is a smaller population of red maples in the Harvard forest. This analysis supports the hypothesis that non-masting red maple species show muted dynamics compared to the masting sugar maple trees on Harvards campus.