Introduction

This report seeks to answer the following question:

Does the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species?

Before we look at our data sets it is important to ensure that we understand the key words that will be used throughout the report. One of the key words that we need to understand is muted dynamics. A tree species exhibits “muted dynamics” when its usually signs of health and vitality are somewhat weak. This could mean the tree doesn’t grow to its full height or it doesn’t produce much sap. Another key word that we need to understand is masting. Masting in this context refers to a species of plants (ex. sugar maple) that produces large quantities of seeds sporadically and synchronously. In contrast, non-masting refers to species of plants (ex. red maple) that produce seeds or reproductive output relatively consistently.

In this project we will be using a data set called maple_tap_and_sap_data which I created from two data sets called maple_tap and maple_sap. These two data sets can be obtained from [https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6] which is a website that contains 14 data sets focused on the monitoring of tree characteristics for the red maple and sugar maple tree species.

I selected the maple_tap and the maple_sap data sets because they clearly stated the tree species which I did not find in any of the other sets. The maple_tap_and_sap_data contains data about the red and sugar maple trees that were in the Harvard forest from 2011 to 2022. The data set contains 7 total variables 5 of which will be the most important for our research. These include: tree_species which is the species of the tree in question, trunk.diameter which is the diameter of the trunk at 1.4m above ground, sugar_concentration which is the sugar concentration measured from the sap that was collected from the tap, and sap.weight which is the weight of the sap collected. Additionally, another variable that may be important is tree_identity which is the identification number of the tree. That leaves us with date and tap (A or B for trees with 2 taps). The full data set can be viewed below:

datatable(maple_tap_and_sap_data, options = list(scrollx = TRUE))

Throughout, we will need the functionality of the tidyverse package, mainly to create visualizations. As well as the DT package to help display our data table. Finally, we will need the modelr package to help with our regression models.

library(tidyverse)
library(DT)
library(modelr)

Cleaning of the Data

When you look at the original data sets you can see that there is a lot more data than what we ended up with in the maple_tap_and_sap_data. There were a few changes I made to the data to organize it so it would be beneficial in answering our question. One of the edits I did was selecting the columns that are important for answering the question at hand. Another edit I did was renaming these columns so it provides a clearer meaning for what each column is representing. Additionally, I filtered out some of the NAs when there was a lot of data missing. Next, I fixed what I found to be an entry error in sugar_concentration where there was a value of 22.0 which I decided to change to 2.2. Finally, I combined the two data sets together through the columns: date, tap, tree-species, and tree-identity.

Visual Approach to Comparing the Tree Species

When trying to answer the question: “Does the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species?” it is important to look at the differences between the two species. In this section we will be comparing the trunk.diameter, sap_weight, and the sugar_concentration of the two species from our data set maple_tap_and_sap_data. It is important to note that in this section the tree species are identified with a set of letters rather than by their actual names. Therefore, ACRU means the Red Maple and ACSA means the Sugar Maple.

Tree Species vs. Trunk Diameter

The first comparison we are going to look at is trunk.diameter in relation to tree_species. With some prior knowledge on trees we can suggest that the trunk diameter between the two species could be similar because they are both maples. To better determine this we can create a visual from our data set that shows if there is a similarity or a difference between the diameters. I hypothesize that we will see a similarity between the diameters of the species trunks’. We can test this hypothesis by building a box plot that compares the trunk diameter and the tree species:

ggplot(data = maple_tap_and_sap_data) +
  geom_boxplot(mapping = aes(x = tree_species, y = trunk.diameter, color = tree_species)) +
  labs(x = "Tree Species (ACRU = Red Maple; ACSA = Sugar Maple)",
       y = "Trunk Diameter (cm)",
       color = "Tree Species",
       title = "Tree Species vs. Trunk Diameter",
       caption = "Data obtained from https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6")

After analyzing the box plot we can determine that my hypothesis was incorrect. We can see that the sugar maple actually has an average trunk diameter that is thicker than that of the red maple. We can see from the graph that the red maple has an average trunk thickness of around 42.5 cm while the sugar maple has an average thickness of around 66 cm. This could be explained by a few different variables such as growth rate differences or the lifespan of the tree but it could also determine that the sugar maple is healthier than the red maple. To better determine whether this is true we can look at the relationship between the tree species and a few other properties such as sap weight.

Tree Species vs. Sap Weight

The next comparison that we are going to look at is sap_weight in comparison to tree_species. We can determine, with some prior knowledge, that there is probably going to be a difference between the sap weight between the two species as different species generally produce different amounts of sap. To determine if this is true we can create a visualization that shows if there is a difference between the tree species and their sap weight. I hypothesize that the sugar maple will have a greater sap weight than the red maple because it produces sugar which directly impacts the sap weight. We can test this hypothesis by building a box plot that compares the sap weight and the tree species:

ggplot(data = maple_tap_and_sap_data) +
  geom_boxplot(mapping = aes(x = tree_species, y = sap_weight, color = tree_species)) +
  labs(x = "Tree Species (ACRU = Red Maple; ACSA = Sugar Maple)",
       y = "Sap Weight (kg)",
       color = "Tree Species",
       title = "Tree Species vs. Sap Weight",
       caption = "Data obtained from https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6")

From the box plot we can see that my hypothesis was in fact correct. The box plot shows that the sugar maple species has a slightly greater sap weight than the red maple species. We can see from the graph that the sugar maple has an average sap weight of approximately 4 kg compared to the red maple which has an average sap weight of approximately 2 kg. This could be explained from a few different characteristics including the sugar concentration which we will see in our next comparison and the tree size which we saw in the last comparison the sugar maple was stronger in this category. To better determine if the sugar maple species is healthier than the red maple species we can look at the comparison between the tree species and the sugar concentration.

Tree Species vs. Sugar Concentration

The final comparison we will be looking at is the tree_species compared to the sugar_concentration. Based off the information we have discovered already we can guess that the sugar maple will have a higher sugar concentration simply because the sugar maple already has a greater average in diameter and sap weight. This isn’t something we should automatically assume but rather we should build a visualization for. Therefore, I hypothesize that the sugar maple’s sugar concentration will have a higher average than the red maple species because we have seen the sugar maple have higher averages in our last two comparisons. We can test this hypothesis by building a box plot that compares the sugar concentration and the tree species:

ggplot(data = maple_tap_and_sap_data) +
  geom_boxplot(mapping = aes(x = tree_species, y = sugar_concentration, color = tree_species)) +
  labs(x = "Tree Species (ACRU = Red Maple; ACSA = Sugar Maple)",
       y = "Sugar Concentration (Brixx)",
       color = "Tree Species",
       title = "Tree Species vs. Sugar Concentration",
       caption = "Data obtained from https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6")

From the box plot we can see that my hypothesis was correct. The box plot shows that the sugar maple species has a higher sugar concentration average than the red maple species. We can see from the graph that the average sugar concentration for the sugar maple species is around 2.5 whereas the average sugar concentration for the red maple species is approximately 1.7. This could be explained by a few different factors including the sap composition which we saw was higher in the sugar maple species and genetic differences that provide these species with different traits. After all of our comparisons we can infer that the sugar maple species is healthier than the red maple species. We can better see this by creating a data table that shows the comparisons of the averages for each species.

Averages Combined

Here we can see a data table the shows the comparisons of the averages of each variable we looked for both tree species. This makes it easier for us to analyze the data as it is more accurate than our visuals and puts all the information into one spot.

maple_tap_and_sap_avg <- maple_tap_and_sap_data %>%
  select(tree_species, trunk.diameter, sugar_concentration, sap_weight) %>%
  group_by(tree_species) %>%
  summarize("avg_trunk_diameter" = mean(trunk.diameter, na.rm = TRUE), "avg_sap_weight" = mean(sap_weight, na.rm = TRUE), "avg_sugar_concentration" = mean(sugar_concentration, na.rm = TRUE)) %>%
  mutate(across(c(avg_trunk_diameter, avg_sap_weight, avg_sugar_concentration), ~ round(., 2))) %>%
  mutate(tree_species = case_when(
    tree_species == "ACRU" ~ "Red Maple",
    tree_species == "ACSA" ~ "Sugar Maple",
    TRUE ~ tree_species
  ))

datatable(maple_tap_and_sap_avg)

This data table provides us with a simplified version of what was discussed and interpreted through our visualizations above. This data table also tells us that the sugar maple species is superior in all of the categories like we concluded from our visualizations. We could go from here and simply conclude that the sugar maple species is healthier and that the red maple species does in fact exhibit muted dynamics. But, there could be underlying factors that can’t be found simply by looking at the averages. Therefore, we can further examine the overarching question by looking at some of the regressions between the variables.

Regressional Approach of Comparing the Tree Species

While the visual approach gives us the basic conclusion that the sugar maple species is healthier than the red maple species there could be some underlying factors that the visual approach doesn’t take into account. This is why we will also look at these variables through a regressional approach. I decided to look at each of the variables individually for a better comparison with the visual approach we did before. After we look at each of the variables individually we will look at the multiple linear model for all of there variables combined when comparing it to the tree species.

It should be noted that when we are referring to the R^2 value we are talking about the adjusted R^2 value rather than the multiple R^2 value.

Trunk Diameter Regression

First we will look at a simple linear regression model for trunk diameter and how it relates to tree species.

trunk_diameter_model <- lm(trunk.diameter ~ tree_species, data = maple_tap_and_sap_data)

summary(trunk_diameter_model)
## 
## Call:
## lm(formula = trunk.diameter ~ tree_species, data = maple_tap_and_sap_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.403  -8.648  -2.103  10.697  21.017 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        46.583      2.405  19.373  < 2e-16 ***
## tree_speciesACSA   20.120      2.673   7.528 9.19e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.78 on 124 degrees of freedom
##   (7994 observations deleted due to missingness)
## Multiple R-squared:  0.3137, Adjusted R-squared:  0.3081 
## F-statistic: 56.67 on 1 and 124 DF,  p-value: 9.191e-12

Trunk Diameter Assessment

What is this model telling us? We can determine this by looking at each of the variables individually.

We can start by interpreting the coefficients and the p-value. The coefficient for trunk.diameter is 46.583 which provides us with a baseline value that we can use to interpret the coefficient for tree_species. The tree_species coefficient is looking at the sugar maple species. We can see that the coefficient is 20.120 which means that the sugar maple species has on average a 20cm thicker trunk than the red maple species. It also means that our null hypothesis is that tree_species has no effect on trunk.diameter. On the other hand, we can look at the p-value which is 9.191e^-12. Since the p-value is below the 0.05 cutoff it means that tree species is a statistically significant predictor of trunk diameter (tree species has a significant effect on trunk diameter). Therefore, we can reject the null hypothesis.

We can also determine this by interpreting the RSE and the R^2 values. We can see that the RSE value is 11.78 on 124 degrees of freedom. This means that there is still some variability in the model that our model doesn’t explain. This is telling us that we should look at some of the other variables rather than just the trunk diameter. Additionally, our R^2 value is 0.3081 which means that 70% of the variation is from other factors rather than just the trunk diameter. This means that tree_species alone is not a strong predictor of trunk.diameter and there is more that needs to be considered.

Trunk Diameter Residuals

trunk_diameter_model_resids <- maple_tap_and_sap_data %>%
  add_residuals(trunk_diameter_model)

ggplot(trunk_diameter_model_resids) +
  geom_histogram(aes(resid)) +
  labs(x = "Residuals",
       y = "Count",
       title = "Trunk Diameter Residual Regression",
       caption = "Data obtained from https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6")

Now that we have interpreted the information provided in our regression model is it important to look at the residuals of the model.

This residual model is quite widely spread with the left side going to -30 and the right side going past 20. This regression model appears to be a multi-model which means that it is probably missing important feature or other variables that aren’t accounted for in this model. This also means that the model for the trunk.diameter doesn’t capture the underlying pattern effectively and there could be a better predictive model that could be used.

Overall Results of Trunk Diameter Regression

Now that we have looked at all of the variables and the residuals we can make some formal conclusions.

Based on the information above we can conclude that tree_species is a meaningful predictor of trunk.diameter. Additionally, while there is a relationship between these two variables it is not the sole or dominant predictor. Therefore, we must consider other variables outside of the trunk.diameter. Finally, we can conclude that there is a statistically significant, but weak, relationship mainly because there is a large amount of variability that the model is not sufficient enough to explain.

Sap Weight Regression

Next, we will look at the simple linear regression model for sap weight in relation to tree species.

sap_weight_model <- lm(sap_weight ~ tree_species, data = maple_tap_and_sap_data)

summary(sap_weight_model)
## 
## Call:
## lm(formula = sap_weight ~ tree_species, data = maple_tap_and_sap_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4227 -2.2895 -0.5727  1.6073 19.6073 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        2.3095     0.1111   20.78   <2e-16 ***
## tree_speciesACSA   2.1232     0.1170   18.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.029 on 7573 degrees of freedom
##   (545 observations deleted due to missingness)
## Multiple R-squared:  0.04166,    Adjusted R-squared:  0.04154 
## F-statistic: 329.2 on 1 and 7573 DF,  p-value: < 2.2e-16

Sap Weight Assessment

What is this model telling us? We can determine this by looking at each of the variables individually.

Let’s start by looking at the coefficients and the p-value of this model. Our first coefficient which is depicting the sap_weight is 2.3095 which provides us for the baseline of our average sap weight for this model. Now, the coefficient of the tree_species is 2.1232 which means that the sugar maple tree species has, on average, a 2.12 greater sap weight in kilograms compared to the sap weight of the red maple tree species. Also, our null hypothesis is that tree_species has no effect on sap_weight. On the other hand, we can look at the p-value of the model. For sap weight our p-value is <2.2^-16 this means that our p-value is way below the 0.05 cutoff. Therefore, we can determine that tree species is a statistically significant predictor of sap weight which also allows us to reject our null hypothesis.

We can further evaluate this model by looking at the RSE and R^2 values. Our RSE value is 3.029 on 7573 degrees of freedom. This means that the actual sap weight values vary from those that were predicted. Next, we should look at the R^2 value which is 0.04154 which means that the tree species is not a strong predictor of sap weight and the majority of the variation is due to other factors. Additionally, this tells us that the model doesn’t fit the data well for predicting sap weight mainly due to the low R^2 value and the high RSE value. Therefore we should look at some of our other variables because sap_weight alone is not a strong predictor.

Sap Weight Residuals

sap_weight_model_resids <- maple_tap_and_sap_data %>%
  add_residuals(sap_weight_model)

ggplot(sap_weight_model_resids) +
  geom_histogram(aes(resid)) +
  labs(x = "Residuals",
       y = "Count",
       title = "Sap Weight Residual Regression",
       caption = "Data obtained from https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6")

Now that we have interpreted the information provided in our regression model is it important to look at the residuals of the model.

We can see that this model is positively skewed and trails off to the right. We can also see that there are a few instances beyond the cluster of the graph where some of the residuals fall. These could be potential outliers in the residuals. We can see that the peak hits around 0 which is a good thing it is just the outliers that are causing our data to skew. Additionally, the graph allows us to consider that it predicts sap_weight reasonably well for most of the data it is just those outliers that are throwing it off.

Overall Results of Sap Weight Regression

Now that we have looked at all of the variables and the residuals we can make some formal conclusions.

Based on the information we considered above we can conclude that there is a statistically significant, but weak, relationship between tree_species and sap_weight. Additionally, while the relationship is statistically significant tree_species is not a strong predictor of sap_weight. Another conclusion that should be stated is that the model, at times, underestimates the sap_weight mainly when you get to the higher variables. Overall, this model doesn’t fit the data well and other variables should be taken into consideration.

Sugar Concentration Regression

For the final of our simple linear regression models we will look at sugar concentration in relation to tree species.

sugar_concentration_model <- lm(sugar_concentration ~ tree_species, data = maple_tap_and_sap_data)

summary(sugar_concentration_model)
## 
## Call:
## lm(formula = sugar_concentration ~ tree_species, data = maple_tap_and_sap_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7391 -0.4391 -0.0391  0.3609  4.7609 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1.83537    0.02082   88.16   <2e-16 ***
## tree_speciesACSA  0.70377    0.02192   32.10   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5837 on 8011 degrees of freedom
##   (107 observations deleted due to missingness)
## Multiple R-squared:  0.114,  Adjusted R-squared:  0.1139 
## F-statistic:  1031 on 1 and 8011 DF,  p-value: < 2.2e-16

Sugar Concentration Assessment

What is this model telling us? We can determine this by looking at each of the variables individually.

First, we can start by looking at the coefficients and the p-value. The coefficient for sugar_concentration is 1.83537 which provides is with the average sugar concentration. This can be compared to the coefficient for our tree_species which is 0.70377 referring to the sugar maple tree. This tells us that the sugar maple tree has, on average, a higher sugar concentration than the red maple tree species. It should be noted here that our null hypothesis is that tree_species has no effect on sugar_concentration. On the other hand we can look at our p-value. The p-value for this model is <2.2e^-16 which is way below our 0.05 cutoff. This allows us to determine that tree species is a statistically significant predictor of sugar concentration. Additionally, it allows us to determine that we can reject our null hypothesis because the tree species does, in fact, effect that sugar concentration.

We can further analyze this model by looking at the RSE and the R^2 values. For this model our RSE is 0.5837 on 8011 degrees of freedom. This means that this model is a reasonable fit for the data because the RSE is so small. Additionally, we can look at the R^2 value which is 0.1139 for this model. This tells us that the tree species has some predictive power for sugar concentration but, yet again, most of the variation comes from other factors rather than just the sugar_concentration.

Sugar Concentration Residuals

sugar_concentration_model_resids <- maple_tap_and_sap_data %>%
  add_residuals(sugar_concentration_model)

ggplot(sugar_concentration_model_resids) +
  geom_histogram(aes(resid)) +
  labs(x = "Residuals",
       y = "Count",
       title = "Sugar Concentration Residual Regression",
       caption = "Data obtained from https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6")

Now that we have interpreted the information provided in our regression model is it important to look at the residuals of the model.

We can see that the model is concentrated between -2 and 3 and is near a normal distribution. We can also see that there is minimal skewedness of this graph which is different from the other variables we looked at. This allows us to determine that the residual model for sugar_concentration is well calibrated and fits the data effectively especially in comparison to the other variables we observed.

Overall Results of Sugar Concentration Regression

Now that we have looked at all of the variables and the residuals we can make some formal conclusions.

Based on the information that we gathered in the sugar concentration regression we can make a few conclusions. First, we can conclude that there is a statistically significant, yet moderate, relationship between tree_species and sugar_concentration. Additionally, we can determine that tree_species has some predictive power but there is still some variation that is from other factors. Finally, we can conclude that tree_species alone is not sufficient enough to explain most of the variation of the model. Therefore, we should consider other variables rather than just the relationship between sugar_concentration and tree_species.

Multiple Regression of Variables

As we can see in the data analyzed above even though we have formed some conclusions it is still important to analyze all of these predictor variables together. For this one I chose to focus on the sugar concentration as our response variable mainly because we needed a numerical variable so we couldn’t use the tree species. I chose the sugar concentration out of the other predictor variables because it was the most recent variable we analyzed and it was the model that was closest to a normal distribution.

Multiple Regression Model

maple_tap_and_sap <- maple_tap_and_sap_data %>%
  select(-date, -tree_identity, -tap)

NA_1 <- which(is.na(maple_tap_and_sap$trunk.diameter))

NA_2 <- which(is.na(maple_tap_and_sap$sap_weight))

maple_tap_and_sap$trunk.diameter[NA_1] <- mean(maple_tap_and_sap$trunk.diameter, na.rm = TRUE)

maple_tap_and_sap$sap_weight[NA_2] <- mean(maple_tap_and_sap$sap_weight, na.rm = TRUE)

maple_tap_and_sap_mult_model <- lm(sugar_concentration ~ tree_species + trunk.diameter + sap_weight, data = maple_tap_and_sap)

summary(maple_tap_and_sap_mult_model)
## 
## Call:
## lm(formula = sugar_concentration ~ tree_species + trunk.diameter + 
##     sap_weight, data = maple_tap_and_sap)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8017 -0.3914 -0.0627  0.3525  4.6963 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1.498198   0.655975   2.284   0.0224 *  
## tree_speciesACSA  0.736951   0.022290  33.061  < 2e-16 ***
## trunk.diameter    0.005998   0.010428   0.575   0.5652    
## sap_weight       -0.016543   0.002204  -7.506 6.78e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5817 on 8009 degrees of freedom
##   (107 observations deleted due to missingness)
## Multiple R-squared:  0.1202, Adjusted R-squared:  0.1199 
## F-statistic: 364.8 on 3 and 8009 DF,  p-value: < 2.2e-16

Multiple Regression Model Assessment

What is this model telling us? We can determine this by looking at each of the variables individually.

We will start out by looking at the coefficients and the p-values of this model. For sugar_concentration we see that our coefficient is 1.498198 which is the predicted sugar concentration when our predictors are at their baseline levels. We can also see that the p-value for sugar_concentration is 0.0224. This is below the 0.05 cutoff which allows us to conclude that sugar_concentration is statistically significant as well as the fact that we can reject our null hypothesis that the predictor variable has no effect on the response variable. Our next coefficient is for the tree_species which is 0.736951 which means that the sugar concentration is 0.74 units higher in our sugar maple tree species than the red maple tree species. The p-value for tree_species is <2e^-16 which is below the 0.05 cutoff. This allows us to conclude that tree_species is statistically significant and allows us to reject our null hypothesis. Next, the coefficient for trunk.diameter is 0.005998. This means that every time the trunk’s diameter increases by 1 cm the sugar concentration increases by 0.006 units. Additionally, our p-value is 0.5652 which is way above our 0.05 cutoff. This allows us to conclude that trunk.diameter is not statistically significant and we must accept our null hypothesis. Finally, we will look at the coefficient for sap_weight. The coefficient is -0.16543 which means that the higher the sap weight is the lower the sugar concentration is. Also, our p-value for sap_weight is 6.78e^-14 which means that sap_weight is significantly significant and we can reject our null hypothesis.

On the other hand, we can also further analyze this model by looking at the RSE and R^2 values. For this model our RSE value is 0.5817 on 8009 degrees of freedom. This means that our models predictions are relatively close to the actual variables for the data set. Also, our R^2 value is 0.1199. This means that in reference to sugar_concentration the variability is largely due to other factors rather than the sap_weight, trunk.diameter, and tree_species.

Multiple Regression Model Residuals

maple_tap_and_sap_mult_model_resids <- maple_tap_and_sap %>%
  add_residuals(maple_tap_and_sap_mult_model)

ggplot(maple_tap_and_sap_mult_model_resids) +
  geom_histogram(aes(resid)) +
  labs(x = "Residuals",
       y = "Count",
       title = "Multiple Regression Model",
       caption = "Data obtained from https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-hfr&identifier=285&revision=6")

Now that we have interpreted the information provided in our regression model is it important to look at the residuals of the model.

We can see that this residual model is similar to the sugar_concentration model we recently interpreted. We can see that this graph is pretty well centered around 0 with residuals ranging from -2 to 3. Additionally, we can see that the graph has a slight symmetric bell-shaped distribution which is what we are looking for. This allows us to infer that this model’s predictive average is close to those that were observed in the data. Overall, we can conclude that this graph indicates that the regression model might be appropriate for the data.

Overall Results Multiple Regression Model

Now that we have looked at all of the variables and the residuals we can make some formal conclusions.

Based on the information gathered in the multiple regression model we can begin to generate some conclusions. First, we can determine that sugar_concentration contributes to the model and influences the response variables. Additionally, tree_species and sap_weight are strong predictors of sugar_concentration and are statistically significant but are still limited. On the other hand, trunk.diameter is not statistically significant and doesn’t have a relationship with the sugar_concentration. Finally, we can conclude that even though we combined all of the variables there is still a lot of variation that suggests that there are other factors we didn’t take into account that may play a greater role.

How Does the Visual and Regressional Approach Answer Our Question?

After analyzing all the visuals and regressions we can come back to our overarching question: Does the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species?

Based on the information and data we analyzed I feel confident in saying that the non-masting red maple species does, in fact, exhibit muted dynamics compared to the masting sugar maple species. This is mainly because the sugar maple tree species has a higher sugar concentration and exhibits stronger relationships with the predictors. This reflects the more variable and resource intensive reproductive strategy that is common with masting species such as the sugar maple species. Some might say this is to be expected because non-masting species are expected to have more consistent dynamics like we saw with the red maple tree species.

We can see this answer to our question throughout our analysis.

From the visuals we saw that the sugar maple tree species had higher averages in trunk.diameter, sap_weight, and sugar_concentration than those of the red maple tree species. A large analysis was put on sugar_concentration since it was the best regressional model. This showed us that the sugar maple tree species averages a higher sugar concentration which further supports our conclusion because there is a statistically significant relationship between the tree_species and the sugar_concentration. Additionally, our answer is supported because the sugar maple tree species had a stronger statistically significant relationship with sap_weight than the red maple tree species. Overall, the sugar maple tree species shows stronger more variable responses than those of the red maple tree species.

Conclusion

In conclusion, based on the analysis done above, I feel confident in saying that the non-masting red maple species does exhibit muted dynamics compared to the masting sugar maple species. The variables that best demonstrated this was the trunk.diameter, sap_weight, sugar_concentration and, of course, the tree_species. This goes to show that the characteristics of tree do impact whether the tree has muted dynamics or if it does not have muted dynamics. The visualization and the regressional analysis helps to prove my point and support my conclusion as to whether or not the red maple species exhibits muted dynamics.

Works Cited

Rapp, J., E. Crone, and K. Stinson. 2023. Maple Reproduction and Sap Flow at Harvard Forest since 2011 ver 6. Environmental Data Initiative. https://doi.org/10.6073/pasta/7c2ddd7b75680980d84478011c5fbba9 (Accessed 2024-12-12).