library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(modelr)
library(DT)
In this report we will be trying to figure out “does the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species?.” In order for us to figure out this question. We’ll have to analyse data such as if a tree is masting, or its growth, and how much it reproduces. Masting is when a tree produces a lot of seed. In order to do this we first have to look through 14 spreadsheets of data and figure out what we want to use. What I have come down to was 3 spreadsheets of data that will be useful to use for this experiment. The 3 sheets are named hf285-01-maple-tap.csv, hf285-02-maple-sap.csv and hf285-09-maple-seed-count.csv. The reason for narrowing down to these three data sets is that they are the only two that have both species of trees in them and the 9th one has the seed count for each year of the sugar maple. We got all this information from “https://doi.org/10.6073/pasta/7c2ddd7b75680980d84478011c5fbba9”. ACSA: “Acer saccharum” which is the Sugar Maple. ACRU: “Acer rubrum” which is the Red Maple. The ACRU does not Mast compare to its counterpart ACSA.
library(readr)
hf285_01_maple_tap <- read_csv("C:/Users/Jack/Downloads/hf285-01-maple-tap.csv")
## Rows: 382 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): tree, tap, species
## dbl (3): dbh, tap.bearing, tap.height
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hf285_02_maple_sap <- read_csv("C:/Users/Jack/Downloads/hf285-02-maple-sap.csv")
## Rows: 9022 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): tree, tap, species
## dbl (2): sugar, sap.wt
## dttm (1): datetime
## date (1): date
## time (1): time
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hf285_09_maple_seed_count <- read_csv("C:/Users/Jack/Downloads/hf285-09-maple-seed-count.csv")
## Rows: 220 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): tree, note
## dbl (3): EC.count, JR.count, total.count
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(hf285_01_maple_tap)
View(hf285_02_maple_sap)
View(hf285_09_maple_seed_count)
We first will clean out the data set hf285-09-maple-seed-count to Seed_Clean. After that we next take the data of the seeds produced each year and graph them in order to determine which years the ACSA trees were masting. For the years that the ACSA trees were masting were 2011, 2013, 2017, and 2019. The years that were Nonmasting were 2012, 2014, 2015, 2016, 2018, 2020, 2021.
Seed_Clean <- hf285_09_maple_seed_count %>%
select(-note, -EC.count, -JR.count)%>%
separate(date, into = c("Year", "Month", "Day"))%>%
mutate(Masting_year = case_when(Year == 2011 | Year == 2013 | Year == 2017 | Year == 2019 ~ "Masting", Year == 2012 | Year == 2014 | Year == 2015 | Year == 2016 | Year == 2018 | Year == 2020 | Year == 2021 ~ "Nonmasting"))%>%
mutate(Year = as.numeric(Year))
view(Seed_Clean)
summary(Seed_Clean)
## tree Year Month Day
## Length:220 Min. :2011 Length:220 Length:220
## Class :character 1st Qu.:2013 Class :character Class :character
## Mode :character Median :2016 Mode :character Mode :character
## Mean :2016
## 3rd Qu.:2019
## Max. :2021
## total.count Masting_year
## Min. : 0.00 Length:220
## 1st Qu.: 0.00 Class :character
## Median : 0.00 Mode :character
## Mean :16.74
## 3rd Qu.:35.25
## Max. :77.00
ggplot(data = Seed_Clean, mapping = aes(x = Year, y = total.count))+
geom_boxplot()+
labs( title = "Seed production", y = "Seed count")
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?
Next we will clean out hf285-01-maple-tap then call it Maple_Clean, and hf285-02-maple-sap then call it Sugar_Clean. We will then remove all rows with “na” in them to clear out information that can skew the data. We will also separate the dates into Year, Month, and Day. Then convert Year to a numeric value so we can use it in our graphs. Then we’ll tag on the years that were and weren’t masting. After that we’ll get the average dbh(Diameter at breast height) per year from the Tap_Clean data set.
Tap_Clean <- hf285_01_maple_tap%>%
drop_na()%>%
mutate(tree = sub("^HFR", "AR", tree))%>%
separate(date, into = c("Year", "Month", "Day"))%>%
mutate(Masting_year = case_when(Year == 2011 | Year == 2013 | Year == 2017 | Year == 2019 ~ "Masting", Year == 2012 | Year == 2014 | Year == 2015 | Year == 2016 | Year == 2018 | Year == 2020 | Year == 2021 ~ "Nonmasting"))%>%
mutate(Year = as.numeric(Year))
view(Tap_Clean)
summary(Tap_Clean)
## Year Month Day tree
## Min. :2012 Length:119 Length:119 Length:119
## 1st Qu.:2016 Class :character Class :character Class :character
## Median :2016 Mode :character Mode :character Mode :character
## Mean :2016
## 3rd Qu.:2017
## Max. :2017
## tap species dbh tap.bearing
## Length:119 Length:119 Min. :31.00 Min. : 1.0
## Class :character Class :character 1st Qu.:54.50 1st Qu.: 74.0
## Mode :character Mode :character Median :63.70 Median :182.0
## Mean :62.68 Mean :171.8
## 3rd Qu.:75.45 3rd Qu.:253.0
## Max. :86.20 Max. :359.0
## tap.height Masting_year
## Min. : 56 Length:119
## 1st Qu.: 96 Class :character
## Median :120 Mode :character
## Mean :120
## 3rd Qu.:146
## Max. :185
Sap_Clean <- hf285_02_maple_sap%>%
select( -time, -datetime)%>%
separate(date, into = c("Year", "Month", "Day"))%>%
mutate(Masting_year = case_when(Year == 2011 | Year == 2013 | Year == 2017 | Year == 2019 ~ "Masting", Year == 2012 | Year == 2014 | Year == 2015 | Year == 2016 | Year == 2018 | Year == 2020 | Year == 2021 ~ "Nonmasting"))%>%
drop_na()%>%
mutate(Year = as.numeric(Year))
view(Sap_Clean)
summary(Sap_Clean)
## Year Month Day tree
## Min. :2012 Length:7145 Length:7145 Length:7145
## 1st Qu.:2014 Class :character Class :character Class :character
## Median :2016 Mode :character Mode :character Mode :character
## Mean :2016
## 3rd Qu.:2018
## Max. :2021
## tap sugar species sap.wt
## Length:7145 Min. :0.80 Length:7145 Min. : 0.010
## Class :character 1st Qu.:2.00 Class :character 1st Qu.: 1.780
## Mode :character Median :2.40 Mode :character Median : 3.420
## Mean :2.48 Mean : 4.146
## 3rd Qu.:2.90 3rd Qu.: 5.800
## Max. :7.30 Max. :24.040
## Masting_year
## Length:7145
## Class :character
## Mode :character
##
##
##
Tap_Whole_date <- Tap_Clean %>%
mutate(
Year = as.numeric(Year),
Month = as.numeric(Month),
Day = as.numeric(Day),
Date = as.Date(sprintf("%04d-%02d-%02d", Year, Month, Day))
)
Avg_dbh_year <- Tap_Whole_date %>%
group_by(Year, species) %>%
summarise(mean_dbh = mean(dbh, na.rm = TRUE))
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
view(Avg_dbh_year)
summary(Avg_dbh_year)
## Year species mean_dbh
## Min. :2012 Length:5 Min. :46.17
## 1st Qu.:2016 Class :character 1st Qu.:46.99
## Median :2016 Mode :character Median :65.27
## Mean :2016 Mean :58.58
## 3rd Qu.:2017 3rd Qu.:66.98
## Max. :2017 Max. :67.47
We will make a bar graph comparing the sugar production for both species and compare when it is a masting year or not.
ggplot(data = Sap_Clean, mapping = aes(x= species, y = sugar, fill = Masting_year))+
geom_boxplot()+
labs(title = "Sugar production")
For the result, the ACRU produces more sugar during non masting years,
while the ACSA produces more sugar during the masting years.
We’ll then do the same thing for sap.wt (Sap weight) as well. Comparing sap weight to the species masting and non masting years.
ggplot(data = Sap_Clean, mapping = aes(x= species, y = sap.wt, fill = Masting_year))+
geom_boxplot()+
labs(title = "Sap production")
The results show that ACRU has more sap production in non masting years,
while ACSA has more sap production in masting years.
This is a common trend for both. That during non masting years the ACRU produces both more sap and sugar, while the ACSA produces more sugar and sap during masting years.
We will then use linear regression to check these trends to see if they have any meaning.
Sugar_check <- lm(sugar ~ species, data = Sap_Clean)
summary(Sugar_check)
##
## Call:
## lm(formula = sugar ~ species, data = Sap_Clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7539 -0.3539 -0.0539 0.3461 4.7461
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.84293 0.02136 86.29 <2e-16 ***
## speciesACSA 0.71099 0.02256 31.51 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5821 on 7143 degrees of freedom
## Multiple R-squared: 0.1221, Adjusted R-squared: 0.1219
## F-statistic: 993.1 on 1 and 7143 DF, p-value: < 2.2e-16
Sap_check <- lm(sap.wt ~ species, data = Sap_Clean)
summary(Sap_check)
##
## Call:
## lm(formula = sap.wt ~ species, data = Sap_Clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3497 -2.2597 -0.5997 1.5803 19.6803
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3095 0.1100 21.00 <2e-16 ***
## speciesACSA 2.0502 0.1162 17.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.997 on 7143 degrees of freedom
## Multiple R-squared: 0.04178, Adjusted R-squared: 0.04164
## F-statistic: 311.4 on 1 and 7143 DF, p-value: < 2.2e-16
From the sugar we get a small p-value(2e-16) which tells us that the species plays a strong effect on the sugar production. The r^2 (0.1219) value is small which tells us that the model does not explain much variance in it. The same thing happens as well with the sap weight. Small p-value (2e-16) and r^2 (0.04164).
We then add the years to both models since the years may be able to tell us if there is more growth throughout the years that the tree lives.
Sugar_Check_m <- lm(sugar ~ species + Masting_year, data = Sap_Clean)
summary(Sugar_Check_m)
##
## Call:
## lm(formula = sugar ~ species + Masting_year, data = Sap_Clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7390 -0.3798 -0.0798 0.3610 4.7202
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.87299 0.02385 78.539 < 2e-16 ***
## speciesACSA 0.70679 0.02260 31.274 < 2e-16 ***
## Masting_yearNonmasting -0.04075 0.01442 -2.827 0.00471 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5819 on 7142 degrees of freedom
## Multiple R-squared: 0.123, Adjusted R-squared: 0.1228
## F-statistic: 501 on 2 and 7142 DF, p-value: < 2.2e-16
Sap_Check_m <- lm(sap.wt ~ species + Masting_year, data = Sap_Clean)
summary(Sap_Check_m)
##
## Call:
## lm(formula = sap.wt ~ species + Masting_year, data = Sap_Clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3862 -2.2662 -0.5929 1.5738 19.7071
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.36357 0.12286 19.239 <2e-16 ***
## speciesACSA 2.04260 0.11643 17.544 <2e-16 ***
## Masting_yearNonmasting -0.07328 0.07427 -0.987 0.324
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.997 on 7142 degrees of freedom
## Multiple R-squared: 0.04191, Adjusted R-squared: 0.04164
## F-statistic: 156.2 on 2 and 7142 DF, p-value: < 2.2e-16
In the sugar model the p-value stays the same for species and the p-value for years is low(2e-16) while the r^2 value hardly improves (0.1228) at all. In the Sap model the p-value stays the same, but the p-value for years is high at (0.324) which is above our 0.05 threshold which tells us that we should not use it in our model. the r^2 value barely improves as well(0.04164). The small r^2 values indicate these models explain only a small proportion of variance
We’ll next graph a box plot comparing the year to dbh, organising it by the species and if it is a masting year. Then we will make another graph that will compare the average dbh over the years. Separated by species. These graphs will help us see if dbh plays an effect on how many seeds are produced.
ggplot(data = Tap_Clean, mapping = aes(x= Year, y = dbh, fill = species, color = Masting_year))+
geom_boxplot()+
scale_color_manual(values = c(
"Masting" = "yellow",
"Nonmasting" = "purple"))+
labs( title = "dbh in each year")
For the second graph the slopes of both lines tells us how much each
tree grows on average. ACRU grows about 0.8166667 cm per year, while
ACSA grows about 0.4353520 cm per year. This shows that ACRU grows much
quicker than ACSA.
ggplot(Avg_dbh_year, aes(x = Year, y = mean_dbh, color = species)) +
geom_line() +
geom_point() +
labs(x = "Year", y = "Average dbh", title = "Avg dbh over the years")
slopes_dbh <- Avg_dbh_year %>%
group_by(species) %>%
do(model = lm(mean_dbh ~ Year, data = .)) %>%
mutate(slope = coef(model)[["Year"]])
slopes_dbh %>% select(species, slope)
## # A tibble: 2 × 2
## # Rowwise:
## species slope
## <chr> <dbl>
## 1 ACRU 0.817
## 2 ACSA 0.435
We’ll then use linear regression to see if there are any effects that species has on dbh
dbh_sp <- lm(dbh ~ species, data = Tap_Clean)
summary(dbh_sp)
##
## Call:
## lm(formula = dbh ~ species, data = Tap_Clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.453 -8.618 -2.353 10.682 21.017
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.583 2.432 19.15 < 2e-16 ***
## speciesACSA 20.169 2.722 7.41 2.15e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.91 on 117 degrees of freedom
## Multiple R-squared: 0.3194, Adjusted R-squared: 0.3136
## F-statistic: 54.91 on 1 and 117 DF, p-value: 2.146e-11
We see that the p-value is small and the r^2 value is relative close to the middle. So the model fits the variance well.
We’ll now add the masting_year variable to the model to see how it affects dbh.
The p-value for the masting year is high(0.632) so the variable does not show if there is significance that the masting year has an effect on the tree’s seed production.
dbh_sp_Myear <- lm(dbh ~ species + Masting_year, data = Tap_Clean)
summary(dbh_sp_Myear)
##
## Call:
## lm(formula = dbh ~ species + Masting_year, data = Tap_Clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.044 -8.623 -2.244 10.377 21.156
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.123 2.686 17.545 < 2e-16 ***
## speciesACSA 20.300 2.744 7.397 2.37e-11 ***
## Masting_yearNonmasting -1.079 2.245 -0.481 0.632
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.95 on 116 degrees of freedom
## Multiple R-squared: 0.3208, Adjusted R-squared: 0.309
## F-statistic: 27.39 on 2 and 116 DF, p-value: 1.81e-10
In conclusion we can say that there is enough evidence that the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species. The growth of the dbh for ACRU is greater than ACSU. The evidence in the p-value for the models for sap and sugar growth during the masting years for the ACSA compared to the growth of the non masting seasons for the ACRU with all the graphs shows that the red maple ACRU exhibits mutated dynamics compared to its masting sugar counterpart.
Citation Rapp, J., E. Crone, and K. Stinson. 2023. Maple Reproduction and Sap Flow at Harvard Forest since 2011 ver 6. Environmental Data Initiative. https://doi.org/10.6073/pasta/7c2ddd7b75680980d84478011c5fbba9 (Accessed 2025-12-05).