wrangled.switchgrass.biomass <-
read.csv(here::here("data", "tidy.data", "Tidy.Switchgrass.Biomass.csv")) %>%
dplyr::mutate(
Mg.ha = (Dry.Weight.Grams / 1000000) * 40000,
percent.moisture = ((Wet.Weight.Grams - Dry.Weight.Grams) / Wet.Weight.Grams) * 100,
Treatment = factor(Treatment, levels = c("PV3", "PV2", "PV1")),
Date = factor(Date, levels = c("June 2024", "Aug 2024", "Oct 2024"))
)What is the ACRE Project?
As part of the Alternative Crops and Renewable Energy (ACRE) project funded by the Climate Smart Commodities Grant from the USDA, we are investigating what harvest frequencies optimize biomass characteristics and abundance of Switchgrass (Panicum virgatum) for anaerobic digestion. Here, we delve into the differences in biomass production and moisture content based on three different harvest frequencies.
Explanation of Data Collection
In order to determine annual differences in total biomass and moisture content of switchgrass based on harvest frequency, we created four plots of each treatment type (harvesting once, harvesting twice, harvesting three times). An hour before each harvest, we collected biomass using a 0.25 m^2 quadrat in four locations in each plot. Therefore, each treatment had 16 biomass data points each date that we harvested a treatment.
Once we got back to the lab, we weighed each sample for its wet weight and then dried the biomass at 60 degrees Celsius until the weight of the biomass no longer changed. The weight of the biomass from each 0.25 m^2 quadrat was recorded in grams.
Harvest Treatments
| Treatment Name | Harvest Times |
|---|---|
| PV3 | June, August, and October |
| PV2 | August, October |
| PV1 | October |
What format was the data originally in
Tidying Data
Once we got back to the lab after each harvest (June, Aug, Oct) we inputted the data into a separate excel file and even did some calculations in the excel file. It was not necessarily organized and each excel sheet from each harvest was a bit different. Outside of R, I pulled only the necessary data (Date,Plot, Replicate, Treatment, Wet.Weight.Grams, and Dry.Weight.Grams) into a new excel file and decided that I would complete the remaining calculations in R.
What did I want to figure out?
- What is the total amount of biomass harvested per treatment.
- Generally, the more biomass that can be produced, it means the system has been optimized the most. However, we also need to consider the quality of the biomass.
- How much biomass can be expected for harvest at each harvest date?
- This will help with storage planning and transportation costs.
Reading in & Wrangling the Data
I needed to scale the grams/0.25 m2 to a more digestible and relevant number. I chose Megagrams/ha because this is a common way to express biomass in the literature that I have been reading. I also wanted to calculate percent moisture from the wet and dry biomass weights.
Preparing Biomass Data
avg.biomass.by.date <- wrangled.switchgrass.biomass %>%
group_by(Treatment, Date) %>%
summarise(mean_biomass_ha = mean(Mg.ha), .groups = "drop") %>%
pivot_wider(
names_from = Date, # Make date the column names
values_from = mean_biomass_ha, # Values will be from 'mean_biomass_ha'
values_fill = list(mean_biomass_ha = NA) # Fill missing values with NA
) %>%
mutate(
# Round the date columns to 1 decimal place
across(starts_with("June 2024"):starts_with("Oct 2024"), ~ round(., 1)),
# Round the total biomass column to 1 decimal place
Total_biomass_ha = round(rowSums(select(., starts_with("June 2024"):starts_with("Oct 2024")), na.rm = TRUE), 1)
) %>%
arrange(Treatment) %>%
rename(June = "June 2024", Aug = "Aug 2024", Oct = "Oct 2024", Total = "Total_biomass_ha") # Rename the Total columnTreatment | June | Aug | Oct | Total |
|---|---|---|---|---|
PV3 | 3.8 | 5.6 | 1.5 | 10.8 |
PV2 | 10.8 | 2.3 | 13.1 | |
PV1 | 12.8 | 12.8 |
Biomass Violin Plots
First I decided to make boxplots inside of violin plots. I faced many challenges trying to accomplish this.
Version 1
# Shading the violins by date and keeping the boxplots unshaded because I like the way that looks
violin.box.biomass <- ggviolin(wrangled.switchgrass.biomass,
x = "Treatment", y = "Mg.ha",
fill = "Date",
palette = c("darkseagreen", "lightblue", "lightcoral"), # Choose custom colors for treatments
add = "boxplot", # Adds boxplot inside violin
add.params = list(fill = "white", size = 0.5), # Customize the boxplot (unshaded)
ylab = "Mg/ha of Switchgrass", #making the labels better than the variable names
xlab = "Treatment") +
# Add significance with separate comparisons and specified where I want the error asterisks to fall in the ggplot. When I looked through the ggplot2 extension gallery, I saw ggpubr and liked the significance bars and thought they would add value to my summary so I copied some code to add some simple statistical tests and hopefully bring more meaning to my violin and box plots.
stat_compare_means(aes(group = Treatment),
comparisons = list(c("PV1", "PV2"), c("PV2", "PV3"), c("PV1", "PV3")),
method = "t.test",
label = "p.signif",
label.y = c(25, 23, 27), vjust = 0.5) + # Move asterisks up one by one so they could be differentiated among each other.
facet_wrap(~ Date) + # Create separate plots for each date
theme_classic2() + #this theme is part of ggpubr
theme(legend.position = "none") # Removed legend because it was redundant and took up a lot of space.
violin.box.biomass #printing the graphVersion 2
Then, Dr. Neel helped point out that I was mixing apples with oranges and it might be wise to just add the mean and standard error bars.
violin.plot.biomass <- ggplot(wrangled.switchgrass.biomass, aes(x = Date, y = Mg.ha, fill = Treatment)) +
geom_violin(trim = TRUE) + # Create the violin plot
# Plot means with error bars (standard deviation or standard error)
stat_summary(fun.data = "mean_sdl",
color = "black",
position = position_dodge(width = 0.9)) + # Align the crossbars directly with each violin - this took a lot of playing around to find the right number....
# Add mean points
facet_wrap(~ Date, scales = "free_x") + # Facet the plots based on 'Date'
scale_x_discrete(labels = c("PV1" = "1 harvest", "PV2" = "2 harvests", "PV3" = "3 harvests")) + # Customize x-axis labels
scale_fill_manual(
values = c("darkseagreen", "lightblue", "lightcoral"), # Change the colors of the boxplots
labels = c("3 Harvests", "2 Harvests", "1 Harvest") # Modify the legend labels to exclude the year
) + # Change the colors of the boxplots
theme_classic() + # Use this theme that I've used before and like
ylab("Biomass (Mg/ha)") + # Label for y-axis
xlab("Treatment") + # Label for x-axis
theme(strip.text = element_blank())
violin.plot.biomassVersion 3
Adding a column for seasonal total.
This should have happened at the start.
wrangled.switchgrass.biomass.season.included <- wrangled.switchgrass.biomass%>%
group_by(Plot, Replicate, Treatment) %>%
summarise(Mg.ha = sum(Mg.ha, na.rm = TRUE),
percent.moisture = mean(percent.moisture, na.rm = TRUE),
Wet.Weight.Grams = mean(Wet.Weight.Grams, na.rm = TRUE),
Dry.Weight.Grams = mean(Dry.Weight.Grams, na.rm = TRUE),
.groups = "drop") %>%
# Add a new column indicating this is the seasonal total
mutate(Date = "Season Total") %>%
# Bind the seasonal totals back to the original dataframe
bind_rows(wrangled.switchgrass.biomass) %>%
mutate(Date = factor(Date, levels = c("June 2024", "Aug 2024", "Oct 2024", "Season Total")))Final Plot
#| label: fig-violin-plot-biomass
#| warning: false
#| echo: true
#| fig-cap: "Now, I finally was able to add in the seasonal total which visualizes the answer to the question"
violin.plot.biomass.seasonal.included <- ggplot(wrangled.switchgrass.biomass.season.included, aes(x = Date, y = Mg.ha, fill = Treatment)) +
geom_violin(trim = TRUE) + # Create the violin plot
# Plot means with error bars (standard deviation or standard error)
stat_summary(fun.data = "mean_sdl",
color = "black",
position = position_dodge(width = 0.9)) + # Align the crossbars directly with each violin - this took a lot of playing around to find the right number....
# Facet the plots based on 'Date'
# Customize x-axis labels
scale_fill_manual(
values = c("darkseagreen", "lightblue", "lightcoral")) + # Change the colors of the boxplots
# Change the colors of the boxplots
theme_classic() + # Use this theme that I've used before and like
ylab("Biomass (Mg/ha)") + # Label for y-axis
xlab("Treatment") + # Label for x-axis
facet_grid(~ Date, scales = "free_x", space = "free_x") +
theme(strip.text = element_blank())
violin.plot.biomass.seasonal.includedWere my questions answered?
Yes - but it took many attempts to visualize it properly!
- PV2 generates the most biomass.
- There are pretty large fluctuations between each harvest for both PV3 and PV2.
What lessons did I learn?
Adding more data to one data.frame in one pipe and then sorting it out for each plot is better than making multiple objects.
How you color needs to be thought about carefully.
There are ways to make it so that your plots are not distorted the way the ones that I made. The code: facet_grid(~ Date, scales = “free_x”, space = “free_x”) should work but it is not working for me and I need to figure that out.
I did the calculations for season total biomass in excel and got different numbers than I did in R. It was due to missing a few cells that should have been included in a sum while in excel. R is really powerful and should be used to help avoid manual errors such as that one.
I love quarto!