Final R Presentation - Switchgrass Biomass

Author

Shaylan Kolodney

Published

December 9, 2024

What is the ACRE Project?

As part of the Alternative Crops and Renewable Energy (ACRE) project funded by the Climate Smart Commodities Grant from the USDA, we are investigating what harvest frequencies optimize biomass characteristics and abundance of Switchgrass (Panicum virgatum) for anaerobic digestion. Here, we delve into the differences in biomass production and moisture content based on three different harvest frequencies.

Explanation of Data Collection

In order to determine annual differences in total biomass and moisture content of switchgrass based on harvest frequency, we created four plots of each treatment type (harvesting once, harvesting twice, harvesting three times). An hour before each harvest, we collected biomass using a 0.25 m^2 quadrat in four locations in each plot. Therefore, each treatment had 16 biomass data points each date that we harvested a treatment.

Once we got back to the lab, we weighed each sample for its wet weight and then dried the biomass at 60 degrees Celsius until the weight of the biomass no longer changed. The weight of the biomass from each 0.25 m^2 quadrat was recorded in grams.

Photo of the study site at Polar Hill during the August harvest.

Harvest Treatments

Treatment Name Harvest Times
PV3 June, August, and October
PV2 August, October
PV1 October

What format was the data originally in

Photo of the study site at Polar Hill during the August harvest.

Photo of the study site at Polar Hill during the August harvest.

Photo of the study site at Polar Hill during the August harvest.

Tidying Data

Once we got back to the lab after each harvest (June, Aug, Oct) we inputted the data into a separate excel file and even did some calculations in the excel file. It was not necessarily organized and each excel sheet from each harvest was a bit different. Outside of R, I pulled only the necessary data (Date,Plot, Replicate, Treatment, Wet.Weight.Grams, and Dry.Weight.Grams) into a new excel file and decided that I would complete the remaining calculations in R.

Photo of the study site at Polar Hill during the August harvest.

What did I want to figure out?

  1. What is the total amount of biomass harvested per treatment.
    1. Generally, the more biomass that can be produced, it means the system has been optimized the most. However, we also need to consider the quality of the biomass.
  2. How much biomass can be expected for harvest at each harvest date?
    1. This will help with storage planning and transportation costs.

Reading in & Wrangling the Data

I needed to scale the grams/0.25 m2 to a more digestible and relevant number. I chose Megagrams/ha because this is a common way to express biomass in the literature that I have been reading. I also wanted to calculate percent moisture from the wet and dry biomass weights.

wrangled.switchgrass.biomass <- 
  read.csv(here::here("data", "tidy.data", "Tidy.Switchgrass.Biomass.csv")) %>%
  dplyr::mutate(
    Mg.ha = (Dry.Weight.Grams / 1000000) * 40000,
    percent.moisture = ((Wet.Weight.Grams - Dry.Weight.Grams) / Wet.Weight.Grams) * 100,
    Treatment = factor(Treatment, levels = c("PV3", "PV2", "PV1")),
    Date = factor(Date, levels = c("June 2024", "Aug 2024", "Oct 2024")) 
  )

Preparing Biomass Data

avg.biomass.by.date <- wrangled.switchgrass.biomass %>%
  group_by(Treatment, Date) %>%
  summarise(mean_biomass_ha = mean(Mg.ha), .groups = "drop") %>%
  pivot_wider(
    names_from = Date,  # Make date the column names
    values_from = mean_biomass_ha,  # Values will be from 'mean_biomass_ha'
    values_fill = list(mean_biomass_ha = NA)  # Fill missing values with NA
  ) %>%
  mutate(
    # Round the date columns to 1 decimal place
    across(starts_with("June 2024"):starts_with("Oct 2024"), ~ round(., 1)),
    # Round the total biomass column to 1 decimal place
    Total_biomass_ha = round(rowSums(select(., starts_with("June 2024"):starts_with("Oct 2024")), na.rm = TRUE), 1)
  ) %>%
  arrange(Treatment) %>%
  rename(June = "June 2024", Aug = "Aug 2024", Oct = "Oct 2024", Total = "Total_biomass_ha")  # Rename the Total column

Average biomass of switchgrass in 2024 by treatment and date (Megagrams per hectare).

Treatment

June

Aug

Oct

Total

PV3

3.8

5.6

1.5

10.8

PV2

10.8

2.3

13.1

PV1

12.8

12.8

Biomass Violin Plots

First I decided to make boxplots inside of violin plots. I faced many challenges trying to accomplish this.

Version 1

#  Shading the violins by date and keeping the boxplots unshaded because I like the way that looks
violin.box.biomass <- ggviolin(wrangled.switchgrass.biomass, 
         x = "Treatment", y = "Mg.ha", 
         fill = "Date", 
         palette = c("darkseagreen", "lightblue", "lightcoral"),  # Choose custom colors for treatments
         add = "boxplot",   # Adds boxplot inside violin
         add.params = list(fill = "white", size = 0.5),  # Customize the boxplot (unshaded)
         ylab = "Mg/ha of Switchgrass", #making the labels better than the variable names
         xlab = "Treatment") +
  
  # Add significance with separate comparisons and specified where I want the error asterisks to fall in the ggplot. When I looked through the ggplot2 extension gallery, I saw ggpubr and liked the significance bars and thought they would add value to my summary so I copied some code to add some simple statistical tests and hopefully bring more meaning to my violin and box plots.
  stat_compare_means(aes(group = Treatment), 
                     comparisons = list(c("PV1", "PV2"), c("PV2", "PV3"), c("PV1", "PV3")),
                     method = "t.test", 
                     label = "p.signif", 
                      label.y = c(25, 23, 27), vjust = 0.5) +  # Move asterisks up one by one so they could be differentiated among each other. 

  facet_wrap(~ Date) +  # Create separate plots for each date
  theme_classic2() + #this theme is part of ggpubr
  theme(legend.position = "none")  # Removed legend because it was redundant and took up a lot of space.

violin.box.biomass #printing the graph
Figure 1: This was my first figure. It wasn’t great because the color didn’t mean anything, the spacing was weird, and the test I ran was not correct for my data.

Version 2

Then, Dr. Neel helped point out that I was mixing apples with oranges and it might be wise to just add the mean and standard error bars.

violin.plot.biomass <- ggplot(wrangled.switchgrass.biomass, aes(x = Date, y = Mg.ha, fill = Treatment)) + 
  geom_violin(trim = TRUE) +  # Create the violin plot
  
  # Plot means with error bars (standard deviation or standard error)
  stat_summary(fun.data = "mean_sdl", 
               color = "black", 
                position = position_dodge(width = 0.9)) + # Align the crossbars directly with each violin - this took a lot of playing around to find the right number....
  
  # Add mean points
  facet_wrap(~ Date, scales = "free_x") +  # Facet the plots based on 'Date'
  scale_x_discrete(labels = c("PV1" = "1 harvest", "PV2" = "2 harvests", "PV3" = "3 harvests")) +  # Customize x-axis labels
   scale_fill_manual(
    values = c("darkseagreen", "lightblue", "lightcoral"),  # Change the colors of the boxplots
    labels = c("3 Harvests", "2 Harvests", "1 Harvest")  # Modify the legend labels to exclude the year
  ) +  # Change the colors of the boxplots
  theme_classic() +  # Use this theme that I've used before and like
  ylab("Biomass (Mg/ha)") +  # Label for y-axis
  xlab("Treatment") +  # Label for x-axis
  theme(strip.text = element_blank())

violin.plot.biomass
Figure 2: This second attempt cleaned things up a bit, but still wasn’t showing what I needed to show to answer my questions

Version 3

Adding a column for seasonal total.

This should have happened at the start.

wrangled.switchgrass.biomass.season.included <- wrangled.switchgrass.biomass%>%
  group_by(Plot, Replicate, Treatment) %>%
  summarise(Mg.ha = sum(Mg.ha, na.rm = TRUE),
            percent.moisture = mean(percent.moisture, na.rm = TRUE),
            Wet.Weight.Grams = mean(Wet.Weight.Grams, na.rm = TRUE),
            Dry.Weight.Grams = mean(Dry.Weight.Grams, na.rm = TRUE),
            .groups = "drop") %>%
  # Add a new column indicating this is the seasonal total
  mutate(Date = "Season Total") %>%
  # Bind the seasonal totals back to the original dataframe
  bind_rows(wrangled.switchgrass.biomass) %>%
  mutate(Date = factor(Date, levels = c("June 2024", "Aug 2024", "Oct 2024", "Season Total")))

Final Plot

#| label: fig-violin-plot-biomass
#| warning: false
#| echo: true
#| fig-cap: "Now, I finally was able to add in the seasonal total which visualizes the answer to the question"

violin.plot.biomass.seasonal.included <- ggplot(wrangled.switchgrass.biomass.season.included, aes(x = Date, y = Mg.ha, fill = Treatment)) + 
  geom_violin(trim = TRUE) +  # Create the violin plot
  
  # Plot means with error bars (standard deviation or standard error)
  stat_summary(fun.data = "mean_sdl", 
               color = "black", 
                position = position_dodge(width = 0.9)) + # Align the crossbars directly with each violin - this took a lot of playing around to find the right number....
   # Facet the plots based on 'Date'
  # Customize x-axis labels
   scale_fill_manual(
    values = c("darkseagreen", "lightblue", "lightcoral")) +  # Change the colors of the boxplots
 # Change the colors of the boxplots
  theme_classic() +  # Use this theme that I've used before and like
  ylab("Biomass (Mg/ha)") +  # Label for y-axis
  xlab("Treatment") +  # Label for x-axis
facet_grid(~ Date, scales = "free_x", space = "free_x") +
    theme(strip.text = element_blank())

violin.plot.biomass.seasonal.included

Were my questions answered?

Yes - but it took many attempts to visualize it properly!

  1. PV2 generates the most biomass.
  2. There are pretty large fluctuations between each harvest for both PV3 and PV2.

What lessons did I learn?

  1. Adding more data to one data.frame in one pipe and then sorting it out for each plot is better than making multiple objects.

  2. How you color needs to be thought about carefully.

  3. There are ways to make it so that your plots are not distorted the way the ones that I made. The code: facet_grid(~ Date, scales = “free_x”, space = “free_x”) should work but it is not working for me and I need to figure that out.

  4. I did the calculations for season total biomass in excel and got different numbers than I did in R. It was due to missing a few cells that should have been included in a sum while in excel. R is really powerful and should be used to help avoid manual errors such as that one.

  5. I love quarto!