Just like any R-script you should start by identifying some of the packages you know you will need and load them. For this example we need dplyr and ggplot2. The package dplyr is a magical data manipulation package that makes data automation easy to establish. This helps as you organize data and/or calculate summary statistics. The ggplot2 package is the go to graphic package for R and is a very useful tool for visualizing data or making publication worthy figures.
If you have not installed these packages you will have to run the installation step before loading the packages from your library! This step has been commented out below.
#install.packages("dplyr")
library("dplyr")
#install.packages("ggplot2")
library("ggplot2")I am going to bring in some data that I made up for this tutorial. You’ll need to have a column identifying the following:
samp_dat <- read.csv("fermentation_example.csv")
head(samp_dat, 20)## treatment sample_ID time CO2
## 1 molasses 1 0 0
## 2 molasses 2 0 0
## 3 molasses 3 0 0
## 4 glucose 1 0 0
## 5 glucose 2 0 0
## 6 glucose 3 0 0
## 7 molasses 1 10 13
## 8 molasses 2 10 14
## 9 molasses 3 10 15
## 10 glucose 1 10 2
## 11 glucose 2 10 3
## 12 glucose 3 10 4
## 13 molasses 1 20 24
## 14 molasses 2 20 25
## 15 molasses 3 20 26
## 16 glucose 1 20 6
## 17 glucose 2 20 7
## 18 glucose 3 20 8
## 19 molasses 1 30 25
## 20 molasses 2 30 26
Consider reorganizing your actual data to match the format in the samp_dat data set. It inlcudes the replicates and time interval data that we need to summarize our data and create a figure of fermentation rate over the course of 40 minutes.
Before we build our figure, we have to calculate some summary statistics. These summary stats will be used to build our rates plot. It is best to store this step in a dataframe so that you can refer back to it while you write about your results. Furthermore, storing in a dataframe makes writing commands much easier in R. The code below will calculate mean, sample size (n), and standard error (SE) of CO2 which is the amount of CO2 produced at eahc time interval for each sample for each separate treatment. We will store this new summary data in a dataframe called samp_summary, which will appear in your Global Environment.
In this case we want to plot the average CO2 produced in each treatment for each time interval. Therefore when we use the group_by function, we need to specify treatment and time as our grouping variables.
Extra Note: In the code chunk below you will notice an operator that you may be new to you (%>%). This is called a pipe. Piping takes the output of on line of commands and makes it the input of the next command statement. It is a commonly used operator in dplyr. It is great for automating multiple actions on one dataframe.
# Summary Stats
samp_summary <- samp_dat %>% # Specifying the name of the new dataframe `iris_summary`
group_by(treatment, time) %>% # Groups our dataframe by treatment and time so R knows how to group mean calculations
summarise( # Function for mutating data into summary stats.
mean_CO2 = mean(CO2), # Creating column to store means
N = n(), # Creating column to store sample size
SE_CO2 = sd(CO2) / sqrt(n()) # Creating column to store standard error
)View(samp_summary)All the SE values are the same in this sample data because all of the data going into mean calculations for each time interval have variance that is exactly equal - your data should give more realistic error terms.
We will create a basic barplot using the summary statistics we just calculated. Below we specify the data we want to use - samp_summary - and the data we want to use to build the aesthetics of the figure - time and mean_CO2. We will specify the treatment with color differences. In the aesthetics we will use fill = treatment and colour = treatment to symbolize the treatments in the experiment.
Fermentation_Plot <-
ggplot(samp_summary, aes(x = time,
y = mean_CO2, fill = treatment, colour = treatment)) + # Specifies the data we want plotted - aes() tells R how we want the data plotted and symbolized
geom_line() + # Tells R what type of plot we want - we want a line to show our rate
geom_point() + # Places point at each time interval
geom_errorbar( # Gives each point an SE bar
aes(
ymin = mean_CO2 - SE_CO2, # Established error bars
ymax = mean_CO2 + SE_CO2,
width = 0.5
)
)Fermentation_Plot # Shows you the plot you just madeggsave()