Just like any R-script you should start by identifying some of the packages you know you will need and load them. For this example we need dplyr and ggplot2. The package dplyr is a magical data manipulation package that makes data automation easy to establish. This helps as you organize data and/or calculate summary statistics. The ggplot2 package is the go to graphic package for R and is a very useful tool for visualizing data or making publication worthy figures.
If you have not installed these packages you will have to run the installation step before loading the packages from your library! This step has been commented out below.
#install.packages("dplyr")
library("dplyr")
#install.packages("ggplot2")
library("ggplot2")For this tutorial I will be using the iris data set. iris is an example data set in base R. You can view the first five lines of the data using this code below:
head(iris)## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Consider reorganizing your actual data to match the format in the iris data set. This dataset gives a good example of some data organization best practices.
Before we build our figure, we have to calculate some summary statistics. These summary stats will be used to build the barplot. It is best to store this step in a dataframe so that you can refer back to it while you write about your results. Furthermore, storing in a dataframe makes writing commands much easier in R. The code below will calculate mean, sample size (n), and standard error (SE) of Sepal.Wdith for each separate species, and then store this new data in a dataframe called iris_summary, which will appear in your Global Environment.
Extra Note: In the code chunk below you will notice an operator that you may be new to you (%>%). This is called a pipe. Piping takes the output of on line of commands and makes it the input of the next command statement. It is a commonly used operator in dplyr. It is great for automating multiple actions on one dataframe.
# Summary Stats
iris_summary <- iris %>% # Specifying the name of the new dataframe `iris_summary`
group_by(Species) %>% # Groups our dataframe by species so R knows how to group mean calculations
summarise( # Function for mutating data into summary stats.
mean_Sepal.Width = mean(Sepal.Width), # Creating column to store means
N = n(), # Creating column to store sample size
SE_Sepal.Width = sd(Sepal.Width) / sqrt(n()) # Creating column to store standard error
)View(iris_summary)We will create a basic barplot using the summary statistics we just calculated. Below we specify the data we want to use - iris_summary - and the data we want to use to build the aesthetics of the figure - Species and mean_Sepal.Width.
Sepal.Width_Plot <-
ggplot(iris_summary, aes(x = Species,
y = mean_Sepal.Width)) + # Specifies the data we want plotted
geom_col() + # Tells R what type of plot we want
geom_errorbar(
aes(
ymin = mean_Sepal.Width - SE_Sepal.Width, # Established error bars
ymax = mean_Sepal.Width + SE_Sepal.Width,
width = 0.5
)
)Sepal.Width_PlotYou can do A LOT in GGplot to make figures look really pretty, too much to go over in this tutorial. For now, here is some code to show an example of how I like to format my figures:
Sepal.Width_Plot +
labs(
y = "Mean Sepal Width",
x = "Species") +
theme_classic(base_size = 16) +
scale_y_continuous(expand = c(0,0),
limits = c(0, 5)) +
theme(legend.position = "none") +
theme(axis.ticks.x = element_blank())ggsave()