I wanted to create an RScript tutorial leveled toward students who are just learning how to graph, perhaps already having learned basic graphing using lattice. For assistance in creating this work, I used Getting Started With R, An Introduction for Biologists (2nd ed.) from Oxford University Press.
First, let’s open up a dataset. I chose the iris dataset.
library("ggplot2", lib.loc="/Library/Frameworks/R.framework/Versions/3.3/Resources/library")
data(iris)
View(iris)
SCATTER PLOTS
The first graph that I made used an xy scatterplot which looked at iris sepal length by petal length. Note that the first line starts with a ggplot command, followed by the dataset itself, and then the x,y parameters. Additional lines are indicated with a “+” at the end of the preceding command line. Each new line is an add-on layer. In this case, the add-on layer puts individual points on the graph.
The aes argument is important, as it lets you define the graph’s variables as well as features of the graph. It is always in the first line of the command, from what I have seen.
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length))+
geom_point()
To get rid of the gray background, which is apparently something many are interested in doing, you would add the line: theme_bw()
You can also add x and y axis labels with units to your graph:
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length))+
geom_point()+
theme_bw()+
xlab("Sepal Length (mm)")+
ylab("Petal Length (mm)")
The iris dataset contains data collected from 3 different iris species. The graph we just drew does not represent those species, and we may wish to find differences between species. The “color” command in the first line will delineate data points by species. One can also replace “color” for “Shape” and get different shaped data points.
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species))+
geom_point()+
theme_bw()+
xlab("Sepal Length(mm)")+
ylab("Petal Length(mm)")
BOX AND WHISKER PLOTS
Another graphing type that we use a lot in the Introductory Biology labs is the box and whisker plot. These too are simply made in ggplot2 using much of the same commands that we saw in the scatter plots. Here we graph the petal length of 3 iris species. The geom_boxplot command designates the plot type.
ggplot(iris, aes(x=Species, y=Petal.Length))+
geom_boxplot()+
xlab("Iris species")+
ylab("Petal length (mm)")
To enhance this graph beyond the box and whiskers, one can also plot the individual data points. The geom_point layer allows you to indicate the size of the points, the color of the points, and the transparenct of the points (e.g. alpha)
ggplot(iris, aes(x=Species, y=Petal.Length))+
geom_boxplot()+
geom_point(size=2, color='blue', alpha=0.1)+
xlab("Iris species")+
ylab("Petal length (mm)")
HISTOGRAMS
Lastly, I used ggplot2 to learn how to make histograms. Histograms unlike the 2 previous plot types only have a single variable, and the frequency of this variable is graphed by R itself.
ggplot(iris, aes (x=Sepal.Length))+
geom_histogram()+
xlab("Sepal Length (mm)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This creates a histogram which is unattractive, and without prompting, R itself advises that you stipulate a better bin width. You can do this by adding parameters to the parentheses in the geom_histogram line of command.
ggplot(iris, aes (x=Sepal.Length))+
geom_histogram(bins=10)+
xlab("Sepal Length (mm)")
This histogram lumps together all the sepals of all three species. We can also draw 3 separate distributions of sepal length by individual species. This is called a “facet” in ggplot
ggplot(iris, aes (x=Sepal.Length))+
geom_histogram(bins=10)+
xlab("Sepal Length (mm)")+
facet_wrap(~Species)
SAVING PLOTS
ggplot2 also has a clever function where you can save individual plots to the project you are working within. This uses the command “ggsave()”, with the file name that you create and the file type, (e.g. .png) in the name.
FUTURE DIRECTIONS
These three plot types are the ones that we generally stress in the Introductory Biology Laboratory. The future directions of my personal learning of ggplot2 will include instruction in how to customize plots, as well as how to coordinate plots with the basic statistics that we teach.