Plots in R

R can be used to make very pretty, easily interpretable graphs - but I’m not going to lie to you, it takes some real effort. So why is it worth it? Because with R, you have the flexibility to create the graphs you want, especially with the package ggplot2. Once your plot is created, R makes it very convenient to tweak later when you’re given feedback.

I’m going to present the plots that I most commonly create in R. I use both base R and the package ggplot2 to create plots; I will try to be explicit on when I’m using ggplot2 or not. If I don’t present a graph that you need, google it and there is likely a tutorial there for you. Check out this ggplot2 tutorial for more information.

Learning the basics of R plots making a boxplot

A boxplot shows the median and quantiles of a continuous variable among groups. It is well-suited for visualizing any nonparametric analysis like Kruskal-Wallis or Wilcoxon test. Showing means (like in a standard barplot) is actually inappropriate for any nonparametric analysis.

We will once again use the iris dataset. This boxplot will display the sepal lengths among the iris species.

Basic boxplot

Calling a basic boxplot in R is actually very easy and does not require any extra packages. The response variable goes before the squiggly, the predictor variable goes after. Our data is the iris dataset.

boxplot(Sepal.Length~Species,data=iris)

So this boxplot looks good, but VERY basic. Let’s spruce it up in a few useful ways.

Add x and y labels using the functions xlab() and ylab(). You can also add a main title if you wish.

boxplot(Sepal.Length~Species,data=iris,ylab="Sepal Length (mm)",xlab="Species",main="")

Point the ticks inward and no ticks on x axis. Highlight all 3 lines at once and run it together.

boxplot(Sepal.Length~Species,data=iris,ylab="Sepal Length (mm)",xlab="Species",main="",yaxt="n",xaxt="n")
axis(side=1,tcl=0,at=c(1,2,3),labels=c("I. setosa","I. versicolor","I. virginica"))
axis(side=2,tcl=.5,at=NULL,labels=TRUE)

The plot above is how many folks prefer to present graphs. But although this code appears simple here, adapting it to your own data will take a deeper understanding of what is going on. Let’s break it down.

Looking at the boxplot() line, we know what the first half is doing from what we did above. yaxt=“n” tells R to ‘remove’ the y-axis for the moment. xaxt=“n” does the same thing but for the x-axis. This gets the plot prepared for the next 2 lines of code.

Now we tell R what we want to put back on those axes using the axis() function. Within axis(), we first see side.
Side 1 = bottom
Side 2 = left
Side 3 = top
Side 4 = right

Then we see tcl, the tick length function. By setting it to 0, there are no ticks on the x axis.

Using the at function, we tell R where to put the labels along the x-axis. It so happens that with categorical variables, the first box is centered at 1, the second at 2, etc. So we tell R that the labels go at positions 1, 2, and 3.

Then the labels function puts the labels at the specified positions. They need to be in the same order as you want them on the graph, left to right.

Looking at the second axis(), it is a little different than the first. We see side=2, meaning that it refers to the left axis. We see the tcl (tick length) is set to 0.5. Positive values extend into the plot, negative values extend out of the plot. at= is set to NULL, this way we are not specifying the locations of the ticks and labels, we are letting R choose them (often a good choice for numeric axes). Finally labels= is set to TRUE, meaning R will provide the tick labels, in this case the sepal lengths in mm.

Simple Scatterplot with Trend Line

Making a scatterplot with a trendline is rather simple, especially compared with the boxplot above. We produce a quick graph here with sepal length on the y-axis and sepal width on the x-axis.

plot(iris$Sepal.Width,iris$Sepal.Length, main="", xlab="Sepal Width (mm)",ylab="Sepal Length (mm)")
abline(lm(iris$Sepal.Length~iris$Sepal.Width), col="red") # regression line (y~x)

#if you are getting a warning similar to "new plot has not been called", try highlighting both lines (plot and abline) and run both lines at the same time))

This is a very simple, kind of ugly scatterplot. Let’s use the package ggplot2 to class it up.

Note: Also please see the scatterplot matrix code in the Multiple Linear Regression tutorial.

Classed-up Scatterplot using ggplot2

A quick ggplot2 intro

This package is coded a little differently than basic R, so there is a bit of learning to do here. Overall though the code is intuitive. The code below produces a scatterplot using ggplot2.

First in ggplot2, you create the ‘base’ of the plot. We do this with the ggplot() function. Within that function, we have several arguments, the first is our data (iris), then aes. Within aes you can set the x data, y data, and grouping data (using the colour function).

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
baseGraph<-ggplot(iris, aes(x=Sepal.Width, y=Sepal.Length,colour=Species))
print(baseGraph)

And nothing shows up in the plot! Pretty anticlimatic. The code above makes the base of the plot, but doesn’t put the points on. In ggplot2, you add components manually, which gives you a ton of control. Below, we add the points using the geom_point function. So run the code below and take a gander.

basicScatter<-ggplot(iris, aes(x=Sepal.Width, y=Sepal.Length,colour=Species)) + geom_point()
print(basicScatter)

We are seeing a classier graph coming together. It shows the 3 species as 3 colors, useful for interpretation. Note: if you don’t want them to be separated by color, just delete the colour argument. In the graph above has some downsides though - the labels are meh, the background is weird, the ticks point out (lots of people want them pointed in), and the colors are odd. And that legend out on the side is off-putting.

Add the labels

In the code below, we add better labels using the labs function where we can define them. Note: above, we created an object basicScatter which is the basic scatterplot. Then we can add components to it, like we add labels below.

basicScatter+labs(title="", x="Sepal Width (mm)", y = "Sepal Length (mm)")

Blank background and place legend within plot

Using the function theme_classic we make the plot a more traditional black and white graph. Then we set the legend position using legend.position within the theme function. The legend position is set by x and y coordinates. These coordinates are proportions of the width and height of the graphs, so our code below puts the legend 85% over to the right and 70% up, if that makes sense. Play around with the coordinates to move the legend around.

Note: once again we made a new graph object, this time called halfwayDone. See how everything in basicScatter, the labels, new theme, and legend position are saved to this? This makes the code easier for our next steps: we just print halfwayDone to see it.

halfwayDone<-basicScatter+labs(title="", x="Sepal Width (mm)", y = "Sepal Length (mm)") + theme_classic() + theme(legend.position = c(.85,.7))
print(halfwayDone)

Ticks Pointing In

It should be simple to make the ticks point in, right? Think again - this is some complicated stuff.

First we call the graph object, halfwayDone, then we change the axis tick length using axis.ticks.length, in this case 0.1 cm inward. Doing this alone messes up the axes though, so we need to use axis.text.x and axis.text.y to fix them. I honestly don’t understand this code that well, so I won’t try to explain it. It appears to work for all examples I’ve tried.

Next we change the text color to black using element_text,axis.text.x, and axis.text.y with the code below. If you prefer a different color, look up R colors here.

almostDone <-
  halfwayDone + theme(
  axis.ticks.length = unit(-0.1, "cm"),
  axis.text.x = element_text(margin = margin(10, 5, 5, 5, "pt")),
  axis.text.y = element_text(margin = margin(5, 10, 10, 5, "pt"))
  ) + theme(axis.text.x = element_text(colour = "black"),
  axis.text.y = element_text(colour = "black"))
print(almostDone)

Change Color of Points

Below we change the points to gray scale using the scale_color_manual function. Again, check out the R colors here.

almostDone + scale_color_manual(values=c("gray10", "gray35", "gray70"))

Adding a trendline

The geom_smooth function adds trendlines, in this case a straight line using the method lm - linear model. If you want confidence regions, delete se=FALSE.

completeScatter<-almostDone + scale_color_manual(values=c("gray10", "gray40", "gray70")) + geom_smooth(method = "lm",se=FALSE)
print(completeScatter)

We did it! A decent looking scatterplot ready for publication. If you want more on ggplot2 check it out at this link.

Want only one trendline for the scatterplot? Delete the species grouping variable from your plot formula.

For more customization of your colors, symbols, etc see this link.

Publication-ready scatterplot: all the code

ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, colour = Species)) + geom_point() +
  labs(title = "", x = "Sepal Width (mm)", y = "Sepal Length (mm)") + theme_classic() + theme(legend.position = c(.85, .7)) + theme(
  axis.ticks.length = unit(-0.1, "cm"),
  axis.text.x = element_text(margin = margin(10, 5, 5, 5, "pt")),
  axis.text.y = element_text(margin = margin(5, 10, 10, 5, "pt"))
  ) + theme(axis.text.x = element_text(colour = "black"),
  axis.text.y = element_text(colour = "black")) + scale_color_manual(values =
  c("gray10", "gray40", "gray70")) + geom_smooth(method = "lm", se = FALSE)

Barplot - means with SE or SD error bars

One of the most commonly performed tests in ecology is ANOVA. This is often visualized using a barplot - the bars showing means and the error bars either showing SE or SD.

This is surprisingly not an easily made graph in R, though. Below is the best, most flexible way that I have found. It uses two packages that I have presented in other tutorials: plyr and ggplot2.

Check out the linear regression example above for more detailed information about graphing with ggplot2.

Create summary to prep data for graph

First we will use the function ddply in the package plyr to get the mean, SD, and SE sepal length for each species. Once we have this summary, then we can make our barplot.

Let’s chat about the code below, so that you fully understand it and then can change it as you need. The function ddply creates a new dataframe composed of the new variables we create. In this case, the variables we create are N (number of observations), mean (mean Sepal Length for each group), min (minimum Sepal Length for each group), max (maximum Sepal Length for each group), var (variance Sepal Length for each group), sd (standard deviation Sepal Length for each group), and se (standard error Sepal Length for each group).

The group that we are summarizing by is species - you can summarize by multiple groups also (see the Data Management tutorial).

library (plyr)
## Warning: package 'plyr' was built under R version 3.4.4
summbySpp<- ddply(iris, c("Species"), summarise, N    = length(Sepal.Length),
               mean = mean(Sepal.Length), min=min(Sepal.Length),max=max(Sepal.Length), var=var(Sepal.Length),
               sd   = sd(Sepal.Length), se   = sd / sqrt(N))
summbySpp
##      Species  N  mean min max       var        sd         se
## 1     setosa 50 5.006 4.3 5.8 0.1242490 0.3524897 0.04984957
## 2 versicolor 50 5.936 4.9 7.0 0.2664327 0.5161711 0.07299762
## 3  virginica 50 6.588 4.9 7.9 0.4043429 0.6358796 0.08992695

For each species, we have produced a bunch of summary stats. Now we can use them to make our graph!

Create graph from summary stats

We call the package ggplot2 in order to make this graph. First, let’s make our basic ggplot. We specify that data is summbySpp and then use the function aes. In this function, we tell R that we want the x-axis to be Species and the y-axis to be mean (of Sepal Length). Then we add the bars using geom_bar. In geom_bar, we specify a few things that allow us then to add error bars using geom_errorbar. Within geom_errorbar, we use ymin to specify that the bottom of the error bar should be (mean - standard error) and ymax to make the top (mean + standard error). You can change the width of the error bars - the range of width is 0 to 1 as a proportion of the whole bar.

#Makes basic plot
library(ggplot2)
p<- ggplot(summbySpp, aes(x=Species, y=mean)) + 
  geom_bar(stat="identity", color="gray83", 
           position=position_dodge()) +
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2,
                position=position_dodge(.9)) 
p

Create finished barplot

Using the function labs we make the title blank, the x label Species, and the y label Sepal Length (mm). The function theme_classic makes the background white and the axes black.

# Fix labels, make plot black and white, set bar color
p+labs(title="", x="Species", y = "Sepal Length (mm)")+
  theme_classic()

Publication ready (or almost so) barplot - all of the code

#data summary
library (plyr)
summbySpp<- ddply(iris, c("Species"), summarise, N    = length(Sepal.Length),
               mean = mean(Sepal.Length), min=min(Sepal.Length),max=max(Sepal.Length), var=var(Sepal.Length),
               sd   = sd(Sepal.Length), se   = sd / sqrt(N))
summbySpp

#Makes basic plot
library(ggplot2)
p<- ggplot(summbySpp, aes(x=Species, y=mean)) + 
  geom_bar(stat="identity", color="gray83", 
           position=position_dodge()) +
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2,
                position=position_dodge(.9)) 
 
# Fix labels, make plot black and white, set bar color
p+labs(title="", x="Species", y = "Sepal Length (mm)")+theme_classic()

Interaction plot

Interaction plots are a useful way to investigate 2-way ANOVA relationships. They are rarely used in publications, but I find them delightful.

In order to do this tutorial, we need to make another categorical variable in the iris dataset. The next code makes the community variable in the iris dataset. This isn’t something you will need to do with your own data, it is just needed for this example.

v<-c("high","middle","low")
iris$Community<-v

The function interaction.plot makes an interaction plot, as its name suggests. The first argument tells R the grouping variable that we want on the x-axis. The second tells R which we want to be the “trace” factor. The 3rd is the variable that we want shown for each group. The fourth function, fun, is the function to compute the summary. Mean is the usual, but other options are possible.

interaction.plot(iris$Species,iris$Community,iris$Sepal.Length,fun=mean, xlab="Species",ylab="Sepal Length (mm)")

The resulting graph shows us that for Iris setosa, all of the communities have fairly similar sepal lengths. In Iris versicolor the community high appears to have lower sepal lengths than the other communities. With Iris virginica, all communities diverge somewhat.

Obviously in this example, community is a made-up variable so we aren’t seeing large differences. In real data, look for where lines cross among groups - a crossed line suggests an interaction of your two variables.