Some excercises adapted from exercises found at http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html
Figures are important communication tools. Not only do they communicate your data, but sometimes they communicate more about you as a researcher than you may think.
Imagine that you are handed a figure that was obviously created in Excel, along with results that indicate statistics were done on the dataset. Excel is not a statistics software package - how careful could the researchers have been? How much control did they have over their statistical analyses? Can I trust their analysis to be correct?
Creating a figure that is uniquely your own - and looks nice - gives you an advantage.
We’ve seen plots from ggplot2 several times during this workshop. It is a plotting package that has some really nice features. That said, it has limitations, and there are lots of options out there for plotting in R.
Advantages of ggplot2:
Some limitations of ggplot2:
ggplot2 breaks down a graphic, or figure, into building blocks. If you understand these building blocks - or syntax - you can build nearly any graphic you like. The building blocks are:
The syntax of ggplot2 can take some getting used to. But, once you’ve figured it out, it becomes remarkably easy to control even the most minute details of your figures.
By the end of this exercise, everyone should be able to recreate this figure using ggplot. The data is from the mtcars dataset, baked into R.
For something a bit more advanced, try to recreate this figure from The Economist:
From The Economist
Below is my attempt:
Note: The author of The Economist figure has done some extra data manipulations, and we don’t know their statistical model, so you won’t get it exactly. I’ve labeled a random subsample of points. More resources for this challenge located throughout, and towards the end of this RMarkdown document. Data available here.
It’s important to know that the base graphic functions are always available to you - but they are often not as customizable as ggplot2. Let’s look at some mtcars data using base plotting functions.
hist(mtcars$mpg)
Now for the ggplot2 histogram:
library(ggplot2)
ggplot(mtcars, aes(x = mpg)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
base graphics looks a bit better, doesn’t it?
We can try to make our ggplot look a bit better by changing the bindwidth.
library(ggplot2)
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 5)
Let’s look at a more complex example. Let’s plot mpg by weight (wt) by transmition type using base and then ggplot:
plot(mpg ~ wt,
data = subset(mtcars, am == 1)) # Plot mpg by weight, use a subset of the data, where transmission == "manual" (==1)
points(mpg ~ wt,
data = subset(mtcars, am == 0), col = "red") # Ditto, but for automatic transmissions
legend(3,34, # Where to put the legend
c("Manual", "Automatic"), # What to call the data keys
col = c("black", "red"), # What to colour them
pch = (c(1,1))) # What symbol to use
Some of the data is cut off. Why do you think?
plot(mpg ~ wt,
data = subset(mtcars, am == 1),
xlim = c(1,4.5),
ylim = c(10,36)) # Set the x and y axis limits
points(mpg ~ wt,
data = subset(mtcars, am == 0),
col = "red")
legend(3.5,34,
c("Manual", "Automatic"),
col = c("black", "red"),
pch = (c(1,1)))
Or:
plot(mpg ~ wt,
data = mtcars,
type = "n") # Set up a plot using ALL the data, but don't actually plot anything (type = "none")
points(mpg ~ wt,
data = subset(mtcars, am == 1))
points(mpg ~ wt,
data = subset(mtcars, am == 0),
col = "red")
legend(4.2,34,
c("Manual", "Automatic"),
col = c("black", "red"),
pch = (c(1,1)))
Now the ggplot2:
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg,
colour = as.factor(am))) + # Set the aesthetic mapping so that x = wt, y = mpg, and colour the points by am
geom_point() # Make it a scatter plot
We need to do some fine-tuning, but I’d argue that the ggplot2 syntax is easier to use. In fact, we can quickly switch between plot types:
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg,
colour = as.factor(am))) + # Set the aesthetic mapping so that x = wt, y = mpg, and colour the points by am
geom_smooth() # Make it loess fit
## `geom_smooth()` using method = 'loess'
Or
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg,
colour = as.factor(am))) + # Set the aesthetic mapping so that x = wt, y = mpg, and colour the points by am
geom_density2d() # Make it a 2D density plot
Just as examples…
ggplot2 all about?Included in the aesthetic mapping are all the things you can see, and that change according do your variables. These might include:
These are usually set within the aes() function.
These are the types of geometries, or plots, that we want to make out of our data. There are many, but some are:
geom_point for scatter and dot plots)geom_line for line plots, and also for functions or regressions)geom_bar for bar graphs)geom_boxplot)geom_violin - these are fancy boxplots!)geom_smooth for complex trends and confidence intervals)Let’s start with the plot we started above, and change the colours. (Note: ggplot2 was developed by Europeans - hence “colours”).
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg,
colour = as.factor(am))) + # Set the aesthetic mapping so that x = wt, y = mpg, and colour the points by am
geom_point() +
scale_colour_manual(values = c("black","red")) # Notice that colour order should be the same as the order of your factors
Maybe we don’t want to colour the points by transmission type at all. Maybe instead we’re interested in the horespower of the car.
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp))
You see that since hp is a continuous variable, the points are now coloured using a gradient. We can change that gradient.
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp)) +
scale_colour_gradient(low = "yellow", high = "red")
ggplot2 also has some built in gradients to use.
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp)) +
scale_colour_distiller(palette = "Spectral")
While we’re at it, we can change the legend title.
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp)) +
scale_colour_distiller(palette = "Spectral", "Horsepower")
geomYou can add as many geoms to a plot as you like. For example, maybe we’d like to add a regression line to the current plot we’re working with.
First, let’s see what happens when we just add a geom_line.
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp)) +
scale_colour_distiller(palette = "Spectral", "Horsepower") +
geom_line()
That doesn’t really serve our current purpose, but it would be useful, for example, in a time-series.
What about adding that regression line? Well, we could do the regression, and add the data to the graph:
mydata <- mtcars
mydata$pred <- predict(lm(mpg~wt,mtcars)) # Do the regression and save the predictions
ggplot(mydata, # Make a ggplot using "mydata""
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp)) +
scale_colour_distiller(palette = "Spectral", "Horsepower") +
geom_line(aes(y = pred)) # Plot the predictions as a line - the results of the linear regression.
Or, we could use geom_smooth and do the regression within the ggplot…
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp)) +
scale_colour_distiller(palette = "Spectral", "Horsepower") +
geom_smooth(method = "lm")
Take away the 95% confidence interval and change the regression line’s colour…
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp)) +
scale_colour_distiller(palette = "Spectral", "Horsepower") +
geom_smooth(method = "lm", se = FALSE, colour = "black")
Voila! The same plot.
You can scale all sorts of things - like the size or alpha (transparency) of points…
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp, size = qsec), alpha = 0.8) + # qsec = 1/4 mile time;
scale_colour_distiller(palette = "Spectral", "Horsepower") +
geom_smooth(method = "lm", se = FALSE, colour = "black")
Can you add a scalar for alpha (transparency) by rear axle ratio (drat)?
theme()Within a ggplot, you can control virtually any part of your plot with options from the theme() command. You can get very specific, and some folks have written canned “themes” that you can use. Here are some examples:
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp, size = qsec), alpha = 0.8) + # qsec = 1/4 mile time;
scale_colour_distiller(palette = "Spectral", "Horsepower") +
geom_smooth(method = "lm", se = FALSE, colour = "black") +
theme_bw() # <------- this is canned "black and white" theme
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp, size = qsec), alpha = 0.8) + # qsec = 1/4 mile time;
scale_colour_distiller(palette = "Spectral", "Horsepower") +
geom_smooth(method = "lm", se = FALSE, colour = "black") +
theme_linedraw() # <------- another canned theme
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp, size = qsec), alpha = 0.8) + # qsec = 1/4 mile time;
scale_colour_distiller(palette = "Spectral", "Horsepower") +
geom_smooth(method = "lm", se = FALSE, colour = "black") +
theme_light() # <------- another canned theme
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp, size = qsec), alpha = 0.8) + # qsec = 1/4 mile time;
scale_colour_distiller(palette = "Spectral", "Horsepower") +
geom_smooth(method = "lm", se = FALSE, colour = "black") +
theme_classic() # <------- another canned theme
ggplot(mtcars, # Make a ggplot using mtcars data
aes(x = wt,
y = mpg)) +
geom_point(aes(colour = hp, size = qsec), alpha = 0.8) + # qsec = 1/4 mile time;
scale_colour_distiller(palette = "Spectral", "Horsepower") +
geom_smooth(method = "lm", se = FALSE, colour = "black") +
theme_minimal() # <------- another canned theme
The important thing to understand about these themes is that you could build them, on your own. Check out this and this for some additional resources, but we will be breaking some of this down below.
Let’s begin with the below plot, that we’ve been working towards today. We’ve plotted mpg by wt, and we’re grouping the results by the number of cylinders (cyl), and scaling points by horespower (hp).
ggplot(aes(x = wt, y = mpg, colour = as.factor(cyl), fill = as.factor(cyl)), data = mtcars) +
geom_point(aes(size = hp), shape = 21) +
geom_smooth(aes(linetype = as.factor(cyl)), fill = "grey", method = "lm", formula = y~x, alpha = 0.2)
The first thing I’d like to do is change the colours that I’m using, and fix my legend titles.
ggplot(aes(x = wt, y = mpg, colour = as.factor(cyl), fill = as.factor(cyl)), data = mtcars) +
geom_point(aes(size = hp), shape = 21) +
geom_smooth(aes(linetype = as.factor(cyl)), fill = "grey", method = "lm", formula = y~x, alpha = 0.2) +
scale_colour_manual(values = c("black", "grey", "black"), name = "Number of \ncylinders") + # Here you can specify EDGE colour, and the title of the legend.
scale_fill_manual(values = c("black", "grey", NA), name = "Number of \ncylinders") # Ditto for point fill. Notice that I've used NO fill (NA) for 6 cylinder cars. Also, setting the name of the legend to be identical to the one above COMBINES them. What happens if you use a different name?
Next, I’d like to specify what kind of line to use for each regression, and control the range of of the point size:
ggplot(aes(x = wt, y = mpg, colour = as.factor(cyl), fill = as.factor(cyl)), data = mtcars) +
geom_point(aes(size = hp), shape = 21) +
geom_smooth(aes(linetype = as.factor(cyl)), fill = "grey", method = "lm", formula = y~x, alpha = 0.2) +
scale_colour_manual(values = c("black", "grey", "black"), name = "Number of \ncylinders") +
scale_fill_manual(values = c("black", "grey", NA), name = "Number of \ncylinders") +
scale_linetype_manual(values = c("solid", "solid", "twodash"), name = "Number of \ncylinders") + # I'm doing the same again for linetype.
scale_size_continuous(breaks = c(100,200,300), range = c(.5,8),name = "Horsepower") # and for the size of the points.
Add x and y labels.
ggplot(aes(x = wt, y = mpg, colour = as.factor(cyl), fill = as.factor(cyl)), data = mtcars) +
geom_point(aes(size = hp), shape = 21) +
geom_smooth(aes(linetype = as.factor(cyl)), fill = "grey", method = "lm", formula = y~x, alpha = 0.2) +
scale_colour_manual(values = c("black", "grey", "black"), name = "Number of \ncylinders") +
scale_fill_manual(values = c("black", "grey", NA), name = "Number of \ncylinders") +
scale_linetype_manual(values = c("solid", "solid", "twodash"), name = "Number of \ncylinders") + # I'm doing the same again for linetype.
scale_size_continuous(breaks = c(100,200,300), range = c(.5,8),name = "Horsepower") + # and for the size of the points.
xlab("Weight (1000 lbs)") +
ylab("MPG")
Here is where we will start manipulating the theme(). The first thing I’d like to do is get rid of those minor and major gridlines.
ggplot(aes(x = wt, y = mpg, colour = as.factor(cyl), fill = as.factor(cyl)), data = mtcars) +
geom_point(aes(size = hp), shape = 21) +
geom_smooth(aes(linetype = as.factor(cyl)), fill = "grey", method = "lm", formula = y~x, alpha = 0.2) +
scale_colour_manual(values = c("black", "grey", "black"), name = "Number of \ncylinders") +
scale_fill_manual(values = c("black", "grey", NA), name = "Number of \ncylinders") +
scale_linetype_manual(values = c("solid", "solid", "twodash"), name = "Number of \ncylinders") + # I'm doing the same again for linetype.
scale_size_continuous(breaks = c(100,200,300), range = c(.5,8),name = "Horsepower") + # and for the size of the points.
xlab("Weight (1000 lbs)") +
ylab("MPG") +
theme(panel.grid.major = element_blank(), # Gets rid of major grid lines
panel.grid.minor = element_blank()) # Gets rid of minor grid lines
Next, let’s get rid of the background entirely. Or, whatever, try changing its colour…
ggplot(aes(x = wt, y = mpg, colour = as.factor(cyl), fill = as.factor(cyl)), data = mtcars) +
geom_point(aes(size = hp), shape = 21) +
geom_smooth(aes(linetype = as.factor(cyl)), fill = "grey", method = "lm", formula = y~x, alpha = 0.2) +
scale_colour_manual(values = c("black", "grey", "black"), name = "Number of \ncylinders") +
scale_fill_manual(values = c("black", "grey", NA), name = "Number of \ncylinders") +
scale_linetype_manual(values = c("solid", "solid", "twodash"), name = "Number of \ncylinders") + # I'm doing the same again for linetype.
scale_size_continuous(breaks = c(100,200,300), range = c(.5,8),name = "Horsepower") + # and for the size of the points.
xlab("Weight (1000 lbs)") +
ylab("MPG") +
theme(panel.grid.major = element_blank(), # Gets rid of major grid lines
panel.grid.minor = element_blank(), # Gets rid of minor grid lines
panel.background = element_blank()) # Use element_rect() to alter the colour of the background.
Let’s move that legend. I’d prefer it to be inside the plot…
ggplot(aes(x = wt, y = mpg, colour = as.factor(cyl), fill = as.factor(cyl)), data = mtcars) +
geom_point(aes(size = hp), shape = 21) +
geom_smooth(aes(linetype = as.factor(cyl)), fill = "grey", method = "lm", formula = y~x, alpha = 0.2) +
scale_colour_manual(values = c("black", "grey", "black"), name = "Number of \ncylinders") +
scale_fill_manual(values = c("black", "grey", NA), name = "Number of \ncylinders") +
scale_linetype_manual(values = c("solid", "solid", "twodash"), name = "Number of \ncylinders") + # I'm doing the same again for linetype.
scale_size_continuous(breaks = c(100,200,300), range = c(.5,8),name = "Horsepower") + # and for the size of the points.
xlab("Weight (1000 lbs)") +
ylab("MPG") +
theme(panel.grid.major = element_blank(), # Gets rid of major grid lines
panel.grid.minor = element_blank(), # Gets rid of minor grid lines
panel.background = element_blank(),
legend.position = c(.99,.99), # Sets the position of the legend to the upper right (within plot) (x,y)
legend.justification = c(.99,.99)) # Sets the justification of the legend to upper right (x,y)
Try using different x,y values for the legend position and justification, so that you understand what’s happening…
We deleted the background, but lost the border when we did so. Let’s put it back.
ggplot(aes(x = wt, y = mpg, colour = as.factor(cyl), fill = as.factor(cyl)), data = mtcars) +
geom_point(aes(size = hp), shape = 21) +
geom_smooth(aes(linetype = as.factor(cyl)), fill = "grey", method = "lm", formula = y~x, alpha = 0.2) +
scale_colour_manual(values = c("black", "grey", "black"), name = "Number of \ncylinders") +
scale_fill_manual(values = c("black", "grey", NA), name = "Number of \ncylinders") +
scale_linetype_manual(values = c("solid", "solid", "twodash"), name = "Number of \ncylinders") + # I'm doing the same again for linetype.
scale_size_continuous(breaks = c(100,200,300), range = c(.5,8),name = "Horsepower") + # and for the size of the points.
xlab("Weight (1000 lbs)") +
ylab("MPG") +
theme(panel.grid.major = element_blank(), # Gets rid of major grid lines
panel.grid.minor = element_blank(), # Gets rid of minor grid lines
panel.background = element_blank(),
legend.position = c(.99,.99), # Sets the position of the legend to the upper right (within plot) (x,y)
legend.justification = c(.99,.99), # Sets the justification of the legend to upper right (x,y)
panel.border = element_rect(colour = "black", fill = NA)) # Makes the bounding box of the plot black
Now I’d like to change the sizes of legend elements and text. This is easy to do within theme:
ggplot(aes(x = wt, y = mpg, colour = as.factor(cyl), fill = as.factor(cyl)), data = mtcars) +
geom_point(aes(size = hp), shape = 21) +
geom_smooth(aes(linetype = as.factor(cyl)), fill = "grey", method = "lm", formula = y~x, alpha = 0.2) +
scale_colour_manual(values = c("black", "grey", "black"), name = "Number of \ncylinders") +
scale_fill_manual(values = c("black", "grey", NA), name = "Number of \ncylinders") +
scale_linetype_manual(values = c("solid", "solid", "twodash"), name = "Number of \ncylinders") + # I'm doing the same again for linetype.
scale_size_continuous(breaks = c(100,200,300), range = c(.5,8),name = "Horsepower") + # and for the size of the points.
xlab("Weight (1000 lbs)") +
ylab("MPG") +
theme(panel.grid.major = element_blank(), # Gets rid of major grid lines
panel.grid.minor = element_blank(), # Gets rid of minor grid lines
panel.background = element_blank(),
legend.position = c(.99,.99), # Sets the position of the legend to the upper right (within plot) (x,y)
legend.justification = c(.99,.99), # Sets the justification of the legend to upper right (x,y)
panel.border = element_rect(colour = "black", fill = NA), # Makes the bounding box of the plot black
legend.key.size = unit(.8,"cm"),
axis.text.x = element_text(size = 13),
axis.text.y = element_text(size = 13),
legend.text = element_text(size = 10),
axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
legend.title = element_text(size = 12))
And finally, I like my plots to be perfect squares. Easy to do by setting the aspect.ratio.
ggplot(aes(x = wt, y = mpg, colour = as.factor(cyl), fill = as.factor(cyl)), data = mtcars) +
geom_point(aes(size = hp), shape = 21) +
geom_smooth(aes(linetype = as.factor(cyl)), fill = "grey", method = "lm", formula = y~x, alpha = 0.2) +
scale_colour_manual(values = c("black", "grey", "black"), name = "Number of \ncylinders") +
scale_fill_manual(values = c("black", "grey", NA), name = "Number of \ncylinders") +
scale_linetype_manual(values = c("solid", "solid", "twodash"), name = "Number of \ncylinders") + # I'm doing the same again for linetype.
scale_size_continuous(breaks = c(100,200,300), range = c(.5,8),name = "Horsepower") + # and for the size of the points.
xlab("Weight (1000 lbs)") +
ylab("MPG") +
theme(panel.grid.major = element_blank(), # Gets rid of major grid lines
panel.grid.minor = element_blank(), # Gets rid of minor grid lines
panel.background = element_blank(),
legend.position = c(.99,.99), # Sets the position of the legend to the upper right (within plot) (x,y)
legend.justification = c(.99,.99), # Sets the justification of the legend to upper right (x,y)
panel.border = element_rect(colour = "black", fill = NA), # Makes the bounding box of the plot black
legend.key.size = unit(.8,"cm"),
axis.text.x = element_text(size = 13),
axis.text.y = element_text(size = 13),
legend.text = element_text(size = 10),
axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
legend.title = element_text(size = 12),
aspect.ratio = 1) # Make plot square
Well that’s a lot of code! But, I’ve gotten my plot exactly how I want it, all within R.
Play around with these settings, or others, to get a sense of how theme() controls how your figure looks. If you feel up to it, try the more advanced challenge of trying to emulate the Economist figure (at the beginning of this session) using ggplot2 and theme(). You may need to use ggrepel (?ggrepel). If you’d like to see how I did it, let me know.
That’s it!