This tutorial serves as an introduction to ggplot2. Though the plot() command serves as a basic plotting device, ggplot allows for the manipulation of more variables, making it more versatile and user friendly (once you get the hang of the syntax).
Let’s start by pulling in some data that RStudio already has stored within itself. When searching the web for R help, you’ll often come across datasets such as cars and mtcars. These are sets that are often used in tutorials like this, and help users communicate different examples of the ways that R can be used. For a complete list of base R datasets, type library(help="datasets").
For this tutorial, we’re going to look at the ChickWeight and InsectSprays datasets, which analyze the effect of diet on chick growth, and the effectiveness of different bug spray types.
library(datasets)
View(ChickWeight)
View(InsectSprays)
Let’s start with the ChickWeight data. For simplicity, we’ll only analyze 3 chicks on different diets. We’ll start by calling in three different diets of thee chicks:
chick1 <- subset(ChickWeight, Chick==1) #pull out data for a chick on diet 1
chick2 <- subset(ChickWeight, Chick==21) #pull out data for a chick on diet 2
chick3 <- subset(ChickWeight, Chick==31) #pull out data for a chick on diet 3
#combine chickens into a dataframe based on time (days)
chicken <- data.frame(chick1$Time, chick1$weight, chick2$weight, chick3$weight)
colnames(chicken) <- c('days', 'chick1', 'chick2', 'chick3')
chicken #view dataframe with newly stored results
## days chick1 chick2 chick3
## 1 0 42 40 42
## 2 2 51 50 53
## 3 4 59 62 62
## 4 6 64 86 73
## 5 8 76 125 85
## 6 10 93 163 102
## 7 12 106 217 123
## 8 14 125 240 138
## 9 16 149 275 170
## 10 18 171 307 204
## 11 20 199 318 235
## 12 21 205 331 256
Let’s get to some plotting. With ggplot, you can make your inputs as simple or as complicated as you want – its simply a matter of searching online for what you want to add into the command list. We’ll start simple and build up.
library(ggplot2)
df <- chicken;
colnames(df) <- c('Time', 'C1', 'C2', 'C3')
ggplot(df, aes(x=Time)) + #set dataframe and x axis name
geom_line(aes(y=C1, color='chick 1'), size=1)+ #plot for chick 1
geom_line(aes(y=C2, color='chick 2'), size=1)+ #plot for chick 2
geom_line(aes(y=C3, color='chick 3'), size=1)+ #plot for chick 3
labs(y = "Weight (g)", x="Days") #change axes names
We see in the lines and plot above how easy it is to specify the basic inputs of ggplot, but what if we want to change line color, type, thickness, legend position, or even the background color?
df <- chicken;
colnames(df) <- c('Time', 'C1', 'C2', 'C3')
ggplot(df, aes(x=Time)) + #set dataframe and x axis name
geom_line(aes(y=C1, color='chick 1'), size=1) + #plot for chick 1
geom_line(aes(y=C2, color='chick 2'), size=1) + #plot for chick 2
geom_line(aes(y=C3, color='chick 3'), size=1) + #plot for chick 3
labs(y = "Weight (g)", x="Days") + #change axes names
scale_colour_manual(values=c("green","red", "blue")) + #override line colors
guides(colour = guide_legend(override.aes = list(size=5))) + #legend symbol thickness
#modify theme of plot
theme_bw()+ #change to theme_light, grey, light, dark, classic, etc
#modify major components:
theme(
plot.background = element_rect(fill = "dark grey"), #background
axis.text=element_text(size=14, colour="black"), #axis text
axis.title=element_text(size=16, colour="brown"), #axis title
axis.line = element_line(colour = "black", #axis lines
size = 1, linetype = "solid"),
axis.ticks = element_line(colour="black"), #axis ticks
panel.grid.major=element_line(colour = "light grey"), #major ticks
panel.grid.minor=element_blank(), #minor ticks (leave blank)
#legend info
legend.position="top",
legend.title=element_blank(),
legend.box = "horizontal",
legend.background = element_rect(fill="white", size=0.5,
linetype="solid", colour ="white"),
legend.text=element_text(size=14)
)
Granted, this isn’t the prettiest of plots, but it showcases how ggplot enables you to change many different aspects of a plot display. Let’s break down what the heck some of these lines even do / change within a plot. Just to note, this is in no way a comprehensive list of everything you can do in ggplot. A quick round of online searching will show you the many wonders (and annoyances) that ggplot has in store for you :)
| Plot Command | What it does | Ways to change it |
|---|---|---|
| ggplot(df, aes(x=Time)) | Specifies data you’re using, and aes() specifies x component of data (can also specify y here for some types of plots). | change dataframe, change x and y column that is being used |
| geom_line(aes(y= C1, color =‘chick 1’), size=1) | geom_line creates a new line on the plot. aes() in a geom_line command is used to specify y values, and the name that will appear on the legend. color='chick 1' specifies a new color for this set of data, now named “chick 1”. You can also specify line size here within geom_line. Multiple geom_lines represent multiple lines on the plots. |
change y=C1 to another column to represent a different chick. change color= to change the name used on the legend. change size= to change line size |
| labs(y = “Weight (g)”, x=“Days”) | specifies the x and y label names | change the text within the quotes |
| scale_colour_manual(values=c(“green”,“red”, “blue”)) | manually override the color of the lines you have plotted, in order of their geom_line() appeareance |
change the color values |
| guides(colour = guide_legend(override.aes = list(size=5))) | override the size of the symbols in the legend. Besides colour, you can also list options for size and shape within guides(). |
change size number, add size=guide_legend(), add shape=guide_legend() |
| theme_bw() | changes the overall theme of the plot. theme_bw creates a black and white plot. |
Some cool options include theme_gray, theme_dark, and theme_classic, but there are more online |
| theme() | there are many ways to modify the theme of a plot. See the example above for a general description of the elements. | add or remove elements to make plot as simple or complicated as you need |
Let’s look at other ways we can use ggplot.
We can make different types of plots by looking at the geom_ options. For example, you can generate boxplots using geom_boxplot(), histograms using geom_histogram(), horizontal lines with geom_hline(), and much more.
View(InsectSprays)
df2 <- InsectSprays
ggplot(df2, aes(x=spray, y=count, fill=spray))+
geom_boxplot() + #change fill color of box plots
ggtitle("Effectiveness of Different Types of Bug Sprays") +
labs(y="Bug Count", x = "Spray Type")