ggplot2 is a package in RStudio that focuses on aesthetics, geometry, and data. The plots produced using this package are much more pleasing to look at, compared to the basics you can make in RStudio. However, the commands are quite different, as you can remember from using the package in the plot styles module to create stacked and grouped bar plots. This information-dense PDF breaks down ggplot2 for future reference, but in this module I will unpack this tool a little more in depth, so that a beginner in RStudio can use it without the assumption that the reader already has sound understanding of the inner workings of R codes.


ggplot2 has two potential functions: ggplot and qplot. In this module, we will analyse qplot, which is shorthand for quickplot and is analogous to the base R plotting function using this data set.


Creating a scatterplot

Like the plot function in base R codes for a scatterplot, the qplot function in ggplot2 without any modification codes for a scatterplot.


qplot(xcoord, ycoord, data = dataframe, color = "color"/subgroup)


Here is an example!

## Read data and set the library to ggplot2

voc <- read.csv(url("https://raw.githubusercontent.com/nmccurtin/CSVfilesbiostats/master/vc%20(1).csv"))

library(ggplot2)

## Generate your plot

qplot(height, vc, data = voc, color = sex)

## You can also add a geom = c( function to create regression.

qplot(height, vc, data = voc, color = sex, geom = c("point", "smooth"))
## `geom_smooth()` using method = 'loess'

## I think it's cool how you can make the points the names of each datum using the geom function!

qplot(height, vc, data = voc, color = sex, label = name, geom = c("point", "text"))


Including a “geom” does some sort of statistical summary of your data, as we can see in this plot is a regression line with standard error bars. The line on its own is good enough, although a lot of R programmers like to show off the standard error bars. (I personally think it’s )


Creating a histogram

This is simple, like in base R.


qplot(value, data = dataframe, geom = "histogram")


## Read data and set the library to ggplot2

voc <- read.csv(url("https://raw.githubusercontent.com/nmccurtin/CSVfilesbiostats/master/vc%20(1).csv"))

library(ggplot2)

## Generate your plot, using the "geom" histogram

qplot(height, data = voc, geom = "histogram")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


Boxplots, stripcharts, and violin plots

These three plots are very similar because they are separated with an axis of categorical values and an axis of numerical values; their R code is similar when using the ggplot2 quick plot.


## Read data and set library to ggplot2
voc <- read.csv(url("https://raw.githubusercontent.com/nmccurtin/CSVfilesbiostats/master/vc%20(1).csv"))

library(ggplot2)

## Generate boxplot by setting your geom to boxplot

qplot(sex, height, data = voc, geom = "boxplot")

## Generate strip chart by setting your geom to jitter, stackdir and binaxis are used to create a jitter

qplot(sex, height, data = voc, geom = "jitter", stackdir = "center", binaxis = "y")
## Warning: Ignoring unknown parameters: stackdir, binaxis

## Generate a violin plot by setting your geom to violin

qplot(sex, height, data = voc, geom = "violin")


Adding labels and colors

This step is quite simple. The arguments are:

xlab = "label", ylab = "label", main = "maintitle"


Colors are also simple, as touched on earlier in this module. Using the quickplot, you can either separate automatic colors by setting the color to the category separating the values, or you can choose an overarching color.


col = variable
col = "blue"


Let’s consolidate these features into one of our previous plots!

## Read data and set library to ggplot2
voc <- read.csv(url("https://raw.githubusercontent.com/nmccurtin/CSVfilesbiostats/master/vc%20(1).csv"))

library(ggplot2)

## Generate boxplot by setting your geom to boxplot

qplot(sex, height, data = voc, geom = "boxplot", col = sex, fill = sex, xlab = "Sex of Participant", ylab = "Vital Capacity (mL)", main = "Vital Capacity and Sex of BIO-203 Students")


The col = function acts as the outline for histograms, boxplots, and violin plots. The fill = function changes the actual color of the bars.


That’s all for now!