My personal documentation, for future reference and created after completion of the JHU Data Science Specialization, online via Coursera LMS. Notes paraphrased from Roger D. Peng’s book Mastering Software Development in R.

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

x <- 5
y <- 2
x * y
## [1] 10

Load Libraries and Data

library(tidyverse)
library(titanic)
library(faraway)

Geom Functions

Geom functions add the graphial elements of ggplot objects; if you don’t include at least one geom, you’ll get a blank plot space.

Each geom function has its own arguments to adjust how the graph is created; they differ in the aesthetic inputs they require.

All geom functions have both required and accepted aesthetics; you can add multiple. For example, to use color to show player position and size to show shots on goal for the World Cup data, you could call:

ggplot(worldcup, aes(x = Time, y = Passes, color = Position,
   size = Shots)) +
   geom_point()

Using Multiple Geoms

Several geoms can be added to the same ggplot object, which allows you to build up layers to create interesting graphs.

Let’s add label points for noteworthy players with their team names and positions.

  1. Create a subset of data with the info for the notable players and add a column with the text to include on the plot.
  2. Add a text geom to the previous ggplot object:
noteworthy_players <- worldcup %>% 
        filter(Shots == max(Shots)  |
        Passes == max(Passes)) %>% 
          mutate(point_label = paste(Team, Position, sep = ", "))
ggplot(worldcup, aes(x = Passes, y = Shots)) +
        geom_point() +
        geom_text(data = noteworthy_players,
                  aes(label = point_label),
                  vjust = "inward", hjust = "inward")

Add Reference Lines

In our earlier World Cup scatterplot, there seemed to be some horizontal clustering. Soccer games last 90 minutes each. To check to see if horizontal clustering is at 90-minute intervals, plot a histogram of player time (Time), with reference lines every 90 minutes.

ggplot(worldcup, aes(x = Time)) +
        geom_histogram(binwidth = 10) +
        geom_vline(xintercept = 90 * 0:6, color = "magenta", alpha = 0.5)

Constant Aesthetics

Instead of mapping an aesthetic to an element of your data, you can use a constant value for it. Do that by specifying the color aesthetic outside of an aes call when adding the points geom, plot below.

You can do this with any of the aesthetics for a geom, including color, fill, shape, and size.

ggplot(worldcup, aes(x = Time, y = Passes)) +
        geom_point(color = "darkgreen")

Specify Shape of Points

To change the shape of points, you use a number to specify the desired shape. Here’s a reference for shapes: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

And a reference for R colors: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

Useful Plot Additions

Elements besides geoms can be added to a ggplot object using +.

Element Description
ggtitle Plot title
xlab, ylab x- and y-axis labels
xlim, ylim Limits of x- and y-axis