My personal documentation, for future reference and created after completion of the JHU Data Science Specialization, online via Coursera LMS. Notes paraphrased from Roger D. Peng’s book Mastering Software Development in R.

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

x <- 3
y <- 4
x + y
## [1] 7

ggplot2 Basic Steps

  1. Create an object of the ggplot class, typically specifying the data and some or all of the aesthetics;
  2. Add on geoms and other elements to create and customize the plot, using +.

Load Libraries and Data

library(titanic)
library(ggplot2)
data("titanic_train", package = "titanic")
titanic <- titanic_train

Initialize ggplot Object

The first step in creating a plot using ggplot2 is to create a ggplot object. This object will not, by itself, create a plot with anything in it. Instead, it typically specifies the data frame to use and which aesthetics will be mapped to certain columns of that data frame.

Generic ggplot Object Template:

object <- ggplot(dataframe, aes(x = column_1, y = column_2))

The dataframe is the first parameter in a ggplot. Aesthetics are defined within an aes function call used within the ggplot call.

Including Plots in R Markdown Document

You can also embed code and resulting plots, for example:

ggplot(data = titanic, aes(x = Fare)) +
   geom_histogram(binwidth = 15)

Note that if the echo = FALSE parameter was added to the code chunk, it would prevent printing of the R code that generated the plot.

ggplot2 Flexible Syntax

You could specify the aesthetic for the histogram in an aes statement when adding the geom (geom_histogram) rather than in the ggplot call:

ggplot(data = titanic) +
   geom_histogram(aes(x = Fare))

Similarly, you could specify the dataframe when adding the geom rather than in the ggplot call:

ggplot() +
   geom_histogram(data = titanic, aes(x = Fare))

Finally, you can pipe the titanic dataframe into a ggplot call, since the ggplot function takes a dataframe as its first argument:

titanic %>% 
   ggplot() +
   geom_histogram(aes(x = Fare))
titanic %>% 
   ggplot(aes(x = Fare)) +
   geom_histogram()

While all of these work, for simplicity we’ll use the syntax of specifying the data and aesthetics in the ggplot call.

aes Call is Not Flexible!

It is very important that if you want to show values from a column of the data using aesthetics like color, size, shape, or position, you remember to make that specification within aes.

Also, be sure that you specify the dataframe before you specify aesthetics, and if you specify a dataframe within a geom, use data = syntax.

To Save a Plot Using Code in an R Script

  1. Open a graphics device (e.g., using the function pdf or png)
  2. Run the code to draw the map
  3. Close the graphics device using the dev.off function