Section 2: Introduction to ggplot2


2.1 Basics of ggplot2

Overview of Components

ggplot2 is designed to work exclusively with tidy data (rows are observations and columns are variables).

Plots in ggplot2 consist of 3 main components:
* Data: The dataset being summarized
* Geometry: The type of plot (scatterplot, boxplot, barplot, histogram, qqplot, smooth density, etc.)
* Aesthetic mapping: Variables mapped to visual cues, such as x-axis and y-axis values and color

There are additional components, including a scale component:
* Scale

And style components:
* Labels, Title, Legend
* Theme/Style

You can construct a plot by simply calling ggplot and passing in data, but having not specified any geometry, mapping or additional components, the screen will be blank.

#ggplot(data = murders)
murders %>% ggplot()

Customizing Plots

Layers

In ggplot2, graphs are created by adding layers to the ggplot object, using the + operator:

DATA %>% ggplot() + LAYER_1 + LAYER_2 + … + LAYER_N

The geometry layer defines the plot type and takes the format geom_X where X is the plot type. If you are creating a geometry for a single variable, the most common options are density, histograms or barplots:

If you are creating a geometry for two variables, the most common options are line, point, boxplot:

murders %>% ggplot() +
    geom_point(aes(x = population/10^6, y = total))

aes() is the function which you most use. It connects the data, or maps it, to the elements of the graph.

Aesthetic mappings, aes() describes how properties of the data connect with features of the graph (axis position, color, size, etc.) Define aesthetic mappings with the aes() function. aes() uses variable names from the object component (for example, total rather than murders$total).