Here is an R Markdown with all the information from the datacamps

How to read this document

Everything is organized like it is in datacamp. The large headings are the lessons (I skipped some tasks in the “Introduction to Data Visualization with ggplot2”). Medium headings show what section of the lesson is documented (there are usually 4 for each datacamp). Small headings include all the individual lessons, some are combined.

For each individual lesson there will be the heading (i.e. “Flipping Axes I”) and the instructions. For multi step instructions, each task is a new paragraph. The code is included in a chunk below the instructions and is separated by ### into tasks like so…

Title of individual lesson

-> Notes (if there are any)

instructions and tasks for 1/3

instructions and tasks for 2/3

instructions and tasks for 3/3

#code 1

###

#code 2

###

#code 3

Enjoy! Let me know if you see anything I need to change!

Introduction to Data Visualization with ggplot2

1-Introduction

Here are bits and pieces of this assignment- I skipped over some of the basics. There are sometimes -> Notes for context

Drawing your first plot

Load the ggplot2 package using library(). Use str() to explore the structure of the mtcars dataset. Hit Submit Answer. This will execute the example code on the right. See if you can understand what ggplot does with the data.

# Load the ggplot2 package
library(ggplot2)

# Explore the mtcars data frame with str()
str(mtcars)

# Execute the following command
ggplot(mtcars, aes(cyl, mpg)) +
  geom_point()

Data columns types affect plot types

->Although cyl (the number of cylinders) is categorical, you probably noticed that it is classified as numeric in mtcars. This is really misleading because the representation in the plot doesn’t match the actual data type. You’ll have to explicitly tell ggplot2 that cyl is a categorical variable.

Change the ggplot() command by wrapping factor() around cyl.

# Load the ggplot2 package
library(ggplot2)

# Change the command below so that cyl is treated as factor
ggplot(mtcars, aes(factor(cyl), mpg)) +
  geom_point()

Changing one geom or every geom

Edit the plot code to map the color aesthetic to the clarity data variable.

Make the points translucent by setting the alpha argument to 0.4.

# Map the color aesthetic to clarity
ggplot(diamonds, aes(carat, price, color=clarity)) +
  geom_point() +
  geom_smooth()

###

# Make the points 40% opaque
ggplot(diamonds, aes(carat, price, color = clarity)) +
  geom_point(alpha=.4) +
  geom_smooth()

Saving plots as variables

Using the diamonds dataset, plot the price (y-axis) versus the carat (x-axis), assigning to plt_price_vs_carat. Using geom_point(), add a point layer to plt_price_vs_carat.

Add an alpha argument to the point layer to make the points 20% opaque, assigning to plt_price_vs_carat_transparent. Type the plot’s variable name (plt_price_vs_carat_transparent) to display it.

Inside geom_point(), call aes() and map color to clarity, assigning to plt_price_vs_carat_by_clarity. Type the plot’s variable name (plt_price_vs_carat_by_clarity) to display it.

# Draw a ggplot
plt_price_vs_carat <- ggplot(
  # Use the diamonds dataset
  data=diamonds,
  # For the aesthetics, map x to carat and y to price
  aes(x=carat, y=price)
)

# Add a point layer to plt_price_vs_carat
plt_price_vs_carat <- ggplot(data=diamonds, aes(x=carat, y=price)) +     geom_point() 

###


# From previous step
plt_price_vs_carat <- ggplot(diamonds, aes(carat, price))

# Edit this to make points 20% opaque: plt_price_vs_carat_transparent
plt_price_vs_carat_transparent <- plt_price_vs_carat + geom_point(alpha=.2)

# See the plot
plt_price_vs_carat_transparent

###

# From previous step
plt_price_vs_carat <- ggplot(diamonds, aes(carat, price))

# Edit this to map color to clarity,
# Assign the updated plot to a new object
plt_price_vs_carat_by_clarity <- plt_price_vs_carat + geom_point(aes(color=clarity))

# See the plot
plt_price_vs_carat_by_clarity

2-Aesthetics

These are the aesthetics you can consider within aes() in this chapter: x, y, color, fill, size, alpha, labels and shape.

All about aesthetics: color vs. fill

->Typically, the color aesthetic changes the outline of a geom and the fill aesthetic changes the inside. geom_point() is an exception: you use color (not fill) for the point color. However, some shapes have special behavior.

->The default geom_point() uses shape = 19: a solid circle. An alternative is shape = 21: a circle that allow you to use both fill for the inside and color for the outline. This is lets you to map two aesthetics to each point.

->All shape values are described on the points() help page.

->fcyl and fam are the cyl and am columns converted to factors, respectively.

All about aesthetics: comparing aesthetics

Using mtcars, create a plot base layer, plt_mpg_vs_wt. Map mpg onto y and wt onto x. Add a point layer, mapping the categorical no. of cylinders, fcyl, onto size.

Change the mapping. This time fcyl should be mapped onto alpha.

Change the mapping again. This time fycl should be mapped onto shape.

Swap the geom layer: change points to text. Change the mapping again. This time fycl should be mapped onto label.

# Establish the base layer
plt_mpg_vs_wt <- ggplot(data=mtcars, aes(x=wt, y=mpg))

# Map fcyl to size
plt_mpg_vs_wt +
  geom_point(aes(size=fcyl))

###

# Base layer
plt_mpg_vs_wt <- ggplot(mtcars, aes(wt, mpg))

# Map fcyl to alpha, not size
plt_mpg_vs_wt +
  geom_point(aes(alpha = fcyl))

###

# Base layer
plt_mpg_vs_wt <- ggplot(mtcars, aes(wt, mpg))

# Map fcyl to shape, not alpha
plt_mpg_vs_wt +
  geom_point(aes(shape = fcyl))

###

# Base layer
plt_mpg_vs_wt <- ggplot(mtcars, aes(wt, mpg))

# Use text layer and map fcyl to label
plt_mpg_vs_wt +
  geom_text(aes(label = fcyl))

All about attributes: color, shape, size and alpha

->You can specify colors in R using hex codes: a hash followed by two hexadecimal numbers each for red, green, and blue (“#RRGGBB”)

Set the point color to my_blue and the alpha to 0.6.

# A hexadecimal color
my_blue <- "#4ABEFF"

ggplot(mtcars, aes(wt, mpg)) +
  # Set the point color and alpha
  geom_point(color=my_blue,alpha=.6)

All about attributes: conflicts with aesthetics

Add a point layer, setting alpha, the transparency, to 0.5.

Add a text layer, setting the label to the rownames of the dataset mtcars, and the color to “red”.

Add a point layer, setting the shape to 24 and the color to “yellow”.

ggplot(mtcars, aes(wt, mpg, color = fcyl)) +
  # Add point layer with alpha 0.5
  geom_point(alpha=.5)

###

ggplot(mtcars, aes(wt, mpg, color = fcyl)) +
  # Add text layer with label rownames(mtcars) and color red
  geom_text(label=rownames(mtcars), color="red")

###

ggplot(mtcars, aes(wt, mpg, color = fcyl)) +
  # Add points layer with shape 24 and color yellow
  geom_point(shape=24,color="yellow")

Going all out

->Here is the last task on adding aes()

Add another two aesthetics: map hp divided by wt onto size.

# 5 aesthetics: add a mapping of size to hp / wt
ggplot(mtcars, aes(mpg, qsec, color = fcyl, shape = fam,size= hp / wt)) +
  geom_point()

Updating aesthetic labels

->labs() to set the x- and y-axis labels. It takes strings for each argument. scale_color_manual() defines properties of the color scale (i.e. axis). The first argument sets the legend title. values is a named vector of colors to use.

Set the x-axis label to “Number of Cylinders”, and the y-axis label to “Count” using the x and y arguments of labs(), respectively.

Implement a custom fill color scale using scale_fill_manual(). Set the first argument to “Transmission”, and values to palette.

Modify the code to set the position to dodge so that the bars for transmissions are displayed side by side.

ggplot(mtcars, aes(fcyl, fill = fam)) +
  geom_bar() +
  # Set the axis labels
  labs(x="Number of Cylinders",
      y="Count")

###

palette <- c(automatic = "#377EB8", manual = "#E41A1C")

ggplot(mtcars, aes(fcyl, fill = fam)) +
  geom_bar() +
  labs(x = "Number of Cylinders", y = "Count") +
  # Set the fill color scale
  scale_fill_manual("Transmission", values = palette)

###

palette <- c(automatic = "#377EB8", manual = "#E41A1C")

# Set the position
ggplot(mtcars, aes(fcyl, fill = fam)) +
  geom_bar(position="dodge") +
  labs(x = "Number of Cylinders", y = "Count")
  scale_fill_manual("Transmission", values = palette)

Setting a dummy aesthetic

->a y-axis will always be provided, even if you didn’t ask for it. You can make univariate plots in ggplot2, but you will need to add a fake y axis by mapping y to zero.

Using mtcars, plot 0 vs. mpg. Make a scatter plot and add “jitter” to it.

Use ylim() to set the limits on the y-axis from -2 to 2.

# Plot 0 vs. mpg
ggplot(mtcars, aes(mpg, 0)) +
  
  # Add jitter 
  geom_point(position="jitter")

###

ggplot(mtcars, aes(mpg, 0)) +
  geom_jitter() +
  # Set the y-axis limits
  ylim(c(-2, 2))

3-Geometries

->Overplotting Scatter plots (using geom_point()) are intuitive, easily understood, and very common, but we must always consider overplotting, particularly in the following four situations:

Large datasets
Aligned values on a single axis
Low-precision data
Integer data

Overplotting 1: large datasets

Add a points layer to the base plot. Set the point transparency to 0.5. Set shape = “.”, the point size of 1 pixel.

Update the point shape to remove the line outlines by setting shape to 16.

# Plot price vs. carat, colored by clarity
plt_price_vs_carat_by_clarity <- ggplot(diamonds, aes(carat, price, color = clarity))

# Add a point layer with tiny points
plt_price_vs_carat_by_clarity + geom_point(alpha=.5, shape=".")

###

# Plot price vs. carat, colored by clarity
plt_price_vs_carat_by_clarity <- ggplot(diamonds, aes(carat, price, color = clarity))

# Set transparency to 0.5
plt_price_vs_carat_by_clarity + geom_point(alpha = 0.5, shape = 16)

Overplotting 2: Aligned values

Create a base plot plt_mpg_vs_fcyl_by_fam of fcyl by mpg, colored by fam. Add a points layer to the base plot.

Add some jittering by using position_jitter(), setting the width to 0.3.

Alternatively, use position_jitterdodge(). Set jitter.width and dodge.width to 0.3 to separate subgroups further.

# Plot base
plt_mpg_vs_fcyl_by_fam <- ggplot(mtcars, aes(fcyl, mpg, color = fam))+ geom_point()

# Default points are shown for comparison
plt_mpg_vs_fcyl_by_fam + geom_point()

###

# Plot base
plt_mpg_vs_fcyl_by_fam <- ggplot(mtcars, aes(fcyl, mpg, color = fam))

# Default points are shown for comparison
plt_mpg_vs_fcyl_by_fam + geom_point()

# Alter the point positions by jittering, width 0.3
plt_mpg_vs_fcyl_by_fam + geom_point(position = position_jitter(width = 0.3))


###

# Plot base
plt_mpg_vs_fcyl_by_fam <- ggplot(mtcars, aes(fcyl, mpg, color = fam))

# Default points are shown for comparison
plt_mpg_vs_fcyl_by_fam + geom_point()

# Now jitter and dodge the point positions
plt_mpg_vs_fcyl_by_fam + geom_point(position = position_jitterdodge(jitter.width=0.3, dodge.width=0.3))

Overplotting 3: Low-precision data

Change the points layer into a jitter layer. Reduce the jitter layer’s width by setting the width argument to 0.1.

Let’s use a different approach: Within geom_point(), set position to “jitter”.

Provide an alternative specification: Have the position argument call position_jitter() with a width of 0.1.

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  # Swap for jitter layer with width 0.1
  geom_jitter(alpha = 0.5, width=0.1)

###

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  # Set the position to jitter
  geom_point(position="jitter",alpha = 0.5)

###

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  # Use a jitter position function with width 0.1
  geom_point(alpha = 0.5, position=position_jitter(width=0.1))

Overplotting 4: Integer data

-> This can be type integer (i.e. 1 ,2, 3…) or categorical (i.e. class factor) variables. factor is just a special class of type integer.

-> You’ll typically have a small, defined number of intersections between two variables, which is similar to case 3, but you may miss it if you don’t realize that integer and factor data are the same as low precision data.

Examine the Vocab dataset using str(). Using Vocab, draw a plot of vocabulary vs education. Add a point layer.

Replace the point layer with a jitter layer.

Set the jitter transparency to 0.2.

Set the shape of the jittered points to hollow circles, (shape 1).

# Examine the structure of Vocab
str(Vocab)

# Plot vocabulary vs. education
ggplot(Vocab, aes(education, vocabulary)) +
  # Add a point layer
  geom_point()

###

ggplot(Vocab, aes(education, vocabulary)) +
  # Change to a jitter layer
  geom_jitter()

###

ggplot(Vocab, aes(education, vocabulary)) +
  # Set the transparency to 0.2
  geom_jitter(alpha=0.2)

###

ggplot(Vocab, aes(education, vocabulary)) +
  # Set the shape to 1
  geom_jitter(alpha = 0.2, shape=1)

Drawing histograms

Using mtcars, map mpg onto the x aesthetic. Add a histogram layer using geom_histogram().

Set the histogram binwidth to 1.

Map y to the internal variable ..density.. to show frequency densities.

Set the fill color of the histogram bars to datacamp_light_blue.

# Plot mpg
ggplot(mtcars, aes(mpg)) +
 
  # Add a histogram layer
  geom_histogram()

###

ggplot(mtcars, aes(mpg)) +
  # Set the binwidth to 1
  geom_histogram(binwidth=1)


###

# Map y to ..density..
ggplot(mtcars, aes(mpg, ..density..)) +
  geom_histogram(binwidth = 1)

###

datacamp_light_blue <- "#51A8C9"

ggplot(mtcars, aes(mpg, ..density..)) +
  # Set the fill color to datacamp_light_blue
  geom_histogram(binwidth = 1, fill=datacamp_light_blue)

Positions in histograms

-> It’s easier to just have notes here

-> geom_histogram(), a special case of geom_bar(), has a position argument that can take on the following values:

stack (the default): Bars for different groups are stacked on top of each other.
dodge: Bars for different groups are placed side by side.
fill: Bars for different groups are shown as proportions.
identity: Plot the values as they appear in the dataset.

For example: Update the histogram layer so bars are top of each other, using the “identity” position. So each bar can be seen, set alpha to 0.4.

ggplot(mtcars, aes(mpg, fill = fam)) +
  # Change the position to identity, with transparency 0.4
  geom_histogram(binwidth = 1, position = "identity", alpha=0.4)

Position in bar and col plots

-> Again, notes are easier

-> Position argument changes geom_bar().

-> We have three position options:

stack: The default
dodge: Preferred
fill: To show proportions

-> While we will be using geom_bar() here, note that the function geom_col() is just geom_bar() where both the position and stat arguments are set to “identity”. It is used when we want the heights of the bars to represent the exact values in the data.

For example: Change the bar position argument to “dodge”.

ggplot(mtcars, aes(fcyl, fill = fam)) +
  # Change the position to "dodge"
  geom_bar(position = "dodge")

Overlapping bar plots

-> Save position_dodge() as an object, posn_d, so that you can easily reuse it.

Use the functional form of the bar position: replace “dodge” with a call to position_dodge(). Set its width to 0.2.

Set the bar transparency level of the bars to 0.6.

ggplot(mtcars, aes(cyl, fill = fam)) +
  # Change position to use the functional form, with width 0.2
  geom_bar(position = position_dodge(width=0.2))

###

ggplot(mtcars, aes(cyl, fill = fam)) +
  # Set the transparency to 0.6
  geom_bar(position = position_dodge(width = 0.2), alpha=0.6)

Bar plots: sequential color palette

Plot the Vocab dataset, mapping education onto x and vocabulary onto fill.

Add a bar layer, setting position to “fill”.

Add a brewer fill scale, using the default palette (don’t pass any arguments). Notice how this generates a warning message and an incomplete plot.

# Plot education, filled by vocabulary
ggplot(Vocab, aes(education, fill = vocabulary)) 

###

# Plot education, filled by vocabulary
ggplot(Vocab, aes(education, fill = vocabulary)) +
  # Add a bar layer with position "fill"
  geom_bar(position="fill") 

###

# Plot education, filled by vocabulary
ggplot(Vocab, aes(education, fill = vocabulary)) +
  # Add a bar layer with position "fill"
  geom_bar(position = "fill") +
  # Add a brewer fill scale with default palette
  scale_fill_brewer()

Basic line plots

-> Skipped some steps here

Now try and plot all time series in a single plot. Plot the fish.tidy dataset, mapping Year to x and Capture to y. group by fish species within the aesthetics of geom_line().

Let’s add color to the previous plot to distinguish between the different time series. Plot the fish.tidy dataset again, this time making sure to color by Species

# Plot the Rainbow Salmon time series
ggplot(fish.species, aes(x = Year, y = Rainbow)) +
  geom_line()

# Plot the Pink Salmon time series
ggplot(fish.species, aes(x = Year, y = Pink)) +
  geom_line()

# Plot multiple time-series by grouping by species

ggplot(fish.tidy, aes(x = Year, y = Capture)) +
  geom_line(aes(group=Species))

###

# Plot the Rainbow Salmon time series
ggplot(fish.species, aes(x = Year, y = Rainbow)) +
  geom_line()

# Plot the Pink Salmon time series
ggplot(fish.species, aes(x = Year, y = Pink)) +
  geom_line()

# Plot multiple time-series by grouping by species
ggplot(fish.tidy, aes(Year, Capture)) +
  geom_line(aes(group = Species))

# Plot multiple time-series by coloring by species


ggplot(fish.tidy, aes(Year, Capture, color=Species)) +
  geom_line(aes(group = Species))

4-Themes

Here’s a link for all the things you can do with theme: Click Here!

Moving the legend

-> p + theme(legend.position = new_value) Here, the new value can be

“top”, “bottom”, “left”, or “right’”: place it at that side of the plot.
“none”: don’t draw it.
c(x, y): c(0, 0) means the bottom-left and c(1, 1) means the top-right.

For example: Position the legend inside the plot, with x-position 0.6 and y-position 0.1.

# Position the legend inside the plot at (0.6, 0.1)

plt_prop_unemployed_over_time +
  theme(legend.position = c(0.6 , 0.1))

Modifying theme elements

-> p + theme(axis.line = element_line(color = “red”, linetype = “dashed”))

Give all rectangles in the plot, (the rect element) a fill color of “grey92” (very pale grey). Remove the legend.key’s outline by setting its color to be missing.

Remove the axis ticks, axis.ticks by making them a blank element. Remove the panel gridlines, panel.grid in the same way.

Add the major horizontal grid lines back to the plot using panel.grid.major.y. Set the line color to “white”, size to 0.5, and linetype to “dotted”.

Make the axis tick labels’ text, axis.text, less prominent by changing the color to “grey25”. Increase the plot.title’s, size to 16 and change its font face to “italic”.

plt_prop_unemployed_over_time +
  theme(
    # For all rectangles, set the fill color to grey92
      rect = element_rect(fill = "grey92"),
    # For the legend key, turn off the outline
    legend.key=element_rect(color=NA)
  )


###



plt_prop_unemployed_over_time +
  theme(
    rect = element_rect(fill = "grey92"),
    legend.key = element_rect(color = NA),
    # Turn off axis ticks
    axis.ticks=element_blank(),
    # Turn off the panel grid
    panel.grid=element_blank()
  )


###


plt_prop_unemployed_over_time +
  theme(
    rect = element_rect(fill = "grey92"),
    legend.key = element_rect(color = NA),
    axis.ticks = element_blank(),
    panel.grid = element_blank(),
    # Add major y-axis panel grid lines back
    panel.grid.major.y= element_line(
      # Set the color to white
      color="white",
      # Set the size to 0.5
      size=0.5,
      # Set the line type to dotted
      linetype="dotted"
    )
  )


###


plt_prop_unemployed_over_time +
  theme(
    rect = element_rect(fill = "grey92"),
    legend.key = element_rect(color = NA),
    axis.ticks = element_blank(),
    panel.grid = element_blank(),
    panel.grid.major.y = element_line(
      color = "white",
      size = 0.5,
      linetype = "dotted"
    ),
    # Set the axis text color to grey25
    axis.text=element_text(color="grey25"),
    # Set the plot title font face to italic and font size to 16
   plot.title=element_text(size=16, face="italic")
  )

Modifying whitespace

->Whitespace means all the non-visible margins and spacing in the plot.

-> To set a single whitespace value, use unit(x, unit), where x is the amount and unit is the unit of measure.

-> Borders require you to set 4 positions, so use margin(top, right, bottom, left, unit).

Give the axis tick length, axis.ticks.length, a unit of 2 “lines”.

Give the legend key size, legend.key.size, a unit of 3 centimeters (“cm”).

Set the legend.margin to 20 points (“pt”) on the top, 30 pts on the right, 40 pts on the bottom, and 50 pts on the left.

Set the plot margin, plot.margin, to 10, 30, 50, and 70 millimeters (“mm”).

# View the original plot
plt_mpg_vs_wt_by_cyl

plt_mpg_vs_wt_by_cyl +
  theme(
    # Set the axis tick length to 2 lines
    axis.ticks.length=unit(2, "lines")
  )

###

plt_mpg_vs_wt_by_cyl +
  theme(
    # Set the legend key size to 3 centimeters
    legend.key.size=unit(3,"cm")
  )


###

plt_mpg_vs_wt_by_cyl +
  theme(
    # Set the legend margin to (20, 30, 40, 50) points
    legend.margin=margin(20,30,40,50,"pt")
  )


###

plt_mpg_vs_wt_by_cyl +
  theme(
    # Set the plot margin to (10, 30, 50, 70) millimeters
    plot.margin=margin(10, 30, 50,  70, "mm")
  )

Built-in themes and Exploring ggthemes

-> In addition to making your own themes, there are several out-of-the-box solutions that may save you lots of time.

theme_gray() is the default.
theme_bw() is useful when you use transparency.
theme_classic() is more traditional.
theme_void() removes everything but the data.

-> Outside of ggplot2, another source of built-in themes is the ggthemes package.

theme_fivethirtyeight()
theme_tufte()
theme_wsj()

Setting themes

Assign the theme to theme_recession. Add the Tufte theme and theme_recession together. Use the Tufte recession theme by adding it to the plot.

Use theme_set() to set theme_tufte_recession as the default theme. Draw the plot, plt_prop_unemployed_over_time, without explicitly adding a theme.

# Save the theme as theme_recession
theme_recession <- theme(
  rect = element_rect(fill = "grey92"),
  legend.key = element_rect(color = NA),
  axis.ticks = element_blank(),
  panel.grid = element_blank(),
  panel.grid.major.y = element_line(color = "white", size = 0.5, linetype = "dotted"),
  axis.text = element_text(color = "grey25"),
  plot.title = element_text(face = "italic", size = 16),
  legend.position = c(0.6, 0.1)
)

# Combine the Tufte theme with theme_recession
theme_tufte_recession <- theme_tufte() + theme_recession

# Add the Tufte recession theme to the plot
plt_prop_unemployed_over_time +theme_tufte_recession


###


theme_recession <- theme(
  rect = element_rect(fill = "grey92"),
  legend.key = element_rect(color = NA),
  axis.ticks = element_blank(),
  panel.grid = element_blank(),
  panel.grid.major.y = element_line(color = "white", size = 0.5, linetype = "dotted"),
  axis.text = element_text(color = "grey25"),
  plot.title = element_text(face = "italic", size = 16),
  legend.position = c(0.6, 0.1)
)
theme_tufte_recession <- theme_tufte() + theme_recession

# Set theme_tufte_recession as the default theme
theme_set(theme_tufte_recession)

# Draw the plot (without explicitly adding a theme)
plt_prop_unemployed_over_time

Using geoms for explanatory plots

geom_segment() adds line segments and requires two additional aesthetics: xend and yend. To draw a horizontal line for each point, map 30 onto xend and country onto yend.

geom_text also needs an additional aesthetic: label. Map lifeExp onto label, and set the attributes color to “white” and size to 1.5.

The color scale has been set for you, but you need to clean up the scales. For the x scale: Set expand to c(0, 0) and limits to c(30, 90). Place the axis on the top of the plot with the position argument.

Make sure to label the plot appropriately using labs(): Make the title “Highest and lowest life expectancies, 2007”. Add a reference by setting caption to “Source: gapminder”

# Add a geom_segment() layer
ggplot(gm2007, aes(x = lifeExp, y = country, color = lifeExp)) +
  geom_point(size = 4) +
  geom_segment(aes(xend = 30, yend = country), size = 2)

###

# Add a geom_text() layer
ggplot(gm2007, aes(x = lifeExp, y = country, color = lifeExp)) +
  geom_point(size = 4) +
  geom_segment(aes(xend = 30, yend = country), size = 2) +
  geom_text(aes(label = lifeExp), color = "white", size = 1.5)


###

# Set the color scale
palette <- brewer.pal(5, "RdYlBu")[-(2:4)]

# Modify the scales
ggplot(gm2007, aes(x = lifeExp, y = country, color = lifeExp)) +
  geom_point(size = 4) +
  geom_segment(aes(xend = 30, yend = country), size = 2) +
  geom_text(aes(label = round(lifeExp,1)), color = "white", size = 1.5) +
  scale_x_continuous("", expand = c(0,0), limits = c(30,90), position = "top") +
  scale_color_gradientn(colors = palette)

###

# Set the color scale
palette <- brewer.pal(5, "RdYlBu")[-(2:4)]

# Add a title and caption
ggplot(gm2007, aes(x = lifeExp, y = country, color = lifeExp)) +
  geom_point(size = 4) +
  geom_segment(aes(xend = 30, yend = country), size = 2) +
  geom_text(aes(label = round(lifeExp,1)), color = "white", size = 1.5) +
  scale_x_continuous("", expand = c(0,0), limits = c(30,90), position = "top") +
  scale_color_gradientn(colors = palette) +
  labs(title="Highest and lowest life expectancies, 2007", caption="Source: gapminder")

Using annotate() for embellishments

Clean up the theme: Add a classic theme to the plot with theme_classic(). Set axis.line.y, axis.ticks.y, and axis.title to element_blank(). Set the axis.text color to “black”. Remove the legend by setting legend.position to “none”

Use geom_vline() to add a vertical line. Set xintercept to global_mean, specify the color to be “grey40”, and set linetype to 3.

x_start and y_start will be used as positions to place text and have been calculated for you. Add a “text” geom as an annotation. For the annotation, set x to x_start, y to y_start, and label to “The”.

Annotate the plot with an arrow connecting your text to the line. Use a “curve” geom. Set the arrow ends xend to x_end and yend to y_end. Set the length of the arrowhead to 0.2 cm and the type to “closed”

# Define the theme
plt_country_vs_lifeExp +
  theme_classic() +
  theme(axis.line.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.text = element_text(color="black"),
        axis.title = element_blank(),
        legend.position = "none")

###

# Add a vertical line
plt_country_vs_lifeExp +
  step_1_themes +
  geom_vline(xintercept=global_mean, color="grey40", linetype=3)

###

# Add text
plt_country_vs_lifeExp +
  step_1_themes +
  geom_vline(xintercept = global_mean, color = "grey40", linetype = 3) +
  annotate(
    "text",
    x = x_start, y = y_start,
    label = "The\nglobal\naverage",
    vjust = 1, size = 3, color = "grey40"
  )


###

# Add a curve
plt_country_vs_lifeExp +  
  step_1_themes +
  geom_vline(xintercept = global_mean, color = "grey40", linetype = 3) +
  step_3_annotation +
  annotate(
    "curve",
    x = x_start, y = y_start,
    xend = x_end, yend = y_end,
    arrow = arrow(length = unit( 0.2, "cm"), type = "closed"),
    color = "grey40"
  )

Intermediate Data Visualization with ggplot2

1-Statistics

Smoothing

Look at the structure of mtcars. Using mtcars, draw a scatter plot of mpg vs. wt.

Update the plot to add a smooth trend line. Use the default method, which uses the LOESS model to fit the curve.

Update the smooth layer. Apply a linear model by setting method to “lm”, and turn off the model’s 95% confidence interval (the ribbon) by setting se to FALSE.

Draw the same plot again, swapping geom_smooth() for stat_smooth().

# View the structure of mtcars
str(mtcars)

# Using mtcars, draw a scatter plot of mpg vs. wt
ggplot(mtcars, aes(wt,mpg))+
    geom_point()

###

# Amend the plot to add a smooth layer
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point()+
  geom_smooth()

###

# Amend the plot. Use lin. reg. smoothing; turn off std err ribbon
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

###

# Amend the plot. Swap geom_smooth() for stat_smooth().
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE)

Grouping variables

Using mtcars, plot mpg vs. wt, colored by fcyl. Add a point layer. Add a smooth stat using a linear model, and don’t show the se ribbon.

Update the plot to add a second smooth stat. Add a dummy group aesthetic to this layer, setting the value to 1. Use the same method and se values as the first stat smooth layer.

# Using mtcars, plot mpg vs. wt, colored by fcyl
ggplot(mtcars, aes(wt, mpg, color=fcyl)) +
  # Add a point layer
  geom_point() +
  # Add a smooth lin reg stat, no ribbon
  stat_smooth(method="lm", se=FALSE)

###

# Amend the plot to add another smooth layer with dummy grouping
ggplot(mtcars, aes(x = wt, y = mpg, color = fcyl)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE)+
  stat_smooth(aes(group=1), method = "lm", se = FALSE)

Modifying stat_smooth

Standard error ribbons show the 95% confidence interval of smoothing models

Explore the effect of the span argument on LOESS curves. Add three smooth LOESS stats, each without the standard error ribbon. Color the 1st one “red”; set its span to 0.9. Color the 2nd one “green”; set its span to 0.6. Color the 3rd one “blue”; set its span to 0.3.

Compare LOESS and linear regression smoothing on small regions of data. Add a smooth LOESS stat, without the standard error ribbon. Add a smooth linear regression stat, again without the standard error ribbon.

LOESS isn’t great on very short sections of data; compare the pieces of linear regression to LOESS over the whole thing. Amend the smooth LOESS stat to map color to a dummy variable, “All”.

Using Vocab, plot vocabulary vs. education, colored by year_group. Use geom_jitter() to add jittered points with transparency 0.25. Add a smooth linear regression stat (with the standard error ribbon).

It’s easier to read the plot if the standard error ribbons match the lines, and the lines have more emphasis. Update the smooth stat. Map the fill color to year_group. Set the line size to 2.

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  # Add 3 smooth LOESS stats, varying span & color
  stat_smooth(color="red", span=0.9, se=FALSE)+
  stat_smooth(color="green", span=0.6,se=FALSE)+
  stat_smooth(color="blue", span=0.3,se=FALSE)

###

# Amend the plot to color by fcyl
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  # Add a smooth LOESS stat, no ribbon
   stat_smooth(se=FALSE)+
  # Add a smooth lin. reg. stat, no ribbon
  stat_smooth(method="lm",se=FALSE)

###

# Amend the plot
ggplot(mtcars, aes(x = wt, y = mpg, color = fcyl)) +
  geom_point() +
  # Map color to dummy variable "All"
  stat_smooth(se = FALSE, aes(color="All")) +
  stat_smooth(method = "lm", se = FALSE)

###

# Using Vocab, plot vocabulary vs. education, colored by year group
ggplot(Vocab, aes(x = education, y = vocabulary, color = year_group)) +
  # Add jittered points with transparency 0.25
  geom_jitter(alpha=0.25) +
  # Add a smooth lin. reg. line (with ribbon)
  stat_smooth(method="lm", se=FALSE)

###

# Amend the plot
ggplot(Vocab, aes(x = education, y = vocabulary, color = year_group)) +
  geom_jitter(alpha = 0.25) +
  # Map the fill color to year_group, set the line size to 2
  stat_smooth(method = "lm", aes(fill=year_group), size=2)

Quantiles

Update the plot to add a quantile regression stat, at quantiles 0.05, 0.5, and 0.95.

Amend the plot to color according to year_group.

ggplot(Vocab, aes(x = education, y = vocabulary)) +
  geom_jitter(alpha = 0.25) +
  # Add a quantile stat, at 0.05, 0.5, and 0.95
  stat_quantile(quantiles=c(0.05, 0.5,  0.95))

###

# Amend the plot to color by year_group
ggplot(Vocab, aes(x = education, y = vocabulary, color=year_group)) +
  geom_jitter(alpha = 0.25) +
  stat_quantile(quantiles = c(0.05, 0.5, 0.95))

Using stat_sum

Run the code to see how jittering & transparency solves overplotting. Replace the jittered points with a sum stat, using stat_sum()

Modify the size aesthetic with the appropriate scale function. Add a scale_size() function to set the range from 1 to 10.

Inside stat_sum(), set size to ..prop.. so circle size represents the proportion of the whole dataset.

Update the plot to group by education, so that circle size represents the proportion of the group.

# Run this, look at the plot, then update it
ggplot(Vocab, aes(x = education, y = vocabulary)) +
  # Replace this with a sum stat
  stat_sum(alpha = 0.25)

###

ggplot(Vocab, aes(x = education, y = vocabulary)) +
  stat_sum() +
  # Add a size scale, from 1 to 10
  scale_size(range=c(1,10))

###

# Amend the stat to use proportion sizes
ggplot(Vocab, aes(x = education, y = vocabulary)) +
  stat_sum(aes(size = ..prop..))

###

# Amend the plot to group by education
ggplot(Vocab, aes(x = education, y = vocabulary, group = education)) +
  stat_sum(aes(size = ..prop..))

Preparations

Using these three functions, define these position objects: posn_j: will jitter with a width of 0.2. posn_d: will dodge with a width of 0.1. posn_jd will jitter and dodge with a jitter.width of 0.2 and a dodge.width of 0.1.

Plot wt vs. fcyl, colored by fam. Assign this base layer to p_wt_vs_fcyl_by_fam. Plot the data using geom_point().

# Define position objects
# 1. Jitter with width 0.2
posn_j <- position_jitter(width=0.2)

# 2. Dodge with width 0.1
posn_d <- position_dodge(width=0.1)

# 3. Jitter-dodge with jitter.width 0.2 and dodge.width 0.1
posn_jd <- position_jitterdodge(jitter.width = 0.2, dodge.width=0.1)


###

# From previous step
posn_j <- position_jitter(width = 0.2)
posn_d <- position_dodge(width = 0.1)
posn_jd <- position_jitterdodge(jitter.width = 0.2, dodge.width = 0.1)

# Create the plot base: wt vs. fcyl, colored by fam
p_wt_vs_fcyl_by_fam <- ggplot(mtcars, aes(fcyl, wt, color=fam))

# Add a point layer
p_wt_vs_fcyl_by_fam +
  geom_point()

Using position objects

Apply the jitter position, posn_j, to the base plot.

Apply the dodge position, posn_d, to the base plot.

Apply the jitter-dodge position, posn_jd, to the base plot.

# Add jittering only
p_wt_vs_fcyl_by_fam +
  geom_point(position=posn_j)

###

# Add dodging only
p_wt_vs_fcyl_by_fam +
  geom_point(position=posn_d)

###

# Add jittering and dodging
p_wt_vs_fcyl_by_fam +

  geom_point(position=posn_jd)

Plotting variations

Add error bars representing the standard deviation. Set the data function to mean_sdl (without parentheses). Draw 1 standard deviation each side of the mean, pass arguments to the mean_sdl() function by assigning them to fun.args in the form of a list. Use posn_d to set the position.

The default geom for stat_summary() is “pointrange” which is already great. Update the summary stat to use an “errorbar” geom by assigning it to the geom argument.

Update the plot to add a summary stat of 95% confidence limits. Set the data function to mean_cl_normal (without parentheses). Again, use the dodge position.

p_wt_vs_fcyl_by_fam_jit +
  # Add a summary stat of std deviation limits
  stat_summary(position= posn_d, fun.data=mean_sdl,fun.args=list(mult=1))

###

p_wt_vs_fcyl_by_fam_jit +
  # Change the geom to be an errorbar
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), position = posn_d, geom="errorbar")


###

p_wt_vs_fcyl_by_fam_jit +
  # Add a summary stat of normal confidence limits
    stat_summary(position= posn_d, fun.data=mean_cl_normal)

2-Coordinates

Zooming In

Update the plot by adding (+) a continuous x scale with limits from 3 to 6. Spoiler: this will cause a problem!

Update the plot by adding a Cartesian coordinate system with x limits, xlim, from 3 to 6.

# Run the code, view the plot, then update it
ggplot(mtcars, aes(x = wt, y = hp, color = fam)) +
  geom_point() +
  geom_smooth() +
  # Add a continuous x scale from 3 to 6
  scale_x_continuous(limits=c(3,6))

### 

ggplot(mtcars, aes(x = wt, y = hp, color = fam)) +
  geom_point() +
  geom_smooth() +
  # Add Cartesian coordinates with x limits from 3 to 6
  coord_cartesian(xlim=c(3,6))

Aspect ratio I: 1:1 ratios

Add a fixed coordinate layer to force a 1:1 aspect ratio.

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_jitter() +
  geom_smooth(method = "lm", se = FALSE) +
  # Fix the coordinate ratio
  coord_fixed(1)

Aspect ratio II: setting ratios

Fix the coordinates to a 1:1 aspect ratio.

The y axis is now unreadably small. Make it bigger! Change the aspect ratio to 20:1. This is the aspect ratio recommended by Cleveland to help make the trend among oscillations easiest to see.

# Fix the aspect ratio to 1:1
sun_plot +
coord_fixed(1)

###

# Change the aspect ratio to 20:1
sun_plot +
  coord_fixed(ratio=20)

Expand and clip

Add Cartesian coordinates with zero expansion, to remove all buffer margins on both the x and y axes.

Setting expand to 0 caused points at the edge of the plot panel to be cut off. Set the clip argument to “off” to prevent this. Remove the axis lines by setting the axis.line argument to element_blank() in the theme() layer function.

ggplot(mtcars, aes(wt, mpg)) +
  geom_point(size = 2) +
  # Add Cartesian coordinates with zero expansion
  coord_cartesian(expand=0) +
  theme_classic()

###

ggplot(mtcars, aes(wt, mpg)) +
  geom_point(size = 2) +
  # Turn clipping off
  coord_cartesian(expand = 0, clip="off") +
  theme_classic() +
  # Remove axis lines
  theme(
    axis.line=element_blank()
  )

Log-transforming scales

Using the msleep dataset, plot the raw values of brainwt against bodywt values as a scatter plot.

Add the scale_x_log10() and scale_y_log10() layers with default values to transform the data before plotting.

Use coord_trans() to apply a “log10” transformation to both the x and y scales.

# Produce a scatter plot of brainwt vs. bodywt
ggplot(msleep, aes(bodywt, brainwt)) +
  geom_point() +
  ggtitle("Raw Values")

###

# Add scale_*_*() functions
ggplot(msleep, aes(bodywt, brainwt)) +
  geom_point() +
  scale_x_log10() +
  scale_y_log10() +
  ggtitle("Scale_ functions")

###

# Perform a log10 coordinate system transformation
ggplot(msleep, aes(bodywt, brainwt)) +
  geom_point() +
  coord_trans(x="log10", y="log10")

Adding stats to transformed scales

Add log10 transformed scales to the x and y axes.

Add a log10 coordinate transformation for both the x and y axes. Do you notice the difference between the two plots?

# Plot with a scale_*_*() function:
ggplot(msleep, aes(bodywt, brainwt)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  # Add a log10 x scale
  scale_x_log10() +
  # Add a log10 y scale
  scale_y_log10() +
  ggtitle("Scale functions")

###

# Plot with transformed coordinates
ggplot(msleep, aes(bodywt, brainwt)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  # Add a log10 coordinate transformation for x and y axes
  coord_trans(x="log10", y="log10")

Useful double axes

Begin with a standard line plot, of Temp described by Date in the airquality dataset.

Convert y_breaks from Fahrenheit to Celsius (subtract 32, then multiply by 5, then divide by 9). Define the secondary y-axis using sec_axis(). Use the identity transformation. Set the breaks and labels to the defined objects y_breaks and y_labels, respectively.

# Using airquality, plot Temp vs. Date
ggplot(airquality, aes(Date, Temp)) +
  # Add a line layer
  geom_line() +
  labs(x = "Date (1973)", y = "Fahrenheit")

###

# Define breaks (Fahrenheit)
y_breaks <- c(59, 68, 77, 86, 95, 104)

# Convert y_breaks from Fahrenheit to Celsius
y_labels <- ((y_breaks-32)*5)/9

# Create a secondary x-axis
secondary_y_axis <- sec_axis(
  # Use identity transformation
  trans = identity,
  name = "Celsius",
  # Define breaks and labels as above
  breaks = y_breaks,
  labels = y_labels
)

# Examine the object
secondary_y_axis

Flipping axes I and II

Create a side-by-side (“dodged”) bar chart of fam, filled according to fcyl.

To get horizontal bars, add a coord_flip() function.

Partially overlapping bars are popular with “infoviz” in magazines. Update the position argument to use position_dodge() with a width of 0.5.

Create a scatter plot of wt versus car using the mtcars dataset. We’ll flip the axes in the next step.

It would be easier to read if car was mapped to the y axis. Flip the coordinates. Notice that the labels also get flipped!

# Plot fcyl bars, filled by fam
ggplot(mtcars, aes(fcyl, fill = fam)) +
  # Place bars side by side
  geom_bar(position = "dodge")

###

ggplot(mtcars, aes(fcyl, fill = fam)) +
  geom_bar(position = "dodge") +
  # Flip the x and y coordinates
  coord_flip()

###

ggplot(mtcars, aes(fcyl, fill = fam)) +
  # Set a dodge width of 0.5 for partially overlapping bars
  geom_bar(position = position_dodge(width=0.5)) +
  coord_flip()

###

# Plot of wt vs. car
ggplot(mtcars, aes(car,wt)) +
  # Add a point layer
  geom_point() +
  labs(x = "car", y = "weight")

###

# Flip the axes to set car to the y axis
ggplot(mtcars, aes(car, wt)) +
  geom_point() +
  labs(x = "car", y = "weight") +
  coord_flip()

Pie charts

Run the code to see the stacked bar plot. Add (+) a polar coordinate system, mapping the angle to the y variable by setting theta to “y”.

Reduce the width of the bars to 0.1. Make it a ring plot by adding a continuous x scale with limits from 0.5 to 1.5.

# Run the code, view the plot, then update it
ggplot(mtcars, aes(x = 1, fill = fcyl)) +
  geom_bar()+
  # Add a polar coordinate system
  coord_polar(theta="y")

###

ggplot(mtcars, aes(x = 1, fill = fcyl)) +
  # Reduce the bar width to 0.1
  geom_bar(width=0.1) +
  coord_polar(theta = "y") +
  # Add a continuous x scale from 0.5 to 1.5
  scale_x_continuous(limits=c(0.5,1.5))

Wind rose plots

Make a classic bar plot mapping wd onto the x aesthetic and ws onto fill. Use a geom_bar() layer, since we want to aggregate over all date values, and set the width argument to 1, to eliminate any spaces between the bars.

Convert the Cartesian coordinate space into a polar coordinate space with coord_polar().

Set the start argument to -pi/16 to position North at the top of the plot.

# Using wind, plot wd filled by ws
ggplot(wind, aes(x=wd, fill=ws)) +
  # Add a bar layer with width 1
  geom_bar(width=1)

###

# Convert to polar coordinates:
ggplot(wind, aes(wd, fill = ws)) +
  geom_bar(width = 1) +
  coord_polar()

###

# Convert to polar coordinates:
ggplot(wind, aes(wd, fill = ws)) +
  geom_bar(width = 1) +
  coord_polar(start = -pi/16)

3-Facets

Facet layer basics

Facet the plot in a grid, with each am value in its own row.

Facet the plot in a grid, with each cyl value in its own column.

Facet the plot in a grid, with each am value in its own row and each cyl value in its own column.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point() +
  # Facet rows by am
  facet_grid(rows=vars(am))

###

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point() +
  # Facet columns by cyl
  facet_grid(col=vars(cyl))

###

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point() +
  # Facet rows by am and columns by cyl
  facet_grid(row=vars(am), col=vars(cyl))

Many variables

Map fcyl_fam onto the a color aesthetic. Add a scale_color_brewer() layer and set “Paired” as the palette.

Map disp, the displacement volume from each cylinder, onto the size aesthetic.

Add a facet_grid() layer, faceting the plot according to gear on rows and vs on columns.

# See the interaction column
mtcars$fcyl_fam

# Color the points by fcyl_fam
ggplot(mtcars, aes(x = wt, y = mpg, color = fcyl_fam)) +
  geom_point() +
  # Use a paired color palette
  scale_color_brewer(palette = "Paired")

###

# Update the plot to map disp to size
ggplot(mtcars, aes(x = wt, y = mpg, color = fcyl_fam, size=disp)) +
  geom_point() +
  scale_color_brewer(palette = "Paired")

###

# Update the plot
ggplot(mtcars, aes(x = wt, y = mpg, color = fcyl_fam, size = disp)) +
  geom_point() +
  scale_color_brewer(palette = "Paired") +
  # Grid facet on gear and vs
  facet_grid(rows = vars(gear), cols = vars(vs))

Formula notation

I found the table they included was more helpful than the tasks? I’ll include the tasks below, just in case.

Modern notation	Formula notation
facet_grid(rows = vars(A))	facet_grid(A ~ .)
facet_grid(cols = vars(B))	facet_grid(. ~ B)
facet_grid(rows = vars(A), cols = vars(B))	facet_grid(A ~ B)

Facet the plot in a grid, with each am value in its own row.

Facet the plot in a grid, with each cyl value in its own column.

Facet the plot in a grid, with each am value in its own row and each cyl value in its own column.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point() +
  # Facet rows by am using formula notation
  facet_grid(am~.)

###

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point() +
  # Facet columns by cyl using formula notation
  facet_grid(.~cyl)

###

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point() +
  # Facet rows by am and columns by cyl using formula notation
  facet_grid(am~cyl)

Labeling facets

Add a facet_grid() layer and facet cols according to the cyl using vars(). There is no labeling.

Apply label_both to the labeller argument and check the output.

Apply label_context to the labeller argument and check the output.

In addition to label_context, let’s facet by one more variable: vs.

# Plot wt by mpg
ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  # The default is label_value
  facet_grid(cols = vars(cyl))

###

# Plot wt by mpg
ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  # Displaying both the values and the variables
  facet_grid(cols = vars(cyl), labeller = label_both)

###

# Plot wt by mpg
ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  # Label context
  facet_grid(cols = vars(cyl), labeller = label_context)

###

# Plot wt by mpg
ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  # Two variables
  facet_grid(cols = vars(vs, cyl), labeller = label_context)

Setting order

Explicitly label the 0 and 1 values of the am column as “automatic” and “manual”, respectively.

Define a specific order using separate levels and labels arguments. Recall that 1 is “manual” and 0 is “automatic”.

# Make factor, set proper labels explictly
mtcars$fam <- factor(mtcars$am, labels = c(`0` = "automatic",
                                           `1` = "manual"))

# Default order is alphabetical
ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  facet_grid(cols = vars(fam))

###

# Make factor, set proper labels explictly, and
# manually set the label order
mtcars$fam <- factor(mtcars$am,
                     levels = c(1, 0),
                     labels = c("manual", "automatic"))

# View again
ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  facet_grid(cols = vars(fam))

Variable plotting spaces I: continuous variables

Update the plot to facet columns by cyl.

Update the faceting to free the x-axis scales.

Facet rows by cyl (rather than columns).

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() + 
  # Facet columns by cyl 
  facet_grid(cols=vars(cyl))

###

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() + 
  # Update the faceting to free the x-axis scales
  facet_grid(cols = vars(cyl), scales="free_x")

###

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() + 
  # Swap cols for rows; free the y-axis scales
  facet_grid(rows = vars(cyl), scales = "free_y")

Variable plotting spaces II: categorical variables

Facet the plot by rows according to gear using vars(). Notice that every car is listed in every facet, resulting in many lines without data.

To remove blank lines, set the scales and space arguments in facet_grid() to free_y.

ggplot(mtcars, aes(x = mpg, y = car, color = fam)) +
  geom_point() +
  # Facet rows by gear
  facet_grid(rows=vars(gear))

###

ggplot(mtcars, aes(x = mpg, y = car, color = fam)) +
  geom_point() +
  # Free the y scales and space
  facet_grid(rows = vars(gear), scales="free_y", space= "free_y")

Wrapping for many levels

Add a facet_wrap() layer and specify: The year variable with an argument using the vars() function,

Add a facet_wrap() layer and specify the year variable with a formula notation (~).

Add a facet_wrap() layer and specify: Formula notation as before, and ncol set to 11.

ggplot(Vocab, aes(x = education, y = vocabulary)) +
  stat_smooth(method = "lm", se = FALSE) +
  # Create facets, wrapping by year, using vars()
  facet_wrap(vars(year))

###

ggplot(Vocab, aes(x = education, y = vocabulary)) +
  stat_smooth(method = "lm", se = FALSE) +
  # Create facets, wrapping by year, using a formula
  facet_wrap(~ year)

###

ggplot(Vocab, aes(x = education, y = vocabulary)) +
  stat_smooth(method = "lm", se = FALSE) +
  # Update the facet layout, using 11 columns
  facet_wrap(~ year, ncol=11)

Margin plots

Update the plot to facet the rows by fvs and fam, and columns by gear.

Add all possible margins to the plot.

Update the facets to only show margins on “fam”.

Update the facets to only show margins on “gear” and “fvs”.

ggplot(mtcars, aes(x = wt, y = mpg)) + 
  geom_point() +
  # Facet rows by fvs and fam, and cols by gear
  facet_grid(rows = vars(fvs, fam), cols = vars(gear))

###

ggplot(mtcars, aes(x = wt, y = mpg)) + 
  geom_point() +
  # Update the facets to add margins
  facet_grid(rows = vars(fvs, fam), cols = vars(gear), margins = TRUE)

###

ggplot(mtcars, aes(x = wt, y = mpg)) + 
  geom_point() +
  # Update the facets to only show margins on fam
  facet_grid(rows = vars(fvs, fam), cols = vars(gear), margins = "fam")

###

ggplot(mtcars, aes(x = wt, y = mpg)) + 
  geom_point() +
  # Update the facets to only show margins on gear and fvs
  facet_grid(rows = vars(fvs, fam), cols = vars(gear), margins = c("gear" , "fvs"))

4-Best Practices

Bar plots: dynamite plots

Using mtcars,, plot wt versus fcyl. Add a bar summary stat, aggregating the wts by their mean, filling the bars in a skyblue color. Add an errorbar summary stat, aggregating the wts by mean_sdl.

# Plot wt vs. fcyl
ggplot(mtcars, aes(x =    , y =    )) +
  # Add a bar summary stat of means, colored skyblue
  stat_summary(fun.y =    , geom = "   ", fill = "   ") +
  # Add an errorbar summary stat std deviation limits
  stat_summary(fun.data =    , fun.args = list(mult = 1), geom = "   ", width = 0.1)

Bar plots: position dodging

Add two more aesthetics so the bars are colored and filled by fam.

The stacked bars are tricky to interpret. Make them transparent and side-by-side. Make the bar summary statistic transparent by setting alpha to 0.5. For each of the summary statistics, set the bars’ position to “dodge”.

The error bars are incorrectly positioned. Use a position object. Define a dodge position object with width 0.9, assigned to posn_d. For each of the summary statistics, set the bars’ position to posn_d.

# Update the aesthetics to color and fill by fam
ggplot(mtcars, aes(x = fcyl, y = wt, color=fam, fill=fam)) +
  stat_summary(fun.y = mean, geom = "bar") +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.1)

###

# Set alpha for the first and set position for each stat summary function
ggplot(mtcars, aes(x = fcyl, y = wt, color = fam, fill = fam)) +
  stat_summary(fun.y = mean, geom = "bar", alpha = 0.5, position= "dodge") +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", position= "dodge", width = 0.1)

###

# Define a dodge position object with width 0.9
posn_d <- position_dodge(width=0.9)

# For each summary stat, update the position to posn_d
ggplot(mtcars, aes(x = fcyl, y = wt, color = fam, fill = fam)) +
  stat_summary(fun.y = mean, geom = "bar", position = posn_d, alpha = 0.5) +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), width = 0.1, position = posn_d, geom = "errorbar")

Bar plots: Using aggregated data

Draw a bar plot with geom_bar(). Using mtcars_by_cyl, plot mean_wt versus cyl. Add a bar layer, with stat set to “identity” an fill-color “skyblue”.

Draw the same plot with geom_col(). Replace geom_bar() with geom_col(). Remove the stat argument.

Change the bar widths to reflect the proportion of data they contain. Add a width aesthetic to geom_col(), set to prop. (Ignore the warning from ggplot2.)

Add geom_errorbar(). Set the ymin aesthetic to mean_wt minus sd_wt. Set the ymax aesthetic to the mean weight plus the standard deviation of the weight. Set the width to 0.1.

# Using mtcars_cyl, plot mean_wt vs. cyl
ggplot(mtcars_by_cyl, aes(cyl, mean_wt)) +
  # Add a bar layer with identity stat, filled skyblue
  geom_bar(stat="identity", fill="skyblue")

###

ggplot(mtcars_by_cyl, aes(x = cyl, y = mean_wt)) +
  # Swap geom_bar() for geom_col()
  geom_col( fill = "skyblue")

###

ggplot(mtcars_by_cyl, aes(x = cyl, y = mean_wt)) +
  # Set the width aesthetic to prop
  geom_col(aes(width = prop), fill = "skyblue")

###

ggplot(mtcars_by_cyl, aes(x = cyl, y = mean_wt)) +
  geom_col(aes(width = prop), fill = "skyblue") +
  # Add an errorbar layer
  geom_errorbar(
    # ... at mean weight plus or minus 1 std dev
    aes(ymin=mean_wt - sd_wt, ymax=mean_wt+sd_wt),
    # with width 0.1
    width=0.1
   )

Heat maps

Using barley, plot variety versus year, filled by yield. Add a geom_tile() layer.

Add a facet_wrap() function with facets as vars(site) and ncol = 1. Strip names will be above the panels, not to the side (as with facet_grid()). Give the heat maps a 2-color palette using scale_fill_gradient(). Set low and high to “white” and “red”, respectively.

A color palette of 9 reds, made with brewer.pal(), is provided as red_brewer_palette. Update the fill scale to use an n-color gradient with scale_fill_gradientn() (note the n). Set the scale colors to the red brewer palette.

# Using barley, plot variety vs. year, filled by yield
ggplot(barley, aes(year, variety, fill=yield)) +
  # Add a tile geom
  geom_tile()

###

# Previously defined
ggplot(barley, aes(x = year, y = variety, fill = yield)) +
  geom_tile() + 
  # Facet, wrapping by site, with 1 column
  facet_wrap(facets = vars(site), ncol = 1) +
  # Add a fill scale using an 2-color gradient
  scale_fill_gradient(low = "white", high = "red")

###

# A palette of 9 reds
red_brewer_palette <- brewer.pal(9, "Reds")

# Update the plot
ggplot(barley, aes(x = year, y = variety, fill = yield)) +
  geom_tile() + 
  facet_wrap(facets = vars(site), ncol = 1) +
  # Update scale to use n-colors from red_brewer_palette
  scale_fill_gradientn(colors=red_brewer_palette)

Heat map alternatives

Using barley, plot yield versus year, colored and grouped by variety. Add a line layer. Facet, wrapping by site, with 1 row.

Display only means and ribbons for spread. Map site onto color, group and fill. Add a stat_summary() layer. set fun.y = mean, and geom = “line”. In the second stat_summary(), set geom = “ribbon”, color = NA and alpha = 0.1

# The heat map we want to replace
# Don't remove, it's here to help you!
ggplot(barley, aes(x = year, y = variety, fill = yield)) +
  geom_tile() +
  facet_wrap( ~ site, ncol = 1) +
  scale_fill_gradientn(colors = brewer.pal(9, "Reds"))

# Using barley, plot yield vs. year, colored and grouped by variety
ggplot(barley, aes(x = year, y = yield, color=variety, group = variety))  +
  # Add a line layer
  geom_line() +
  # Facet, wrapping by site, with 1 row
  facet_wrap( ~ site, nrow = 1)

###


# Using barely, plot yield vs. year, colored, grouped, and filled by site
ggplot(barley, aes(x = year, y = yield, color = site, group = site, fill = site)) +
  # Add a line summary stat aggregated by mean
  stat_summary(fun.y = mean, geom = "line") +
  # Add a ribbon summary stat with 10% opacity, no color
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "ribbon", alpha = 0.1, color = NA)

Typical problems

The first plot contains purposely illegible labels. It’s a common problem that can occur when resizing plots. There is also too much non-data ink. Change theme_gray(3) to theme_classic().

Our previous plot still has a major problem, dose is stored as a factor variable. That’s why the spacing is off between the levels. Use as.character() wrapped in as.numeric() to convert the factor variable to real (continuous) numbers.

Use the appropriate geometry for the data: In the new stat_summary() function, set fun.y to to calculate the mean and the geom to a “line” to connect the points at their mean values.

Make sure the labels are informative: Add the units “(mg/day)” and “(mean, standard deviation)” to the x and y labels, respectively. Use the “Set1” palette. Set the legend labels to “Orange juice” and “Ascorbic acid”.

# Initial plot
growth_by_dose <- ggplot(TG, aes(dose, len, color = supp)) +
  stat_summary(fun.data = mean_sdl,
               fun.args = list(mult = 1),
               position = position_dodge(0.1)) +
  theme_classic()

# View plot
growth_by_dose


###


# Change type
TG$dose <- as.numeric(as.character(TG$dose))

# Plot
growth_by_dose <- ggplot(TG, aes(dose, len, color = supp)) +
  stat_summary(fun.data = mean_sdl,
               fun.args = list(mult = 1),
               position = position_dodge(0.2)) +
  theme_classic()

# View plot
growth_by_dose


###


# Change type
TG$dose <- as.numeric(as.character(TG$dose))

# Plot
growth_by_dose <- ggplot(TG, aes(dose, len, color = supp)) +
  stat_summary(fun.data = mean_sdl,
               fun.args = list(mult = 1),
               position = position_dodge(0.2)) +
  # Use the right geometry
  stat_summary(fun.y = mean,
               geom = "line",
               position = position_dodge(0.1)) +
  theme_classic()

# View plot
growth_by_dose


###


# Change type
TG$dose <- as.numeric(as.character(TG$dose))

# Plot
growth_by_dose <- ggplot(TG, aes(dose, len, color = supp)) +
  stat_summary(fun.data = mean_sdl,
               fun.args = list(mult = 1),
               position = position_dodge(0.2)) +
  stat_summary(fun.y = mean,
               geom = "line",
               position = position_dodge(0.1)) +
  theme_classic() +
  # Adjust labels and colors:
  labs(x = "Dose (mg/day)", y = "Odontoblasts length (mean, standard deviation)", color = "Supplement") +
  scale_color_brewer(palette = "Set1", labels = c("Orange juice" , "Ascorbic acid")) +
  scale_y_continuous(limits = c(0,35), breaks = seq(0, 35, 5), expand = c(0,0))

# View plot
growth_by_dose

Datacamp for 502

Elana Greenberg

10/4/2021