Introduction to ggplot2: Part 2 (Bars)

Content Reference:

This lab reference practice problems from “R for Data Science” - Chapter 3: Data Visualisation https://r4ds.had.co.nz/data-visualisation.html

In this lab we will discuss and apply:

Position Adjustments (for bars)
Geometric Objects

Example 1: Diamonds

First, call the tidyverse package

library(tidyverse)

The diamonds dataset is built into the ggplot2 package.

Prices of over 50,000 round cut diamonds

Description: A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows:

?diamonds

A. Learn About the Data

What kinds of variables are we working with?

# look at the structure of the data

B. Basic Bar Graph

Let’s start with a basic bar graph. This is a univariate graphical tool for the relative frequency of categorical data.

ggplot(data=diamonds)+
  geom_bar(aes(x=cut))

1. Playing with Color vs Fill

Color

# Apply cut to the color aesthetic

Does the output of the graphic above match what you imagined? Why or why not?

Fill

Now try changing the aesthetic to fill.

# Apply cut to the fill aesthetic

Does this match what you were hoping for?

2. Color Palette

What do you notice about this color palette?

Hint: How is it different than the following example using the mpg dataset:

ggplot(data=mpg)+
  geom_bar(aes(x=class, fill=class))

C. Position Adjustment

1. Stacked Bar Graphs

By default, when we use a different variable to add color to a bar graph than the frequency variable, R creates a stacked bar graph

# Look at the frequency of cut 
# Apply clarity to the fill aesthetic

When would it be useful to use a graphic like this?
When might it not be a good choice? Why?

2. Filled Bar Graphs

When we are interested in comparing proportions across, we can use the `position=“fill”’ argument.

# add position="fill" to your graph above

3. Side-by-side Bar Graphs

When we are interested in comparing counts across groups, we can use the `position=“dodge”’ argument.

# add position="dodge" to your graph above

When would it be useful to use a graphic like this?
When might it not be a good choice? Why?

Example 2: FiveThiryEight

A. Read the Article

“Voter Registrations Are Way, Way Down During the Pandemic” (Jun 26, 2020) by Kaleigh Rogers and Nathaniel Rakich

https://fivethirtyeight.com/features/voter-registrations-are-way-way-down-during-the-pandemic/

B. Discuss in Small Groups

How are graphics used to tell the author’s story?
What geometries are used?

C. The Data

What does the raw data look like?

# Import data
vreg<-read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/voter-registration/new-voter-registrations.csv",
               header=TRUE)


head(vreg)

##   Jurisdiction Year Month New.registered.voters
## 1      Arizona 2016   Jan                 25852
## 2      Arizona 2016   Feb                 51155
## 3      Arizona 2016   Mar                 48614
## 4      Arizona 2016   Apr                 30668
## 5      Arizona 2020   Jan                 33229
## 6      Arizona 2020   Feb                 50853

D. Processing the Data

Relevel

Relevel the data so that its in the right order:

# Level the Month variable so that its in the right order (ie not alphabetical)
vreg$Month<-factor(vreg$Month,
                   levels=c("Jan", "Feb", "Mar", "Apr", "May"))

Tidy

### USE spread() FROM tidyr
vregYear<-vreg%>%
  spread(Year, New.registered.voters)

### RENAME THE COLUMNS
colnames(vregYear)<-c("Jurisdiction", "Month", "Y2016", "Y2020")

Mutate

Add change in registration.

### mutate() FROM dplyr()
vregChange<-vregYear%>%
  mutate(change=Y2020-Y2016)

E. Recreate the graphic

# type code/answer here

Other hints

You can add another column to define color
Pay careful attention to the axes, you might want to read the help file for facet_wrap