0. Publishing your project to RPubs.

  1. After creating your project handout, knitted as a pdf, and submitted to Moodle, re-knit the file to html format.

  2. In the upper right corner of the html output file you will see a blue publish icon. Click on the icon to publish your results at rpubs.com.

  3. You will need to set up a personal account name and password at RPubs.

  4. Give your document a name and a brief description.

1. Use U.S. 2016 general election data (county_data) to illustrate graphical layering.

dim(county_data)
## [1] 3195   32
str(county_data[1:9])
## 'data.frame':    3195 obs. of  9 variables:
##  $ id           : chr  "0" "01000" "01001" "01003" ...
##  $ name         : chr  NA "1" "Autauga County" "Baldwin County" ...
##  $ state        : Factor w/ 51 levels "AK","AL","AR",..: NA 2 2 2 2 2 2 2 2 2 ...
##  $ census_region: Factor w/ 4 levels "Midwest","Northeast",..: NA 3 3 3 3 3 3 3 3 3 ...
##  $ pop_dens     : Factor w/ 7 levels "[    0,   10)",..: 3 3 3 4 2 2 3 2 2 4 ...
##  $ pop_dens4    : Factor w/ 4 levels "[  0,   17)",..: 3 3 3 4 2 2 3 2 2 4 ...
##  $ pop_dens6    : Factor w/ 6 levels "[  0,    9)",..: 5 5 5 5 3 3 5 2 3 5 ...
##  $ pct_black    : Factor w/ 7 levels "[ 0.0, 2.0)",..: 4 6 5 3 6 5 1 7 6 5 ...
##  $ pop          : int  318857056 4849377 55395 200111 26887 22506 57719 10764 20296 115916 ...
county_data %>%
  select(id, name, state, pop, black, partywinner16) %>% 
  sample_n(5)
##      id              name state   pop black partywinner16
## 1 21231      Wayne County    KY 20486   1.8    Republican
## 2 16059      Lemhi County    ID  7726   0.4    Republican
## 3 37023      Burke County    NC 89486   6.8    Republican
## 4 13265 Taliaferro County    GA  1693  58.6      Democrat
## 5 38019   Cavalier County    ND  3855   0.4    Republican

2. Create a scatterplot with the percent of black residents by population for US counties in 2016 and highlight whether the county flipped or not.

Layer 1. Create a base layer of points with the percent back (y=black/100) by population (x=pop) for counties that did not flip in 2016.

p0 <- ggplot(data = subset(county_data, flipped == "No"),
             aes(x = pop, y = black/100))

p1 <- p0 + geom_point(alpha = 0.15, color = "gray50")

p1

Layer 2. Add a second layer of points for counties that did flip in 2016.

*Add commas to the large population numbers to aid readability.

party_colors <- c("blue", "red")

p2 <- p1 + geom_point(data = subset(county_data, flipped == "Yes"),
                      aes(x = pop, y = black/100, color = partywinner16)) +
  scale_color_manual(values = party_colors) + 
  scale_x_log10(labels=scales::comma)

p2

Layer 3. Transform the y-axis scale to reflect percentages. Add axes labels, title, and caption.

p3 <- p2 + scale_y_continuous(labels=scales::percent) + 
  
  labs(color = "County flipped to ... ", 
       x = "County Population (log scale)",
       y = "Percent Black Population",
       title = "Flipped counties, 2016",
       caption = "Counties in gray did not flip.")
p3

Layer 4. If a flipped county’s black population exceeds 25%, add a state annotation.

p4 <- p3 + geom_text_repel(data=
                    subset(county_data, flipped=="Yes" & black>25),
                    aes(label=state), size=3)
p4 + theme(legend.position="top")

What theory do these data patterns support, if any?