After creating your project handout, knitted as a pdf, and submitted to Moodle, re-knit the file to html format.
In the upper right corner of the html output file you will see a blue publish icon. Click on the icon to publish your results at rpubs.com.
You will need to set up a personal account name and password at RPubs.
Give your document a name and a brief description.
county_data) to illustrate graphical layering.county_data dataset.dim(county_data)
## [1] 3195 32
str(county_data[1:9])
## 'data.frame': 3195 obs. of 9 variables:
## $ id : chr "0" "01000" "01001" "01003" ...
## $ name : chr NA "1" "Autauga County" "Baldwin County" ...
## $ state : Factor w/ 51 levels "AK","AL","AR",..: NA 2 2 2 2 2 2 2 2 2 ...
## $ census_region: Factor w/ 4 levels "Midwest","Northeast",..: NA 3 3 3 3 3 3 3 3 3 ...
## $ pop_dens : Factor w/ 7 levels "[ 0, 10)",..: 3 3 3 4 2 2 3 2 2 4 ...
## $ pop_dens4 : Factor w/ 4 levels "[ 0, 17)",..: 3 3 3 4 2 2 3 2 2 4 ...
## $ pop_dens6 : Factor w/ 6 levels "[ 0, 9)",..: 5 5 5 5 3 3 5 2 3 5 ...
## $ pct_black : Factor w/ 7 levels "[ 0.0, 2.0)",..: 4 6 5 3 6 5 1 7 6 5 ...
## $ pop : int 318857056 4849377 55395 200111 26887 22506 57719 10764 20296 115916 ...
county_data dataset.county_data %>%
select(id, name, state, pop, black, partywinner16) %>%
sample_n(5)
## id name state pop black partywinner16
## 1 21231 Wayne County KY 20486 1.8 Republican
## 2 16059 Lemhi County ID 7726 0.4 Republican
## 3 37023 Burke County NC 89486 6.8 Republican
## 4 13265 Taliaferro County GA 1693 58.6 Democrat
## 5 38019 Cavalier County ND 3855 0.4 Republican
Layer 1. Create a base layer of points with the percent back (y=black/100) by population (x=pop) for counties that did not flip in 2016.
Subset the data to include only the counties that did not flip from “red to blue” or “blue to red” (i.e., flipped with the level “No”).
Set the color of these points to be a light gray (say, “gray50”) using geom_point(), as these points will be in the background layer of the plot.
p0 <- ggplot(data = subset(county_data, flipped == "No"),
aes(x = pop, y = black/100))
p1 <- p0 + geom_point(alpha = 0.15, color = "gray50")
p1
Layer 2. Add a second layer of points for counties that did flip in 2016.
Choose counties where flipped was “Yes”. The x and y mappings are the same, inherited from the first layer.
Map the 2016 winning party variable (partywinner16) to the color aesthetic. Assign blue and red party_colors, for the Democrats and Republicans, respectively, using scale_color_manual().
The population (pop) is right-skewed. To spread out the distribution along the x axis apply a logarithmic scale transformation.
*Add commas to the large population numbers to aid readability.
party_colors <- c("blue", "red")
p2 <- p1 + geom_point(data = subset(county_data, flipped == "Yes"),
aes(x = pop, y = black/100, color = partywinner16)) +
scale_color_manual(values = party_colors) +
scale_x_log10(labels=scales::comma)
p2
Layer 3. Transform the y-axis scale to reflect percentages. Add axes labels, title, and caption.
p3 <- p2 + scale_y_continuous(labels=scales::percent) +
labs(color = "County flipped to ... ",
x = "County Population (log scale)",
y = "Percent Black Population",
title = "Flipped counties, 2016",
caption = "Counties in gray did not flip.")
p3
Layer 4. If a flipped county’s black population exceeds 25%, add a state annotation.
For any counties that flipped in 2016, AND
If the black population percentage exceeded 25%
Label the state abbreviation using geom_text_repel().
p4 <- p3 + geom_text_repel(data=
subset(county_data, flipped=="Yes" & black>25),
aes(label=state), size=3)
p4 + theme(legend.position="top")
What theory do these data patterns support, if any?