Your challenge is to become the expert on one House district. You may want to predict who will win your district and you will definitely want to explore the polling data in order to try to determine who is likely to vote for whom.
You will write a report on your district which is due on Tuesday (morning), November 5th (Election Day!). I’ll show you how to write a report from within RStudio (using R Markdown) that includes your summary tables and graphs and you will have class time to work on this project.
We can get individual call-level polling data on some house districts here and read about those same house districts here.
You may want to think about how accurate polls have been historical our previous lab.
Here’s some code for reading the house polling day into R and looking at the first six responses. For the purposes of this example, I’ll use data from a poll of Arizona’s 2nd Congressional District (a swing district that includes much of Tucson).
library(RCurl); library(dplyr); library(ggplot2)
poll <- "az02-1.csv"
url <- paste0("https://raw.githubusercontent.com/TheUpshot/2018-live-poll-results/master/data/elections-poll-", poll)
x <- getURL(url)
poll_results <- read.csv(text = x)
head(poll_results)
## response ager educ file_race gender
## 1 Rep 65 and older Graduate or Professional Degree White Female
## 2 Rep 50 to 64 Bachelors' degree White Female
## 3 Dem 18 to 34 Some college or trade school Unknown Female
## 4 Dem 35 to 49 Bachelors' degree White Female
## 5 Dem 65 and older Graduate or Professional Degree White Male
## 6 Rep 65 and older Bachelors' degree White Male
## race_eth APPKAV approve CHECK FEMINISM genballot hdem_fav
## 1 White support Approve Support oppose Reps. keep House Unfavorable
## 2 White support Approve Check support Reps. keep House Unfavorable
## 3 White oppose Disapp. Check support Dems. take House Don't know
## 4 Hispanic oppose Disapp. Check support Dems. take House Favorable
## 5 White oppose Disapp. Check support Dems. take House Favorable
## 6 Other support Approve Support support Reps. keep House Unfavorable
## hrep_fav WOMEN age_combined educ4 file_party
## 1 Don't know disagree 65 and older Postgraduate Degree Other
## 2 Favorable agree 45 to 64 4-year College Grad. Republican
## 3 Don't know agree 18 to 29 Some College Educ. Democratic
## 4 Unfavorable agree 30 to 44 4-year College Grad. Democratic
## 5 Don't know agree 65 and older Postgraduate Degree Democratic
## 6 Favorable agree 65 and older 4-year College Grad. Republican
## gender_combined likely partyid
## 1 Female Almost certain Independent (No party)
## 2 Female Almost certain Independent (No party)
## 3 Female Almost certain Democrat
## 4 Female Almost certain Democrat
## 5 Male Almost certain Democrat
## 6 Male Almost certain Independent (No party)
## race_edu age_combinedd educ3
## 1 White, 4-Year College Grads 65 and older Postgraduate Degree
## 2 White, 4-Year College Grads 45 to 64 4-year College Grad.
## 3 White, No 4-Year College Degree 18 to 44 No 4-year College Degree
## 4 Nonwhite 18 to 44 4-year College Grad.
## 5 White, 4-Year College Grads 65 and older Postgraduate Degree
## 6 Nonwhite 65 and older 4-year College Grad.
## file_race_white region turnout_class phone_type turnout_scale
## 1 White Pima 85 percent or higher Cell 1
## 2 White Pima 85 percent or higher Landline 1
## 3 Unknown Pima 50 to 85 percent Landline 1
## 4 White Pima 85 percent or higher Landline 1
## 5 White Pima 85 percent or higher Landline 1
## 6 White Pima 85 percent or higher Landline 1
## turnout_score w_LV w_RV final_weight
## 1 0.9146 0.7363 0.5470 0.7499
## 2 0.9280 1.0059 0.6411 1.0230
## 3 0.8100 1.1542 0.9188 1.1949
## 4 0.9758 0.8772 0.4805 0.8886
## 5 0.9964 0.6954 0.4226 0.7050
## 6 0.9971 1.1027 0.6743 1.1182
We can let deploy our entire toolbox of dplyr operations and ggplot graphs on this data set!!!
Here are a three examples (one data summary and two graphs):
library(knitr)
poll_summary <- poll_results %>% group_by(ager) %>%
summarize(n=n(),
expected_turnout=sum(turnout_score),
mean_turnout=mean(turnout_score),
R_vote=100*mean(response=="Rep"),
D_vote=100*mean(response=="Dem"),
D_edge = D_vote-R_vote
)
kable(poll_summary, digits=1)
ager | n | expected_turnout | mean_turnout | R_vote | D_vote | D_edge |
---|---|---|---|---|---|---|
[DO NOT READ] Refused | 18 | 16.5 | 0.9 | 16.7 | 55.6 | 38.9 |
18 to 34 | 52 | 30.5 | 0.6 | 21.2 | 55.8 | 34.6 |
35 to 49 | 83 | 64.0 | 0.8 | 33.7 | 55.4 | 21.7 |
50 to 64 | 151 | 126.5 | 0.8 | 37.7 | 54.3 | 16.6 |
65 and older | 198 | 184.3 | 0.9 | 44.9 | 46.5 | 1.5 |
poll_results %>%
ggplot(aes(turnout_score)) + geom_histogram()+facet_wrap(.~ager)
poll_results %>% group_by(ager, race_eth) %>%
summarize(n=n(),
expected_turnout=sum(turnout_score),
R_vote=mean(response=="Rep"),
D_vote=mean(response=="Dem"),
D_edge = D_vote-R_vote
) %>% ggplot(aes(ager, D_edge, size=expected_turnout))+
facet_wrap(~race_eth)+geom_point()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))