2018 House Election Prediction/Exploration Project

Your challenge is to become the expert on one House district. You may want to predict who will win your district and you will definitely want to explore the polling data in order to try to determine who is likely to vote for whom.

You will write a report on your district which is due on Tuesday (morning), November 5th (Election Day!). I’ll show you how to write a report from within RStudio (using R Markdown) that includes your summary tables and graphs and you will have class time to work on this project.

We can get individual call-level polling data on some house districts here and read about those same house districts here.

You may want to think about how accurate polls have been historical our previous lab.

Here’s some code for reading the house polling day into R and looking at the first six responses. For the purposes of this example, I’ll use data from a poll of Arizona’s 2nd Congressional District (a swing district that includes much of Tucson).

library(RCurl); library(dplyr); library(ggplot2)

poll <- "az02-1.csv"
url <- paste0("https://raw.githubusercontent.com/TheUpshot/2018-live-poll-results/master/data/elections-poll-", poll)

x <- getURL(url)
poll_results <- read.csv(text = x)
head(poll_results)

##   response         ager                            educ file_race gender
## 1      Rep 65 and older Graduate or Professional Degree     White Female
## 2      Rep     50 to 64               Bachelors' degree     White Female
## 3      Dem     18 to 34    Some college or trade school   Unknown Female
## 4      Dem     35 to 49               Bachelors' degree     White Female
## 5      Dem 65 and older Graduate or Professional Degree     White   Male
## 6      Rep 65 and older               Bachelors' degree     White   Male
##   race_eth  APPKAV approve   CHECK FEMINISM        genballot    hdem_fav
## 1    White support Approve Support   oppose Reps. keep House Unfavorable
## 2    White support Approve   Check  support Reps. keep House Unfavorable
## 3    White  oppose Disapp.   Check  support Dems. take House  Don't know
## 4 Hispanic  oppose Disapp.   Check  support Dems. take House   Favorable
## 5    White  oppose Disapp.   Check  support Dems. take House   Favorable
## 6    Other support Approve Support  support Reps. keep House Unfavorable
##      hrep_fav    WOMEN age_combined                educ4 file_party
## 1  Don't know disagree 65 and older  Postgraduate Degree      Other
## 2   Favorable    agree     45 to 64 4-year College Grad. Republican
## 3  Don't know    agree     18 to 29   Some College Educ. Democratic
## 4 Unfavorable    agree     30 to 44 4-year College Grad. Democratic
## 5  Don't know    agree 65 and older  Postgraduate Degree Democratic
## 6   Favorable    agree 65 and older 4-year College Grad. Republican
##   gender_combined         likely                partyid
## 1          Female Almost certain Independent (No party)
## 2          Female Almost certain Independent (No party)
## 3          Female Almost certain               Democrat
## 4          Female Almost certain               Democrat
## 5            Male Almost certain               Democrat
## 6            Male Almost certain Independent (No party)
##                          race_edu age_combinedd                    educ3
## 1     White, 4-Year College Grads  65 and older      Postgraduate Degree
## 2     White, 4-Year College Grads      45 to 64     4-year College Grad.
## 3 White, No 4-Year College Degree      18 to 44 No 4-year College Degree
## 4                        Nonwhite      18 to 44     4-year College Grad.
## 5     White, 4-Year College Grads  65 and older      Postgraduate Degree
## 6                        Nonwhite  65 and older     4-year College Grad.
##   file_race_white region        turnout_class phone_type turnout_scale
## 1           White   Pima 85 percent or higher       Cell             1
## 2           White   Pima 85 percent or higher   Landline             1
## 3         Unknown   Pima     50 to 85 percent   Landline             1
## 4           White   Pima 85 percent or higher   Landline             1
## 5           White   Pima 85 percent or higher   Landline             1
## 6           White   Pima 85 percent or higher   Landline             1
##   turnout_score   w_LV   w_RV final_weight
## 1        0.9146 0.7363 0.5470       0.7499
## 2        0.9280 1.0059 0.6411       1.0230
## 3        0.8100 1.1542 0.9188       1.1949
## 4        0.9758 0.8772 0.4805       0.8886
## 5        0.9964 0.6954 0.4226       0.7050
## 6        0.9971 1.1027 0.6743       1.1182

We can let deploy our entire toolbox of dplyr operations and ggplot graphs on this data set!!!

Here are a three examples (one data summary and two graphs):

library(knitr)
poll_summary <- poll_results %>% group_by(ager) %>%
  summarize(n=n(), 
            expected_turnout=sum(turnout_score), 
            mean_turnout=mean(turnout_score), 
           R_vote=100*mean(response=="Rep"),
           D_vote=100*mean(response=="Dem"),
           D_edge = D_vote-R_vote
           )

kable(poll_summary, digits=1)

ager	n	expected_turnout	mean_turnout	R_vote	D_vote	D_edge
[DO NOT READ] Refused	18	16.5	0.9	16.7	55.6	38.9
18 to 34	52	30.5	0.6	21.2	55.8	34.6
35 to 49	83	64.0	0.8	33.7	55.4	21.7
50 to 64	151	126.5	0.8	37.7	54.3	16.6
65 and older	198	184.3	0.9	44.9	46.5	1.5

poll_results %>% 
  ggplot(aes(turnout_score)) + geom_histogram()+facet_wrap(.~ager)

poll_results %>% group_by(ager, race_eth) %>%
  summarize(n=n(), 
            expected_turnout=sum(turnout_score), 
           R_vote=mean(response=="Rep"),
           D_vote=mean(response=="Dem"),
           D_edge = D_vote-R_vote
           ) %>% ggplot(aes(ager, D_edge, size=expected_turnout))+
  facet_wrap(~race_eth)+geom_point()+ 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

2018 House Election Prediction/Exploration Project

Data Science