1. Goal: Increase Voter Turn-out

This prior election illustrated how voter turnout matters. This year in Pennsylvania, Joe Biden won by only 47,578 votes. By understanding what factors impacted voter turnout in previous years, campaigns can better target people who are actually likely to vote rather than wasting both time and money. As such for our project, we decided to focus our efforts on figuring out what factors best predict that someone will vote.

2. The Data

We used the “IPUMS-ASA U.S. Voting Behaviors” dataset about voting behaviors in the U.S. from the Census Bureau and Bureau of Labor Statistics, provided by the IPUMS organization and curated by ASA.

3. Exploratory Data Analysis

3.1 What percent of people vote?

Interactive map made with plotly

Note that this is weighted with the survey weights provided.

3.3 What variables are associated with voter turn-out?

3.3.1 Education Level

Hypothesis: People with higher education levels are more likely to vote

3.3.2 Age

Hypothesis: Older people are more likely to vote

3.3.3 Marital Status

Hypothesis: People who are or have been married are more likely to vote

3.3.4 Veteran Status

Hypothesis: Veterans more likely to vote

3.3.5 Labor Force

Hypothesis: People who are employed are more likely to vote

3.3.6 Race

3.4 Why don’t people vote?

4. Analyses

4.1 Regression

Do states with higher rates of mail-in voting have higher rates voter turn-out?

Interactive graphic made with plotly

4.2 Weighted Chi-Squared Tests

4.2.1 Education

##              No School Some school but no diploma High school graduate or GED
## Did not vote         0                   17324670                    44499002
## Voted                0                   11995158                    61551938
##              Some college but no degree Associate degree Bachelors degree
## Did not vote                   22710480         10336346         14258097
## Voted                          50096529         28466024         67231104
##              Masters degree Professional or Doctoral degree
## Did not vote        4034243                         1281277
## Voted              29884543                        10435339
## 
##  Pearson's Chi-squared test
## 
## data:  tbl
## X-squared = NaN, df = 7, p-value = NA

4.2.2 Race

##                  White     Black More than one race Asian or Pacific Islander
## Did not vote  89352859  13803392            2624594                   7073032
## Voted        211146197  32313137            4130562                  10128785
##              American Indian or Aleut or Eskimo
## Did not vote                            1999820
## Voted                                   2098503
##                  White     Black More than one race Asian or Pacific Islander
## Did not vote 0.2973482 0.2993155          0.3885320                 0.4111793
## Voted        0.7026518 0.7006845          0.6114680                 0.5888207
##              American Indian or Aleut or Eskimo
## Did not vote                          0.4879606
## Voted                                 0.5120394
## 
##  Pearson's Chi-squared test
## 
## data:  tbl2
## X-squared = 1864975, df = 4, p-value < 2.2e-16

4.2.3 Marital Status

## Did not vote        Voted 
##    114853697    259817184
## 
##  Chi-squared test for given probabilities
## 
## data:  tbl4
## X-squared = 56087659, df = 1, p-value < 2.2e-16

4.2.4 Veteran Status

##              No service       Yes
## Did not vote  107821916   7031781
## Voted         235033022  24784162
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tbl5
## X-squared = 1196521, df = 1, p-value < 2.2e-16

4.2.5 Citizenship

##              Born abroad of American parents Born in U.S Born in U.S. outlying
## Did not vote                         1064127   100237770               1229508
## Voted                                2307519   235363239               1394888
##              Naturalized citizen
## Did not vote            12322292
## Voted                   20751538
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tbl5
## X-squared = 1196521, df = 1, p-value < 2.2e-16

4.2.6 Labor Force

##              No, not in the labor force Yes, in the labor force
## Did not vote                   42247712                72605984
## Voted                          89934150               169883034
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tbl7
## X-squared = 164185, df = 1, p-value < 2.2e-16

4.3 Logistic Generalized Linear Models

## 
## Call:
## glm(formula = as.factor(Voted) ~ AGE, family = "binomial", data = trim1618)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9493  -1.2968   0.7173   0.8747   1.1328  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.3325070  0.0161454  -20.59   <2e-16 ***
## AGE          0.0243576  0.0003231   75.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 186890  on 152823  degrees of freedom
## Residual deviance: 180929  on 152822  degrees of freedom
## AIC: 180933
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = as.factor(Voted) ~ AGE + as.factor(SEX) + Metro + 
##     RaceSimp + Vet + Citizen + Labor + EduSimp, family = "binomial", 
##     data = trim1618)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6036  -1.0220   0.5683   0.8141   2.2380  
## 
## Coefficients:
##                                              Estimate Std. Error z value
## (Intercept)                                -2.2848579  0.0721985 -31.647
## AGE                                         0.0316859  0.0003883  81.593
## as.factor(SEX)2                             0.1178478  0.0127306   9.257
## MetroCentral city status unknown           -0.1309099  0.0197097  -6.642
## MetroNot identifiable                      -0.1067393  0.0593822  -1.797
## MetroNot in metro area                     -0.0895915  0.0190150  -4.712
## MetroOutside central city                   0.0032961  0.0164771   0.200
## RaceSimpBlack                               0.2999303  0.0208877  14.359
## RaceSimpMore than one race                 -0.0983910  0.0459707  -2.140
## RaceSimpAsian or Pacific Islander          -0.5467592  0.0317662 -17.212
## RaceSimpAmerican Indian or Aleut or Eskimo -0.4507596  0.0502083  -8.978
## VetYes                                      0.0881883  0.0238815   3.693
## CitizenBorn in U.S                          0.0986694  0.0642475   1.536
## CitizenBorn in U.S. outlying               -0.3652759  0.1009645  -3.618
## CitizenNaturalized citizen                 -0.2755766  0.0676390  -4.074
## LaborYes, in the labor force                0.3067889  0.0143658  21.356
## EduSimpHigh school graduate or GED          0.7680608  0.0223958  34.295
## EduSimpSome college but no degree           1.4008940  0.0244355  57.330
## EduSimpAssociate degree                     1.4976338  0.0275499  54.361
## EduSimpBachelors degree                     2.1132725  0.0257172  82.174
## EduSimpMasters degree                       2.4268736  0.0339748  71.432
## EduSimpProfessional or Doctoral degree      2.5381708  0.0513136  49.464
##                                            Pr(>|z|)    
## (Intercept)                                 < 2e-16 ***
## AGE                                         < 2e-16 ***
## as.factor(SEX)2                             < 2e-16 ***
## MetroCentral city status unknown           3.10e-11 ***
## MetroNot identifiable                      0.072257 .  
## MetroNot in metro area                     2.46e-06 ***
## MetroOutside central city                  0.841450    
## RaceSimpBlack                               < 2e-16 ***
## RaceSimpMore than one race                 0.032331 *  
## RaceSimpAsian or Pacific Islander           < 2e-16 ***
## RaceSimpAmerican Indian or Aleut or Eskimo  < 2e-16 ***
## VetYes                                     0.000222 ***
## CitizenBorn in U.S                         0.124595    
## CitizenBorn in U.S. outlying               0.000297 ***
## CitizenNaturalized citizen                 4.62e-05 ***
## LaborYes, in the labor force                < 2e-16 ***
## EduSimpHigh school graduate or GED          < 2e-16 ***
## EduSimpSome college but no degree           < 2e-16 ***
## EduSimpAssociate degree                     < 2e-16 ***
## EduSimpBachelors degree                     < 2e-16 ***
## EduSimpMasters degree                       < 2e-16 ***
## EduSimpProfessional or Doctoral degree      < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 186494  on 152617  degrees of freedom
## Residual deviance: 163922  on 152596  degrees of freedom
##   (206 observations deleted due to missingness)
## AIC: 163966
## 
## Number of Fisher Scoring iterations: 4

4.4 Tree Models

4.4.1 Boosted Trees for Variable Importance

##               var    rel.inf
## EduSimp   EduSimp 51.8737011
## AGE           AGE 34.4071612
## RaceSimp RaceSimp  5.9799034
## Citizen   Citizen  3.6491491
## Labor       Labor  2.3461848
## SEX           SEX  1.2491621
## Vet           Vet  0.4947382

4.4.2 Classification Tree

## 
## Classification tree:
## tree(formula = as.factor(Voted) ~ AGE + EduSimp, data = trim1618)
## Number of terminal nodes:  3 
## Residual mean deviance:  1.137 = 173500 / 152600 
## Misclassification error rate: 0.2898 = 44222 / 152618

## node), split, n, deviance, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 152618 186500 Voted ( 0.3001 0.6999 )  
##   2) EduSimp: Some school but no diploma,High school graduate or GED,Some college but no degree 84939 114300 Voted ( 0.3992 0.6008 )  
##     4) AGE < 45.5 34903  48310 Did not vote ( 0.5227 0.4773 ) *
##     5) AGE > 45.5 50036  62190 Voted ( 0.3130 0.6870 ) *
##   3) EduSimp: Associate degree,Bachelors degree,Masters degree,Professional or Doctoral degree 67679  62950 Voted ( 0.1758 0.8242 ) *