MATH 239 Statistical Report

The Relationship Between Key Identity Demographics and Voter Turnout

Introduction

After the historic November election just a few weeks ago, it is ever apparent how valuable voter data can be to understanding the beliefs and actions of our country’s people. Maps of the U.S. were scattered over social media depicting voter-turnout based on key demographic identities such as race, gender and age. As we know, those three key demographic identities and their intersections with themselves and other confounding factors bring to light key systemic issues that in turn create voter suppression. This study aims to understand and unveil important insights into the relationships between communities of people based on identity and their voter status.

Statistical analysis research was conducted to specifically understand the relationship between a person’s voter status (VOTED; yes or no) and the variables age, sex, and race. Without the most recent 2020 November election data, we will focus on the last presidential election (November 2016) by piping our data through the YEAR variable. While variables such as education and employment status also greatly influence voter status, this paper focuses on the three most basic demographics that society analyzes to most generally discuss trends in voter data. That way it is convenient to compare and contrast these voter data graphics with ones made for past and future presidential election years. The overall research interests aim to understand how the chosen variables relate to and impact 2016 voter status. The overarching research question is what are the relationships between demographic variables such as age, race, gender and voter turnout in the 2016 November election? Understanding these relationships will provide valuable insights when identifying where changes must be made in order to increase voter turnout and decrease voter suppression in future elections.

This report will outline the variables used and data cleaning required to analyze the data. Then, the report will describe methods of analysis and how they relate to the variables, data structure, and overarching research question. General linear models were conducted to identify the variables that have statistically significant relationship with voter status (VOTED). In conclusion, the report will outline potential correlations and trends between voter status and key demographics and acknowledge the limitations of this study and provide insight into where and how further research can be conducted.

Data/Application

Statistical data analysis utilized R-Studio to analyze the data sourced from IPUMS-CPS, which is an integrated set of data from the monthly Current Population Survey (CPS) by the U.S. Census Bureau and Bureau of Labor Statistics. The dataset contained voter data from the past 14 years, however this report only utilizes data from the year 2016. Originally, the dataset included 28 different variables on more than 640,000 cases but this report only analyzes 4 of those variables:

VOTED [1 = did not vote; 2 = voted]

SEX [1 = male; 2 = female]

AGE [in years]

RACE [1= White; 2= Black; 3= American Indian/Aleut/Eskimo; 4= Asian/Pacific Islander; 5= More than one race]

Since elections in the United States have become so polarizing, it is often easy to identify each state with a political party and even most counties. Here is a map that visualizes the percentage of voter turnout by state in the 2016 presidential election:

Fig. 1: Map of 2016 voter turnout by state.

This map begs the question: what factors may affect voter turnout that may be particular to each state? This exploration focused on the response variable ‘VOTED’ with 1 representing those who did not vote and 2 representing those who did. The other variables considered are as follows: Age (gives each person’s age in years at last birthday), sex (1 represents male and 2 represents female), state FIP (state of household’s residence identified by FIPS coding scheme which orders states alphabetically), and race (where 1 denotes White, 2 denotes Black, 3 denotes American Indian/Aleut/Eskimo, 4 denotes Asian/Pacific Islander, and 5 denotes more than one race). For the race variable, our project used a new variable combining the data collected for both racial identity and whether the individual was Hispanic. It is worth noting that age is the only continuous numeric explanatory variable. While the sex data collected is generally balanced between male and female, the racial data collected is disproportionately white, which has greater social implications.

Exploratory data analysis played a key role in identifying potential relationships among our variables. Consider the following proportion bar graphs of whether individuals voted as a function of age, sex, and racial identity:

This is one of the most informative and interesting visuals. Here, age is modeled against VOTED. This relationship is statistically significant and the graphic shows a clear upwards trend: as someone’s age increases, the more likely that person will vote.

This graphic shows sex, a categorical predictor, modeled against voted. The relationship between these two variables is indeed statistically significant. The two levels are quantitatively similar which only means that people vote in similar manners regardless of their sex. Just under 75% of both males and females in this dataset voted in 2016 however, the mean is slightly higher for females than males.

The data collected on race was disproportionately white, which this graphic accounts for by filling the bars and displaying proportions rather than quantities. In this visual, observe the comparable voter turnout for White and Black populations. There is a clear decrease in voter turnout for Asian/Pacific islander, mixed race, and especially for Indigenous populations.

These simple graphics provide insight into the voter turnout gaps between different ages, binary genders, and racial identities, prompting further investigation through more complex statistical methods.

Methods

The dominant method explored in this project was logistic regressions. This method was chosen because the response variable of interest is categorical and the glm() function provides a detailed overview of statistical information. Through general linear modeling in R, it was observed that age, gender, and race all prove to be statistically significant with the response variable of ‘VOTED’. The particular logistic regressions considered were age alone, age and sex, and age and race. This is because age is the only numeric variable under consideration, and thus act as a necessary explanatory variable for the logistic regressions.

Various combinations of multiple regressions were explored in order to confirm the relationships with the explanatory variables, confirming the statistical significance of these models. Confusion matrices were used to illuminate the results of the regression models. While this project focused on regression models, the binary nature of the response variable would prove quite interesting to look at through tree models in future statistical projects.

Analysis

The following models were derived through the glm() general linear modeling function in R.

modAge<- glm(as.factor(VOTED) ~ as.numeric(AGE), data=data, family="binomial")
summary(modAge)

## 
## Call:
## glm(formula = as.factor(VOTED) ~ as.numeric(AGE), family = "binomial", 
##     data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0019  -1.3424   0.6834   0.8249   1.0391  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -0.0752300  0.0228422  -3.293  0.00099 ***
## as.numeric(AGE)  0.0227560  0.0004632  49.124  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 92581  on 79819  degrees of freedom
## Residual deviance: 90061  on 79818  degrees of freedom
## AIC: 90065
## 
## Number of Fisher Scoring iterations: 4

For voting as a function of age, we derived: \[p = \frac{1}{1+e^{-(\beta_{0} + \beta_{1}x_{1})}} = \frac{1}{1+e^{-(-0.075 + 0.023x_{1})}} \]

where \(p\) is the probability of having voted and \(x_{1}\) represents age. This model is statistically significant with a p-value of <2e-16. This implies that voter turnout proportion increases (if marginally) at each age level.

modSex <- glm(as.factor(VOTED) ~ as.numeric(AGE)+SEX, data=data, family="binomial")
summary(modSex)

## 
## Call:
## glm(formula = as.factor(VOTED) ~ as.numeric(AGE) + SEX, family = "binomial", 
##     data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0352  -1.3253   0.6868   0.8234   1.0731  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -0.1574058  0.0242170   -6.50 8.04e-11 ***
## as.numeric(AGE)  0.0226607  0.0004638   48.86  < 2e-16 ***
## SEX2             0.1674585  0.0162791   10.29  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 92581  on 79819  degrees of freedom
## Residual deviance: 89955  on 79817  degrees of freedom
## AIC: 89961
## 
## Number of Fisher Scoring iterations: 4

For voting as a function of age \(x_{1}\) by sex, for the female population we derive: \[ p_{\text{female}} = \frac{1}{1+e^{-(-0.01 + 0.023x_{1})}} = \frac{1}{1+e^{(0.01 - 0.023x_{1})}}\] For the male population, we derive:

\[ p_{\text{male}} = \frac{1}{1+e^{-(-0.157 + 0.023x_{1})}} = \frac{1}{1+e^{(0.157 - 0.023x_{1})}}\]

This model is also statistically significant with p-values of <2e-16. These models imply a higher voter turnout for female-identifying populations because the power to which \(e\) is raised in the denominator is smaller, establishing a smaller denominator and thus higher proportion at each age level.

modRace <- glm(as.factor(VOTED) ~ as.numeric(AGE)+as.factor(RACEHISP), data=data, family="binomial")
summary(modRace)

## 
## Call:
## glm(formula = as.factor(VOTED) ~ as.numeric(AGE) + as.factor(RACEHISP), 
##     family = "binomial", data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0420  -1.3423   0.6809   0.8198   1.3816  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -0.0238360  0.0234769  -1.015   0.3100    
## as.numeric(AGE)       0.0224207  0.0004654  48.173  < 2e-16 ***
## as.factor(RACEHISP)2  0.0701887  0.0274248   2.559   0.0105 *  
## as.factor(RACEHISP)3 -0.8478431  0.0641369 -13.219  < 2e-16 ***
## as.factor(RACEHISP)4 -0.5339882  0.0379479 -14.072  < 2e-16 ***
## as.factor(RACEHISP)5 -0.3232741  0.0604164  -5.351 8.76e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 92581  on 79819  degrees of freedom
## Residual deviance: 89673  on 79814  degrees of freedom
## AIC: 89685
## 
## Number of Fisher Scoring iterations: 4

For voting as a function of race, there are 5 potential equations to be derived by racial identity (White, Black, Indigenous, Asian/Pacific Islander, and more than one race). Each equation represents the probability \(p\) each racial category voted based on age, \(x_{1}\).

\[ p_{\text{white}} = \frac{1}{1+e^{-(-0.024 + 0.022x_{1})}} = \frac{1}{1+e^{(0.024 - 0.022x_{1})}}\]

\[ p_{\text{black}} = \frac{1}{1+e^{-(-0.024 + 0.070 + 0.022x_{1})}} = \frac{1}{1+e^{(-0.046 - 0.022x_{1})}}\]

\[ p_{\text{indigenous}} = \frac{1}{1+e^{-(-0.024 + (-0.848) + 0.022x_{1})}} = \frac{1}{1+e^{(0.872 - 0.022x_{1})}}\]

\[ p_{\text{asian/pacific islander}} = \frac{1}{1+e^{-(-0.024 + (-0.538) + 0.022x_{1})}} = \frac{1}{1+e^{(0.562 - 0.022x_{1})}}\]

\[ p_{\text{more than one race}} = \frac{1}{1+e^{-(-0.024 + (-0.323) + 0.022x_{1})}} = \frac{1}{1+e^{(0.347 - 0.022x_{1})}}\]

All variables are statistically significant, though the Black category has a much higher p-value than other groups compared to the category, as hinted by the visualization in the Data/Application section. These models imply varying proportions of voter turnout by racial categories based on age, with White and Black populations having the highest proportions of of voter turnout (increasing with age) and Indigenous populations of American Indian/Aleut/Eskimo people having lowest voter turnout.

Analysis of the above regression models could be further clarified by confusion matrices which would reveal the the results of the application of these models. Finally, the confusion matrices demonstrate the predictive abilities of this model.

First, consider the confusion matrix for the general model of VOTED as a function of age:

##        
## predAge     1     2
##       2 21287 58533

This confusion matrix means that this model would predict any individual of any age voted in the 2016 election. This table demonstrates that this prediction is incorrect for 21287 out of 79820 for an error rate of \(\frac{21287}{79820} = 0.2667\), meaning this prediction is incorrect 27% of the time.

Now, consider the confusion matrix for VOTED as a function of age and sex:

##        
## predSex     1     2
##       2 21287 58533

The results of this model hold the same predictions as the previous, and thus the same error rate of 27%.

Finally, consider the confusion matrix for VOTED as a function of age and race:

##         
## predRace     1     2
##        1   445   350
##        2 20842 58183

The results of this model’s confusion matrix is different. Out of 79820 observations, this model predicts correctly for 58,628 observations. However, it predicts that 20,842 observations voted who actually did not and that 350 observations did not vote who actually did. Thus, the error rate is \(\frac{20842+350}{79820} = 0.2655\). Despite the different numbers, the error rate rounds to 27% like in previous models, suggesting the added complexity did not result in stronger predictive capabilities.

Results

Despite only conducting exploratory data analysis on four variables in a robust dataset, this report garnered significant findings and led to interesting conclusions. Every model produced in this report was statistically significant showing the strength in the dataset as a whole. Age modeled against voter status showed that typically as someone’s age increases, the more likely that person will vote. This graph clearly showed a reasonably steady, strong trend with the exception of the largest drop for the age of 85, which may indicate when old age might interfere with someone’s ability to vote or desire to vote.

On the other hand, sex modeled against voter status showed how limiting it can be to use two binary variables. Generally, this model showed that regardless of sex, 75 out of 100 people from this dataset voted. Other sources report that just under 60% of Americans voted in the 2016 presidential election which means that this dataset overrepresents voters. Additionally, unlike before the 1920’s and since the addition of the 19th amendment, females are generally accepted to represent half of the voting population without contestation. What would be more interesting, but society and government official documents fail to recognize, is the breakdown of voter status across all genders, beyond the outdated gender binary. Due to society’s tendancy to conflate one’s sex assigned at birth and their gender identity, there can be descrepancies between records of the two which causes documentation issues and possible rejection of ballots (ie. voter suppression).

Finally, race was modeled against voter status. The visual shows that white and black people vote at comparable rates. There is a noticeable decrease in voter turnout for Asian/Pacific islander, mixed race, and especially for Indigenous populations. Despite the addition of the 15th amendment in 1789 (revised in 1992 to erase specification of males), only recently is voter data more representative of the population of the nation. Black voters, in fact, have been voting in record numbers, especially in 2020, and along with indigenous people, are credited with swinging the 2020 presidential election. However, there is still extreme voter suppression for particularly Indigenous communities in the United States, as a result of inaccessibility and racially targeting policies (nontraditional addresses, voter registration, and voter ID laws), even decades after the Voting Rights Act of 1965.

Conclusion

While the November 2016 election was one Americans would always remember, analyzing the November 2020 election data would have been even more timely. Unique to an election during a pandemic, we saw voters vote by mail in record numbers, and pulling in more voters than ever before in our country’s history. Understanding that data and its relationships would provide valuable evidence when identifying where changes must be made in order to increase voter turnout and decrease voter suppression in future elections. Analyzing social determinants of voting like education, employment status, geographic location (urban/rural), particularly modeled with race would only highlight systemic, racialized issues of voter suppression and tell a fuller, more detailed story. Unfortunately, voting in this country is a privilege, rather than a right as it was intended to be in a democratic system. For future datasets, it would be very interesting to include party affiliations and analyze how someone’s voter status/how they voted corresponded to who they voted for. If analyzing the 2020 election, you would find huge disparities in in-person/mail-in voters and party affiliation, you would find correlations between geographic location or rural/urban and party affiliation, etc. In some way, if we vote, how we vote, and who we vote for can be predicted based on the most recent data available that looks at voting demographics. With all that said, there is so much more to explore within this existing dataset, and especially for future elections to come.

Appendix

This appendix consists of all R code used in the statistical investigation, including unused interesting models and graphics.

##### Packages & Setup
## Packages
library(tidyverse)
library("readxl") 
#install.packages("usmap")
library(usmap)
#install.packages("plotly")
library(plotly)
#install.packages("viridis")
library(viridis)
#install.packages("colorspace")
library(colorspace)

## Data
data<- data%>%
  mutate(RACEHISP = case_when(
    RACESIMPLE == 1 ~ 1, 
    RACESIMPLE == 2 ~ 2, 
    RACESIMPLE == 3 ~ 3, 
    RACESIMPLE == 4 ~ 4, 
    RACESIMPLE == 5 ~ 5,
    RACESIMPLE == 9 ~ 9,
    HISPSIMPLE == 1 & RACESIMPLE <=5 ~ 5,
    HISPSIMPLE == 1 & RACESIMPLE == 9 ~ 6
  )) #This step combined Hispanic variable and Race variable
data<- data%>%
  mutate(STATEFIP = case_when(
    STATEFIP == 1 ~ "01", 
    STATEFIP == 2 ~ "02",
    STATEFIP == 3 ~ "03", 
    STATEFIP == 4 ~ "04",
    STATEFIP == 5 ~ "05", 
    STATEFIP == 6 ~ "06",
    STATEFIP == 7 ~ "07", 
    STATEFIP == 8 ~ "08",
    STATEFIP == 9 ~ "09",
    STATEFIP == 10 ~ "10",
    STATEFIP == 11 ~ "11", 
    STATEFIP == 12 ~ "12",
    STATEFIP == 13 ~ "13", 
    STATEFIP == 14 ~ "14",
    STATEFIP == 15 ~ "15", 
    STATEFIP == 16 ~ "16",
    STATEFIP == 17 ~ "17", 
    STATEFIP == 18 ~ "18",
    STATEFIP == 19 ~ "19",
    STATEFIP == 20 ~ "20",
    STATEFIP == 21 ~ "21", 
    STATEFIP == 22 ~ "22",
    STATEFIP == 23 ~ "23", 
    STATEFIP == 24 ~ "24",
    STATEFIP == 25 ~ "25", 
    STATEFIP == 26 ~ "26",
    STATEFIP == 27 ~ "27", 
    STATEFIP == 28 ~ "28",
    STATEFIP == 29 ~ "29",
    STATEFIP == 30 ~ "30",
    STATEFIP == 31 ~ "31", 
    STATEFIP == 32 ~ "32",
    STATEFIP == 33 ~ "33", 
    STATEFIP == 34 ~ "34",
    STATEFIP == 35 ~ "35", 
    STATEFIP == 36 ~ "36",
    STATEFIP == 37 ~ "37", 
    STATEFIP == 38 ~ "38",
    STATEFIP == 39 ~ "39",
    STATEFIP == 40 ~ "40",
    STATEFIP == 41 ~ "41", 
    STATEFIP == 42 ~ "42",
    STATEFIP == 43 ~ "43", 
    STATEFIP == 44 ~ "44",
    STATEFIP == 45 ~ "45", 
    STATEFIP == 46 ~ "46",
    STATEFIP == 47 ~ "47", 
    STATEFIP == 48 ~ "48",
    STATEFIP == 49 ~ "49",
    STATEFIP == 50 ~ "50",
    STATEFIP == 51 ~ "51", 
    STATEFIP == 52 ~ "52",
    STATEFIP == 53 ~ "53", 
    STATEFIP == 54 ~ "54",
    STATEFIP == 55 ~ "55", 
    STATEFIP == 56 ~ "56"
  )) #This step cleaned the STATEFIP variable observations in order to create map visuals

##### Code for bar graphs and maps
## Map of population percentage who voted by state
states <- usmap::us_map()
state18<-data%>%
  group_by(STATEFIP, VOTED)%>%
  summarise(nVote=n(), 
            nWgtVote=sum(VOSUPPWT, na.rm=TRUE))

## `summarise()` regrouping output by 'STATEFIP' (override with `.groups` argument)

state18T<-data%>%
  group_by(STATEFIP)%>%
  summarise(n=n(), 
            nWgt=sum(VOSUPPWT, na.rm=TRUE))

## `summarise()` ungrouping output (override with `.groups` argument)

statePropVote<-state18%>%
  filter(VOTED=="2")%>%
  left_join(state18T)%>%
  mutate(sampPropVote=nVote/n, 
         wgtPropVote=nWgtVote/nWgt)

## Joining, by = "STATEFIP"

mapPropVote<-left_join(x = statePropVote, y = states, by = c("STATEFIP"="fips"))
map<-mapPropVote%>%
  ggplot(aes(x, y, group = group)) +
  geom_polygon(aes(text=STATEFIP, fill = wgtPropVote),color="black")+
  theme_bw()+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())+
  labs(x="", y = "", 
       caption = "(Based on data from IPUMS)", 
       fill = 'Percent',
       title=paste("2016 Voter Turn-out"))+
  scale_fill_gradient(low="white", high="indianred2")

## Warning: Ignoring unknown aesthetics: text

#Then the function ggplotly(map) would output the map

## Voted by age
graphAge<-ggplot(data=data)+
  geom_bar(aes(x=AGE,  fill=VOTED), position="fill")+
  theme_classic()+
  ggtitle("Votes by Age")+
  xlab("Age (years)")+
  labs(fill="Voted (1= no, 2 = yes)")+
  scale_fill_manual(values=c("indianred2", "skyblue2"))
#Then 'graphAge' creates  visual 

## Voted by sex
graphSex<-ggplot(data=data)+
  geom_bar(aes(x=SEX,  fill=VOTED), position="fill")+
  theme_classic()+
  ggtitle("Votes by Sex")+
  xlab("Sex (1 = male, 2 = female)")+
  labs(fill="Voted (1= no, 2 = yes)")+
  scale_fill_manual(values=c("indianred2", "skyblue2"))
#Then 'graphSex' creates visual

## Voted by race
graphRace<-ggplot(data=data)+
  geom_bar(aes(x=RACEHISP,  fill=VOTED), position="fill")+
  theme_classic()+
  ggtitle("Votes by Racial Identity")+
  xlab("Race")+
  labs(fill="Voted (1= no, 2 = yes)")+
  scale_x_discrete(limits = c("1", "2", "3", 
                              "4", "5"),
                   labels = c("White", "Black", 
                              "American Indian/\nAleut/Eskimo", "Asian/\n Pacific Islander", "More than\n one race"))+
  scale_fill_manual(values=c("indianred2", "skyblue2"))
#Then 'graphRace' creates visual 


##### Logistic regression models
modAge<- glm(as.factor(VOTED) ~ as.numeric(AGE), data=data, family="binomial")
#Then 'summary(modAge)' creates regression table

modSex <- glm(as.factor(VOTED) ~ as.numeric(AGE)+SEX, data=data, family="binomial")
#Then 'summary(modSex)' creates regression table

modRace <- glm(as.factor(VOTED) ~ as.numeric(AGE)+as.factor(RACEHISP), data=data, family="binomial")
#Then 'summary(modRace)' creates regression table 


##### Confusion matrices
## Confusion matrix for VOTED as function of age
probAge<-predict(modAge, newdata = data, type = "response")
predAge<-rep("1",79820)
predAge[probAge>.5]<-"2"
#Then 'table(predAge,data$VOTED)' prints table

## Confusion matrix for VOTED as function of age and sex
probSex<-predict(modSex, newdata = data, type = "response")
predSex<-rep("1",79820)
predSex[probSex>.5]<-"2"
#Then 'table(predSex,data$VOTED)' prints table

##Confusion matrix for VOTED as function of age and race
probRace<-predict(modRace, newdata = data, type = "response")
predRace<-rep("1",79820)
predRace[probRace>.5]<-"2"
#Then 'table(predRace,data$VOTED)' prints table 


##### Unused interesting visuals
## Looking at maps of the population percentage of various racial categories within this data set highlights some interesting points. Consider the following graphs focusing on White, Black, and Indigenous observations in this data set. NOTE: some states are not being displayed in this R Markdown despite running in  RStudio. Since these visuals are not included in the report, trouble shooting was not conducted but would be necessary in future work with this data. 

# White
states <- usmap::us_map()
state4<-data%>%
  group_by(STATEFIP, RACEHISP)%>%
  summarise(nVote4=n(), 
            nWgtVote4=sum(VOSUPPWT, na.rm=TRUE))

## `summarise()` regrouping output by 'STATEFIP' (override with `.groups` argument)

state4T<-data%>%
  group_by(STATEFIP)%>%
  summarise(n4=n(), 
            nWgt4=sum(VOSUPPWT, na.rm=TRUE))

## `summarise()` ungrouping output (override with `.groups` argument)

statePropVote4<-state4%>%
  filter(RACEHISP=="1")%>%
  left_join(state4T)%>%
  mutate(sampPropVote4=nVote4/n4, 
         wgtPropVote4=nWgtVote4/nWgt4)

## Joining, by = "STATEFIP"

mapPropVote4<-left_join(x = statePropVote4, y = states, by = c("STATEFIP"="fips"))
map4<-mapPropVote4%>%
  ggplot(aes(x, y, group = group)) +
  geom_polygon(aes(text=STATEFIP, fill = wgtPropVote4),color="black")+
  theme_bw()+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())+
  labs(x="", y = "", 
       caption = "(Based on data from IPUMS)", 
       fill = 'Percent',
       title=paste("2016 White Response Percentage"))+
  scale_fill_gradient(low="white", high="indianred2")

## Warning: Ignoring unknown aesthetics: text

ggplotly(map4)

# Black
state3<-data%>%
  group_by(STATEFIP, RACEHISP)%>%
  summarise(nVote3=n(), 
            nWgtVote3=sum(VOSUPPWT, na.rm=TRUE))

## `summarise()` regrouping output by 'STATEFIP' (override with `.groups` argument)

state3T<-data%>%
  group_by(STATEFIP)%>%
  summarise(n3=n(), 
            nWgt3=sum(VOSUPPWT, na.rm=TRUE))

## `summarise()` ungrouping output (override with `.groups` argument)

statePropVote3<-state3%>%
  filter(RACEHISP=="2")%>%
  left_join(state3T)%>%
  mutate(sampPropVote3=nVote3/n3, 
         wgtPropVote3=nWgtVote3/nWgt3)

## Joining, by = "STATEFIP"

mapPropVote3<-left_join(x = statePropVote3, y = states, by = c("STATEFIP"="fips"))
map3<-mapPropVote3%>%
  ggplot(aes(x, y, group = group)) +
  geom_polygon(aes(text=STATEFIP, fill = wgtPropVote3),color="black")+
  theme_bw()+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())+
  labs(x="", y = "", 
       caption = "(Based on data from IPUMS)", 
       fill = 'Percent',
       title=paste("2016 Black Response Percentage"))+
  scale_fill_gradient(low="white", high="indianred2")

## Warning: Ignoring unknown aesthetics: text

ggplotly(map3)

# American Indian/Aleut/Eskimo 
state2<-data%>%
  group_by(STATEFIP, RACEHISP)%>%
  summarise(nVote2=n(), 
            nWgtVote2=sum(VOSUPPWT, na.rm=TRUE))

## `summarise()` regrouping output by 'STATEFIP' (override with `.groups` argument)

state2T<-data%>%
  group_by(STATEFIP)%>%
  summarise(n2=n(), 
            nWgt2=sum(VOSUPPWT, na.rm=TRUE))

## `summarise()` ungrouping output (override with `.groups` argument)

statePropVote2<-state2%>%
  filter(RACEHISP=="3")%>%
  left_join(state2T)%>%
  mutate(sampPropVote2=nVote2/n2, 
         wgtPropVote2=nWgtVote2/nWgt2)

## Joining, by = "STATEFIP"

mapPropVote2<-left_join(x = statePropVote2, y = states, by = c("STATEFIP"="fips"))
map2<-mapPropVote2%>%
  ggplot(aes(x, y, group = group)) +
  geom_polygon(aes(text=STATEFIP, fill = wgtPropVote2),color="black")+
  theme_bw()+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())+
  labs(x="", y = "", 
       caption = "(Based on data from IPUMS)", 
       fill = 'Percent',
       title=paste("2016 Indigenous Response Percentage "))+
  scale_fill_gradient(low="white", high="indianred2")

## Warning: Ignoring unknown aesthetics: text

ggplotly(map2)