1 2016 Primary Election Results
This project uses a data set that contains demographic data on US counties related to the 2016 US Primary Presidential Election.
The dataset contains several main files. The data was retrieved from the url: https://www.kaggle.com/benhamner/2016-us-election.
1.1 Executive Summary
1.1.1 Group Members:
Bianca Sosnovski, Elina Azrilyan, Robert Mercier, Asher Dvir-Djerassi and Charls Joseph.
1.1.2 Project Dataset:
Results from the 2016 Presidential Primaries by county.
1.1.3 Files:
- “Primary_Results2016”: Results of the Democratic and Republican primaries, by US County and each presidential candidate; 2) “County_Facts”: The demographic breakdown of the US Counties that voted in the primaries; and 3) “Headers”: Labels of the columns in the county_facts spreadsheet.
1.1.4 Data Source:
From the data science community at Kaggle. URL: https://kaggle.com/benhamner/2016-us-election.
1.1.5 Project Description:
Check for the differences between the counties that voted for the top presidential candidates.
1.1.6 Project Summary:
1.1.6.1 Primary Question:
“Which are the most valued data science skills?” #### Secondary Question: “What are the key elements that explain primary election results of a candidate by state?”
1.1.7 Team Tools:
R - for Analysis; Tableau - for some visualizations; Skype - for group meetings; Doodle - for synchronizing meetup times; GitHub for coding and making changes; plain old email.
1.1.8 Loading process:
The two datasets we worked with were downloaded as CSV files. We read the individual files as data frame in R using the function “read.csv.”
1.1.9 Transforming process:
Since the original datasets were separated, the files had to be merged. The most efficient way of merging them was using the unique identifier, FIPS code, in each of the two main files.
1.1.10 Key Data Challenges:
- While we wanted to view the results for all 50 states, one state did not have primary results. Minnesota has counties profile results but not primary results, therefore it was not included.
- However, the most significant challenge was getting matches for all the counties in the Primary Results and the County facts files, by no fault of our own. The FIPS code was the unique identifier needed. However, since the FIPS code is classified strictly on an individual county basis, some of the polling data did not match. While the county demographics file based on individual county info had the correct 4-digit and 5-digit codes (4-digits codes did not have a ‘leading’ zero), 11 states broke out the primary results into city, town or district results. These results manifested themselves into 8-digit codes that where not associated with FIPS. This was also the reason for the fourth dataset the ANSI county codes for subdivisions. This said as an example, Illinois has 102 counties, however Cook county is divided into Chicago and the outlying suburbs for a total of 103. Instead of inferring too much and trying to decode the multitude of rules for each state we simply struck out any state without all the matches. There were ten (10) states total and one other state (New Hampshire) which was missing the FIPS codes entirely.
1.1.11 Thoughts on Data Challenges:
- One thing that became clear is that your final data frame can only be as good as the individual parts. While we obtained a trove of useful data in the two sets and having the FIPS code helped greatly, if say the FIPS codes are missing in an entire state, as in the case in New Hampshire there is nothing we can do. The choice comes down to looking up the information manually which a) defeats the purpose of using an organized dataset b) is inefficient c) could lead to mistakes on our end.
- Having a standardized way of receiving or reordering the primary data would have been helpful. While most of the results were reported by county, the ten states that reported differently affected the merging of the data and therefore the final data frame. Though there was enough data that these omissions wouldn’t systematically affect the results, it is interesting to note of the twelve states: Alaska, Connecticut, Illinois, Kansas, Maine, Massachusetts, North Dakota, New Hampshire, Rhode Island, Vermont and Wyoming that were omitted all six (6) New England states are on the list. The other states are concentrated in the Midwest to West and Alaska.
- Using the County_Fact dataset as the primary table could have alleviated some of the issues, but also could have caused a lot more headaches. As mentioned, since the FIPS code was used as the primary identifier and the FIPS is based off the individual counties this dataset might have been the better table to base everything else from. However, whatever data set you use to relate to the table having that data being reported by county (and therefore FIPS) would be paramount. Instead of the example of Kansas where the data received was from congressional district rather than county, the data would have to be reported in similar fashion.
1.1.12 Recommendations for Future Analysis:
1.1.12.1 As a next level analysis from the current data it would be interesting to:
- Add other candidates to the mix, especially on the Republican side, since there were more.
- Cluster the counties into similar types and analyze those clusters’ propensity to vote on each of the candidates #### Or with additional data it would be interesting to:
- Compare the primary results to the general election results.
1.1.13 Conclusion:
The primary question we were asked to answer is “Which are the most valued data science skills?” Although we had our own ideas of the most valued data science skills before the assignment, it was important for us to go through the entire process to make sure we didn’t have any pre-conceived notions. While there were other vital skills that were required, the most valuable data skill, not because it was the “hardest” but because it was the most unpredictable, is data wrangling. It is exactly the unpredictability of working with large datasets that makes wrangling so valuable. In huge data sets it is often unpredictable how the data will be organized throughout. Every issue must be dealt with to be able to get the fun part: analysis of the data. The secondary question, “What are the key elements that explain which candidate wins?”, is integrated in the code that follows.
1.2 Load packages
library(knitr)
library(kableExtra)# manipulate table styles
suppressMessages(library(tidyverse))
1.3 2016 primary election results data
1.3.1 Read the data
pr_df <- read.csv(file="https://raw.githubusercontent.com/bsosnovski/Project3/master/Primary_Results2016.csv", header=TRUE, sep=",")
kable(head(pr_df))%>% kable_styling(bootstrap_options = c("striped", "condensed"))
| state | state_abbreviation | county | fips | party | candidate | votes | fraction_votes |
|---|---|---|---|---|---|---|---|
| Alabama | AL | Autauga | 1001 | Democrat | Bernie Sanders | 544 | 0.182 |
| Alabama | AL | Autauga | 1001 | Democrat | Hillary Clinton | 2387 | 0.800 |
| Alabama | AL | Baldwin | 1003 | Democrat | Bernie Sanders | 2694 | 0.329 |
| Alabama | AL | Baldwin | 1003 | Democrat | Hillary Clinton | 5290 | 0.647 |
| Alabama | AL | Barbour | 1005 | Democrat | Bernie Sanders | 222 | 0.078 |
| Alabama | AL | Barbour | 1005 | Democrat | Hillary Clinton | 2567 | 0.906 |
kable(tail(pr_df))%>% kable_styling(bootstrap_options = c("striped", "condensed"))
| state | state_abbreviation | county | fips | party | candidate | votes | fraction_votes | |
|---|---|---|---|---|---|---|---|---|
| 24606 | Wyoming | WY | Teton-Sublette | 95600028 | Republican | Marco Rubio | 19 | 0.475 |
| 24607 | Wyoming | WY | Teton-Sublette | 95600028 | Republican | Ted Cruz | 0 | 0.000 |
| 24608 | Wyoming | WY | Uinta-Lincoln | 95600027 | Republican | Donald Trump | 0 | 0.000 |
| 24609 | Wyoming | WY | Uinta-Lincoln | 95600027 | Republican | John Kasich | 0 | 0.000 |
| 24610 | Wyoming | WY | Uinta-Lincoln | 95600027 | Republican | Marco Rubio | 0 | 0.000 |
| 24611 | Wyoming | WY | Uinta-Lincoln | 95600027 | Republican | Ted Cruz | 53 | 1.000 |
dim(pr_df)
## [1] 24611 8
1.3.2 Tidying the data
Spreading the data set to move candidate votes and fraction data from rows to columns.
#The code was adapted from the following help page: https://community.rstudio.com/t/spread-with-multiple-value-columns/5378
pr_df$party <- NULL
pr_df_wide <- pr_df %>%
gather(variable, value, -(state:candidate)) %>%
unite(temp, candidate, variable) %>%
spread(temp, value)
kable(head(pr_df_wide))%>% kable_styling(bootstrap_options = c("striped", "condensed"))
| state | state_abbreviation | county | fips | No Preference_fraction_votes | No Preference_votes | Uncommitted_fraction_votes | Uncommitted_votes | Ben Carson_fraction_votes | Ben Carson_votes | Bernie Sanders_fraction_votes | Bernie Sanders_votes | Carly Fiorina_fraction_votes | Carly Fiorina_votes | Chris Christie_fraction_votes | Chris Christie_votes | Donald Trump_fraction_votes | Donald Trump_votes | Hillary Clinton_fraction_votes | Hillary Clinton_votes | Jeb Bush_fraction_votes | Jeb Bush_votes | John Kasich_fraction_votes | John Kasich_votes | Marco Rubio_fraction_votes | Marco Rubio_votes | Martin O’Malley_fraction_votes | Martin O’Malley_votes | Mike Huckabee_fraction_votes | Mike Huckabee_votes | Rand Paul_fraction_votes | Rand Paul_votes | Rick Santorum_fraction_votes | Rick Santorum_votes | Ted Cruz_fraction_votes | Ted Cruz_votes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alabama | AL | Autauga | 1001 | NA | NA | NA | NA | 0.146 | 1764 | 0.182 | 544 | NA | NA | NA | NA | 0.445 | 5387 | 0.800 | 2387 | NA | NA | 0.035 | 421 | 0.148 | 1785 | NA | NA | NA | NA | NA | NA | NA | NA | 0.205 | 2482 |
| Alabama | AL | Baldwin | 1003 | NA | NA | NA | NA | 0.084 | 4221 | 0.329 | 2694 | NA | NA | NA | NA | 0.469 | 23618 | 0.647 | 5290 | NA | NA | 0.059 | 2987 | 0.193 | 9703 | NA | NA | NA | NA | NA | NA | NA | NA | 0.170 | 8571 |
| Alabama | AL | Barbour | 1005 | NA | NA | NA | NA | 0.122 | 417 | 0.078 | 222 | NA | NA | NA | NA | 0.501 | 1710 | 0.906 | 2567 | NA | NA | 0.036 | 123 | 0.146 | 498 | NA | NA | NA | NA | NA | NA | NA | NA | 0.179 | 609 |
| Alabama | AL | Bibb | 1007 | NA | NA | NA | NA | 0.099 | 393 | 0.197 | 246 | NA | NA | NA | NA | 0.494 | 1959 | 0.755 | 942 | NA | NA | 0.021 | 84 | 0.112 | 444 | NA | NA | NA | NA | NA | NA | NA | NA | 0.255 | 1011 |
| Alabama | AL | Blount | 1009 | NA | NA | NA | NA | 0.100 | 1523 | 0.386 | 395 | NA | NA | NA | NA | 0.487 | 7390 | 0.551 | 564 | NA | NA | 0.022 | 337 | 0.122 | 1843 | NA | NA | NA | NA | NA | NA | NA | NA | 0.244 | 3698 |
| Alabama | AL | Bullock | 1011 | NA | NA | NA | NA | 0.085 | 47 | 0.066 | 178 | NA | NA | NA | NA | 0.565 | 313 | 0.913 | 2451 | NA | NA | 0.042 | 23 | 0.116 | 64 | NA | NA | NA | NA | NA | NA | NA | NA | 0.170 | 94 |
dim(pr_df_wide)
## [1] 4217 36
1.3.3 Create data frame
Now let’s create a new data frame with only the data for the 4 candidates we are intersted in: Bernie Sanders, Hillary Clinton, Ted Cruz, and Donald Trump.
new_pr_df <- data.frame(pr_df_wide$state, pr_df_wide$state_abbreviation, pr_df_wide$county, pr_df_wide$fips, pr_df_wide$`Bernie Sanders_fraction_votes`, pr_df_wide$`Bernie Sanders_votes`, pr_df_wide$`Hillary Clinton_fraction_votes`, pr_df_wide$`Hillary Clinton_votes`, pr_df_wide$`Donald Trump_fraction_votes`, pr_df_wide$`Donald Trump_votes`, pr_df_wide$`Ted Cruz_fraction_votes`, pr_df_wide$`Ted Cruz_votes`)
names(new_pr_df) <- c("state", "state_abbr", "county", "fips", "sanders fraction votes", "sanders votes", "clinton fraction votes", "clinton votes","trump fraction votes", "trump votes", "cruz fraction votes", "cruz votes")
kable(head(new_pr_df))%>% kable_styling(bootstrap_options = c("striped", "condensed"))
| state | state_abbr | county | fips | sanders fraction votes | sanders votes | clinton fraction votes | clinton votes | trump fraction votes | trump votes | cruz fraction votes | cruz votes |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Alabama | AL | Autauga | 1001 | 0.182 | 544 | 0.800 | 2387 | 0.445 | 5387 | 0.205 | 2482 |
| Alabama | AL | Baldwin | 1003 | 0.329 | 2694 | 0.647 | 5290 | 0.469 | 23618 | 0.170 | 8571 |
| Alabama | AL | Barbour | 1005 | 0.078 | 222 | 0.906 | 2567 | 0.501 | 1710 | 0.179 | 609 |
| Alabama | AL | Bibb | 1007 | 0.197 | 246 | 0.755 | 942 | 0.494 | 1959 | 0.255 | 1011 |
| Alabama | AL | Blount | 1009 | 0.386 | 395 | 0.551 | 564 | 0.487 | 7390 | 0.244 | 3698 |
| Alabama | AL | Bullock | 1011 | 0.066 | 178 | 0.913 | 2451 | 0.565 | 313 | 0.170 | 94 |
dim(new_pr_df)
## [1] 4217 12
write.csv(new_pr_df,'new_pr_df.csv')
1.4 Demographic data
1.4.1 Read the data
facts <- read.csv(file="https://raw.githubusercontent.com/bsosnovski/Project3/master/County_Facts.csv", header=TRUE, sep=",")
glimpse(facts, width = getOption("width"))
## Observations: 3,195
## Variables: 54
## $ fips <int> 0, 1000, 1001, 1003, 1005, 1007, 1009, 1011...
## $ area_name <fct> United States, Alabama, Autauga County, Bal...
## $ state_abbreviation <fct> , , AL, AL, AL, AL, AL, AL, AL, AL, AL, AL,...
## $ PST045214 <int> 318857056, 4849377, 55395, 200111, 26887, 2...
## $ PST040210 <int> 308758105, 4780127, 54571, 182265, 27457, 2...
## $ PST120214 <dbl> 3.3, 1.4, 1.5, 9.8, -2.1, -1.8, 0.7, -1.4, ...
## $ POP010210 <int> 308745538, 4779736, 54571, 182265, 27457, 2...
## $ AGE135214 <dbl> 6.2, 6.1, 6.0, 5.6, 5.7, 5.3, 6.1, 6.3, 6.1...
## $ AGE295214 <dbl> 23.1, 22.8, 25.2, 22.2, 21.2, 21.0, 23.6, 2...
## $ AGE775214 <dbl> 14.5, 15.3, 13.8, 18.7, 16.5, 14.8, 17.0, 1...
## $ SEX255214 <dbl> 50.8, 51.5, 51.4, 51.2, 46.6, 45.9, 50.5, 4...
## $ RHI125214 <dbl> 77.4, 69.7, 77.9, 87.1, 50.2, 76.3, 96.0, 2...
## $ RHI225214 <dbl> 13.2, 26.7, 18.7, 9.6, 47.6, 22.1, 1.8, 70....
## $ RHI325214 <dbl> 1.2, 0.7, 0.5, 0.7, 0.6, 0.4, 0.6, 0.8, 0.4...
## $ RHI425214 <dbl> 5.4, 1.3, 1.1, 0.9, 0.5, 0.2, 0.3, 0.3, 0.9...
## $ RHI525214 <dbl> 0.2, 0.1, 0.1, 0.1, 0.2, 0.1, 0.1, 0.7, 0.0...
## $ RHI625214 <dbl> 2.5, 1.5, 1.8, 1.6, 0.9, 0.9, 1.2, 1.1, 0.8...
## $ RHI725214 <dbl> 17.4, 4.1, 2.7, 4.6, 4.5, 2.1, 8.7, 7.5, 1....
## $ RHI825214 <dbl> 62.1, 66.2, 75.6, 83.0, 46.6, 74.5, 87.8, 2...
## $ POP715213 <dbl> 84.9, 85.0, 85.0, 82.1, 84.8, 86.6, 88.7, 8...
## $ POP645213 <dbl> 12.9, 3.5, 1.6, 3.6, 2.9, 1.2, 4.3, 5.4, 0....
## $ POP815213 <dbl> 20.7, 5.2, 3.5, 5.5, 5.0, 2.1, 7.3, 5.2, 1....
## $ EDU635213 <dbl> 86.0, 83.1, 85.6, 89.1, 73.7, 77.5, 77.0, 6...
## $ EDU685213 <dbl> 28.8, 22.6, 20.9, 27.7, 13.4, 12.1, 12.1, 1...
## $ VET605213 <int> 21263779, 388865, 5922, 19346, 2120, 1327, ...
## $ LFE305213 <dbl> 25.5, 24.2, 26.2, 25.9, 24.6, 27.6, 33.9, 2...
## $ HSG010214 <int> 133957180, 2207912, 22751, 107374, 11799, 8...
## $ HSG445213 <dbl> 64.9, 69.7, 76.8, 72.6, 67.7, 79.0, 81.0, 7...
## $ HSG096213 <dbl> 26.0, 15.9, 8.3, 24.4, 10.6, 7.3, 4.5, 8.7,...
## $ HSG495213 <int> 176700, 122500, 136200, 168600, 89200, 9050...
## $ HSD410213 <int> 115610216, 1838683, 20071, 73283, 9200, 709...
## $ HSD310213 <dbl> 2.63, 2.55, 2.71, 2.52, 2.66, 3.03, 2.70, 2...
## $ INC910213 <int> 28155, 23680, 24571, 26766, 16829, 17427, 2...
## $ INC110213 <int> 53046, 43253, 53682, 50221, 32911, 36447, 4...
## $ PVY020213 <dbl> 15.4, 18.6, 12.1, 13.9, 26.7, 18.1, 15.8, 2...
## $ BZA010213 <int> 7488353, 97578, 817, 4871, 464, 275, 660, 1...
## $ BZA110213 <int> 118266253, 1603100, 10120, 54988, 6611, 314...
## $ BZA115213 <dbl> 2.0, 1.1, 2.1, 3.7, -5.6, 7.5, 3.4, 0.0, 2....
## $ NES010213 <int> 23005620, 311578, 2947, 16508, 1546, 1126, ...
## $ SBO001207 <int> 27092908, 382350, 4067, 19035, 1667, 1385, ...
## $ SBO315207 <dbl> 7.1, 14.8, 15.2, 2.7, 0.0, 14.9, 0.0, 0.0, ...
## $ SBO115207 <dbl> 0.9, 0.8, 0.0, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0...
## $ SBO215207 <dbl> 5.7, 1.8, 1.3, 1.0, 0.0, 0.0, 0.0, 0.0, 3.3...
## $ SBO515207 <dbl> 0.1, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
## $ SBO415207 <dbl> 8.3, 1.2, 0.7, 1.3, 0.0, 0.0, 0.0, 0.0, 0.0...
## $ SBO015207 <dbl> 28.8, 28.1, 31.7, 27.3, 27.0, 0.0, 23.2, 38...
## $ MAN450207 <dbl> 5319456312, 112858843, 0, 1410273, 0, 0, 34...
## $ WTN220207 <dbl> 4174286516, 52252752, 0, 0, 0, 0, 0, 0, 567...
## $ RTN130207 <dbl> 3917663456, 57344851, 598175, 2966489, 1883...
## $ RTN131207 <int> 12990, 12364, 12003, 17166, 6334, 5804, 562...
## $ AFN120207 <int> 613795732, 6426342, 88157, 436955, 0, 10757...
## $ BPS030214 <int> 1046363, 13369, 131, 1384, 8, 19, 3, 1, 2, ...
## $ LND110210 <dbl> 3531905.43, 50645.33, 594.44, 1589.78, 884....
## $ POP060210 <dbl> 87.4, 94.4, 91.8, 114.6, 31.0, 36.8, 88.9, ...
headers <- read.csv(file="https://raw.githubusercontent.com/bsosnovski/Project3/master/Headers.csv", header=TRUE, sep=",", stringsAsFactors = F)
glimpse(headers, width = getOption("width"))
## Observations: 51
## Variables: 2
## $ column_name <chr> "PST045214", "PST040210", "PST120214", "POP010210"...
## $ description <chr> "Population, 2014 estimate", "Population, 2010 (Ap...
1.4.2 Tidying the data
The data set contains rows with total figures for each state and for the country. Because we can use the summary function to obtains these figures, and also to facilitate the data reading, we filter these rows out. To do so, we note that such rows have the variable state_abbreviation with blanks.
facts <- facts %>% filter(state_abbreviation!="")
kable(head(facts))%>% kable_styling(bootstrap_options = c("striped", "condensed"))
| fips | area_name | state_abbreviation | PST045214 | PST040210 | PST120214 | POP010210 | AGE135214 | AGE295214 | AGE775214 | SEX255214 | RHI125214 | RHI225214 | RHI325214 | RHI425214 | RHI525214 | RHI625214 | RHI725214 | RHI825214 | POP715213 | POP645213 | POP815213 | EDU635213 | EDU685213 | VET605213 | LFE305213 | HSG010214 | HSG445213 | HSG096213 | HSG495213 | HSD410213 | HSD310213 | INC910213 | INC110213 | PVY020213 | BZA010213 | BZA110213 | BZA115213 | NES010213 | SBO001207 | SBO315207 | SBO115207 | SBO215207 | SBO515207 | SBO415207 | SBO015207 | MAN450207 | WTN220207 | RTN130207 | RTN131207 | AFN120207 | BPS030214 | LND110210 | POP060210 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1001 | Autauga County | AL | 55395 | 54571 | 1.5 | 54571 | 6.0 | 25.2 | 13.8 | 51.4 | 77.9 | 18.7 | 0.5 | 1.1 | 0.1 | 1.8 | 2.7 | 75.6 | 85.0 | 1.6 | 3.5 | 85.6 | 20.9 | 5922 | 26.2 | 22751 | 76.8 | 8.3 | 136200 | 20071 | 2.71 | 24571 | 53682 | 12.1 | 817 | 10120 | 2.1 | 2947 | 4067 | 15.2 | 0.0 | 1.3 | 0 | 0.7 | 31.7 | 0 | 0 | 598175 | 12003 | 88157 | 131 | 594.44 | 91.8 |
| 1003 | Baldwin County | AL | 200111 | 182265 | 9.8 | 182265 | 5.6 | 22.2 | 18.7 | 51.2 | 87.1 | 9.6 | 0.7 | 0.9 | 0.1 | 1.6 | 4.6 | 83.0 | 82.1 | 3.6 | 5.5 | 89.1 | 27.7 | 19346 | 25.9 | 107374 | 72.6 | 24.4 | 168600 | 73283 | 2.52 | 26766 | 50221 | 13.9 | 4871 | 54988 | 3.7 | 16508 | 19035 | 2.7 | 0.4 | 1.0 | 0 | 1.3 | 27.3 | 1410273 | 0 | 2966489 | 17166 | 436955 | 1384 | 1589.78 | 114.6 |
| 1005 | Barbour County | AL | 26887 | 27457 | -2.1 | 27457 | 5.7 | 21.2 | 16.5 | 46.6 | 50.2 | 47.6 | 0.6 | 0.5 | 0.2 | 0.9 | 4.5 | 46.6 | 84.8 | 2.9 | 5.0 | 73.7 | 13.4 | 2120 | 24.6 | 11799 | 67.7 | 10.6 | 89200 | 9200 | 2.66 | 16829 | 32911 | 26.7 | 464 | 6611 | -5.6 | 1546 | 1667 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 27.0 | 0 | 0 | 188337 | 6334 | 0 | 8 | 884.88 | 31.0 |
| 1007 | Bibb County | AL | 22506 | 22919 | -1.8 | 22915 | 5.3 | 21.0 | 14.8 | 45.9 | 76.3 | 22.1 | 0.4 | 0.2 | 0.1 | 0.9 | 2.1 | 74.5 | 86.6 | 1.2 | 2.1 | 77.5 | 12.1 | 1327 | 27.6 | 8978 | 79.0 | 7.3 | 90500 | 7091 | 3.03 | 17427 | 36447 | 18.1 | 275 | 3145 | 7.5 | 1126 | 1385 | 14.9 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0 | 0 | 124707 | 5804 | 10757 | 19 | 622.58 | 36.8 |
| 1009 | Blount County | AL | 57719 | 57322 | 0.7 | 57322 | 6.1 | 23.6 | 17.0 | 50.5 | 96.0 | 1.8 | 0.6 | 0.3 | 0.1 | 1.2 | 8.7 | 87.8 | 88.7 | 4.3 | 7.3 | 77.0 | 12.1 | 4540 | 33.9 | 23826 | 81.0 | 4.5 | 117100 | 21108 | 2.70 | 20730 | 44145 | 15.8 | 660 | 6798 | 3.4 | 3563 | 4458 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 23.2 | 341544 | 0 | 319700 | 5622 | 20941 | 3 | 644.78 | 88.9 |
| 1011 | Bullock County | AL | 10764 | 10915 | -1.4 | 10914 | 6.3 | 21.4 | 14.9 | 45.3 | 26.9 | 70.1 | 0.8 | 0.3 | 0.7 | 1.1 | 7.5 | 22.1 | 84.7 | 5.4 | 5.2 | 67.8 | 12.5 | 636 | 26.9 | 4461 | 74.3 | 8.7 | 70600 | 3741 | 2.73 | 18628 | 32033 | 21.6 | 112 | 0 | 0.0 | 470 | 417 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 38.8 | 0 | 0 | 43810 | 3995 | 3670 | 1 | 622.81 | 17.5 |
1.4.3 Adjust table headers
The file Headers.csv contains the descriptions of what some of the variables are. We replace the codes in the table headers with those decriptions accordingly.
# Function that matches the code in the dataframe column
# and replace it with dictionary value
new.headers <- function(headers,facts){
n <- nrow(headers)
for (i in seq(n)){
col.Ind <- which(colnames(facts)==headers[i,1])
colnames(facts)[col.Ind] <- headers[i,2]
}
return(facts)
}
facts2 <- new.headers(headers,facts)
kable(head(facts2))%>% kable_styling(bootstrap_options = c("striped", "condensed"))
| fips | area_name | state_abbreviation | Population, 2014 estimate | Population, 2010 (April 1) estimates base | Population, percent change - April 1, 2010 to July 1, 2014 | Population, 2010 | Persons under 5 years, percent, 2014 | Persons under 18 years, percent, 2014 | Persons 65 years and over, percent, 2014 | Female persons, percent, 2014 | White alone, percent, 2014 | Black or African American alone, percent, 2014 | American Indian and Alaska Native alone, percent, 2014 | Asian alone, percent, 2014 | Native Hawaiian and Other Pacific Islander alone, percent, 2014 | Two or More Races, percent, 2014 | Hispanic or Latino, percent, 2014 | White alone, not Hispanic or Latino, percent, 2014 | Living in same house 1 year & over, percent, 2009-2013 | Foreign born persons, percent, 2009-2013 | Language other than English spoken at home, pct age 5+, 2009-2013 | High school graduate or higher, percent of persons age 25+, 2009-2013 | Bachelor’s degree or higher, percent of persons age 25+, 2009-2013 | Veterans, 2009-2013 | Mean travel time to work (minutes), workers age 16+, 2009-2013 | Housing units, 2014 | Homeownership rate, 2009-2013 | Housing units in multi-unit structures, percent, 2009-2013 | Median value of owner-occupied housing units, 2009-2013 | Households, 2009-2013 | Persons per household, 2009-2013 | Per capita money income in past 12 months (2013 dollars), 2009-2013 | Median household income, 2009-2013 | Persons below poverty level, percent, 2009-2013 | Private nonfarm establishments, 2013 | Private nonfarm employment, 2013 | Private nonfarm employment, percent change, 2012-2013 | Nonemployer establishments, 2013 | Total number of firms, 2007 | Black-owned firms, percent, 2007 | American Indian- and Alaska Native-owned firms, percent, 2007 | Asian-owned firms, percent, 2007 | Native Hawaiian- and Other Pacific Islander-owned firms, percent, 2007 | Hispanic-owned firms, percent, 2007 | Women-owned firms, percent, 2007 | Manufacturers shipments, 2007 ($1,000) | Merchant wholesaler sales, 2007 ($1,000) | Retail sales, 2007 ($1,000) | Retail sales per capita, 2007 | Accommodation and food services sales, 2007 ($1,000) | Building permits, 2014 | Land area in square miles, 2010 | Population per square mile, 2010 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1001 | Autauga County | AL | 55395 | 54571 | 1.5 | 54571 | 6.0 | 25.2 | 13.8 | 51.4 | 77.9 | 18.7 | 0.5 | 1.1 | 0.1 | 1.8 | 2.7 | 75.6 | 85.0 | 1.6 | 3.5 | 85.6 | 20.9 | 5922 | 26.2 | 22751 | 76.8 | 8.3 | 136200 | 20071 | 2.71 | 24571 | 53682 | 12.1 | 817 | 10120 | 2.1 | 2947 | 4067 | 15.2 | 0.0 | 1.3 | 0 | 0.7 | 31.7 | 0 | 0 | 598175 | 12003 | 88157 | 131 | 594.44 | 91.8 |
| 1003 | Baldwin County | AL | 200111 | 182265 | 9.8 | 182265 | 5.6 | 22.2 | 18.7 | 51.2 | 87.1 | 9.6 | 0.7 | 0.9 | 0.1 | 1.6 | 4.6 | 83.0 | 82.1 | 3.6 | 5.5 | 89.1 | 27.7 | 19346 | 25.9 | 107374 | 72.6 | 24.4 | 168600 | 73283 | 2.52 | 26766 | 50221 | 13.9 | 4871 | 54988 | 3.7 | 16508 | 19035 | 2.7 | 0.4 | 1.0 | 0 | 1.3 | 27.3 | 1410273 | 0 | 2966489 | 17166 | 436955 | 1384 | 1589.78 | 114.6 |
| 1005 | Barbour County | AL | 26887 | 27457 | -2.1 | 27457 | 5.7 | 21.2 | 16.5 | 46.6 | 50.2 | 47.6 | 0.6 | 0.5 | 0.2 | 0.9 | 4.5 | 46.6 | 84.8 | 2.9 | 5.0 | 73.7 | 13.4 | 2120 | 24.6 | 11799 | 67.7 | 10.6 | 89200 | 9200 | 2.66 | 16829 | 32911 | 26.7 | 464 | 6611 | -5.6 | 1546 | 1667 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 27.0 | 0 | 0 | 188337 | 6334 | 0 | 8 | 884.88 | 31.0 |
| 1007 | Bibb County | AL | 22506 | 22919 | -1.8 | 22915 | 5.3 | 21.0 | 14.8 | 45.9 | 76.3 | 22.1 | 0.4 | 0.2 | 0.1 | 0.9 | 2.1 | 74.5 | 86.6 | 1.2 | 2.1 | 77.5 | 12.1 | 1327 | 27.6 | 8978 | 79.0 | 7.3 | 90500 | 7091 | 3.03 | 17427 | 36447 | 18.1 | 275 | 3145 | 7.5 | 1126 | 1385 | 14.9 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0 | 0 | 124707 | 5804 | 10757 | 19 | 622.58 | 36.8 |
| 1009 | Blount County | AL | 57719 | 57322 | 0.7 | 57322 | 6.1 | 23.6 | 17.0 | 50.5 | 96.0 | 1.8 | 0.6 | 0.3 | 0.1 | 1.2 | 8.7 | 87.8 | 88.7 | 4.3 | 7.3 | 77.0 | 12.1 | 4540 | 33.9 | 23826 | 81.0 | 4.5 | 117100 | 21108 | 2.70 | 20730 | 44145 | 15.8 | 660 | 6798 | 3.4 | 3563 | 4458 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 23.2 | 341544 | 0 | 319700 | 5622 | 20941 | 3 | 644.78 | 88.9 |
| 1011 | Bullock County | AL | 10764 | 10915 | -1.4 | 10914 | 6.3 | 21.4 | 14.9 | 45.3 | 26.9 | 70.1 | 0.8 | 0.3 | 0.7 | 1.1 | 7.5 | 22.1 | 84.7 | 5.4 | 5.2 | 67.8 | 12.5 | 636 | 26.9 | 4461 | 74.3 | 8.7 | 70600 | 3741 | 2.73 | 18628 | 32033 | 21.6 | 112 | 0 | 0.0 | 470 | 417 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 38.8 | 0 | 0 | 43810 | 3995 | 3670 | 1 | 622.81 | 17.5 |
1.5 Merging the demographic data and the primary election results data.
We determine what keys will foster the optimal join. Before merging the data sets, we prepare them by matching data types of columns.
1.5.1 Prepare data sets for merging
facts2$county <- sapply(facts2$area_name , function(x) {
str_replace_all(x, " County" , "")
})
typeof(facts2$county) #Data type of county in facts2
## [1] "character"
typeof(new_pr_df$county) #Data type of county in new_pr_df
## [1] "integer"
new_pr_df$county <- as.character(new_pr_df$county) # Convert county in new_pr_df to character
typeof(facts2$state_abbreviation) #Data type of the state abbreviation in facts2
## [1] "integer"
typeof(new_pr_df$state_abbr) #Data type of the state abbreviation in new_pr_df
## [1] "integer"
facts2$state_abbreviation <- as.character(facts2$state_abbreviation) # Convert county in new_pr_df to character
new_pr_df$state_abbr <- as.character(new_pr_df$state_abbr) # Convert county in new_pr_df to character
#Trim white space around character data
facts2$state_abbreviation <- trimws(facts2$state_abbreviation)
new_pr_df$state_abbr <- trimws(new_pr_df$state_abbr)
facts2$county <- trimws(facts2$county)
new_pr_df$county <- trimws(new_pr_df$county)
1.5.2 Left join by county and state
#Left join by count and state abbreviation
complete_data <- left_join(new_pr_df, facts2, by = c("county", "state_abbr" = "state_abbreviation" ))
1.5.2.1 Dimensions of merged data and original data
dim(new_pr_df) # Dimension of primary election data
## [1] 4217 12
dim(facts2) # Dimensions of the demographic data, which is organized on the county level.
## [1] 3143 55
dim(complete_data) # Dimensions of the two datasets merged by an inner join, which is a left join that creates a dataset that only contains exact matches across the two data frame.
## [1] 4217 65
unique(complete_data$state_abbr) # Listed here are the states that were accurately joined.
## [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN"
## [15] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MS" "MO" "MT" "NE" "NV" "NH"
## [29] "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX"
## [43] "UT" "VT" "VA" "WA" "WV" "WI" "WY"
1.5.2.2 Where did the two data frames not match (i.e. the rows with NA values)
anyNA(complete_data) #There are missing values
## [1] TRUE
#Create a subset of the merged data that only contains those rows where there are NA values.
df_did_not_match <- subset(complete_data, is.na(complete_data$area_name))
dim(df_did_not_match) #There are 1491 rows where the primary election data did not match the demographic data
## [1] 1491 65
df_did_not_match$fips_length <- sapply(df_did_not_match$fips.x, nchar)
unique(df_did_not_match$fips_length) #In the df_did_not_match subset, there are FIPS code 4, 5, and 8 digits in length.
## [1] 8 4 5
count((subset(df_did_not_match, fips_length==4))) #Number of fips codes that are 4 digits long
## # A tibble: 1 x 1
## n
## <int>
## 1 1
count((subset(df_did_not_match, fips_length==5))) #Number of fips codes that are 5 digits long
## # A tibble: 1 x 1
## n
## <int>
## 1 121
count((subset(df_did_not_match, fips_length==8))) #Number of fips codes that are 8 digits long
## # A tibble: 1 x 1
## n
## <int>
## 1 1369
unique(df_did_not_match$state) #States that did not properly match.
## [1] Alaska Arkansas Connecticut Idaho Illinois
## [6] Kansas Kentucky Louisiana Maine Maryland
## [11] Massachusetts Mississippi Missouri New Mexico New York
## [16] North Dakota Oklahoma Rhode Island South Dakota Texas
## [21] Vermont Virginia Wyoming
## 49 Levels: Alabama Alaska Arizona Arkansas California ... Wyoming
1.5.3 Left join by fips code
As shown below, this merge leads to better results. Rather than 1491 rows not matching as was the case when joining by the state and county, using the FIPS code as join by key leads to 1419 rows to not match. On top of this, when joining by the FIPS code a mere 11 states’ primary election results do not accurately match.
complete_data <- left_join(new_pr_df, facts2, by = c("fips"))
1.5.3.1 Dimensions of merged data and original data
dim(new_pr_df) # Dimension of primary election data
## [1] 4217 12
dim(facts2) # Dimensions of the demographic data, which is organized on the county level.
## [1] 3143 55
dim(complete_data) # Dimensions of the two datasets merged by an inner join, which is a left join that creates a dataset that only contains exact matches across the two data frame.
## [1] 4217 66
unique(complete_data$state_abbr) # Listed here are the states that were accurately joined.
## [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN"
## [15] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MS" "MO" "MT" "NE" "NV" "NH"
## [29] "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX"
## [43] "UT" "VT" "VA" "WA" "WV" "WI" "WY"
1.5.3.2 Where did the two data frames not match (i.e. the rows with NA values)?
anyNA(complete_data) #There are missing values
## [1] TRUE
We create a subset of the merged data that only contains those rows where there are NA values.
df_did_not_match <- subset(complete_data, is.na(complete_data$area_name))
dim(df_did_not_match) #There are 1419 rows where the primary election data did not match the demographic data. This is compared to the 1491 rows that did not match when using state and county names.
## [1] 1419 66
df_did_not_match$fips_length <- sapply(df_did_not_match$fips, nchar)
unique(df_did_not_match$fips_length) #In the df_did_not_match subset, are only FIPS code 8 digits in length and observations without a FIPS code.
## [1] 8 NA
count((subset(df_did_not_match, fips_length==4))) #Number of fips codes that are 4 digits long
## # A tibble: 1 x 1
## n
## <int>
## 1 0
count((subset(df_did_not_match, fips_length==5))) #Number of fips codes that are 5 digits long
## # A tibble: 1 x 1
## n
## <int>
## 1 0
count((subset(df_did_not_match, fips_length==8))) #Number of fips codes that are 8 digits long
## # A tibble: 1 x 1
## n
## <int>
## 1 1409
unique(df_did_not_match$state)
## [1] Alaska Connecticut Illinois Kansas Maine
## [6] Massachusetts New Hampshire North Dakota Rhode Island Vermont
## [11] Wyoming
## 49 Levels: Alabama Alaska Arizona Arkansas California ... Wyoming
1.6 Data limitations
The election data (new_pr_df) is organized on the basis of towns and cities for many states, while the demographic data we are using is solely organized on the basis of counties. The FIPS code only refers to counties, so for those states that are not organized by county, they have a code in the FIPS code column that is not actually a FIPS code.
Take the case of Connecticut (CT), for instance. Connecticut has a mere 8 counties in reality. However, in the primary election data, new_pr_df, there are 169 observations. In the column labeled county for the state of Connecticut, the names listed do not refer to county names; rather, they refer to town and city names. While each of these towns and cities are in a county and can be assigned to a county, neither the new_pr_df or the facts2 data frames contain the necessary information to do this.
For the analysis we would like to conduct, these 11 states that are not organized on a county basis can be ignored. Moving forward, we will use the data below.
1.7 Data analysis
For the analysis of this data, we use the following data frame, created by an inner join by FIPS codes.
complete_data <- inner_join(new_pr_df, facts2, by = c("fips"))
dim(complete_data) # 2798 counties
## [1] 2798 66
unique(complete_data$state) #40 states
## [1] Alabama Arizona Arkansas California
## [5] Colorado Delaware Florida Georgia
## [9] Hawaii Idaho Illinois Indiana
## [13] Iowa Kentucky Louisiana Maryland
## [17] Michigan Mississippi Missouri Montana
## [21] Nebraska Nevada New Jersey New Mexico
## [25] New York North Carolina Ohio Oklahoma
## [29] Oregon Pennsylvania South Carolina South Dakota
## [33] Tennessee Texas Utah Virginia
## [37] Washington West Virginia Wisconsin Wyoming
## 49 Levels: Alabama Alaska Arizona Arkansas California ... Wyoming
write.csv(complete_data, "complete_data.csv")
trump_data <- complete_data %>% filter(`trump votes`> 0)
write.csv(trump_data, "trump_data.csv")
1.7.1 Analysis on votes gained by Trump and Clinton
Let’s do a Backward elimination process to find the relationship of demographic metrics on Votes gained by Trump and Clinton. Backward elimination is the process of removing the metrics which are less statisically significant to a particular target metric. The idea here is that the metrics having less p-value (significance) has high co-relation to the target metrics. Once this is done, we will look at those metrics which are having less p-value (high significance).
First, we will do it for Trump.
Target metric : Votes gained by Trump
Metrics used : Since there are lot of demographic metrics in the data set, we will use some possible metrics by doing conscious judgment.
+ Persons under 5 years, percent, 2014 + Persons under 18 years, percent, 2014 + Persons 65 years and over, percent, 2014 + Female persons, percent, 2014 + White alone, percent, 2014 + Black or African American alone, percent, 2014 + American Indian and Alaska Native alone, percent, 2014 + Asian alone, percent, 2014 + Native Hawaiian and Other Pacific Islander alone, percent, 2014 + Two or More Races, percent, 2014 + Hispanic or Latino, percent, 2014 + White alone, not Hispanic or Latino, percent, 2014 + Households, 2009-2013 + Per capita money income in past 12 months (2013 dollars), 2009-2013 + Retail sales per capita, 2007
1.7.1.1 Trump analysis
trump_data <- complete_data %>% filter(`trump votes`> 0 ) %>% select("sanders fraction votes","sanders votes","clinton fraction votes","clinton votes","trump fraction votes","trump votes","cruz fraction votes","cruz votes", "Persons under 5 years, percent, 2014","Persons under 18 years, percent, 2014","Persons 65 years and over, percent, 2014","Female persons, percent, 2014","White alone, percent, 2014","Black or African American alone, percent, 2014","American Indian and Alaska Native alone, percent, 2014","Asian alone, percent, 2014","Native Hawaiian and Other Pacific Islander alone, percent, 2014","Two or More Races, percent, 2014","Hispanic or Latino, percent, 2014","White alone, not Hispanic or Latino, percent, 2014","Households, 2009-2013" , "Per capita money income in past 12 months (2013 dollars), 2009-2013" , "Retail sales per capita, 2007")
full_model <- lm(`trump votes` ~ `Persons under 18 years, percent, 2014`+`Persons 65 years and over, percent, 2014`+`Female persons, percent, 2014`+`White alone, percent, 2014`+`Black or African American alone, percent, 2014`+`American Indian and Alaska Native alone, percent, 2014`+`Asian alone, percent, 2014`+`Native Hawaiian and Other Pacific Islander alone, percent, 2014`+`Two or More Races, percent, 2014`+ `Hispanic or Latino, percent, 2014` + `White alone, not Hispanic or Latino, percent, 2014` + `Households, 2009-2013` + `Per capita money income in past 12 months (2013 dollars), 2009-2013` + `Retail sales per capita, 2007` , data= trump_data)
summary(full_model)
##
## Call:
## lm(formula = `trump votes` ~ `Persons under 18 years, percent, 2014` +
## `Persons 65 years and over, percent, 2014` + `Female persons, percent, 2014` +
## `White alone, percent, 2014` + `Black or African American alone, percent, 2014` +
## `American Indian and Alaska Native alone, percent, 2014` +
## `Asian alone, percent, 2014` + `Native Hawaiian and Other Pacific Islander alone, percent, 2014` +
## `Two or More Races, percent, 2014` + `Hispanic or Latino, percent, 2014` +
## `White alone, not Hispanic or Latino, percent, 2014` + `Households, 2009-2013` +
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` +
## `Retail sales per capita, 2007`, data = trump_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -65821 -1580 -446 637 70362
##
## Coefficients:
## Estimate
## (Intercept) -5.682e+04
## `Persons under 18 years, percent, 2014` -9.272e+00
## `Persons 65 years and over, percent, 2014` -3.741e+01
## `Female persons, percent, 2014` 6.134e+01
## `White alone, percent, 2014` 1.071e+03
## `Black or African American alone, percent, 2014` 5.048e+02
## `American Indian and Alaska Native alone, percent, 2014` 4.964e+02
## `Asian alone, percent, 2014` 3.362e+02
## `Native Hawaiian and Other Pacific Islander alone, percent, 2014` -1.291e+02
## `Two or More Races, percent, 2014` 8.235e+02
## `Hispanic or Latino, percent, 2014` -5.522e+02
## `White alone, not Hispanic or Latino, percent, 2014` -5.584e+02
## `Households, 2009-2013` 7.594e-02
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1.923e-01
## `Retail sales per capita, 2007` 1.044e-01
## Std. Error
## (Intercept) 1.353e+05
## `Persons under 18 years, percent, 2014` 4.537e+01
## `Persons 65 years and over, percent, 2014` 3.358e+01
## `Female persons, percent, 2014` 5.133e+01
## `White alone, percent, 2014` 1.359e+03
## `Black or African American alone, percent, 2014` 1.353e+03
## `American Indian and Alaska Native alone, percent, 2014` 1.353e+03
## `Asian alone, percent, 2014` 1.354e+03
## `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 1.395e+03
## `Two or More Races, percent, 2014` 1.355e+03
## `Hispanic or Latino, percent, 2014` 1.463e+02
## `White alone, not Hispanic or Latino, percent, 2014` 1.536e+02
## `Households, 2009-2013` 1.050e-03
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` 2.324e-02
## `Retail sales per capita, 2007` 1.959e-02
## t value
## (Intercept) -0.420
## `Persons under 18 years, percent, 2014` -0.204
## `Persons 65 years and over, percent, 2014` -1.114
## `Female persons, percent, 2014` 1.195
## `White alone, percent, 2014` 0.788
## `Black or African American alone, percent, 2014` 0.373
## `American Indian and Alaska Native alone, percent, 2014` 0.367
## `Asian alone, percent, 2014` 0.248
## `Native Hawaiian and Other Pacific Islander alone, percent, 2014` -0.093
## `Two or More Races, percent, 2014` 0.608
## `Hispanic or Latino, percent, 2014` -3.774
## `White alone, not Hispanic or Latino, percent, 2014` -3.636
## `Households, 2009-2013` 72.315
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` 8.274
## `Retail sales per capita, 2007` 5.329
## Pr(>|t|)
## (Intercept) 0.674625
## `Persons under 18 years, percent, 2014` 0.838097
## `Persons 65 years and over, percent, 2014` 0.265377
## `Female persons, percent, 2014` 0.232147
## `White alone, percent, 2014` 0.430603
## `Black or African American alone, percent, 2014` 0.709101
## `American Indian and Alaska Native alone, percent, 2014` 0.713708
## `Asian alone, percent, 2014` 0.803979
## `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 0.926265
## `Two or More Races, percent, 2014` 0.543274
## `Hispanic or Latino, percent, 2014` 0.000164
## `White alone, not Hispanic or Latino, percent, 2014` 0.000282
## `Households, 2009-2013` < 2e-16
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` < 2e-16
## `Retail sales per capita, 2007` 1.07e-07
##
## (Intercept)
## `Persons under 18 years, percent, 2014`
## `Persons 65 years and over, percent, 2014`
## `Female persons, percent, 2014`
## `White alone, percent, 2014`
## `Black or African American alone, percent, 2014`
## `American Indian and Alaska Native alone, percent, 2014`
## `Asian alone, percent, 2014`
## `Native Hawaiian and Other Pacific Islander alone, percent, 2014`
## `Two or More Races, percent, 2014`
## `Hispanic or Latino, percent, 2014` ***
## `White alone, not Hispanic or Latino, percent, 2014` ***
## `Households, 2009-2013` ***
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` ***
## `Retail sales per capita, 2007` ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4992 on 2694 degrees of freedom
## Multiple R-squared: 0.7506, Adjusted R-squared: 0.7493
## F-statistic: 579.2 on 14 and 2694 DF, p-value: < 2.2e-16
step(full_model ,data= trump_data , direction = "backward" ,test = "F")
## Start: AIC=46152.09
## `trump votes` ~ `Persons under 18 years, percent, 2014` + `Persons 65 years and over, percent, 2014` +
## `Female persons, percent, 2014` + `White alone, percent, 2014` +
## `Black or African American alone, percent, 2014` + `American Indian and Alaska Native alone, percent, 2014` +
## `Asian alone, percent, 2014` + `Native Hawaiian and Other Pacific Islander alone, percent, 2014` +
## `Two or More Races, percent, 2014` + `Hispanic or Latino, percent, 2014` +
## `White alone, not Hispanic or Latino, percent, 2014` + `Households, 2009-2013` +
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` +
## `Retail sales per capita, 2007`
##
## Df
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 1
## - `Persons under 18 years, percent, 2014` 1
## - `Asian alone, percent, 2014` 1
## - `American Indian and Alaska Native alone, percent, 2014` 1
## - `Black or African American alone, percent, 2014` 1
## - `Two or More Races, percent, 2014` 1
## - `White alone, percent, 2014` 1
## - `Persons 65 years and over, percent, 2014` 1
## - `Female persons, percent, 2014` 1
## <none>
## - `White alone, not Hispanic or Latino, percent, 2014` 1
## - `Hispanic or Latino, percent, 2014` 1
## - `Retail sales per capita, 2007` 1
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1
## - `Households, 2009-2013` 1
## Sum of Sq
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 2.1344e+05
## - `Persons under 18 years, percent, 2014` 1.0405e+06
## - `Asian alone, percent, 2014` 1.5353e+06
## - `American Indian and Alaska Native alone, percent, 2014` 3.3546e+06
## - `Black or African American alone, percent, 2014` 3.4686e+06
## - `Two or More Races, percent, 2014` 9.2092e+06
## - `White alone, percent, 2014` 1.5483e+07
## - `Persons 65 years and over, percent, 2014` 3.0922e+07
## - `Female persons, percent, 2014` 3.5589e+07
## <none>
## - `White alone, not Hispanic or Latino, percent, 2014` 3.2942e+08
## - `Hispanic or Latino, percent, 2014` 3.5496e+08
## - `Retail sales per capita, 2007` 7.0755e+08
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1.7057e+09
## - `Households, 2009-2013` 1.3030e+11
## RSS
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 6.7126e+10
## - `Persons under 18 years, percent, 2014` 6.7127e+10
## - `Asian alone, percent, 2014` 6.7128e+10
## - `American Indian and Alaska Native alone, percent, 2014` 6.7129e+10
## - `Black or African American alone, percent, 2014` 6.7129e+10
## - `Two or More Races, percent, 2014` 6.7135e+10
## - `White alone, percent, 2014` 6.7142e+10
## - `Persons 65 years and over, percent, 2014` 6.7157e+10
## - `Female persons, percent, 2014` 6.7162e+10
## <none> 6.7126e+10
## - `White alone, not Hispanic or Latino, percent, 2014` 6.7455e+10
## - `Hispanic or Latino, percent, 2014` 6.7481e+10
## - `Retail sales per capita, 2007` 6.7834e+10
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 6.8832e+10
## - `Households, 2009-2013` 1.9743e+11
## AIC
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 46150
## - `Persons under 18 years, percent, 2014` 46150
## - `Asian alone, percent, 2014` 46150
## - `American Indian and Alaska Native alone, percent, 2014` 46150
## - `Black or African American alone, percent, 2014` 46150
## - `Two or More Races, percent, 2014` 46150
## - `White alone, percent, 2014` 46151
## - `Persons 65 years and over, percent, 2014` 46151
## - `Female persons, percent, 2014` 46152
## <none> 46152
## - `White alone, not Hispanic or Latino, percent, 2014` 46163
## - `Hispanic or Latino, percent, 2014` 46164
## - `Retail sales per capita, 2007` 46178
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 46218
## - `Households, 2009-2013` 49073
## F value
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 0.0086
## - `Persons under 18 years, percent, 2014` 0.0418
## - `Asian alone, percent, 2014` 0.0616
## - `American Indian and Alaska Native alone, percent, 2014` 0.1346
## - `Black or African American alone, percent, 2014` 0.1392
## - `Two or More Races, percent, 2014` 0.3696
## - `White alone, percent, 2014` 0.6214
## - `Persons 65 years and over, percent, 2014` 1.2410
## - `Female persons, percent, 2014` 1.4283
## <none>
## - `White alone, not Hispanic or Latino, percent, 2014` 13.2207
## - `Hispanic or Latino, percent, 2014` 14.2459
## - `Retail sales per capita, 2007` 28.3966
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 68.4561
## - `Households, 2009-2013` 5229.4221
## Pr(>F)
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 0.9262653
## - `Persons under 18 years, percent, 2014` 0.8380972
## - `Asian alone, percent, 2014` 0.8039785
## - `American Indian and Alaska Native alone, percent, 2014` 0.7137075
## - `Black or African American alone, percent, 2014` 0.7091012
## - `Two or More Races, percent, 2014` 0.5432743
## - `White alone, percent, 2014` 0.4306025
## - `Persons 65 years and over, percent, 2014` 0.2653769
## - `Female persons, percent, 2014` 0.2321470
## <none>
## - `White alone, not Hispanic or Latino, percent, 2014` 0.0002821
## - `Hispanic or Latino, percent, 2014` 0.0001639
## - `Retail sales per capita, 2007` 1.07e-07
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` < 2.2e-16
## - `Households, 2009-2013` < 2.2e-16
##
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014`
## - `Persons under 18 years, percent, 2014`
## - `Asian alone, percent, 2014`
## - `American Indian and Alaska Native alone, percent, 2014`
## - `Black or African American alone, percent, 2014`
## - `Two or More Races, percent, 2014`
## - `White alone, percent, 2014`
## - `Persons 65 years and over, percent, 2014`
## - `Female persons, percent, 2014`
## <none>
## - `White alone, not Hispanic or Latino, percent, 2014` ***
## - `Hispanic or Latino, percent, 2014` ***
## - `Retail sales per capita, 2007` ***
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` ***
## - `Households, 2009-2013` ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=46150.1
## `trump votes` ~ `Persons under 18 years, percent, 2014` + `Persons 65 years and over, percent, 2014` +
## `Female persons, percent, 2014` + `White alone, percent, 2014` +
## `Black or African American alone, percent, 2014` + `American Indian and Alaska Native alone, percent, 2014` +
## `Asian alone, percent, 2014` + `Two or More Races, percent, 2014` +
## `Hispanic or Latino, percent, 2014` + `White alone, not Hispanic or Latino, percent, 2014` +
## `Households, 2009-2013` + `Per capita money income in past 12 months (2013 dollars), 2009-2013` +
## `Retail sales per capita, 2007`
##
## Df
## - `Persons under 18 years, percent, 2014` 1
## - `Persons 65 years and over, percent, 2014` 1
## - `Female persons, percent, 2014` 1
## - `Asian alone, percent, 2014` 1
## <none>
## - `Black or African American alone, percent, 2014` 1
## - `American Indian and Alaska Native alone, percent, 2014` 1
## - `Two or More Races, percent, 2014` 1
## - `White alone, percent, 2014` 1
## - `White alone, not Hispanic or Latino, percent, 2014` 1
## - `Hispanic or Latino, percent, 2014` 1
## - `Retail sales per capita, 2007` 1
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1
## - `Households, 2009-2013` 1
## Sum of Sq
## - `Persons under 18 years, percent, 2014` 1.0754e+06
## - `Persons 65 years and over, percent, 2014` 3.0980e+07
## - `Female persons, percent, 2014` 3.5718e+07
## - `Asian alone, percent, 2014` 4.1625e+07
## <none>
## - `Black or African American alone, percent, 2014` 8.9653e+07
## - `American Indian and Alaska Native alone, percent, 2014` 9.0093e+07
## - `Two or More Races, percent, 2014` 1.3861e+08
## - `White alone, percent, 2014` 2.8500e+08
## - `White alone, not Hispanic or Latino, percent, 2014` 3.2944e+08
## - `Hispanic or Latino, percent, 2014` 3.5502e+08
## - `Retail sales per capita, 2007` 7.0763e+08
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1.7100e+09
## - `Households, 2009-2013` 1.3040e+11
## RSS
## - `Persons under 18 years, percent, 2014` 6.7127e+10
## - `Persons 65 years and over, percent, 2014` 6.7157e+10
## - `Female persons, percent, 2014` 6.7162e+10
## - `Asian alone, percent, 2014` 6.7168e+10
## <none> 6.7126e+10
## - `Black or African American alone, percent, 2014` 6.7216e+10
## - `American Indian and Alaska Native alone, percent, 2014` 6.7216e+10
## - `Two or More Races, percent, 2014` 6.7265e+10
## - `White alone, percent, 2014` 6.7411e+10
## - `White alone, not Hispanic or Latino, percent, 2014` 6.7456e+10
## - `Hispanic or Latino, percent, 2014` 6.7481e+10
## - `Retail sales per capita, 2007` 6.7834e+10
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 6.8836e+10
## - `Households, 2009-2013` 1.9753e+11
## AIC
## - `Persons under 18 years, percent, 2014` 46148
## - `Persons 65 years and over, percent, 2014` 46149
## - `Female persons, percent, 2014` 46150
## - `Asian alone, percent, 2014` 46150
## <none> 46150
## - `Black or African American alone, percent, 2014` 46152
## - `American Indian and Alaska Native alone, percent, 2014` 46152
## - `Two or More Races, percent, 2014` 46154
## - `White alone, percent, 2014` 46160
## - `White alone, not Hispanic or Latino, percent, 2014` 46161
## - `Hispanic or Latino, percent, 2014` 46162
## - `Retail sales per capita, 2007` 46177
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 46216
## - `Households, 2009-2013` 49072
## F value
## - `Persons under 18 years, percent, 2014` 0.0432
## - `Persons 65 years and over, percent, 2014` 1.2438
## - `Female persons, percent, 2014` 1.4340
## - `Asian alone, percent, 2014` 1.6712
## <none>
## - `Black or African American alone, percent, 2014` 3.5994
## - `American Indian and Alaska Native alone, percent, 2014` 3.6171
## - `Two or More Races, percent, 2014` 5.5648
## - `White alone, percent, 2014` 11.4424
## - `White alone, not Hispanic or Latino, percent, 2014` 13.2265
## - `Hispanic or Latino, percent, 2014` 14.2535
## - `Retail sales per capita, 2007` 28.4102
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 68.6529
## - `Households, 2009-2013` 5235.3434
## Pr(>F)
## - `Persons under 18 years, percent, 2014` 0.8354108
## - `Persons 65 years and over, percent, 2014` 0.2648416
## - `Female persons, percent, 2014` 0.2312138
## - `Asian alone, percent, 2014` 0.1962120
## <none>
## - `Black or African American alone, percent, 2014` 0.0579071
## - `American Indian and Alaska Native alone, percent, 2014` 0.0572954
## - `Two or More Races, percent, 2014` 0.0183956
## - `White alone, percent, 2014` 0.0007281
## - `White alone, not Hispanic or Latino, percent, 2014` 0.0002812
## - `Hispanic or Latino, percent, 2014` 0.0001632
## - `Retail sales per capita, 2007` 1.063e-07
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` < 2.2e-16
## - `Households, 2009-2013` < 2.2e-16
##
## - `Persons under 18 years, percent, 2014`
## - `Persons 65 years and over, percent, 2014`
## - `Female persons, percent, 2014`
## - `Asian alone, percent, 2014`
## <none>
## - `Black or African American alone, percent, 2014` .
## - `American Indian and Alaska Native alone, percent, 2014` .
## - `Two or More Races, percent, 2014` *
## - `White alone, percent, 2014` ***
## - `White alone, not Hispanic or Latino, percent, 2014` ***
## - `Hispanic or Latino, percent, 2014` ***
## - `Retail sales per capita, 2007` ***
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` ***
## - `Households, 2009-2013` ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=46148.14
## `trump votes` ~ `Persons 65 years and over, percent, 2014` +
## `Female persons, percent, 2014` + `White alone, percent, 2014` +
## `Black or African American alone, percent, 2014` + `American Indian and Alaska Native alone, percent, 2014` +
## `Asian alone, percent, 2014` + `Two or More Races, percent, 2014` +
## `Hispanic or Latino, percent, 2014` + `White alone, not Hispanic or Latino, percent, 2014` +
## `Households, 2009-2013` + `Per capita money income in past 12 months (2013 dollars), 2009-2013` +
## `Retail sales per capita, 2007`
##
## Df
## - `Female persons, percent, 2014` 1
## - `Persons 65 years and over, percent, 2014` 1
## - `Asian alone, percent, 2014` 1
## <none>
## - `American Indian and Alaska Native alone, percent, 2014` 1
## - `Black or African American alone, percent, 2014` 1
## - `Two or More Races, percent, 2014` 1
## - `White alone, percent, 2014` 1
## - `White alone, not Hispanic or Latino, percent, 2014` 1
## - `Hispanic or Latino, percent, 2014` 1
## - `Retail sales per capita, 2007` 1
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1
## - `Households, 2009-2013` 1
## Sum of Sq
## - `Female persons, percent, 2014` 3.7903e+07
## - `Persons 65 years and over, percent, 2014` 4.2856e+07
## - `Asian alone, percent, 2014` 4.5993e+07
## <none>
## - `American Indian and Alaska Native alone, percent, 2014` 9.5863e+07
## - `Black or African American alone, percent, 2014` 9.6385e+07
## - `Two or More Races, percent, 2014` 1.4802e+08
## - `White alone, percent, 2014` 2.9863e+08
## - `White alone, not Hispanic or Latino, percent, 2014` 3.2921e+08
## - `Hispanic or Latino, percent, 2014` 3.5556e+08
## - `Retail sales per capita, 2007` 7.2012e+08
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1.7118e+09
## - `Households, 2009-2013` 1.3107e+11
## RSS
## - `Female persons, percent, 2014` 6.7165e+10
## - `Persons 65 years and over, percent, 2014` 6.7170e+10
## - `Asian alone, percent, 2014` 6.7173e+10
## <none> 6.7127e+10
## - `American Indian and Alaska Native alone, percent, 2014` 6.7223e+10
## - `Black or African American alone, percent, 2014` 6.7224e+10
## - `Two or More Races, percent, 2014` 6.7275e+10
## - `White alone, percent, 2014` 6.7426e+10
## - `White alone, not Hispanic or Latino, percent, 2014` 6.7457e+10
## - `Hispanic or Latino, percent, 2014` 6.7483e+10
## - `Retail sales per capita, 2007` 6.7847e+10
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 6.8839e+10
## - `Households, 2009-2013` 1.9819e+11
## AIC
## - `Female persons, percent, 2014` 46148
## - `Persons 65 years and over, percent, 2014` 46148
## - `Asian alone, percent, 2014` 46148
## <none> 46148
## - `American Indian and Alaska Native alone, percent, 2014` 46150
## - `Black or African American alone, percent, 2014` 46150
## - `Two or More Races, percent, 2014` 46152
## - `White alone, percent, 2014` 46158
## - `White alone, not Hispanic or Latino, percent, 2014` 46159
## - `Hispanic or Latino, percent, 2014` 46160
## - `Retail sales per capita, 2007` 46175
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 46214
## - `Households, 2009-2013` 49079
## F value
## - `Female persons, percent, 2014` 1.5223
## - `Persons 65 years and over, percent, 2014` 1.7212
## - `Asian alone, percent, 2014` 1.8472
## <none>
## - `American Indian and Alaska Native alone, percent, 2014` 3.8501
## - `Black or African American alone, percent, 2014` 3.8711
## - `Two or More Races, percent, 2014` 5.9450
## - `White alone, percent, 2014` 11.9939
## - `White alone, not Hispanic or Latino, percent, 2014` 13.2218
## - `Hispanic or Latino, percent, 2014` 14.2801
## - `Retail sales per capita, 2007` 28.9219
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 68.7516
## - `Households, 2009-2013` 5263.8982
## Pr(>F)
## - `Female persons, percent, 2014` 0.2173808
## - `Persons 65 years and over, percent, 2014` 0.1896517
## - `Asian alone, percent, 2014` 0.1742260
## <none>
## - `American Indian and Alaska Native alone, percent, 2014` 0.0498464
## - `Black or African American alone, percent, 2014` 0.0492279
## - `Two or More Races, percent, 2014` 0.0148232
## - `White alone, percent, 2014` 0.0005421
## - `White alone, not Hispanic or Latino, percent, 2014` 0.0002819
## - `Hispanic or Latino, percent, 2014` 0.0001609
## - `Retail sales per capita, 2007` 8.182e-08
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` < 2.2e-16
## - `Households, 2009-2013` < 2.2e-16
##
## - `Female persons, percent, 2014`
## - `Persons 65 years and over, percent, 2014`
## - `Asian alone, percent, 2014`
## <none>
## - `American Indian and Alaska Native alone, percent, 2014` *
## - `Black or African American alone, percent, 2014` *
## - `Two or More Races, percent, 2014` *
## - `White alone, percent, 2014` ***
## - `White alone, not Hispanic or Latino, percent, 2014` ***
## - `Hispanic or Latino, percent, 2014` ***
## - `Retail sales per capita, 2007` ***
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` ***
## - `Households, 2009-2013` ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=46147.67
## `trump votes` ~ `Persons 65 years and over, percent, 2014` +
## `White alone, percent, 2014` + `Black or African American alone, percent, 2014` +
## `American Indian and Alaska Native alone, percent, 2014` +
## `Asian alone, percent, 2014` + `Two or More Races, percent, 2014` +
## `Hispanic or Latino, percent, 2014` + `White alone, not Hispanic or Latino, percent, 2014` +
## `Households, 2009-2013` + `Per capita money income in past 12 months (2013 dollars), 2009-2013` +
## `Retail sales per capita, 2007`
##
## Df
## - `Persons 65 years and over, percent, 2014` 1
## - `Asian alone, percent, 2014` 1
## <none>
## - `American Indian and Alaska Native alone, percent, 2014` 1
## - `Black or African American alone, percent, 2014` 1
## - `Two or More Races, percent, 2014` 1
## - `White alone, percent, 2014` 1
## - `White alone, not Hispanic or Latino, percent, 2014` 1
## - `Hispanic or Latino, percent, 2014` 1
## - `Retail sales per capita, 2007` 1
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1
## - `Households, 2009-2013` 1
## Sum of Sq
## - `Persons 65 years and over, percent, 2014` 3.4398e+07
## - `Asian alone, percent, 2014` 4.5628e+07
## <none>
## - `American Indian and Alaska Native alone, percent, 2014` 9.6192e+07
## - `Black or African American alone, percent, 2014` 9.6775e+07
## - `Two or More Races, percent, 2014` 1.4852e+08
## - `White alone, percent, 2014` 3.0215e+08
## - `White alone, not Hispanic or Latino, percent, 2014` 3.3814e+08
## - `Hispanic or Latino, percent, 2014` 3.6656e+08
## - `Retail sales per capita, 2007` 8.1231e+08
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1.8054e+09
## - `Households, 2009-2013` 1.3268e+11
## RSS
## - `Persons 65 years and over, percent, 2014` 6.7200e+10
## - `Asian alone, percent, 2014` 6.7211e+10
## <none> 6.7165e+10
## - `American Indian and Alaska Native alone, percent, 2014` 6.7261e+10
## - `Black or African American alone, percent, 2014` 6.7262e+10
## - `Two or More Races, percent, 2014` 6.7314e+10
## - `White alone, percent, 2014` 6.7467e+10
## - `White alone, not Hispanic or Latino, percent, 2014` 6.7503e+10
## - `Hispanic or Latino, percent, 2014` 6.7532e+10
## - `Retail sales per capita, 2007` 6.7978e+10
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 6.8971e+10
## - `Households, 2009-2013` 1.9984e+11
## AIC
## - `Persons 65 years and over, percent, 2014` 46147
## - `Asian alone, percent, 2014` 46148
## <none> 46148
## - `American Indian and Alaska Native alone, percent, 2014` 46150
## - `Black or African American alone, percent, 2014` 46150
## - `Two or More Races, percent, 2014` 46152
## - `White alone, percent, 2014` 46158
## - `White alone, not Hispanic or Latino, percent, 2014` 46159
## - `Hispanic or Latino, percent, 2014` 46160
## - `Retail sales per capita, 2007` 46178
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 46218
## - `Households, 2009-2013` 49100
## F value
## - `Persons 65 years and over, percent, 2014` 1.3812
## - `Asian alone, percent, 2014` 1.8322
## <none>
## - `American Indian and Alaska Native alone, percent, 2014` 3.8626
## - `Black or African American alone, percent, 2014` 3.8860
## - `Two or More Races, percent, 2014` 5.9636
## - `White alone, percent, 2014` 12.1329
## - `White alone, not Hispanic or Latino, percent, 2014` 13.5781
## - `Hispanic or Latino, percent, 2014` 14.7192
## - `Retail sales per capita, 2007` 32.6180
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 72.4963
## - `Households, 2009-2013` 5327.7099
## Pr(>F)
## - `Persons 65 years and over, percent, 2014` 0.2399922
## - `Asian alone, percent, 2014` 0.1759858
## <none>
## - `American Indian and Alaska Native alone, percent, 2014` 0.0494774
## - `Black or African American alone, percent, 2014` 0.0487928
## - `Two or More Races, percent, 2014` 0.0146680
## - `White alone, percent, 2014` 0.0005033
## - `White alone, not Hispanic or Latino, percent, 2014` 0.0002333
## - `Hispanic or Latino, percent, 2014` 0.0001276
## - `Retail sales per capita, 2007` 1.244e-08
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` < 2.2e-16
## - `Households, 2009-2013` < 2.2e-16
##
## - `Persons 65 years and over, percent, 2014`
## - `Asian alone, percent, 2014`
## <none>
## - `American Indian and Alaska Native alone, percent, 2014` *
## - `Black or African American alone, percent, 2014` *
## - `Two or More Races, percent, 2014` *
## - `White alone, percent, 2014` ***
## - `White alone, not Hispanic or Latino, percent, 2014` ***
## - `Hispanic or Latino, percent, 2014` ***
## - `Retail sales per capita, 2007` ***
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` ***
## - `Households, 2009-2013` ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=46147.06
## `trump votes` ~ `White alone, percent, 2014` + `Black or African American alone, percent, 2014` +
## `American Indian and Alaska Native alone, percent, 2014` +
## `Asian alone, percent, 2014` + `Two or More Races, percent, 2014` +
## `Hispanic or Latino, percent, 2014` + `White alone, not Hispanic or Latino, percent, 2014` +
## `Households, 2009-2013` + `Per capita money income in past 12 months (2013 dollars), 2009-2013` +
## `Retail sales per capita, 2007`
##
## Df
## <none>
## - `Asian alone, percent, 2014` 1
## - `American Indian and Alaska Native alone, percent, 2014` 1
## - `Black or African American alone, percent, 2014` 1
## - `Two or More Races, percent, 2014` 1
## - `White alone, percent, 2014` 1
## - `White alone, not Hispanic or Latino, percent, 2014` 1
## - `Hispanic or Latino, percent, 2014` 1
## - `Retail sales per capita, 2007` 1
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1
## - `Households, 2009-2013` 1
## Sum of Sq
## <none>
## - `Asian alone, percent, 2014` 5.1833e+07
## - `American Indian and Alaska Native alone, percent, 2014` 1.0263e+08
## - `Black or African American alone, percent, 2014` 1.0295e+08
## - `Two or More Races, percent, 2014` 1.5391e+08
## - `White alone, percent, 2014` 3.1164e+08
## - `White alone, not Hispanic or Latino, percent, 2014` 3.3874e+08
## - `Hispanic or Latino, percent, 2014` 3.6493e+08
## - `Retail sales per capita, 2007` 8.6681e+08
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1.7865e+09
## - `Households, 2009-2013` 1.3303e+11
## RSS
## <none> 6.7200e+10
## - `Asian alone, percent, 2014` 6.7251e+10
## - `American Indian and Alaska Native alone, percent, 2014` 6.7302e+10
## - `Black or African American alone, percent, 2014` 6.7303e+10
## - `Two or More Races, percent, 2014` 6.7354e+10
## - `White alone, percent, 2014` 6.7511e+10
## - `White alone, not Hispanic or Latino, percent, 2014` 6.7538e+10
## - `Hispanic or Latino, percent, 2014` 6.7565e+10
## - `Retail sales per capita, 2007` 6.8066e+10
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 6.8986e+10
## - `Households, 2009-2013` 2.0023e+11
## AIC
## <none> 46147
## - `Asian alone, percent, 2014` 46147
## - `American Indian and Alaska Native alone, percent, 2014` 46149
## - `Black or African American alone, percent, 2014` 46149
## - `Two or More Races, percent, 2014` 46151
## - `White alone, percent, 2014` 46158
## - `White alone, not Hispanic or Latino, percent, 2014` 46159
## - `Hispanic or Latino, percent, 2014` 46160
## - `Retail sales per capita, 2007` 46180
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 46216
## - `Households, 2009-2013` 49103
## F value
## <none>
## - `Asian alone, percent, 2014` 2.0810
## - `American Indian and Alaska Native alone, percent, 2014` 4.1203
## - `Black or African American alone, percent, 2014` 4.1332
## - `Two or More Races, percent, 2014` 6.1792
## - `White alone, percent, 2014` 12.5119
## - `White alone, not Hispanic or Latino, percent, 2014` 13.6002
## - `Hispanic or Latino, percent, 2014` 14.6517
## - `Retail sales per capita, 2007` 34.8016
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 71.7245
## - `Households, 2009-2013` 5340.9454
## Pr(>F)
## <none>
## - `Asian alone, percent, 2014` 0.1492547
## - `American Indian and Alaska Native alone, percent, 2014` 0.0424690
## - `Black or African American alone, percent, 2014` 0.0421483
## - `Two or More Races, percent, 2014` 0.0129865
## - `White alone, percent, 2014` 0.0004112
## - `White alone, not Hispanic or Latino, percent, 2014` 0.0002306
## - `Hispanic or Latino, percent, 2014` 0.0001323
## - `Retail sales per capita, 2007` 4.106e-09
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` < 2.2e-16
## - `Households, 2009-2013` < 2.2e-16
##
## <none>
## - `Asian alone, percent, 2014`
## - `American Indian and Alaska Native alone, percent, 2014` *
## - `Black or African American alone, percent, 2014` *
## - `Two or More Races, percent, 2014` *
## - `White alone, percent, 2014` ***
## - `White alone, not Hispanic or Latino, percent, 2014` ***
## - `Hispanic or Latino, percent, 2014` ***
## - `Retail sales per capita, 2007` ***
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` ***
## - `Households, 2009-2013` ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call:
## lm(formula = `trump votes` ~ `White alone, percent, 2014` + `Black or African American alone, percent, 2014` +
## `American Indian and Alaska Native alone, percent, 2014` +
## `Asian alone, percent, 2014` + `Two or More Races, percent, 2014` +
## `Hispanic or Latino, percent, 2014` + `White alone, not Hispanic or Latino, percent, 2014` +
## `Households, 2009-2013` + `Per capita money income in past 12 months (2013 dollars), 2009-2013` +
## `Retail sales per capita, 2007`, data = trump_data)
##
## Coefficients:
## (Intercept)
## -6.988e+04
## `White alone, percent, 2014`
## 1.229e+03
## `Black or African American alone, percent, 2014`
## 6.592e+02
## `American Indian and Alaska Native alone, percent, 2014`
## 6.498e+02
## `Asian alone, percent, 2014`
## 4.991e+02
## `Two or More Races, percent, 2014`
## 9.766e+02
## `Hispanic or Latino, percent, 2014`
## -5.593e+02
## `White alone, not Hispanic or Latino, percent, 2014`
## -5.658e+02
## `Households, 2009-2013`
## 7.613e-02
## `Per capita money income in past 12 months (2013 dollars), 2009-2013`
## 1.948e-01
## `Retail sales per capita, 2007`
## 1.121e-01
Inference from backward elimination process [Trump]
We see from the above results, the votes gain for Trump has some high co-relation to below metrics.
White alone, percent, 201412.5119 0.0004112 ***White alone, not Hispanic or Latino, percent, 201413.6002 0.0002306 ***Hispanic or Latino, percent, 201414.6517 0.0001323 ***Retail sales per capita, 200734.8016 4.106e-09 ***Per capita money income in past 12 months (2013 dollars), 2009-201371.7245 < 2.2e-16 ***Households, 2009-20135340.9454 < 2.2e-16 ***
1.7.1.2 Clinton analysis
clinton_data <- complete_data %>% filter(`clinton votes`> 0 ) %>% select("sanders fraction votes","sanders votes","clinton fraction votes","clinton votes","trump fraction votes","trump votes","cruz fraction votes","cruz votes", "Persons under 5 years, percent, 2014","Persons under 18 years, percent, 2014","Persons 65 years and over, percent, 2014","Female persons, percent, 2014","White alone, percent, 2014","Black or African American alone, percent, 2014","American Indian and Alaska Native alone, percent, 2014","Asian alone, percent, 2014","Native Hawaiian and Other Pacific Islander alone, percent, 2014","Two or More Races, percent, 2014","Hispanic or Latino, percent, 2014","White alone, not Hispanic or Latino, percent, 2014","Households, 2009-2013" , "Per capita money income in past 12 months (2013 dollars), 2009-2013" , "Retail sales per capita, 2007")
full_model <- lm(`clinton votes` ~ `Persons under 18 years, percent, 2014`+`Persons 65 years and over, percent, 2014`+`Female persons, percent, 2014`+`White alone, percent, 2014`+`Black or African American alone, percent, 2014`+`American Indian and Alaska Native alone, percent, 2014`+`Asian alone, percent, 2014`+`Native Hawaiian and Other Pacific Islander alone, percent, 2014`+`Two or More Races, percent, 2014`+ `Hispanic or Latino, percent, 2014` + `White alone, not Hispanic or Latino, percent, 2014` + `Households, 2009-2013` + `Per capita money income in past 12 months (2013 dollars), 2009-2013` + `Retail sales per capita, 2007` , data= trump_data)
summary(full_model)
##
## Call:
## lm(formula = `clinton votes` ~ `Persons under 18 years, percent, 2014` +
## `Persons 65 years and over, percent, 2014` + `Female persons, percent, 2014` +
## `White alone, percent, 2014` + `Black or African American alone, percent, 2014` +
## `American Indian and Alaska Native alone, percent, 2014` +
## `Asian alone, percent, 2014` + `Native Hawaiian and Other Pacific Islander alone, percent, 2014` +
## `Two or More Races, percent, 2014` + `Hispanic or Latino, percent, 2014` +
## `White alone, not Hispanic or Latino, percent, 2014` + `Households, 2009-2013` +
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` +
## `Retail sales per capita, 2007`, data = trump_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -122141 -1131 226 1223 116968
##
## Coefficients:
## Estimate
## (Intercept) -1.835e+05
## `Persons under 18 years, percent, 2014` -1.965e+02
## `Persons 65 years and over, percent, 2014` 2.608e+01
## `Female persons, percent, 2014` 1.574e+02
## `White alone, percent, 2014` 9.920e+02
## `Black or African American alone, percent, 2014` 1.846e+03
## `American Indian and Alaska Native alone, percent, 2014` 1.818e+03
## `Asian alone, percent, 2014` 1.910e+03
## `Native Hawaiian and Other Pacific Islander alone, percent, 2014` -2.834e+02
## `Two or More Races, percent, 2014` 1.371e+03
## `Hispanic or Latino, percent, 2014` 7.517e+02
## `White alone, not Hispanic or Latino, percent, 2014` 7.685e+02
## `Households, 2009-2013` 1.555e-01
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1.281e-01
## `Retail sales per capita, 2007` -8.968e-02
## Std. Error
## (Intercept) 1.828e+05
## `Persons under 18 years, percent, 2014` 6.130e+01
## `Persons 65 years and over, percent, 2014` 4.537e+01
## `Female persons, percent, 2014` 6.935e+01
## `White alone, percent, 2014` 1.836e+03
## `Black or African American alone, percent, 2014` 1.828e+03
## `American Indian and Alaska Native alone, percent, 2014` 1.828e+03
## `Asian alone, percent, 2014` 1.830e+03
## `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 1.885e+03
## `Two or More Races, percent, 2014` 1.830e+03
## `Hispanic or Latino, percent, 2014` 1.977e+02
## `White alone, not Hispanic or Latino, percent, 2014` 2.075e+02
## `Households, 2009-2013` 1.419e-03
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` 3.140e-02
## `Retail sales per capita, 2007` 2.646e-02
## t value
## (Intercept) -1.004
## `Persons under 18 years, percent, 2014` -3.206
## `Persons 65 years and over, percent, 2014` 0.575
## `Female persons, percent, 2014` 2.270
## `White alone, percent, 2014` 0.540
## `Black or African American alone, percent, 2014` 1.010
## `American Indian and Alaska Native alone, percent, 2014` 0.995
## `Asian alone, percent, 2014` 1.044
## `Native Hawaiian and Other Pacific Islander alone, percent, 2014` -0.150
## `Two or More Races, percent, 2014` 0.749
## `Hispanic or Latino, percent, 2014` 3.803
## `White alone, not Hispanic or Latino, percent, 2014` 3.704
## `Households, 2009-2013` 109.572
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` 4.079
## `Retail sales per capita, 2007` -3.389
## Pr(>|t|)
## (Intercept) 0.315623
## `Persons under 18 years, percent, 2014` 0.001363
## `Persons 65 years and over, percent, 2014` 0.565423
## `Female persons, percent, 2014` 0.023271
## `White alone, percent, 2014` 0.589038
## `Black or African American alone, percent, 2014` 0.312537
## `American Indian and Alaska Native alone, percent, 2014` 0.319869
## `Asian alone, percent, 2014` 0.296606
## `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 0.880499
## `Two or More Races, percent, 2014` 0.453887
## `Hispanic or Latino, percent, 2014` 0.000146
## `White alone, not Hispanic or Latino, percent, 2014` 0.000217
## `Households, 2009-2013` < 2e-16
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` 4.65e-05
## `Retail sales per capita, 2007` 0.000712
##
## (Intercept)
## `Persons under 18 years, percent, 2014` **
## `Persons 65 years and over, percent, 2014`
## `Female persons, percent, 2014` *
## `White alone, percent, 2014`
## `Black or African American alone, percent, 2014`
## `American Indian and Alaska Native alone, percent, 2014`
## `Asian alone, percent, 2014`
## `Native Hawaiian and Other Pacific Islander alone, percent, 2014`
## `Two or More Races, percent, 2014`
## `Hispanic or Latino, percent, 2014` ***
## `White alone, not Hispanic or Latino, percent, 2014` ***
## `Households, 2009-2013` ***
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` ***
## `Retail sales per capita, 2007` ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6744 on 2694 degrees of freedom
## Multiple R-squared: 0.8751, Adjusted R-squared: 0.8745
## F-statistic: 1348 on 14 and 2694 DF, p-value: < 2.2e-16
step(full_model ,data= clinton_data , direction = "backward" ,test = "F")
## Start: AIC=47782.23
## `clinton votes` ~ `Persons under 18 years, percent, 2014` + `Persons 65 years and over, percent, 2014` +
## `Female persons, percent, 2014` + `White alone, percent, 2014` +
## `Black or African American alone, percent, 2014` + `American Indian and Alaska Native alone, percent, 2014` +
## `Asian alone, percent, 2014` + `Native Hawaiian and Other Pacific Islander alone, percent, 2014` +
## `Two or More Races, percent, 2014` + `Hispanic or Latino, percent, 2014` +
## `White alone, not Hispanic or Latino, percent, 2014` + `Households, 2009-2013` +
## `Per capita money income in past 12 months (2013 dollars), 2009-2013` +
## `Retail sales per capita, 2007`
##
## Df
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 1
## - `White alone, percent, 2014` 1
## - `Persons 65 years and over, percent, 2014` 1
## - `Two or More Races, percent, 2014` 1
## - `American Indian and Alaska Native alone, percent, 2014` 1
## - `Black or African American alone, percent, 2014` 1
## - `Asian alone, percent, 2014` 1
## <none>
## - `Female persons, percent, 2014` 1
## - `Persons under 18 years, percent, 2014` 1
## - `Retail sales per capita, 2007` 1
## - `White alone, not Hispanic or Latino, percent, 2014` 1
## - `Hispanic or Latino, percent, 2014` 1
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1
## - `Households, 2009-2013` 1
## Sum of Sq
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 1.0281e+06
## - `White alone, percent, 2014` 1.3277e+07
## - `Persons 65 years and over, percent, 2014` 1.5031e+07
## - `Two or More Races, percent, 2014` 2.5519e+07
## - `American Indian and Alaska Native alone, percent, 2014` 4.5019e+07
## - `Black or African American alone, percent, 2014` 4.6404e+07
## - `Asian alone, percent, 2014` 4.9566e+07
## <none>
## - `Female persons, percent, 2014` 2.3441e+08
## - `Persons under 18 years, percent, 2014` 4.6742e+08
## - `Retail sales per capita, 2007` 5.2236e+08
## - `White alone, not Hispanic or Latino, percent, 2014` 6.2388e+08
## - `Hispanic or Latino, percent, 2014` 6.5773e+08
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 7.5673e+08
## - `Households, 2009-2013` 5.4605e+11
## RSS
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 1.2253e+11
## - `White alone, percent, 2014` 1.2254e+11
## - `Persons 65 years and over, percent, 2014` 1.2254e+11
## - `Two or More Races, percent, 2014` 1.2255e+11
## - `American Indian and Alaska Native alone, percent, 2014` 1.2257e+11
## - `Black or African American alone, percent, 2014` 1.2257e+11
## - `Asian alone, percent, 2014` 1.2258e+11
## <none> 1.2253e+11
## - `Female persons, percent, 2014` 1.2276e+11
## - `Persons under 18 years, percent, 2014` 1.2299e+11
## - `Retail sales per capita, 2007` 1.2305e+11
## - `White alone, not Hispanic or Latino, percent, 2014` 1.2315e+11
## - `Hispanic or Latino, percent, 2014` 1.2318e+11
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1.2328e+11
## - `Households, 2009-2013` 6.6858e+11
## AIC
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 47780
## - `White alone, percent, 2014` 47781
## - `Persons 65 years and over, percent, 2014` 47781
## - `Two or More Races, percent, 2014` 47781
## - `American Indian and Alaska Native alone, percent, 2014` 47781
## - `Black or African American alone, percent, 2014` 47781
## - `Asian alone, percent, 2014` 47781
## <none> 47782
## - `Female persons, percent, 2014` 47785
## - `Persons under 18 years, percent, 2014` 47791
## - `Retail sales per capita, 2007` 47792
## - `White alone, not Hispanic or Latino, percent, 2014` 47794
## - `Hispanic or Latino, percent, 2014` 47795
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 47797
## - `Households, 2009-2013` 52377
## F value
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 0.0226
## - `White alone, percent, 2014` 0.2919
## - `Persons 65 years and over, percent, 2014` 0.3305
## - `Two or More Races, percent, 2014` 0.5611
## - `American Indian and Alaska Native alone, percent, 2014` 0.9898
## - `Black or African American alone, percent, 2014` 1.0203
## - `Asian alone, percent, 2014` 1.0898
## <none>
## - `Female persons, percent, 2014` 5.1540
## - `Persons under 18 years, percent, 2014` 10.2773
## - `Retail sales per capita, 2007` 11.4853
## - `White alone, not Hispanic or Latino, percent, 2014` 13.7174
## - `Hispanic or Latino, percent, 2014` 14.4617
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 16.6384
## - `Households, 2009-2013` 12006.1001
## Pr(>F)
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014` 0.8804987
## - `White alone, percent, 2014` 0.5890381
## - `Persons 65 years and over, percent, 2014` 0.5654232
## - `Two or More Races, percent, 2014` 0.4538869
## - `American Indian and Alaska Native alone, percent, 2014` 0.3198689
## - `Black or African American alone, percent, 2014` 0.3125374
## - `Asian alone, percent, 2014` 0.2966056
## <none>
## - `Female persons, percent, 2014` 0.0232711
## - `Persons under 18 years, percent, 2014` 0.0013625
## - `Retail sales per capita, 2007` 0.0007116
## - `White alone, not Hispanic or Latino, percent, 2014` 0.0002167
## - `Hispanic or Latino, percent, 2014` 0.0001462
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 4.654e-05
## - `Households, 2009-2013` < 2.2e-16
##
## - `Native Hawaiian and Other Pacific Islander alone, percent, 2014`
## - `White alone, percent, 2014`
## - `Persons 65 years and over, percent, 2014`
## - `Two or More Races, percent, 2014`
## - `American Indian and Alaska Native alone, percent, 2014`
## - `Black or African American alone, percent, 2014`
## - `Asian alone, percent, 2014`
## <none>
## - `Female persons, percent, 2014` *
## - `Persons under 18 years, percent, 2014` **
## - `Retail sales per capita, 2007` ***
## - `White alone, not Hispanic or Latino, percent, 2014` ***
## - `Hispanic or Latino, percent, 2014` ***
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` ***
## - `Households, 2009-2013` ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=47780.25
## `clinton votes` ~ `Persons under 18 years, percent, 2014` + `Persons 65 years and over, percent, 2014` +
## `Female persons, percent, 2014` + `White alone, percent, 2014` +
## `Black or African American alone, percent, 2014` + `American Indian and Alaska Native alone, percent, 2014` +
## `Asian alone, percent, 2014` + `Two or More Races, percent, 2014` +
## `Hispanic or Latino, percent, 2014` + `White alone, not Hispanic or Latino, percent, 2014` +
## `Households, 2009-2013` + `Per capita money income in past 12 months (2013 dollars), 2009-2013` +
## `Retail sales per capita, 2007`
##
## Df
## - `Persons 65 years and over, percent, 2014` 1
## <none>
## - `Female persons, percent, 2014` 1
## - `White alone, percent, 2014` 1
## - `Two or More Races, percent, 2014` 1
## - `Persons under 18 years, percent, 2014` 1
## - `Retail sales per capita, 2007` 1
## - `White alone, not Hispanic or Latino, percent, 2014` 1
## - `Hispanic or Latino, percent, 2014` 1
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1
## - `Asian alone, percent, 2014` 1
## - `Black or African American alone, percent, 2014` 1
## - `American Indian and Alaska Native alone, percent, 2014` 1
## - `Households, 2009-2013` 1
## Sum of Sq
## - `Persons 65 years and over, percent, 2014` 1.4949e+07
## <none>
## - `Female persons, percent, 2014` 2.3515e+08
## - `White alone, percent, 2014` 3.1735e+08
## - `Two or More Races, percent, 2014` 4.1577e+08
## - `Persons under 18 years, percent, 2014` 4.6956e+08
## - `Retail sales per capita, 2007` 5.2223e+08
## - `White alone, not Hispanic or Latino, percent, 2014` 6.2381e+08
## - `Hispanic or Latino, percent, 2014` 6.5757e+08
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 7.6019e+08
## - `Asian alone, percent, 2014` 9.4274e+08
## - `Black or African American alone, percent, 2014` 1.0206e+09
## - `American Indian and Alaska Native alone, percent, 2014` 1.0259e+09
## - `Households, 2009-2013` 5.4647e+11
## RSS
## - `Persons 65 years and over, percent, 2014` 1.2254e+11
## <none> 1.2253e+11
## - `Female persons, percent, 2014` 1.2276e+11
## - `White alone, percent, 2014` 1.2284e+11
## - `Two or More Races, percent, 2014` 1.2294e+11
## - `Persons under 18 years, percent, 2014` 1.2300e+11
## - `Retail sales per capita, 2007` 1.2305e+11
## - `White alone, not Hispanic or Latino, percent, 2014` 1.2315e+11
## - `Hispanic or Latino, percent, 2014` 1.2318e+11
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1.2329e+11
## - `Asian alone, percent, 2014` 1.2347e+11
## - `Black or African American alone, percent, 2014` 1.2355e+11
## - `American Indian and Alaska Native alone, percent, 2014` 1.2355e+11
## - `Households, 2009-2013` 6.6900e+11
## AIC
## - `Persons 65 years and over, percent, 2014` 47779
## <none> 47780
## - `Female persons, percent, 2014` 47783
## - `White alone, percent, 2014` 47785
## - `Two or More Races, percent, 2014` 47787
## - `Persons under 18 years, percent, 2014` 47789
## - `Retail sales per capita, 2007` 47790
## - `White alone, not Hispanic or Latino, percent, 2014` 47792
## - `Hispanic or Latino, percent, 2014` 47793
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 47795
## - `Asian alone, percent, 2014` 47799
## - `Black or African American alone, percent, 2014` 47801
## - `American Indian and Alaska Native alone, percent, 2014` 47801
## - `Households, 2009-2013` 52377
## F value
## - `Persons 65 years and over, percent, 2014` 0.3288
## <none>
## - `Female persons, percent, 2014` 5.1723
## - `White alone, percent, 2014` 6.9803
## - `Two or More Races, percent, 2014` 9.1450
## - `Persons under 18 years, percent, 2014` 10.3281
## - `Retail sales per capita, 2007` 11.4866
## - `White alone, not Hispanic or Latino, percent, 2014` 13.7208
## - `Hispanic or Latino, percent, 2014` 14.4635
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 16.7205
## - `Asian alone, percent, 2014` 20.7357
## - `Black or African American alone, percent, 2014` 22.4480
## - `American Indian and Alaska Native alone, percent, 2014` 22.5652
## - `Households, 2009-2013` 12019.6905
## Pr(>F)
## - `Persons 65 years and over, percent, 2014` 0.5664092
## <none>
## - `Female persons, percent, 2014` 0.0230285
## - `White alone, percent, 2014` 0.0082891
## - `Two or More Races, percent, 2014` 0.0025175
## - `Persons under 18 years, percent, 2014` 0.0013257
## - `Retail sales per capita, 2007` 0.0007111
## - `White alone, not Hispanic or Latino, percent, 2014` 0.0002164
## - `Hispanic or Latino, percent, 2014` 0.0001461
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 4.458e-05
## - `Asian alone, percent, 2014` 5.506e-06
## - `Black or African American alone, percent, 2014` 2.271e-06
## - `American Indian and Alaska Native alone, percent, 2014` 2.138e-06
## - `Households, 2009-2013` < 2.2e-16
##
## - `Persons 65 years and over, percent, 2014`
## <none>
## - `Female persons, percent, 2014` *
## - `White alone, percent, 2014` **
## - `Two or More Races, percent, 2014` **
## - `Persons under 18 years, percent, 2014` **
## - `Retail sales per capita, 2007` ***
## - `White alone, not Hispanic or Latino, percent, 2014` ***
## - `Hispanic or Latino, percent, 2014` ***
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` ***
## - `Asian alone, percent, 2014` ***
## - `Black or African American alone, percent, 2014` ***
## - `American Indian and Alaska Native alone, percent, 2014` ***
## - `Households, 2009-2013` ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=47778.58
## `clinton votes` ~ `Persons under 18 years, percent, 2014` + `Female persons, percent, 2014` +
## `White alone, percent, 2014` + `Black or African American alone, percent, 2014` +
## `American Indian and Alaska Native alone, percent, 2014` +
## `Asian alone, percent, 2014` + `Two or More Races, percent, 2014` +
## `Hispanic or Latino, percent, 2014` + `White alone, not Hispanic or Latino, percent, 2014` +
## `Households, 2009-2013` + `Per capita money income in past 12 months (2013 dollars), 2009-2013` +
## `Retail sales per capita, 2007`
##
## Df
## <none>
## - `White alone, percent, 2014` 1
## - `Female persons, percent, 2014` 1
## - `Two or More Races, percent, 2014` 1
## - `Retail sales per capita, 2007` 1
## - `White alone, not Hispanic or Latino, percent, 2014` 1
## - `Hispanic or Latino, percent, 2014` 1
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1
## - `Asian alone, percent, 2014` 1
## - `Black or African American alone, percent, 2014` 1
## - `American Indian and Alaska Native alone, percent, 2014` 1
## - `Persons under 18 years, percent, 2014` 1
## - `Households, 2009-2013` 1
## Sum of Sq
## <none>
## - `White alone, percent, 2014` 3.0425e+08
## - `Female persons, percent, 2014` 3.2773e+08
## - `Two or More Races, percent, 2014` 4.0155e+08
## - `Retail sales per capita, 2007` 5.6948e+08
## - `White alone, not Hispanic or Latino, percent, 2014` 6.2399e+08
## - `Hispanic or Latino, percent, 2014` 6.5893e+08
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 7.6277e+08
## - `Asian alone, percent, 2014` 9.3083e+08
## - `Black or African American alone, percent, 2014` 1.0070e+09
## - `American Indian and Alaska Native alone, percent, 2014` 1.0117e+09
## - `Persons under 18 years, percent, 2014` 1.0611e+09
## - `Households, 2009-2013` 5.4966e+11
## RSS
## <none> 1.2254e+11
## - `White alone, percent, 2014` 1.2285e+11
## - `Female persons, percent, 2014` 1.2287e+11
## - `Two or More Races, percent, 2014` 1.2294e+11
## - `Retail sales per capita, 2007` 1.2311e+11
## - `White alone, not Hispanic or Latino, percent, 2014` 1.2317e+11
## - `Hispanic or Latino, percent, 2014` 1.2320e+11
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 1.2330e+11
## - `Asian alone, percent, 2014` 1.2347e+11
## - `Black or African American alone, percent, 2014` 1.2355e+11
## - `American Indian and Alaska Native alone, percent, 2014` 1.2355e+11
## - `Persons under 18 years, percent, 2014` 1.2360e+11
## - `Households, 2009-2013` 6.7220e+11
## AIC
## <none> 47779
## - `White alone, percent, 2014` 47783
## - `Female persons, percent, 2014` 47784
## - `Two or More Races, percent, 2014` 47785
## - `Retail sales per capita, 2007` 47789
## - `White alone, not Hispanic or Latino, percent, 2014` 47790
## - `Hispanic or Latino, percent, 2014` 47791
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 47793
## - `Asian alone, percent, 2014` 47797
## - `Black or African American alone, percent, 2014` 47799
## - `American Indian and Alaska Native alone, percent, 2014` 47799
## - `Persons under 18 years, percent, 2014` 47800
## - `Households, 2009-2013` 52388
## F value
## <none>
## - `White alone, percent, 2014` 6.6936
## - `Female persons, percent, 2014` 7.2103
## - `Two or More Races, percent, 2014` 8.8344
## - `Retail sales per capita, 2007` 12.5289
## - `White alone, not Hispanic or Latino, percent, 2014` 13.7282
## - `Hispanic or Latino, percent, 2014` 14.4968
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 16.7814
## - `Asian alone, percent, 2014` 20.4789
## - `Black or African American alone, percent, 2014` 22.1556
## - `American Indian and Alaska Native alone, percent, 2014` 22.2582
## - `Persons under 18 years, percent, 2014` 23.3452
## - `Households, 2009-2013` 12092.7989
## Pr(>F)
## <none>
## - `White alone, percent, 2014` 0.0097277
## - `Female persons, percent, 2014` 0.0072931
## - `Two or More Races, percent, 2014` 0.0029822
## - `Retail sales per capita, 2007` 0.0004075
## - `White alone, not Hispanic or Latino, percent, 2014` 0.0002155
## - `Hispanic or Latino, percent, 2014` 0.0001435
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` 4.318e-05
## - `Asian alone, percent, 2014` 6.290e-06
## - `Black or African American alone, percent, 2014` 2.641e-06
## - `American Indian and Alaska Native alone, percent, 2014` 2.505e-06
## - `Persons under 18 years, percent, 2014` 1.430e-06
## - `Households, 2009-2013` < 2.2e-16
##
## <none>
## - `White alone, percent, 2014` **
## - `Female persons, percent, 2014` **
## - `Two or More Races, percent, 2014` **
## - `Retail sales per capita, 2007` ***
## - `White alone, not Hispanic or Latino, percent, 2014` ***
## - `Hispanic or Latino, percent, 2014` ***
## - `Per capita money income in past 12 months (2013 dollars), 2009-2013` ***
## - `Asian alone, percent, 2014` ***
## - `Black or African American alone, percent, 2014` ***
## - `American Indian and Alaska Native alone, percent, 2014` ***
## - `Persons under 18 years, percent, 2014` ***
## - `Households, 2009-2013` ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call:
## lm(formula = `clinton votes` ~ `Persons under 18 years, percent, 2014` +
## `Female persons, percent, 2014` + `White alone, percent, 2014` +
## `Black or African American alone, percent, 2014` + `American Indian and Alaska Native alone, percent, 2014` +
## `Asian alone, percent, 2014` + `Two or More Races, percent, 2014` +
## `Hispanic or Latino, percent, 2014` + `White alone, not Hispanic or Latino, percent, 2014` +
## `Households, 2009-2013` + `Per capita money income in past 12 months (2013 dollars), 2009-2013` +
## `Retail sales per capita, 2007`, data = trump_data)
##
## Coefficients:
## (Intercept)
## -2.060e+05
## `Persons under 18 years, percent, 2014`
## -2.203e+02
## `Female persons, percent, 2014`
## 1.726e+02
## `White alone, percent, 2014`
## 1.220e+03
## `Black or African American alone, percent, 2014`
## 2.073e+03
## `American Indian and Alaska Native alone, percent, 2014`
## 2.048e+03
## `Asian alone, percent, 2014`
## 2.126e+03
## `Two or More Races, percent, 2014`
## 1.588e+03
## `Hispanic or Latino, percent, 2014`
## 7.523e+02
## `White alone, not Hispanic or Latino, percent, 2014`
## 7.685e+02
## `Households, 2009-2013`
## 1.554e-01
## `Per capita money income in past 12 months (2013 dollars), 2009-2013`
## 1.285e-01
## `Retail sales per capita, 2007`
## -9.226e-02
Inference from backward elimination process [Clinton]
We see from the above results, the votes gain for Clinton has some high co-relation to below metrics.
Retail sales per capita, 200712.5289 0.0004075 ***White alone, not Hispanic or Latino, percent, 201413.7282 0.0002155 ***Hispanic or Latino, percent, 201414.4968 0.0001435 ***Per capita money income in past 12 months (2013 dollars), 2009-201316.7814 4.318e-05 ***Asian alone, percent, 201420.4789 6.290e-06 ***Black or African American alone, percent, 201422.1556 2.641e-06 ***American Indian and Alaska Native alone, percent, 201422.2582 2.505e-06 ***Persons under 18 years, percent, 201423.3452 1.430e-06 ***Households, 2009-201312092.7989 < 2.2e-16 ***
1.7.2 Relationship between Households with Trump, Clinton votes gained
Based on the above results, we see the metrics “Households, 2009-2013” has a high significance level on votes gained regardless of candidates. In fact, there are a couple of other metrics too [Per capita money income, Retail sales per capita, 2007] which are common to both candidates, but highly significant to the target metric.
Let’s create a scatterplot for trump favored states [Indiana, Florida, Pennsylvania counties] that shows mean household and total votes gained for Trump/Clinton for each states. We see a linear relationship between Households and votes gained [Trump]. But slope of line is different in both cases. Clinton has higher slope than Trump which means that counties having low household is favoring Trump and counties having high household favoring Clinton.
Please read this article which commends on “Richer people vote more”: https://www.weforum.org/agenda/2018/07/low-voter-turnout-increasing-household-income-may-help/
States where Trump won.
Now let’s create a scatterplot for Clinton favored states [California, New York, New Jersey counties] that shows mean household and total votes gained for Trump/Clinton for each states. Here as well, we get a linear relation, but slope is more or less same which means there is not much favorism towards Trump in these states.
States where Clinton won.
1.7.3 More analysis
1.7.3.1 Number and percentage of votes
First let’s take a look at how close the election was - let’s look at the average percentage and number of votes for all 4 candidates.
perc_votes<-complete_data %>%
select ("sanders fraction votes", "clinton fraction votes", "trump fraction votes", "cruz fraction votes")
count_votes<-complete_data %>%
select ("sanders votes", "clinton votes", "trump votes", "cruz votes")
# Average
sort(apply(perc_votes, 2, mean, na.rm=TRUE), decreasing = TRUE)
## clinton fraction votes trump fraction votes sanders fraction votes
## 0.5360086 0.4697300 0.4297484
## cruz fraction votes
## 0.2814360
# Sum of votes
sort(apply(count_votes, 2, sum, na.rm=TRUE), decreasing = TRUE)
## clinton votes trump votes sanders votes cruz votes
## 14122335 12559572 10332812 7359825
We can see that Clinton was in the lead by both average percentage and total count of votes in the states we have the data for.
1.7.3.2 Votes of female population
Let’s take a look at how gender correlates with the results by examining the relationship of female population with results for each candidate.
#Democrats
plot(complete_data$"Female persons, percent, 2014", complete_data$`clinton fraction votes`, main = "Clinton Data", xlab = "% of female population", ylab = "Percentage of votes for Candidate")
plot(complete_data$"Female persons, percent, 2014", complete_data$`sanders fraction votes`, main = "Sanders Data", xlab = "% of female population", ylab = "Percentage of votes for Candidate")
#Republicans
plot(complete_data$"Female persons, percent, 2014", complete_data$`trump fraction votes`, main = "Trump Data", xlab = "% of female population", ylab = "Percentage of votes for Candidate")
plot(complete_data$"Female persons, percent, 2014", complete_data$`cruz fraction votes`, main = "Cruz Data", xlab = "% of female population", ylab = "Percentage of votes for Candidate")
It looks like the “Clinton” data shows a correlation - the counties with higher female population - seem to be showing higher Clinton votes in the primary election. There is also the opposite effect for Bernie Sanders.
1.7.3.3 Votes of white population
Let’s take a look at how race correlates with the results by examining the relationship of white population with results for each candidate.
#Democrats
plot(complete_data$"White alone, percent, 2014", complete_data$`clinton fraction votes`, main = "Clinton Data", xlab = "% of white population", ylab = "Percentage of votes for Candidate")
plot(complete_data$"White alone, percent, 2014", complete_data$`sanders fraction votes`, main = "Sanders Data", xlab = "% of white population", ylab = "Percentage of votes for Candidate")
#Republicans
plot(complete_data$"White alone, percent, 2014", complete_data$`trump fraction votes`, main = "Trump Data", xlab = "% of white population", ylab = "Percentage of votes for Candidate")
plot(complete_data$"White alone, percent, 2014", complete_data$`cruz fraction votes`, main = "Cruz Data", xlab = "% of white population", ylab = "Percentage of votes for Candidate")
From visual inspection it looks like “Sanders” and “Trump” were slightly more popular in the counties with higher white only population.
1.7.3.4 Votes of foreign born population
Let’s see if percentage of foreign born population impacts results.
#Democrats
boxplot(complete_data[complete_data$'Foreign born persons, percent, 2009-2013' >=15,]$'clinton fraction votes', complete_data$'clinton fraction votes', names = c("in counties with over 15% foreign born", "overall votes"), ylab = "Clinton")
boxplot(complete_data[complete_data$'Foreign born persons, percent, 2009-2013' >=15,]$'sanders fraction votes', complete_data$'sanders fraction votes', names = c("in counties with over 15% foreign born", "overall votes"), ylab = "Sanders")
#Republicans
boxplot(complete_data[complete_data$'Foreign born persons, percent, 2009-2013' >=15,]$'trump fraction votes', complete_data$'trump fraction votes', names = c("in counties with over 15% foreign born", "overall votes"), ylab = "Trump")
boxplot(complete_data[complete_data$'Foreign born persons, percent, 2009-2013' >=15,]$'cruz fraction votes', complete_data$'cruz fraction votes', names = c("in counties with over 15% foreign born", "overall votes"), ylab = "Cruz")
It looks like Clinton and Trump were more popular in counties with a higher percentage of foreign born population but the differnce doesn’t seem very drastic.
1.7.3.5 Median household effect in votes
Let’s see if median household income impacts results. We compare overall results with results of households with median income over 80K.
#Democrats
boxplot(complete_data[complete_data$'Median household income, 2009-2013' >=80000,]$'clinton fraction votes', complete_data$'clinton fraction votes', names = c("median income over 80K", "overall votes"), ylab = "Clinton")
boxplot(complete_data[complete_data$'Median household income, 2009-2013' >=80000,]$'sanders fraction votes', complete_data$'clinton fraction votes', names = c("median income over 80K", "overall votes"), ylab = "Sanders")
#Republicans
boxplot(complete_data[complete_data$'Median household income, 2009-2013' >=80000,]$'trump fraction votes', complete_data$'clinton fraction votes', names = c("median income over 80K", "overall votes"), ylab = "Trump")
boxplot(complete_data[complete_data$'Median household income, 2009-2013' >=80000,]$'cruz fraction votes', complete_data$'clinton fraction votes', names = c("median income over 80K", "overall votes"), ylab = "Cruz")
Clinton and Trump are very significantly favored by household with median income over 80K.