I have lived in Illinois for a while and find it fascinating. It’s been a blue state for decades. Chicago is the liberal stronghold of the midwest, but downstate is completely different. I wanted to see how someone like Trump (who’s a conservative, kind of) does in an odd state like Illinois.

library(ggplot2)
library(readr)
library(dplyr) 
library(extrafontdb)
library(extrafont)
library(jsonlite)
library(purrr)
library(choroplethr)
library(RColorBrewer)
library(gridExtra)
library(coefplot)
library(DT)

It’s surprisingly difficult to find primary results broken down by county. There isn’t any resource that I can find that has this data in a nice downloadable format. So, I will extract it from the CNN website and convert it to a format that’s usable for me.

ildata <- fromJSON("http://data.cnn.com/ELECTION/2016primary/IL/county/R.json", flatten = TRUE)
trump <- mutate(map_df(ildata$counties$race.candidates, function(x) { x %>% filter(lname == "Trump") }), FIPS=ildata$counties$countycode)
glimpse(trump)
## Observations: 103
## Variables: 16
## $ id         (int) 8639, 8639, 8639, 8639, 8639, 8639, 8639, 8639, 863...
## $ fname      (chr) "Donald", "Donald", "Donald", "Donald", "Donald", "...
## $ mname      (chr) "", "", "", "", "", "", "", "", "", "", "", "", "",...
## $ lname      (chr) "Trump", "Trump", "Trump", "Trump", "Trump", "Trump...
## $ suffix     (chr) "", "", "", "", "", "", "", "", "", "", "", "", "",...
## $ usesuffix  (lgl) FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA...
## $ party      (chr) "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "...
## $ inrace     (lgl) TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU...
## $ nominee    (lgl) FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA...
## $ winner     (lgl) TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU...
## $ vpct       (int) 41, 52, 41, 39, 51, 40, 48, 36, 43, 29, 39, 44, 41,...
## $ pctDecimal (chr) "40.7", "52.4", "41.2", "39.4", "50.8", "39.5", "48...
## $ inc        (lgl) FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA...
## $ votes      (int) 5233, 282, 1102, 3816, 502, 2185, 329, 1268, 680, 7...
## $ cvotes     (chr) "5,233", "282", "1,102", "3,816", "502", "2,185", "...
## $ FIPS       (int) 17001, 17003, 17005, 17007, 17009, 17011, 17013, 17...

This data is relatively clean but there are some vectors that I really don’t need and I am going to be adding a lot of additional columns, so I want to clean this up some now.

trump <- select(trump, lname, winner, pctDecimal, votes, FIPS)
head(trump)
## Source: local data frame [6 x 5]
## 
##   lname winner pctDecimal votes  FIPS
##   (chr)  (lgl)      (chr) (int) (int)
## 1 Trump   TRUE       40.7  5233 17001
## 2 Trump   TRUE       52.4   282 17003
## 3 Trump   TRUE       41.2  1102 17005
## 4 Trump   TRUE       39.4  3816 17007
## 5 Trump   TRUE       50.8   502 17009
## 6 Trump   TRUE       39.5  2185 17011

I have the FIPS codes for each county which is awesome because it will allow me to make some maps that can help me understand how well Trump did over the entire state. Before I do that I need to take a look at the two vectors I really need: fips and pctDecimal

The vote share needs to be a numeric vector and it’s not. And the FIPS vector has two zero values. One of the zeroes is an error in the data because the FIPS codes are jumping by twos and it doesn’t fit the pattern. However the second zero needs to be 17031. I will make those changes now.

trump$vote_share <- as.numeric(trump$pctDecimal)
trump = trump[-11,]
trump[16,5] <-  17031
trump$FIPS
##   [1] 17001 17003 17005 17007 17009 17011 17013 17015 17017 17019 17021
##  [12] 17023 17025 17027 17029 17031 17033 17035 17037 17039 17041 17043
##  [23] 17045 17047 17049 17051 17053 17055 17057 17059 17061 17063 17065
##  [34] 17067 17069 17071 17073 17075 17077 17079 17081 17083 17085 17087
##  [45] 17089 17091 17093 17095 17097 17099 17101 17103 17105 17107 17115
##  [56] 17117 17119 17121 17123 17125 17127 17109 17111 17113 17129 17131
##  [67] 17133 17135 17137 17139 17141 17143 17145 17147 17149 17151 17153
##  [78] 17155 17157 17159 17161 17165 17167 17169 17171 17173 17163 17175
##  [89] 17177 17179 17181 17183 17185 17187 17189 17191 17193 17195 17197
## [100] 17199 17201 17203

Now that the cleaning is done, we can map.

trump$region <- trump$FIPS
trump$value <- trump$vote_share
choro = CountyChoropleth$new(trump)
choro$title = "Trump Vote Percent by County"
choro$set_num_colors(1)
choro$set_zoom("illinois")
choro$ggplot_polygon = geom_polygon(aes(fill = value), color = NA)
choro$ggplot_scale = scale_fill_gradientn(name = "Percent", colours = brewer.pal(8, "Reds"))
choro$render()

It’s apparent that Trump does very well in Southern Illinois, specifically southeastern counties. Those counties are very rural and economically depressed. I will pull in some data and map the unemployment rates. Unemployment data is available on the Bureau of Labor’s website. I just copied and pasted those into a spreadsheet and then added them to my dataset.

ilun <- read.csv("D:/ilun.csv", stringsAsFactors = FALSE)
trump$unem <- ilun$rate
trump$county <- ilun$county

Now to map the unemployment by county

It’s obvious that there’s some connection between the two. Unemployment is very high in deep southern Illinois and so is support for Trump. I’m going to pull in some other data points including poverty rates, median income, and levels of college education. The college education data is available through the USDA here: http://goo.gl/mC2PZp. The economic data is available through the census bureau here: http://goo.gl/FNrXtL.

ilpov <- read.csv("D:/ilpov.csv", stringsAsFactors = FALSE)
iled <- read.csv("D:/iled.csv", stringsAsFactors = FALSE)

Now, everything is merged. I need to clean up a few things like the education rate has a % sign and income has a dollar sign and a comma.

trump$income<- gsub('\\$', '', trump$income)
trump$income<- gsub(',', '', trump$income)
trump$educ <- gsub('%', '', trump$educ)

The best way to get a sense of what’s going on for me is to run a quick regression to see what matters or not statistically.

## 
## Call:
## lm(formula = vote_share ~ unem + pov_rate + income + educ + whtvoteage + 
##     totalpop, data = trump)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9759 -3.3832 -0.0304  3.9868 10.1601 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.912e+01  1.078e+01   1.773 0.079409 .  
## unem         1.832e-01  3.942e-01   0.465 0.643162    
## pov_rate     7.911e-01  2.164e-01   3.656 0.000421 ***
## income       2.944e-04  1.187e-04   2.481 0.014866 *  
## educ        -6.632e-01  9.843e-02  -6.738 1.23e-09 ***
## whtvoteage   7.489e+00  5.113e+00   1.465 0.146295    
## totalpop     1.074e-05  4.720e-06   2.275 0.025156 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.701 on 95 degrees of freedom
## Multiple R-squared:  0.4515, Adjusted R-squared:  0.4169 
## F-statistic: 13.04 on 6 and 95 DF,  p-value: 1.046e-10

So, some interesting things here. Poverty drives up Trump’s vote share. A county having more college degrees decreases Trump’s vote percentage. Having a larger population helps Trump as well as a higher median income. I am going to display the coefficient results graphically but I need to rescale all my IVs so they can be more easily interpreted.

I really think that the college education finding is interesting and the magnitude is really staggering. I want to drill down on that a little bit and look at it in a way where I can look at individual counties.

ggplot(trump, aes(x=vote_share, y=educ))+
my_theme()+
geom_point(shape=1) +
geom_smooth(method=lm)+
labs(title= "", x="Trump's Vote Share", y="% with College Degrees")+
ggtitle(expression(atop(bold("Trump in Illinois"), atop(italic("Association between Trump's Vote Share and College Education"),""))))+
geom_text(aes(label=county), vjust=-1, hjust=0.5, size=2)+
theme(plot.title = element_text(size = 16, face = "bold", colour = "black", vjust = 0.5, hjust=0.5))

From having a little background knowledge of Illinois it’s apparent that the very rural counties in southern Illinois are the Trump strongholds. Places like Gallatin and Alexander county are struggling with massive unemployment and poverty coupled with low levels of college education. I’ve included a sortable table below to take

trump_small <- trump %>%  select(county, vote_share, votes, unem, pov_rate, income, educ, whtvoteage, totalpop) 
datatable(trump_small, class = 'compact')