• Looking at demographic data and Election results
  • Limitations of Analysis
  • Under the Hood
    • Big picture racial demographics
  • Things to be explored
  • Relationship between whiteness and economics
    • Summarize economic/racial relationship
  • Map Tossup, Heavily democratic, and Heavily republican counties
  • Create bins for Social and economic stats
    • Bin county population size
    • Bin Income levels
      • Explore interaction of Binned income and population
  • Mapping wealthy neighborhoods
  • Graph Hispanic Communities in range (20-60% Hispanic)
    • let’s build a voter turnout column
  • Try to find relationship between turnout, Hispanic population, and vote
  • Conclusion

Looking at demographic data and Election results

  • The data set and idea for this project stems from a course on data Camp
  • Throughout the paper I interchangeably use democratic and republican, which essentially means Donald Trump versus Hilary Clinton
  • We explore interesting relationships between racial, social, and economic demographics.
  • This topic is gigantic and there are so many questions one can ask about
  • This notebook, is mostly just data exploration and experimentation with the choroplethr mapping package
    • The package has county wide demographic data which was merged with the Data Camp 2016 county election results dataset

Limitations of Analysis


This is an observational study. We can’t draw any causal relationships from anything found here. The entire framework for the analysis conducted below, is based on stats aggregated to the county level. I don’t have statistics for how racial/economic demographics voted individually. Therefore, any information gained from the dataset is somewhat limited as individual level data would be much more valuable.

  • County sizes range from a minimum of 87 people, to a maximum of almost 10 million. Treating the information gained from a city of 90 people the same as one of millions, is not a favorable approach. Very likely it will lead to information loss and possibly towards identifying incorrect correlations. However, I still believe there is plenty of informative knowledge to be gained from analyzing the dataset.

Under the Hood

  • To start, I have hidden the code involved in all of this, so if you are attempting some of this mapping on your own, feel free to download the data sets and my code from my github mapping_2016_electiondata_with_choroplethr
  • Initial steps to clean the data
    • Merge CSV’s
      • Create master Data Frame to work with built on Trump’s voter share and the choroplethr package county data set
    • Create new columns
      • Spread Party affiliations into 3 new columns(Republican, Democrat, Independent)
      • Democratic party vote pct.
      • Republican Party vote pct.
      • Voter Turnout
      • Voter categorical strata by race and income
        • Decisions for these strata are seemingly arbitrary, but as the groups are distributed differently, they couldn’t be standardized. I attempted to choose these strata by looking at mean and asking myself what I wanted to look at
        • The average county only has about a 2% black population; However, Minority groups show a large propensity to be congregated in counties
    • Creation of new columns led to data misentry discovery
      • Richmond Virginia had a 1.7% voter turnout, other counties had up to 2700% voter turnout
      • Roanoke was taken out because of duplication
        • Overall 7 observations out of our 3k plus were removed
        • These outliers in themselves place some doubt on the validity of the dataset.


With that said, lets get started!

Big picture racial demographics

  • Below are 2 different grids
  • Grid one
    • Scatter plot matrix with regression lines for democratic party vote share and the percentage of ethnic breakdowns in a county.
      • Its very clear from the regression lines in 1st grid that the whiter communities are less likely to support the democrats
        • This seems like an important relationship, so I will focus on how whiteness effects voting throughout the report
    • The other non-white demographics all show a positive correlation between increase in their presence and higher democratic vote share
      • The regression line themselves don’t necessarily explain these relationships well
      • Eyeballing the Asian community and the Hispanic community, the line clearly doesn’t look as though it fits the data well
  • Grid two
    • Deeper look at these demographics via a boxplot matrix broken down by hand selected strata
      • Note the scales for these graphs are custom. Racial demographics vary, smaller communities like the Asian community will never make up counties at levels of 90% plus, and therefore the bins needed to be adjusted manually
    • Boxplots tells the same story as the scatterplots; however, boxplots allow us to notice deeper trends
      • We can see that the Hispanic community has a very strange level of democratic party support
        • It appears party support doesn’t increase and in fact decreases as more Hispanics are present in neighborhoods in the range form (30-60%)
        • Overall nationwide Hispanic support for democrats is around 65%, yet in some areas with over 50% Hispanic population, we are seeing very low support for the democratic party around 25%
        • I attempt to explore and provide reasoning for this later in my report as well
    • It also appears there are a significant number of outliers in areas of high white population. This means many communities that are mostly white, still vote democratic.

Things to be explored

  • How does whiteness affects county wide voting behavior?
  • What is the effect of Population size on voting behavior?
  • How does economics affect voting behavior?
  • What is the interaction of economics and population size on voting behavior?
  • Why are areas with high levels of Hispanic population voting republican, given that the Hispanic community generally votes democratic?

Relationship between whiteness and economics

  • Immediately what comes to my mind is
    • what do the income distributions look like in these predominantly White counties?
      • My assumption is that they will be bimodal (rural “fly-over” country, and suburban communities in coastal America)
  • So to explore this, let’s start with finding out where these areas are on the map
    • So overall, we have 3099 counties, lets break down some maps and display areas of 50% plus white population
      • about 2400 counties in our data set are somewhere over 50% white
    • Lets then look at voting support levels for these groups
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

## Loading required package: RColorBrewer
## [1] "#E41A1C" "#377EB8" "#4DAF4A" "#984EA3" "#FF7F00" "#FFFF33" "#A65628"
## [8] "#F781BF" "#999999"

Summarize economic/racial relationship

  • It must be stressed, that these stats are median income for entire counties
  • Areas that are heavily white, include democratic strongholds such as parts of the Northeast, as well as tossup states in the Midwest
  • Surprisingly, the south has very few counties which are majority white.
  • Percapita income, seems to increase as areas become more white
    • However, this relationship doesn’t seem linear. It clearly tales off and decreases in areas that are 80% + white
    • democratic counties tend to be richer
    • Heavy republican counties, make up some of the poorest and most white counties in America
    • Poor
  • Our scatter plot reveals a relationship between population size and percapita income.
    • Heavily democratic areas tend to be cities with population sizes over 1 million
    • These areas tend to be majority minority population

Map Tossup, Heavily democratic, and Heavily republican counties

## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

  • The maps reveal that most counties in America, are in fact Heavily Republican
  • Tossup counties appear to be scattered throughout many states.
    • the Mexican border and coastal areas tend to have many toss up counties

Create bins for Social and economic stats

  • Below I map categorical levels of party support and income

Bin county population size

## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

  • The above population maps, are rather self explanatory
  • Higher populations, tend to favor democrats

Bin Income levels

  • Richer counties, tend to lean democratic

Explore interaction of Binned income and population

Mapping wealthy neighborhoods

  • Display counties that are in the top 5 % in income
    • Visually, the below map is deceiving. Eye balling the map, I would assume that most counties in top 5% of income are republican. But that is because the counties in Midwest and North Dakota, are much larger in land mass. Major cities, which are in top 5% and democratic strongholds, are blips on the map.
    • The table below the map displays how the counties that are in top 5% vote
      • 62 counties are heavily republican while 59 are heavily democratic. Overall it is nearly even
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

## # A tibble: 6 x 2
##   party_supprt_levels     total_wealthy_counties
##   <fct>                                    <int>
## 1 heavily republican                          62
## 2 5-15% republican                            14
## 3 Less than 5% republican                      7
## 4 Less than 5% Democratic                      9
## 5 5-15% Democratic                            20
## 6 heavily Democratic                          59

Graph Hispanic Communities in range (20-60% Hispanic)

  • Earlier I noticed a weird trend in the scatterplot for party support among areas that had relatively large levels of Hispanic voters.
    • Typically we would expect these areas to be very favorable to democrats, the data showed that was not the case. Below I attempt to see if the map can give insight into any of this
  • The below maps are the reciprocal of each other. I thought it was helpful to see them displayed where darker blue means higher dem support, and darker red equals higher republican support
    • Coastal and boarder areas with high Hispanic population, are more democratic
    • Most of Texas and more inland areas with Hispanic populations, are more republican
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

let’s build a voter turnout column

  • So behind the scenes I created a voter turnout column for our data frame
  • below you can see the histogram and box plots for that column
    • Summarizing these statistics, 95% of counties fall within 31.3-57.6% voting percentage with the average county voting about 44.5%
## [1] 0.1139998

## voter_turnout 
## 
##  1  Variables      3092  Observations
## ---------------------------------------------------------------------------
## value 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3092        0     3092        1    44.54    9.057    31.32    34.45 
##      .25      .50      .75      .90      .95 
##    39.35    44.49    49.93    54.21    57.62 
## 
## lowest : 11.39998 19.13579 20.03485 20.49166 20.65982
## highest: 76.71033 76.78300 78.48672 81.00649 85.67674
## ---------------------------------------------------------------------------

Try to find relationship between turnout, Hispanic population, and vote

  • I think this is the smoking gun that explains why areas with high Hispanic voter populations, still tend to vote Republican
  • Below you can see a map where the darker blue areas are areas of high voter turnout and lighter areas are areas of low voter turnout.
  • It’s hard to get information from the map, so I made a table for high voter turn out(above mean), and low voter turnout(below mean)
    • Below you can tell that areas in the lower voter turnout table, are areas that voted heavily republican
    • Such analysis could lead to an interesting area of exploration, as voter turnout should be largely independent of county size.
      • So why are these heavily Hispanic areas experiencing low voter turnout? Individual level voting statistics would help to shed light on this phenomenon. My assumption, is that Hispanics are either afraid to vote due to intimidation and social factors or voter id laws make it more difficult for Hispanics to vote in these areas.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
Turnout below averag & 20-60% hispanic counties
Var1 Freq
heavily republican 177
5-15% republican 10
Less than 5% republican 3
Less than 5% Democratic 3
5-15% Democratic 18
heavily Democratic 31
Turnout above average & 20-60% hispanic counties
Var1 Freq
heavily republican 24
5-15% republican 6
Less than 5% republican 3
Less than 5% Democratic 2
5-15% Democratic 3
heavily Democratic 11

Conclusion

  • I think the most interesting takeway from this exploration is how heavily hispanic counties(in Texas in particular), are voting republican.
    • It appears that low voter turnout might be a significant explanation for this
  • Our expoloratory analysis gives us strong insight into future model building as well as provides us an understanding of social dynamics
  • Overall The Package is very fun to work with