In this work we analyse how some socioeconomic characteristics relate to the results of the 2008, 2012 and 2016 US presidential elections at county level. The socioeconomic variables that we take into account are associated to the income, the ethnicity, the level of education, and the health access of the population. Three hypotheses are studied in this work, for which two regression models are proposed. The hypotheses are that (i) on average, counties that share similar socioeconomic characteristics have a similar voting behaviour; (ii) for a given county, deviations of its socioeconomic variables from their mean strengthen (or weaken) the correlation with the election results; and that (iii) the way in which the socioeconomic variables relate to the voting behaviour has changed from election to election.
The most relevant findings are that the variables that related in favour of the GOP, are the percentage of white population, the median household income, and the percentage of uninsured population; and the variables related against the GOP are the population density, the unemployment rate, the gini index and the percentage of population with college or more. For a given county, we found that a positive change in the population density and in the median household income (the rest constant) implied that the GOP received proportionally less votes. Also, compared to the 2008 elections, in the 2016 elections the population density and the percentage of population with college degree weighted more against the GOP in 2016. Similarly, the variables percentage of white people, the mean household income and the percentage of uninsured people weighted more in favour of the GOP, which can be interpreted as an increasingly polarized population.
The unit that we use for the analysis is the county. For each county, we have measures of seven socioeconomic variables for the years 2007, 2011 and 2015. These are the years previous to the presidential elections in 2008 and 2012, won by Obama; and 2016, won by by Trump. Also, for each county, we have the results of the three presidential elections, which we will relate to the socioeconomic variables measured in the previous year. For simplicity of exposition, we only refer to the years 2008, 2012 and 2016 in this work, with the understanding that the socioeconomic variables were measured in the years 2007, 2011 and 2015. The variables that we consider in this work are listed below.
| Variable | Range | Description |
|---|---|---|
| density | \(\mathbb{R}\) | Population density in the county |
| perc_gop | [0,1] | Percentage of the population that voted for the republican party (GOP) |
| pop_perc_white_nh | [0,1] | Percentage of the population that belongs to the white (non hispanic) ethnic group |
| eco_unemp_rate | [0,1] | Unemployment rate |
| eco_med_income | \(\mathbb{R}\) | Median household income |
| eco_gini | [0,1] | Gini index. A measure of income inequality, where 0 is no inequality and 1 is extreme inequality |
| edu_perc_college_and_more | [0,1] | Percentage of the population of 25 years old and over that has a college degree or higher |
| hc_perc_unins | [0,1] | Percentage of the population that has not a health insurance |
The data of socioeconomical variables used in this work come from the United Stated Census Bureu’s American Community Survey (ACS) for the years 2007, 2011 and 2015. The 1-year estimates of the ACS are used1. The number of counties in the data is 788, 811 and 819, respectively for for the three years, which correspond to the ones with more than 65,000 inhabitants2. The county level election results come from the Github page of tonmcg.
Comparing the population density with the percentage of votes that the GOP received in the past three elections we observe a negative correlation: the more dense the counties, the more less proportion of vote the GOP received. If we regard the density of population as an indicator of the level of urbanization, we can assert that more urbanized counties voted less (proportionally) for the GOP.
We compare the percentage of white (non hispanic) people vs the proportion of votes received by the GOP in the three years. In the three elections we observe a general trend: the more white the population is in a county, the more proportion of votes that the GOP received. For counties with less than 50% of white population the trends are similar for the three years. For counties where more than 50% of the population is white, the percentage of votes for the GOP is greater in 2012 and 2016 than in 2008. But, interestingly, in the 2016 elections the trend changes in counties with highly predominant white population (\(>85\%\)): they voted more, proportionally, for the GOP than in previous years. Another interesting fact is that, in general, the counties with high population densities are more diverse.
As a consequence of the economic crisis of 2008, we can see that the unemployment rate rose considerably from 2008 to 2012 in practically all counties. Thereafter, from 2012 to 2016, we observe a recovery in the employment, to reach the pre-crisis levels of 2008. Regarding the elections, we observe that unemployment is negatively correlated to the proportion of votes for the GOP, i.e., counties with higher rates of unemployment vote, proportionally, less for the GOP. The trend for the 2012 elections stands out from the rest, which is likely to be attributable to the change in the unemployment rate, rather than to a change in the voting proportion: the trend curve of 2016 is a translated version of the trends observed in 2008 and 2012.
First, we note that the household income was also affected by the 2008 economic crisis. However, contrary to the unemployment rate that has recovered from 2012 to 2016, it hasn’t yet reached the levels of 2008.
Regarding, the elections results, for the three years, we observe that counties with higher income vote less (proportionally) for the GOP. However, comparing the trends of the three elections, we can observe that in 2016 the lower income counties voted more for the GOP, while the counties with higher income voted less for this party. This means that in the last elections the results are more polarized across the income levels of the counties. Also, we see that the income of highly dense populated counties is distributed along all levels of income.
The Gini index, which measures the inequality in the income distribution, has worsened since 2008. The correlation of the Gini index with the proportion of votes for the GOP is negative, which means that counties which are more unequal in their income distribution vote proportionally less for the GOP. This may be due to the fact that the more dense populated counties show a large Gini index3, and low proportion of votes for the GOP. The shape of the trend line is similar across the three years analyzed here.
Here, we compare the percentage of the population of more than 25 years old who has a college degree or superior, with the percentage of votes for the GOP. For the years 2012 and 2016, the slope is clearly negative, i.e., the higher the level of educated people in a county, the less they have supported the candidates of the GOP. Furthermore, since the year 2008, the slope of the trend line is getting more pronounced, which means that the voting results are more polarized across the level of education. It is interesting to see that for the year 2008 there are counties with very high education level, but that no counties with such education levels are observed in 2012 and 2016. Since people cannot be made uneducated, this behaviour, at least in the short term, can only be explained by migration: either emigration of highly educated people or immigration of less educated people, or both.
We observe a clear decrease in the proportion of uninsured citizens since 2012. Moreover, the decrease has taken place in practically all the counties. Regarding the election results, for the three years we note similar trend curves. However, the trend curve for the 2016 elections appears to be a translated version of the others. From this last remark, we can conclude that the decrease in the uninsured population seems to have had no effect in the proportion of votes that the GOP obtained in 2016.
We hypothesize that, (i) on average, counties that share similar socioeconomic characteristics have a similar voting behaviour. If this is true, we would expect the per-county mean of the socioeconomic variables to be correlated to the election results. Moreover, we also hypothesize that, (ii) for a given county, deviations of its socioeconomic variables from their mean strengthen (or weaken) the correlation with the election results. If we regard the per-county average of socioeconomic variables as the county status quo, then the above hypotheses are that the voting behaviour in a county is related to both its status quo and the deviations from it in a given year.
In order to investigate the validity of the hypothesis (i) and (ii) we propose the regression model \[ logit(E[y_{it}]) = x_{i.}^T\alpha + (x_{it} - x_{i.})^T\beta, \] where \(y_{it}\) is the proportion of votes for the GOP in the county \(i\) for the year \(t\), \(E[y_{it}]\) is its expected value; \(x_{it}\) is the vector of independent variables measured for the county \(i\) at time \(t\); \(x_{i.}\) is the vector of means in time of the independent variables for county \(i\), the status quo. The estimated coefficients are \(\alpha\) and \(\beta\).
The coefficients \(\alpha\) relate the proportion of votes that the GOP received with the county mean level of the socioeconomic variables, i.e., with its status quo. The coefficients \(\beta\) relate the proportion of votes that the GOP received in the last three elections with the deviation that these variables have from their mean value across time. In this sense, the coefficients \(\alpha\) capture the differences between counties, whereas the \(\beta\)’s capture the within differences. Since our dependent variable, the proportion of votes for the GOP, lies in the \((0,1)\) interval, we propose to model \(y_{it}\) as a beta distributed random variable4. The results of this model are shown in appendix A.
In the results we see that all the estimated coefficients \(\alpha\) are statistically significant and, thus, we can assert that the hypothesis (i) is valid: on average, counties that share similar socioeconomic characteristics have a similar voting behaviour. Moreover, the coefficients \(\alpha\) help us to explain (in part) the relation of the socioeconomic characteristics with the voting behaviour. We observe coefficients with positive sign for the percentage of white population, the median household income, and the percentage of uninsured population. This means that from two similar counties, which only differ in one of the aforeentioned variables (all the other variables are equal), the one with the greater variable value voted proportionally more for the GOP. The converse is true for the variables with coefficients of negative sign: the population density, the unemployment rate, the gini index and the percentage of population with college or more.
The coefficients \(\beta\), except those for the change in the population density and the change in the median income, are not statistically significant at a 95% confidence. This means that the relation between the voting behaviour and the deviations from the status quo of these variables is not clear. From which we can conclude that changes in the percentage of white population, the unemployment rate, the gini index, the percentage of uninsured population and the percentage of population with college or more appear not to have had any influence in the voting behaviour. Hence, the hypothesis (ii) is valid for the change in population density and change in median household income. These last variables have negative signs, which can be interpreted as: for a given county, a positive change in this variables (the rest constant) implied that the GOP received proportionally less votes. A second interpretation can be as following: from two counties with the same status quo, the one where the change in population density and (or) median income is positive voted proportionally less for the GOP. Is interesting to note that the estimated coefficients for the change in density and the change in household income are larger in absolute value than the coefficients for the mean of these variables. Hence, for these variables, deviations from the status quo are more important than the very status quo in explaining the elections results.
We hypothesize that (iii) the way in which the socioeconomic variables introduced in this work relate to the voting behaviour has changed from election to election. If this hypothesis is true, then we would expect the same county to vote differently in two elections even when its socioeconomic variables didn’t change. This phenomena can be due to hidden variables, such as perception or the empathy towards a particular candidate and her or his proposals.
To study the hypothesis (iii) we propose the regression model of the form \[ logit(E[y_{it}]) = x_{it}^T\beta_{t}, \] where \(y_{it}\) is the proportion of votes for the GOP in the county \(i\) for the year \(t\), \(E[y_{it}]\) is its expected value, \(x_{it}\) the vector of independent variables measured for the county \(i\) at time \(t\), and \(\beta_{t}\) the vector of coefficients for year \(t\).
This model allow us to test the hypothesis \(\beta_{t} == \beta_{s}\). If the hypothesis is rejected, then we can state that the explanatory variable relates to the election results in different manner for the elections in the years \(t\) and \(s\). As in our first model, we assume \(y_{it}\) to be a beta distributed random variable. The results of this model are shown in appendix B5.
The results show that the coefficients \(\beta_{2008}\) and \(\beta_{2016}\) are statistically different (significance level of 0.05%) for the population density, the percentage of white people, the median household income, the percentage of uninsured people, and the percentage of the population with college degree or more. Compared to 2008, in the 2016 elections the coefficients of the population density and the percentage of population with college degree are smaller, which means that they weighted more against the GOP in 2016. Similarly, the coefficients of the variables percentage of white people, the median household income and the percentage of uninsured people are bigger, weighting more in favour of the GOP in the 2016 elections. It is interesting to note that, compared to 2008, in the 2016 elections the coefficients are bigger in absolute value. This can be interpreted as an increasingly polarized population: counties that favoured GOP will favour it more and viceversa. This behaviour can also be appreciated in the figure 4.1.
Figure 4.1: County proportion of votes for the GOP in 2008 and 2016 elections. The black line is the identity: no change in the proportion of votes from 2008 to 2016 elections. The counties in the red area are those who voted majority (>50%) for GOP in both elections; the counties in the blue square are those who voted less than 50% for the GOP in both elections. Polarization: in general, highly predominant GOP counties (dark red area) lie above the identity line, while highly predominant non-GOP counties (dark blue area) lie below the identity line. Counties in the gray area are swing counties.
In this work two regression models were used to explain the election results via their relation with the socioeconomic variables. In the first model is proposed to capture how the level of the socioeconomic variables, i.e., their per county average, and the deviations from this average relate to the election results. The second model is proposed to study if the way in which the socioeconomic variables relate to the election results has changed over time.
With the model I we found that the per county average of all the socioeconomic variables is correlated to the proportion of votes for the GOP. Thus, we can assert the validity of hypothesis (i) and, thus, we can characterize the average voting behaviour of the counties by their average level of their socioeconomic variables, i.e., their status quo. The variables that are related in favour of the GOP, are the percentage of white population, the median household income, and the percentage of uninsured population; the variables related against the GOP are the population density, the unemployment rate, the gini index and the percentage of population with college or more. Regarding the relation between the voting behaviour and the deviations of the socioeconomic variables from their average, i.e., the deviations from their status quo, we found only the change in the population density and the change in the median household income to be significant (confidence of 95%). Hence, we can assert that the hypothesis (ii) is true for these variables. In particular, we found that the change in the population density and the change in the median household income (all the other variables remaining constant) implied that the GOP received proportionally less votes. Also, we found that for these variables, deviations from the status quo are more important than the very status quo in explaining the elections results.
With the model II we found that compared to 2008, in the 2016 elections the population density and the percentage of population with college degree weighted more against the GOP in 2016. Similarly, the variables percentage of white people, the median household income and the percentage of uninsured people weighted more in favour of the GOP. These results support the hypothesis (iii) however not for all variables. The change in the weight that the variables have in explaining the proportion of votes for the GOP means that counties with similar characteristics across time voted in a different manner in 2016, compared to the 2008 elections. This implies that there are other variables explaining the election results. Some of these unobserved variables can be attributed to demographic phenomena, such as internal migration; the effect of the candidates in the elections; or to individual variables, such as perceptions. These later individual variables being difficult, if not impossible, to measure. Finally, it is interesting to note that, compared to 2008, in the 2016 elections the coefficients are bigger in absolute value. This can be interpreted as an increasingly polarized population: counties that favoured GOP will favour it more and viceversa.
##
## Call:
## betareg(formula = as.formula(mod_2_d_3), data = df_acs_votes__,
## link = "logit")
##
## Standardized weighted residuals 2:
## Min 1Q Median 3Q Max
## -3.9002 -0.6277 0.0519 0.6704 4.4229
##
## Coefficients (mean model with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.506e-02 2.039e-01 0.368 0.71278
## density_m -9.937e-05 1.116e-05 -8.908 < 2e-16 ***
## pop_perc_white_nh_m 1.951e+00 6.999e-02 27.882 < 2e-16 ***
## eco_med_income_m 6.806e-03 9.495e-04 7.168 7.62e-13 ***
## eco_unemp_rate_m -6.658e+00 8.667e-01 -7.682 1.57e-14 ***
## eco_gini_m -3.498e+00 3.225e-01 -10.848 < 2e-16 ***
## hc_perc_unins_m 7.559e+00 2.622e-01 28.829 < 2e-16 ***
## edu_perc_college_and_more_m -1.603e+00 1.201e-01 -13.344 < 2e-16 ***
## chg_density -1.048e-03 3.604e-04 -2.909 0.00363 **
## chg_pop_perc_white 3.194e-01 7.052e-01 0.453 0.65061
## chg_eco_med_income -1.001e-02 3.775e-03 -2.651 0.00803 **
## chg_eco_unemp_rate 8.246e-02 8.512e-01 0.097 0.92283
## chg_eco_gini 3.009e-01 6.127e-01 0.491 0.62332
## chg_hc_perc_unins -5.750e-01 3.329e-01 -1.728 0.08407 .
## chg_edu_perc_college -6.202e-02 3.401e-01 -0.182 0.85532
##
## Phi coefficients (precision model with identity link):
## Estimate Std. Error z value Pr(>|z|)
## (phi) 26.3900 0.7767 33.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 2134 on 16 Df
## Pseudo R-squared: 0.5677
## Number of iterations: 22 (BFGS) + 2 (Fisher scoring)
##
## Call:
## betareg(formula = perc_gop ~ (density + pop_perc_white_nh + eco_med_income +
## eco_unemp_rate + eco_gini + hc_perc_unins + edu_perc_college) *
## year, data = df_acs_votes, link = "logit")
##
## Standardized weighted residuals 2:
## Min 1Q Median 3Q Max
## -3.7786 -0.6280 0.0282 0.6382 4.0477
##
## Coefficients (mean model with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.004e-01 3.191e-01 0.941 0.346457
## density -1.010e-04 1.918e-05 -5.268 1.38e-07 ***
## pop_perc_white_nh 1.386e+00 1.151e-01 12.041 < 2e-16 ***
## eco_med_income 1.111e-03 1.537e-03 0.723 0.469945
## eco_unemp_rate -6.752e+00 1.476e+00 -4.575 4.76e-06 ***
## eco_gini -3.173e+00 5.073e-01 -6.256 3.95e-10 ***
## hc_perc_unins 4.892e+00 3.677e-01 13.306 < 2e-16 ***
## edu_perc_college -8.434e-01 1.766e-01 -4.775 1.80e-06 ***
## year2012 -1.297e+00 4.562e-01 -2.843 0.004470 **
## year2016 1.122e-01 4.437e-01 0.253 0.800309
## density:year2012 -6.861e-06 2.757e-05 -0.249 0.803469
## density:year2016 -6.082e-05 2.824e-05 -2.154 0.031260 *
## pop_perc_white_nh:year2012 5.994e-01 1.626e-01 3.687 0.000227 ***
## pop_perc_white_nh:year2016 7.194e-01 1.619e-01 4.443 8.88e-06 ***
## eco_med_income:year2012 9.865e-03 2.257e-03 4.372 1.23e-05 ***
## eco_med_income:year2016 4.657e-03 2.154e-03 2.162 0.030601 *
## eco_unemp_rate:year2012 1.687e+00 1.742e+00 0.968 0.332830
## eco_unemp_rate:year2016 2.912e+00 2.064e+00 1.411 0.158301
## eco_gini:year2012 5.575e-01 7.136e-01 0.781 0.434631
## eco_gini:year2016 1.011e+00 7.098e-01 1.424 0.154541
## hc_perc_unins:year2012 2.547e+00 5.509e-01 4.623 3.79e-06 ***
## hc_perc_unins:year2016 1.675e+00 6.153e-01 2.722 0.006480 **
## edu_perc_college:year2012 -2.558e-01 2.745e-01 -0.932 0.351312
## edu_perc_college:year2016 -1.739e+00 2.825e-01 -6.154 7.57e-10 ***
##
## Phi coefficients (precision model with identity link):
## Estimate Std. Error z value Pr(>|z|)
## (phi) 25.5655 0.7521 33.99 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 2097 on 25 Df
## Pseudo R-squared: 0.5446
## Number of iterations: 33 (BFGS) + 2 (Fisher scoring)
The reason why the 1-year estimates are used in this work is that we are interested in the currency of the data.↩
https://www.census.gov/programs-surveys/acs/guidance/estimates.html↩
Counties with high population density are urban areas. As such, they offer a great variety of economic activities, which range from Wallmart cashiers to Wallstreet bankers.↩
See https://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf↩
In this results, the \(\beta_{2008}=\text{variable_name}\)’s and \(\beta_{2016}=\text{variable_name} + \text{variable_name:year2016}\).↩