Descriptive Statistics of All Variables

Variables Description
BA Percent bachelor’s degree or higher 2017 estimate
county Ohio county name
rural rural = 1 non-rural = 0
region region name
market media market
rep_gov Mike DeWine and Jon Husted (R) votes
dem_gov Richard Cordray and Betty Sutton (D) votes
dem_senate Sherrod Brown (D) votes
rep_senate Jim Renacci (R) votes
for_count Issue 1 for votes
agnst_count Issue 1 gainst votes
OD_12_17 Number of Unintentional Drug Overdose Deaths and Average Crude and Age-Adjusted Annual Death Rates Per 100000 Population by County 2005-2017
mrp_ideology_mean Mean ideology of each district
registered Number of registered persons
white Proportion RACE - One race - White; 2017 estimate
poverty Proportion below poverty level; 2017 estimate; variable name in original dataset:HC03_EST_VC01
hs Proportion high school graduate or higher 2017 estimate
##     county          rural        region          market      rep_gov      
##  Length:88          0:39   Central  :20   Columbus  :19   Min.   :  2561  
##  Class :character   1:49   Northeast:20   Cleveland :17   1st Qu.:  8210  
##  Mode  :character          Northwest:12   Toledo    :12   Median : 13604  
##                            Southeast:14   Dayton    :11   Mean   : 25363  
##                            Southwest: 8   Cincinnati: 8   3rd Qu.: 26093  
##                            West     :14   Charleston: 7   Max.   :166057  
##                                           (Other)   :14                   
##     dem_gov         dem_senate       rep_senate       for_count     
##  Min.   :  1338   Min.   :  1622   Min.   :  2432   Min.   :   681  
##  1st Qu.:  3603   1st Qu.:  4438   1st Qu.:  7564   1st Qu.:  2376  
##  Median :  6842   Median :  8238   Median : 12221   Median :  4423  
##  Mean   : 23498   Mean   : 25986   Mean   : 22862   Mean   : 18454  
##  3rd Qu.: 16550   3rd Qu.: 19706   3rd Qu.: 23908   3rd Qu.: 14116  
##  Max.   :323276   Max.   :338519   Max.   :148064   Max.   :251827  
##                                                                     
##   agnst_count        OD_12_17      mrp_ideology_mean   registered    
##  Min.   :  3312   Min.   :   7.0   Min.   :-0.2976   Min.   :  4171  
##  1st Qu.: 10170   1st Qu.:  35.5   1st Qu.: 0.1705   1st Qu.: 13020  
##  Median : 15838   Median :  69.5   Median : 0.2937   Median : 20034  
##  Mean   : 31468   Mean   : 210.3   Mean   : 0.2801   Mean   : 49800  
##  3rd Qu.: 30870   3rd Qu.: 155.0   3rd Qu.: 0.4046   3rd Qu.: 45293  
##  Max.   :228899   Max.   :2160.0   Max.   : 0.8261   Max.   :477651  
##                                                                      
##      white           poverty             hs               BA        
##  Min.   :0.6300   Min.   :0.0510   Min.   :0.5820   Min.   :0.0850  
##  1st Qu.:0.9045   1st Qu.:0.1153   1st Qu.:0.8685   1st Qu.:0.1417  
##  Median :0.9435   Median :0.1415   Median :0.8950   Median :0.1650  
##  Mean   :0.9192   Mean   :0.1451   Mean   :0.8851   Mean   :0.1976  
##  3rd Qu.:0.9645   3rd Qu.:0.1760   3rd Qu.:0.9062   3rd Qu.:0.2333  
##  Max.   :0.9930   Max.   :0.3020   Max.   :0.9670   Max.   :0.5380  
## 

The Dependent variables that we are most interested in are the “for_count” for issue 1 and the “agnst_count”. Both show a pretty large range, as does our control variable “registered” – which is the registered number of voters. That means that we do have some sufficient variance to analyze, which is a good thing.

Let’s start by analyzing independent variable (IV) at a time with one dependent variable “for count”. We are going to do a step-wise method of regression, which means that we are dropping in one IV into the equation at a time, and then take a look to see if our model performs better or worse with that IV. If our model performs better, then we can keep the IV and add another. If the model performs worse, we remove the IV and move on to the next.

Because we have some theoretical reasons as to why “Rural” might explain for differences in voting patterns, we will use it at the first variable. Rural is a dichtomous variable where 1=rural, and 0=non-rural. I’m using the term non-rural here because I simply checked a list of rural counties and do not know the designation of the other counties, aside from the fact that they were not on the rural list.

Variable = Rural

First, let’s take a look at some descriptive statistics to get a feel for what to expect. The analysis of rural to some of our other variables is presented first.

First, we can see that there are more rural (1) than non-rural counties (0) in our dataset, but the difference isn’t staggering. This bar chart also gives us a comparison of the composition of the regions. For example, we can see that the Western region has more rural than non-rural counties, while the Southwest region has more non-rural counties than rural counties. Each region does have a mix of both rural and non-rural counties. This also tells us that the Region and Rural variables are not equivalent and that Region may also help explain some of the variance.

In this graph we can see the overlap between rural areas and the various markets. Both of these graphs show us in the least that the variables, market and region add different information to the analysis and they are not all measuring the exact same construct.

Now on to looking at some descriptive statistics that will tell us what to expect when we look at the relationships to rural and our dependent variables.

This graph tells us that the vast majority of the yes votes came from non-rural counties. But, we aren’t yet accounting for population differences here.

But a similarly large number of no votes came from non-rural counties as well – suggesting that the rural predictor may not tell us much…especially after accounting for the differences in population. So let’s control for population by using the variable “registered”.

Model 1

I’m leaving some of the code visible in this section so that I do not have to retype out the equation or type of equation used for subsequent models, since we will be running quite a few. Here is the model that you can see reflected in the code.

model 1 => for_count = rural + offset(log(registered))

for_count is our Y or dependent variable. rural is our independent variable (X), which is our predictor. registered is our offset variable which is essentially a control variable.

We will try a poisson model first.

m1 <- glm(for_count~rural+offset(log(registered)), family=poisson, data=Election_Data_Final)
summary(m1)
## 
## Call:
## glm(formula = for_count ~ rural + offset(log(registered)), family = poisson, 
##     data = Election_Data_Final)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -63.989  -32.060  -10.742   -1.477  128.463  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.9053923  0.0008313 -1089.1   <2e-16 ***
## rural1      -0.6086443  0.0025187  -241.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 170982  on 87  degrees of freedom
## Residual deviance: 102618  on 86  degrees of freedom
## AIC: 103552
## 
## Number of Fisher Scoring iterations: 4

A Poisson model suggests that rural is a statistically significant predictor. Pr(>z) is like a p value. Let’s transform the coefficient into a incidence rate ration so that it is more easily interpreted.

##               Estimate
## (Intercept) -0.9053923
## rural1      -0.6086443
##              Estimate
## (Intercept) 0.4043832
## rural1      0.5440880

This tells us that after controlling for the number of registered voters, rural counties vote yes at 0.54 of the rate of non-rural counties (p<0.00).

Now, we need to test for something called overdispersion because the median of our deviance residuals are not =0. To do that we can run a negative binomial model using the same variables.

Let’s call it Model 1.5.

library(MASS)
m1.5 <- glm.nb(for_count~rural+offset(log(registered)), data=Election_Data_Final)
summary(m1.5)
## 
## Call:
## glm.nb(formula = for_count ~ rural + offset(log(registered)), 
##     data = Election_Data_Final, init.theta = 15.75630759, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9702  -0.7115  -0.1512   0.3946   3.6699  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.15991    0.04037 -28.732  < 2e-16 ***
## rural1      -0.41036    0.05416  -7.577 3.53e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(15.7563) family taken to be 1)
## 
##     Null deviance: 146.782  on 87  degrees of freedom
## Residual deviance:  88.858  on 86  degrees of freedom
## AIC: 1551.4
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  15.76 
##           Std. Err.:  2.36 
## 
##  2 x log-likelihood:  -1545.409

Now for some stats speak. The AIC of Model 1 is 103,552 which is significantly higher than the AIC of Model 1.5 (1,551.4). Further, model 1.5’s medial deviance residual is nearly 0. Both of these factors suggest that Model 1.5 better fits the data. And– lucky for us – the coefficient for rural is still significant. Now, let’s tranform them into IRR’s again so that we can better interpret the coefficient.

##               Estimate
## (Intercept) -1.1599098
## rural1      -0.4103574
##              Estimate
## (Intercept) 0.3135145
## rural1      0.6634131

This tells us that after controlling for the number of registered voters, rural counties vote yes at 0.66 of the rate of non-rural counties (p<0.00).

Let’s keep going with model 1.5 and see if we can improve the model fit by adding in additional variables and even interactions.

Model 2: Negative Binomial Rural + Region

m2 <- glm.nb(for_count~rural+region+offset(log(registered)), data=Election_Data_Final)
summary(m2)
## 
## Call:
## glm.nb(formula = for_count ~ rural + region + offset(log(registered)), 
##     data = Election_Data_Final, init.theta = 17.09488586, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3193  -0.5581  -0.1877   0.2853   4.5028  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -1.21644    0.05956 -20.424  < 2e-16 ***
## rural1          -0.39246    0.05495  -7.142 9.18e-13 ***
## regionNortheast  0.10323    0.07666   1.347    0.178    
## regionNorthwest  0.14492    0.09004   1.610    0.107    
## regionSoutheast -0.08264    0.08658  -0.954    0.340    
## regionSouthwest  0.06678    0.10145   0.658    0.510    
## regionWest       0.04913    0.08513   0.577    0.564    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(17.0949) family taken to be 1)
## 
##     Null deviance: 159.200  on 87  degrees of freedom
## Residual deviance:  88.752  on 81  degrees of freedom
## AIC: 1554.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  17.09 
##           Std. Err.:  2.56 
## 
##  2 x log-likelihood:  -1538.09

I added “region” to the model and it appears that our model fits slightly worse (AIC: 1551.4 vs. 1554.1) (We want the AIC as low as we can get it.). Also, none of the region coefficents are significant, so we can eliminate it from our model. But, what about an interaction between rural and region?

m2.5 <- glm.nb(for_count~rural*region+offset(log(registered)), data=Election_Data_Final)
summary(m2.5)
## 
## Call:
## glm.nb(formula = for_count ~ rural * region + offset(log(registered)), 
##     data = Election_Data_Final, init.theta = 18.40948297, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2623  -0.5697  -0.1948   0.2777   4.3184  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -1.23939    0.07038 -17.611  < 2e-16 ***
## rural1                 -0.34196    0.10506  -3.255  0.00113 ** 
## regionNortheast         0.12029    0.09739   1.235  0.21677    
## regionNorthwest         0.28042    0.15194   1.846  0.06495 .  
## regionSoutheast        -0.31456    0.15208  -2.068  0.03860 *  
## regionSouthwest         0.11825    0.12581   0.940  0.34729    
## regionWest              0.16601    0.12581   1.320  0.18698    
## rural1:regionNortheast -0.03576    0.14961  -0.239  0.81111    
## rural1:regionNorthwest -0.20369    0.18772  -1.085  0.27789    
## rural1:regionSoutheast  0.26443    0.18495   1.430  0.15278    
## rural1:regionSouthwest -0.12869    0.20040  -0.642  0.52077    
## rural1:regionWest      -0.20097    0.16730  -1.201  0.22965    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(18.4095) family taken to be 1)
## 
##     Null deviance: 171.388  on 87  degrees of freedom
## Residual deviance:  88.713  on 76  degrees of freedom
## AIC: 1557.5
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  18.41 
##           Std. Err.:  2.76 
## 
##  2 x log-likelihood:  -1531.504

Again, here our AIC: 1557.5 is larger than model 1.5, suggesting a worse fit. So both region and the interaaction variable can be ruled out leaving us with model 1.5. So we have eliminated Region and will add Market to model 1.5.

Model 3: rural & market

m3 <- glm.nb(for_count~rural+market+offset(log(registered)), data=Election_Data_Final)
summary(m3)
## 
## Call:
## glm.nb(formula = for_count ~ rural + market + offset(log(registered)), 
##     data = Election_Data_Final, init.theta = 18.77863195, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4136  -0.5930  -0.0885   0.3420   3.9436  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -1.150288   0.099095 -11.608  < 2e-16 ***
## rural1            -0.409195   0.054096  -7.564  3.9e-14 ***
## marketCincinnati   0.006637   0.122622   0.054   0.9568    
## marketCleveland    0.053208   0.106742   0.498   0.6182    
## marketColumbus    -0.062132   0.105122  -0.591   0.5545    
## marketDayton       0.039114   0.112583   0.347   0.7283    
## marketFt. Wayne   -0.166553   0.186258  -0.894   0.3712    
## marketLima        -0.259761   0.251369  -1.033   0.3014    
## marketParkersburg -0.118982   0.247419  -0.481   0.6306    
## marketToledo       0.090870   0.110297   0.824   0.4100    
## marketWheeling    -0.322822   0.129556  -2.492   0.0127 *  
## marketYoungstown  -0.010655   0.162018  -0.066   0.9476    
## marketZanesville   0.010186   0.247282   0.041   0.9671    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(18.7786) family taken to be 1)
## 
##     Null deviance: 174.809  on 87  degrees of freedom
## Residual deviance:  88.645  on 75  degrees of freedom
## AIC: 1557.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  18.78 
##           Std. Err.:  2.82 
## 
##  2 x log-likelihood:  -1529.683

Based on the AIC (1557.7), it appears that “market” isn’t adding much to our equation and can be eliminated. Is there an interaction between rural and market?

m3.5 <- glm.nb(for_count~rural*market+offset(log(registered)), data=Election_Data_Final)
summary(m3.5)
## 
## Call:
## glm.nb(formula = for_count ~ rural * market + offset(log(registered)), 
##     data = Election_Data_Final, init.theta = 20.67489761, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3965  -0.5572  -0.0888   0.3443   3.7295  
## 
## Coefficients: (4 not defined because of singularities)
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -1.70182    0.22053  -7.717 1.19e-14 ***
## rural1                    0.21110    0.23828   0.886  0.37565    
## marketCincinnati          0.58071    0.24149   2.405  0.01619 *  
## marketCleveland           0.59094    0.23125   2.555  0.01061 *  
## marketColumbus            0.46249    0.23032   2.008  0.04464 *  
## marketDayton              0.69743    0.24645   2.830  0.00466 ** 
## marketFt. Wayne          -0.23532    0.18071  -1.302  0.19285    
## marketLima                0.29177    0.31163   0.936  0.34913    
## marketParkersburg        -0.18774    0.23820  -0.788  0.43061    
## marketToledo              0.74288    0.25452   2.919  0.00351 ** 
## marketWheeling            0.21429    0.27001   0.794  0.42740    
## marketYoungstown          0.54075    0.26988   2.004  0.04511 *  
## marketZanesville         -0.05857    0.23806  -0.246  0.80566    
## rural1:marketCincinnati  -0.68178    0.28761  -2.370  0.01777 *  
## rural1:marketCleveland   -0.58696    0.26184  -2.242  0.02498 *  
## rural1:marketColumbus    -0.55720    0.25941  -2.148  0.03172 *  
## rural1:marketDayton      -0.79393    0.27537  -2.883  0.00394 ** 
## rural1:marketFt. Wayne         NA         NA      NA       NA    
## rural1:marketLima              NA         NA      NA       NA    
## rural1:marketParkersburg       NA         NA      NA       NA    
## rural1:marketToledo      -0.75675    0.27988  -2.704  0.00685 ** 
## rural1:marketWheeling    -0.59859    0.30562  -1.959  0.05016 .  
## rural1:marketYoungstown  -0.61990    0.35983  -1.723  0.08493 .  
## rural1:marketZanesville        NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(20.6749) family taken to be 1)
## 
##     Null deviance: 192.373  on 87  degrees of freedom
## Residual deviance:  88.585  on 68  degrees of freedom
## AIC: 1563.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  20.67 
##           Std. Err.:  3.11 
## 
##  2 x log-likelihood:  -1521.136

It appears that rural x market ’s interation does not perform better than Model 1.5 (AIC: 1563.1 vs. AIC: 1551.4) So we can eliminate it as well, keep model 1.5 and move on to the next variable.

Model 4: rural & OD

m4 <- glm.nb(for_count~rural+OD_12_17+offset(log(registered)), data=Election_Data_Final)
summary(m4)
## 
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + offset(log(registered)), 
##     data = Election_Data_Final, init.theta = 23.62461016, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2188  -0.6348  -0.1015   0.3952   4.5322  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.342e+00  4.122e-02 -32.551  < 2e-16 ***
## rural1      -2.534e-01  4.902e-02  -5.168 2.36e-07 ***
## OD_12_17     3.979e-04  6.166e-05   6.454 1.09e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(23.6246) family taken to be 1)
## 
##     Null deviance: 219.66  on 87  degrees of freedom
## Residual deviance:  88.62  on 85  degrees of freedom
## AIC: 1517.4
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  23.62 
##           Std. Err.:  3.56 
## 
##  2 x log-likelihood:  -1509.423

The number of overdoses from 2012-2017 does contribute to improving model fit over Model 1.5, based on its AIC of 1517 vs AIC: 1551.4.

##                  Estimate
## (Intercept) -1.3416407868
## rural1      -0.2533713769
## OD_12_17     0.0003979476
##              Estimate
## (Intercept) 0.2614164
## rural1      0.7761796
## OD_12_17    1.0003980

Unfortunately, the effect size is very small, as the IRR is 0.0004 for OD. It really isn’t worth commenting on. However, it does change the coefficient for rural. Rural counties vote yes at 0.78 of the rate of non-rural counties (p<0.00), after controlling for the number of overdoses and the number of registered voters. So far, Model 4 is our leader, but let’s check for an interaction between rural and overdoses.

m4.5 <- glm.nb(for_count~rural*OD_12_17+offset(log(registered)), data=Election_Data_Final)
summary(m4.5)
## 
## Call:
## glm.nb(formula = for_count ~ rural * OD_12_17 + offset(log(registered)), 
##     data = Election_Data_Final, init.theta = 24.99759174, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0066  -0.6803  -0.0676   0.3697   4.8059  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -1.336e+00  4.014e-02 -33.288  < 2e-16 ***
## rural1          -3.479e-01  6.190e-02  -5.620 1.91e-08 ***
## OD_12_17         3.841e-04  6.022e-05   6.379 1.78e-10 ***
## rural1:OD_12_17  1.478e-03  6.318e-04   2.339   0.0194 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(24.9976) family taken to be 1)
## 
##     Null deviance: 232.35  on 87  degrees of freedom
## Residual deviance:  88.53  on 84  degrees of freedom
## AIC: 1514.4
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  25.00 
##           Std. Err.:  3.77 
## 
##  2 x log-likelihood:  -1504.364

This model has a slightly better AIC of 1514.4 vs AIC of 1517.

##                      Estimate
## (Intercept)     -1.3361258101
## rural1          -0.3478615913
## OD_12_17         0.0003841533
## rural1:OD_12_17  0.0014775053
##                  Estimate
## (Intercept)     0.2628621
## rural1          0.7061966
## OD_12_17        1.0003842
## rural1:OD_12_17 1.0014786

But again, the effect size is minimal. When possible it is better to stick to simpler models, so I vote we stick with Model 4 and NOT include the interaction term. So, lets continue with Model 4 and add in the next variable. (Also, in case anyone is wondering, we don’t have to worry about colinearity with count models like we do with traditional linear models.)

Model 5: Rural, OD, Ideology

m5 <- glm.nb(for_count~rural+OD_12_17+mrp_ideology_mean+offset(log(registered)), data=Election_Data_Final)
summary(m5)
## 
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + mrp_ideology_mean + 
##     offset(log(registered)), data = Election_Data_Final, init.theta = 34.25353992, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3431  -0.5946  -0.1648   0.5536   3.4890  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -1.1288637  0.0494595 -22.824  < 2e-16 ***
## rural1            -0.2323035  0.0410146  -5.664 1.48e-08 ***
## OD_12_17           0.0002213  0.0000591   3.745  0.00018 ***
## mrp_ideology_mean -0.6923904  0.1143334  -6.056 1.40e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(34.2535) family taken to be 1)
## 
##     Null deviance: 317.678  on 87  degrees of freedom
## Residual deviance:  88.459  on 84  degrees of freedom
## AIC: 1486.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  34.25 
##           Std. Err.:  5.19 
## 
##  2 x log-likelihood:  -1476.652

Ideology does contribute to our model, decreasing our AIC further than model 4 did. Let’s see if there is an interaction, before we interpret the coefficients.

First lets check the potention interaction between ideology with OD.

m5.2 <- glm.nb(for_count~rural+OD_12_17*mrp_ideology_mean+offset(log(registered)), data=Election_Data_Final)
summary(m5.2)
## 
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 * mrp_ideology_mean + 
##     offset(log(registered)), data = Election_Data_Final, init.theta = 36.14154168, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3552  -0.6116  -0.1997   0.4982   3.4440  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                -1.138e+00  4.827e-02 -23.577  < 2e-16 ***
## rural1                     -2.089e-01  4.132e-02  -5.054 4.32e-07 ***
## OD_12_17                    2.663e-04  6.109e-05   4.360 1.30e-05 ***
## mrp_ideology_mean          -7.708e-01  1.175e-01  -6.559 5.41e-11 ***
## OD_12_17:mrp_ideology_mean  5.147e-04  2.308e-04   2.230   0.0257 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(36.1415) family taken to be 1)
## 
##     Null deviance: 335.036  on 87  degrees of freedom
## Residual deviance:  88.446  on 83  degrees of freedom
## AIC: 1483.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  36.14 
##           Std. Err.:  5.48 
## 
##  2 x log-likelihood:  -1471.945

Overdoses and ideology do interact, but the effect size is very small and theoretically, it doesn’t add much to the story. Also, the decrease in AIC is very minimal (AIC: 1483.9 vs. AIC: 1486.7). So, my vote is that it stays out of the model.

Now we need to check for the interaction between ideology and rural.

m5.3 <- glm.nb(for_count~OD_12_17+rural*mrp_ideology_mean+offset(log(registered)), data=Election_Data_Final)
summary(m5.3)
## 
## Call:
## glm.nb(formula = for_count ~ OD_12_17 + rural * mrp_ideology_mean + 
##     offset(log(registered)), data = Election_Data_Final, init.theta = 35.01273488, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3282  -0.5587  -0.1530   0.4746   3.2328  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -1.190e+00  6.557e-02 -18.147  < 2e-16 ***
## OD_12_17                  2.690e-04  6.794e-05   3.959 7.53e-05 ***
## rural1                   -1.380e-01  8.113e-02  -1.701  0.08901 .  
## mrp_ideology_mean        -4.928e-01  1.821e-01  -2.706  0.00681 ** 
## rural1:mrp_ideology_mean -3.077e-01  2.280e-01  -1.350  0.17715    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(35.0127) family taken to be 1)
## 
##     Null deviance: 324.660  on 87  degrees of freedom
## Residual deviance:  88.468  on 83  degrees of freedom
## AIC: 1486.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  35.01 
##           Std. Err.:  5.31 
## 
##  2 x log-likelihood:  -1474.743

The ruralxideology interaction term doesn’t decrease our AIC so we can eliminate it – bringing us back to model 5. So, let’s interpret model 5 - what does it tell us?

##                       Estimate
## (Intercept)       -1.128863744
## rural1            -0.232303471
## OD_12_17           0.000221332
## mrp_ideology_mean -0.692390364
##                    Estimate
## (Intercept)       0.3234005
## rural1            0.7927055
## OD_12_17          1.0002214
## mrp_ideology_mean 0.5003786

It tells us that controlling for the number of registered voters, the number of overdoses, and mean ideology, rural counties voted yes at a rate 0.79 of non-rural counties (p<0.05). So essentially, ideology is pulling out some of the variance accounted for by rural. For every 1 unit increase in ideology (when the population because 1 unit more conservative), we had a 50% decrease in the rate of yes votes, holding rural, OD, and registered voters constant.

Moving on – lets build on Model 5 and see if we can get that AIC even lower! (wohoo!)

Model 6: rural, OD, ideology, whiteness

m6 <- glm.nb(for_count~rural+OD_12_17+mrp_ideology_mean+white+offset(log(registered)), data=Election_Data_Final)
summary(m6)
## 
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + mrp_ideology_mean + 
##     white + offset(log(registered)), data = Election_Data_Final, 
##     init.theta = 40.55178286, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1557  -0.5828  -0.1363   0.5494   3.5219  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        8.848e-01  5.033e-01   1.758   0.0787 .  
## rural1            -2.037e-01  3.842e-02  -5.301 1.15e-07 ***
## OD_12_17          -6.784e-05  9.081e-05  -0.747   0.4550    
## mrp_ideology_mean -5.607e-01  1.094e-01  -5.125 2.98e-07 ***
## white             -2.184e+00  5.423e-01  -4.028 5.63e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(40.5518) family taken to be 1)
## 
##     Null deviance: 375.524  on 87  degrees of freedom
## Residual deviance:  88.405  on 83  degrees of freedom
## AIC: 1473.8
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  40.55 
##           Std. Err.:  6.16 
## 
##  2 x log-likelihood:  -1461.842
##                        Estimate
## (Intercept)        8.848433e-01
## rural1            -2.036767e-01
## OD_12_17          -6.783859e-05
## mrp_ideology_mean -5.607365e-01
## white             -2.184499e+00
##                    Estimate
## (Intercept)       2.4226048
## rural1            0.8157260
## OD_12_17          0.9999322
## mrp_ideology_mean 0.5707885
## white             0.1125341

Model 5 had AIC: 1486.7, so this model does improve on the AIC (1473.8) and the proportion of whites does seem to influence the number of yes votes in a statistically significant manner. So, we are keeping it in and moving on with Model 6.

Interpretation of Coefficients: Rural counties voted yes 0.82 of the rate of nonrural counties, when holding OD, ideology, proportions of whites in the county, and registered voters constant. For every 1 unit more conservative a county becomes, the rate of yes votes decreases by 43%, when holding OD, rural, proportions of whites, and registered voters constant. For every 1 unit increase in the proportion of whites in a county, the rate of yes votes decreases by 89%, holding rural, ODs, ideology and registered voters constant.

Lets see if there is an interaction effect.

m6.1 = ideology*white

library(MASS)
m6.1 <- glm.nb(for_count~rural+OD_12_17+mrp_ideology_mean*white+offset(log(registered)), data=Election_Data_Final)
summary(m6.1)
## 
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + mrp_ideology_mean * 
##     white + offset(log(registered)), data = Election_Data_Final, 
##     init.theta = 42.57485336, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1521  -0.6295  -0.1476   0.4075   3.5837  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              4.260e-01  5.459e-01   0.780  0.43518    
## rural1                  -1.787e-01  3.932e-02  -4.544 5.51e-06 ***
## OD_12_17                 4.219e-05  1.053e-04   0.401  0.68860    
## mrp_ideology_mean        1.947e+00  1.207e+00   1.613  0.10671    
## white                   -1.707e+00  5.850e-01  -2.917  0.00353 ** 
## mrp_ideology_mean:white -2.711e+00  1.302e+00  -2.082  0.03738 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(42.5749) family taken to be 1)
## 
##     Null deviance: 394.069  on 87  degrees of freedom
## Residual deviance:  88.432  on 82  degrees of freedom
## AIC: 1471.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  42.57 
##           Std. Err.:  6.48 
## 
##  2 x log-likelihood:  -1457.619

Miminal difference (1.5 decrease) in AIC so we can exclude this interaction.

Next, Model 6.2 = white*OD

m6.2 <- glm.nb(for_count~rural+OD_12_17*white+mrp_ideology_mean+offset(log(registered)), data=Election_Data_Final)
summary(m6.2)
## 
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 * white + mrp_ideology_mean + 
##     offset(log(registered)), data = Election_Data_Final, init.theta = 43.33934695, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1962  -0.6775  -0.0674   0.3910   3.6153  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        1.1197630  0.4936908   2.268  0.02332 *  
## rural1            -0.1683326  0.0397981  -4.230 2.34e-05 ***
## OD_12_17          -0.0009903  0.0003808  -2.600  0.00931 ** 
## white             -2.4905609  0.5360639  -4.646 3.38e-06 ***
## mrp_ideology_mean -0.5540262  0.1059649  -5.228 1.71e-07 ***
## OD_12_17:white     0.0012905  0.0005234   2.466  0.01368 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(43.3393) family taken to be 1)
## 
##     Null deviance: 401.072  on 87  degrees of freedom
## Residual deviance:  88.367  on 82  degrees of freedom
## AIC: 1470
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  43.34 
##           Std. Err.:  6.59 
## 
##  2 x log-likelihood:  -1456.002

Again, minimal decrease in AIC (4 pt). What the interaction term tells us in a Negative Binomial model is the degree to which one variables affect is influenced by the other variable. I have to do a little more research here to figure out the correct interpretation of the interaction term, if we decide we want to keep them in.

Next, Model 6.3 = white*rural

m6.3 <- glm.nb(for_count~rural*white+OD_12_17+mrp_ideology_mean+offset(log(registered)), data=Election_Data_Final)
summary(m6.3)
## 
## Call:
## glm.nb(formula = for_count ~ rural * white + OD_12_17 + mrp_ideology_mean + 
##     offset(log(registered)), data = Election_Data_Final, init.theta = 43.4272963, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3426  -0.5652  -0.0924   0.5678   3.2594  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        2.093e-01  5.559e-01   0.376   0.7066    
## rural1             2.204e+00  9.619e-01   2.292   0.0219 *  
## white             -1.482e+00  5.943e-01  -2.494   0.0126 *  
## OD_12_17           4.301e-05  9.834e-05   0.437   0.6618    
## mrp_ideology_mean -5.070e-01  1.077e-01  -4.710 2.48e-06 ***
## rural1:white      -2.556e+00  1.020e+00  -2.506   0.0122 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(43.4273) family taken to be 1)
## 
##     Null deviance: 401.878  on 87  degrees of freedom
## Residual deviance:  88.409  on 82  degrees of freedom
## AIC: 1469.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  43.43 
##           Std. Err.:  6.61 
## 
##  2 x log-likelihood:  -1455.867

Again, here rural and white have small effect on the AIC (4 pt decrease).

If we were to keep both significant interaction terms….

m6.3 <- glm.nb(for_count~rural*white+OD_12_17*white+mrp_ideology_mean+offset(log(registered)), data=Election_Data_Final)
summary(m6.3)
## 
## Call:
## glm.nb(formula = for_count ~ rural * white + OD_12_17 * white + 
##     mrp_ideology_mean + offset(log(registered)), data = Election_Data_Final, 
##     init.theta = 45.33544475, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2656  -0.6112  -0.1008   0.4977   3.3843  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        0.5199423  0.5621593   0.925  0.35502    
## rural1             1.7830899  0.9626806   1.852  0.06400 .  
## white             -1.8556836  0.6067012  -3.059  0.00222 ** 
## OD_12_17          -0.0007233  0.0003934  -1.839  0.06598 .  
## mrp_ideology_mean -0.5112310  0.1053964  -4.851 1.23e-06 ***
## rural1:white      -2.0779211  1.0238619  -2.029  0.04241 *  
## white:OD_12_17     0.0010448  0.0005249   1.991  0.04651 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(45.3354) family taken to be 1)
## 
##     Null deviance: 419.345  on 87  degrees of freedom
## Residual deviance:  88.381  on 81  degrees of freedom
## AIC: 1468.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  45.34 
##           Std. Err.:  6.90 
## 
##  2 x log-likelihood:  -1452.09

Our AIC is 6 points lower, but both OD and rural become not significant….

I’m going to need to check on this to see if it is worth keeping in these interaction terms. For now, I am going to stick with Model 6 and add in poverty. But we can talk about whether or not we think it is a good idea to keep the interactions white x rural and white x OD in

MODEL 7: rural, ideology, OD, white, poverty

m7 <- glm.nb(for_count~rural+OD_12_17+mrp_ideology_mean+white+poverty+offset(log(registered)), data=Election_Data_Final)
summary(m7)
## 
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + mrp_ideology_mean + 
##     white + poverty + offset(log(registered)), data = Election_Data_Final, 
##     init.theta = 41.29685166, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0349  -0.6324  -0.1262   0.5032   4.0022  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        9.848e-01  5.032e-01   1.957   0.0503 .  
## rural1            -1.842e-01  4.046e-02  -4.553 5.29e-06 ***
## OD_12_17          -6.463e-05  9.012e-05  -0.717   0.4732    
## mrp_ideology_mean -5.907e-01  1.104e-01  -5.348 8.87e-08 ***
## white             -2.220e+00  5.376e-01  -4.129 3.64e-05 ***
## poverty           -4.869e-01  3.938e-01  -1.236   0.2164    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(41.2969) family taken to be 1)
## 
##     Null deviance: 382.356  on 87  degrees of freedom
## Residual deviance:  88.404  on 82  degrees of freedom
## AIC: 1474.3
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  41.30 
##           Std. Err.:  6.28 
## 
##  2 x log-likelihood:  -1460.251

Our AIC score between Model 6 and 7 are the same, meaning that poverty is not adding anything to our model, so we can ommit it.

MODEL 8: rural, ideology, OD, white, hs

m8 <- glm.nb(for_count~rural+OD_12_17+mrp_ideology_mean+white+hs+offset(log(registered)), data=Election_Data_Final)
summary(m8)
## 
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + mrp_ideology_mean + 
##     white + hs + offset(log(registered)), data = Election_Data_Final, 
##     init.theta = 45.99793179, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3380  -0.5704  -0.1879   0.5494   3.6157  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -5.667e-01  6.366e-01  -0.890 0.373322    
## rural1            -1.732e-01  3.707e-02  -4.671 2.99e-06 ***
## OD_12_17          -1.977e-05  8.631e-05  -0.229 0.818810    
## mrp_ideology_mean -5.675e-01  1.029e-01  -5.518 3.44e-08 ***
## white             -1.848e+00  5.189e-01  -3.561 0.000369 ***
## hs                 1.260e+00  3.720e-01   3.388 0.000705 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(45.9979) family taken to be 1)
## 
##     Null deviance: 425.406  on 87  degrees of freedom
## Residual deviance:  88.473  on 82  degrees of freedom
## AIC: 1464.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  46.00 
##           Std. Err.:  7.01 
## 
##  2 x log-likelihood:  -1450.918
(coefm8 <-cbind(Estimate = coef(m8)))
##                        Estimate
## (Intercept)       -5.667478e-01
## rural1            -1.731521e-01
## OD_12_17          -1.977169e-05
## mrp_ideology_mean -5.675469e-01
## white             -1.848059e+00
## hs                 1.260393e+00
exp(coefm8)
##                    Estimate
## (Intercept)       0.5673676
## rural1            0.8410097
## OD_12_17          0.9999802
## mrp_ideology_mean 0.5669144
## white             0.1575426
## hs                3.5268077

This give us a 10 point decrease in AIC, suggesting this model is a bit better.

Interpretation of Coefficients: Rural counties voted yes 0.84 of the rate of nonrural counties, when holding OD, ideology, proportions of whites in the county, proportion of high school graduates and registered voters constant. For every 1 unit more conservative a county becomes, the rate of yes votes decreases by 43%%, when holding OD, rural, proportions of whites, high school graduates and registered voters constant. For every unit increase in in the proportion of whites in a county, the rate of yes votes decreases by 85%, holding rural, ODs, ideology, high school graduates and registered voters constant. For every 1 unit increase in in the percentage of high school graduates in a county, the rate of yes votes increases by 353%, holding rural, ODs, ideology, and registered voters constant. All of these findings are statistically significant.

I am going to skip over the interactions and add in the proportion of county residents with a bachelors degree.

MODEL 9: rural, ideology, OD, white, hs, ba

m9 <- glm.nb(for_count~rural+OD_12_17+mrp_ideology_mean+white+hs+BA+offset(log(registered)), data=Election_Data_Final)
summary(m9)
## 
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + mrp_ideology_mean + 
##     white + hs + BA + offset(log(registered)), data = Election_Data_Final, 
##     init.theta = 55.87353615, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7706  -0.7047  -0.1453   0.6477   3.1945  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -0.3176545  0.5812118  -0.547 0.584696    
## rural1            -0.1223276  0.0356256  -3.434 0.000595 ***
## OD_12_17          -0.0000391  0.0000785  -0.498 0.618440    
## mrp_ideology_mean -0.4640974  0.0958987  -4.839 1.30e-06 ***
## white             -1.6690249  0.4732715  -3.527 0.000421 ***
## hs                 0.4758415  0.3847196   1.237 0.216142    
## BA                 1.1414951  0.2605932   4.380 1.18e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(55.8735) family taken to be 1)
## 
##     Null deviance: 515.535  on 87  degrees of freedom
## Residual deviance:  88.402  on 81  degrees of freedom
## AIC: 1449.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  55.87 
##           Std. Err.:  8.55 
## 
##  2 x log-likelihood:  -1433.928
(coefm9 <-cbind(Estimate = coef(m9)))
##                        Estimate
## (Intercept)       -3.176545e-01
## rural1            -1.223276e-01
## OD_12_17          -3.909565e-05
## mrp_ideology_mean -4.640974e-01
## white             -1.669025e+00
## hs                 4.758415e-01
## BA                 1.141495e+00
exp(coefm9)
##                    Estimate
## (Intercept)       0.7278542
## rural1            0.8848584
## OD_12_17          0.9999609
## mrp_ideology_mean 0.6287023
## white             0.1884307
## hs                1.6093679
## BA                3.1314466

When we add in BA, HS becomes not significant, which isn’t really surprising, considering what they measure overlaps. Adding BA decreases the AIC from Model 8 down 15 points. Notwithstanding interaction terms, it appears Model 9 is the best so far, but since OD is not significant and its effect was only marginal to begin with. I am going to try one more model that removes OD to see if our AIC improves or stays the same.

MODEL 10: rural, ideology, white, hs, ba

m10 <- glm.nb(for_count~rural+mrp_ideology_mean+white+hs+BA+offset(log(registered)), data=Election_Data_Final)
summary(m10)
## 
## Call:
## glm.nb(formula = for_count ~ rural + mrp_ideology_mean + white + 
##     hs + BA + offset(log(registered)), data = Election_Data_Final, 
##     init.theta = 55.71489903, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7659  -0.7325  -0.1318   0.5938   3.2633  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -0.52528    0.40519  -1.296 0.194840    
## rural1            -0.12118    0.03564  -3.400 0.000674 ***
## mrp_ideology_mean -0.46347    0.09594  -4.831 1.36e-06 ***
## white             -1.48217    0.28676  -5.169 2.36e-07 ***
## hs                 0.50831    0.38013   1.337 0.181159    
## BA                 1.13199    0.26050   4.345 1.39e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(55.7149) family taken to be 1)
## 
##     Null deviance: 514.090  on 87  degrees of freedom
## Residual deviance:  88.404  on 82  degrees of freedom
## AIC: 1448.2
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  55.71 
##           Std. Err.:  8.52 
## 
##  2 x log-likelihood:  -1434.176
(coefm10 <-cbind(Estimate = coef(m10)))
##                     Estimate
## (Intercept)       -0.5252807
## rural1            -0.1211783
## mrp_ideology_mean -0.4634722
## white             -1.4821669
## hs                 0.5083088
## BA                 1.1319943
exp(coefm10)
##                    Estimate
## (Intercept)       0.5913893
## rural1            0.8858760
## mrp_ideology_mean 0.6290955
## white             0.2271450
## hs                1.6624772
## BA                3.1018364

Interpretation of Coefficients: Rural counties voted yes 0.89 of the rate of nonrural counties, when holding ideology, proportion of whites in the county, proportion of high school graduates, BA and registered voters constant. For every 1 unit more conservative a county becomes, the rate of yes votes decreases by 37%, when holding OD, rural, proportions of whites, high school graduates and registered voters constant. For every 1 unit increase in in the proportion of whites in a county, the rate of yes votes decreases by 77%, holding rural, ODs, ideology, high school graduates and registered voters constant. For every 1 unit increase in bachelors graduates in a county, the rate of yes votes increases by 310.2%, holding rural, ODs, ideology, high school graduates and registered voters constant. All of these findings are statistically significant.

MODEL 11: rural, ideology, white, hs, ba, dem_gov

Just to check to see how good our measure of ideology is….lets add in the number of votes for democratic governor.

m11 <- glm.nb(for_count~rural+mrp_ideology_mean+white+hs+BA++dem_gov+offset(log(registered)), data=Election_Data_Final)
summary(m11)
## 
## Call:
## glm.nb(formula = for_count ~ rural + mrp_ideology_mean + white + 
##     hs + BA + +dem_gov + offset(log(registered)), data = Election_Data_Final, 
##     init.theta = 57.68288618, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7705  -0.6283  -0.1572   0.5581   3.0783  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        7.085e-02  5.257e-01   0.135 0.892793    
## rural1            -1.178e-01  3.514e-02  -3.353 0.000799 ***
## mrp_ideology_mean -4.816e-01  9.514e-02  -5.062 4.15e-07 ***
## white             -2.026e+00  4.221e-01  -4.801 1.58e-06 ***
## hs                 4.116e-01  3.776e-01   1.090 0.275812    
## BA                 1.207e+00  2.593e-01   4.656 3.22e-06 ***
## dem_gov           -9.440e-07  5.368e-07  -1.758 0.078686 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(57.6829) family taken to be 1)
## 
##     Null deviance: 532.003  on 87  degrees of freedom
## Residual deviance:  88.424  on 81  degrees of freedom
## AIC: 1447.2
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  57.68 
##           Std. Err.:  8.83 
## 
##  2 x log-likelihood:  -1431.183
(coefm11 <-cbind(Estimate = coef(m11)))
##                        Estimate
## (Intercept)        7.085178e-02
## rural1            -1.178195e-01
## mrp_ideology_mean -4.816229e-01
## white             -2.026489e+00
## hs                 4.115509e-01
## BA                 1.207428e+00
## dem_gov           -9.439691e-07
exp(coefm11)
##                    Estimate
## (Intercept)       1.0734221
## rural1            0.8888565
## mrp_ideology_mean 0.6177800
## white             0.1317975
## hs                1.5091566
## BA                3.3448715
## dem_gov           0.9999991

The AIC doesnt change so it looks like our ideology measure accounts for the same thing our votes for democratic governor does. But, since it might be easier to interpret democratic governor counts vs. the mean ideology score, lets substitute dem_gov in for ideology.

MODEL 12: rural, dem_gov, white, hs, ba

m12 <- glm.nb(for_count~rural+dem_gov+white+hs+BA+offset(log(registered)), data=Election_Data_Final)
summary(m12)
## 
## Call:
## glm.nb(formula = for_count ~ rural + dem_gov + white + hs + BA + 
##     offset(log(registered)), data = Election_Data_Final, init.theta = 44.47616583, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.4288  -0.5746  -0.0302   0.5219   3.6666  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  5.230e-01  5.896e-01   0.887  0.37503    
## rural1      -1.079e-01  3.995e-02  -2.701  0.00692 ** 
## dem_gov     -6.571e-07  6.057e-07  -1.085  0.27800    
## white       -2.526e+00  4.692e-01  -5.384 7.27e-08 ***
## hs           1.853e-01  4.250e-01   0.436  0.66283    
## BA           1.527e+00  2.889e-01   5.284 1.26e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(44.4762) family taken to be 1)
## 
##     Null deviance: 411.481  on 87  degrees of freedom
## Residual deviance:  88.384  on 82  degrees of freedom
## AIC: 1467.8
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  44.48 
##           Std. Err.:  6.77 
## 
##  2 x log-likelihood:  -1453.761
(coefm12 <-cbind(Estimate = coef(m12)))
##                  Estimate
## (Intercept)  5.230107e-01
## rural1      -1.078817e-01
## dem_gov     -6.570572e-07
## white       -2.526497e+00
## hs           1.853126e-01
## BA           1.526511e+00
exp(coefm12)
##               Estimate
## (Intercept) 1.68709932
## rural1      0.89773376
## dem_gov     0.99999934
## white       0.07993857
## hs          1.20359458
## BA          4.60208998

Well, this is confirmation for the superiority of the ideology mean scover vs. just using governor counts so model 10 is our winner!!!

Next Steps Let’s discuss what we think is interesting about these findings and whether or not you would like me to check the remaining interaction terms. I suggest that we include interactions only when we have a reason for doing so because there are so many of them.