Variables | Description |
---|---|
BA | Percent bachelor’s degree or higher 2017 estimate |
county | Ohio county name |
rural | rural = 1 non-rural = 0 |
region | region name |
market | media market |
rep_gov | Mike DeWine and Jon Husted (R) votes |
dem_gov | Richard Cordray and Betty Sutton (D) votes |
dem_senate | Sherrod Brown (D) votes |
rep_senate | Jim Renacci (R) votes |
for_count | Issue 1 for votes |
agnst_count | Issue 1 gainst votes |
OD_12_17 | Number of Unintentional Drug Overdose Deaths and Average Crude and Age-Adjusted Annual Death Rates Per 100000 Population by County 2005-2017 |
mrp_ideology_mean | Mean ideology of each district |
registered | Number of registered persons |
white | Proportion RACE - One race - White; 2017 estimate |
poverty | Proportion below poverty level; 2017 estimate; variable name in original dataset:HC03_EST_VC01 |
hs | Proportion high school graduate or higher 2017 estimate |
## county rural region market rep_gov
## Length:88 0:39 Central :20 Columbus :19 Min. : 2561
## Class :character 1:49 Northeast:20 Cleveland :17 1st Qu.: 8210
## Mode :character Northwest:12 Toledo :12 Median : 13604
## Southeast:14 Dayton :11 Mean : 25363
## Southwest: 8 Cincinnati: 8 3rd Qu.: 26093
## West :14 Charleston: 7 Max. :166057
## (Other) :14
## dem_gov dem_senate rep_senate for_count
## Min. : 1338 Min. : 1622 Min. : 2432 Min. : 681
## 1st Qu.: 3603 1st Qu.: 4438 1st Qu.: 7564 1st Qu.: 2376
## Median : 6842 Median : 8238 Median : 12221 Median : 4423
## Mean : 23498 Mean : 25986 Mean : 22862 Mean : 18454
## 3rd Qu.: 16550 3rd Qu.: 19706 3rd Qu.: 23908 3rd Qu.: 14116
## Max. :323276 Max. :338519 Max. :148064 Max. :251827
##
## agnst_count OD_12_17 mrp_ideology_mean registered
## Min. : 3312 Min. : 7.0 Min. :-0.2976 Min. : 4171
## 1st Qu.: 10170 1st Qu.: 35.5 1st Qu.: 0.1705 1st Qu.: 13020
## Median : 15838 Median : 69.5 Median : 0.2937 Median : 20034
## Mean : 31468 Mean : 210.3 Mean : 0.2801 Mean : 49800
## 3rd Qu.: 30870 3rd Qu.: 155.0 3rd Qu.: 0.4046 3rd Qu.: 45293
## Max. :228899 Max. :2160.0 Max. : 0.8261 Max. :477651
##
## white poverty hs BA
## Min. :0.6300 Min. :0.0510 Min. :0.5820 Min. :0.0850
## 1st Qu.:0.9045 1st Qu.:0.1153 1st Qu.:0.8685 1st Qu.:0.1417
## Median :0.9435 Median :0.1415 Median :0.8950 Median :0.1650
## Mean :0.9192 Mean :0.1451 Mean :0.8851 Mean :0.1976
## 3rd Qu.:0.9645 3rd Qu.:0.1760 3rd Qu.:0.9062 3rd Qu.:0.2333
## Max. :0.9930 Max. :0.3020 Max. :0.9670 Max. :0.5380
##
The Dependent variables that we are most interested in are the “for_count” for issue 1 and the “agnst_count”. Both show a pretty large range, as does our control variable “registered” – which is the registered number of voters. That means that we do have some sufficient variance to analyze, which is a good thing.
Let’s start by analyzing independent variable (IV) at a time with one dependent variable “for count”. We are going to do a step-wise method of regression, which means that we are dropping in one IV into the equation at a time, and then take a look to see if our model performs better or worse with that IV. If our model performs better, then we can keep the IV and add another. If the model performs worse, we remove the IV and move on to the next.
Because we have some theoretical reasons as to why “Rural” might explain for differences in voting patterns, we will use it at the first variable. Rural is a dichtomous variable where 1=rural, and 0=non-rural. I’m using the term non-rural here because I simply checked a list of rural counties and do not know the designation of the other counties, aside from the fact that they were not on the rural list.
First, let’s take a look at some descriptive statistics to get a feel for what to expect. The analysis of rural to some of our other variables is presented first.
First, we can see that there are more rural (1) than non-rural counties (0) in our dataset, but the difference isn’t staggering. This bar chart also gives us a comparison of the composition of the regions. For example, we can see that the Western region has more rural than non-rural counties, while the Southwest region has more non-rural counties than rural counties. Each region does have a mix of both rural and non-rural counties. This also tells us that the Region and Rural variables are not equivalent and that Region may also help explain some of the variance.
In this graph we can see the overlap between rural areas and the various markets. Both of these graphs show us in the least that the variables, market and region add different information to the analysis and they are not all measuring the exact same construct.
Now on to looking at some descriptive statistics that will tell us what to expect when we look at the relationships to rural and our dependent variables.
This graph tells us that the vast majority of the yes votes came from non-rural counties. But, we aren’t yet accounting for population differences here.
But a similarly large number of no votes came from non-rural counties as well – suggesting that the rural predictor may not tell us much…especially after accounting for the differences in population. So let’s control for population by using the variable “registered”.
I’m leaving some of the code visible in this section so that I do not have to retype out the equation or type of equation used for subsequent models, since we will be running quite a few. Here is the model that you can see reflected in the code.
model 1 => for_count = rural + offset(log(registered))
for_count is our Y or dependent variable. rural is our independent variable (X), which is our predictor. registered is our offset variable which is essentially a control variable.
We will try a poisson model first.
m1 <- glm(for_count~rural+offset(log(registered)), family=poisson, data=Election_Data_Final)
summary(m1)
##
## Call:
## glm(formula = for_count ~ rural + offset(log(registered)), family = poisson,
## data = Election_Data_Final)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -63.989 -32.060 -10.742 -1.477 128.463
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.9053923 0.0008313 -1089.1 <2e-16 ***
## rural1 -0.6086443 0.0025187 -241.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 170982 on 87 degrees of freedom
## Residual deviance: 102618 on 86 degrees of freedom
## AIC: 103552
##
## Number of Fisher Scoring iterations: 4
A Poisson model suggests that rural is a statistically significant predictor. Pr(>z) is like a p value. Let’s transform the coefficient into a incidence rate ration so that it is more easily interpreted.
## Estimate
## (Intercept) -0.9053923
## rural1 -0.6086443
## Estimate
## (Intercept) 0.4043832
## rural1 0.5440880
This tells us that after controlling for the number of registered voters, rural counties vote yes at 0.54 of the rate of non-rural counties (p<0.00).
Now, we need to test for something called overdispersion because the median of our deviance residuals are not =0. To do that we can run a negative binomial model using the same variables.
Let’s call it Model 1.5.
library(MASS)
m1.5 <- glm.nb(for_count~rural+offset(log(registered)), data=Election_Data_Final)
summary(m1.5)
##
## Call:
## glm.nb(formula = for_count ~ rural + offset(log(registered)),
## data = Election_Data_Final, init.theta = 15.75630759, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9702 -0.7115 -0.1512 0.3946 3.6699
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.15991 0.04037 -28.732 < 2e-16 ***
## rural1 -0.41036 0.05416 -7.577 3.53e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(15.7563) family taken to be 1)
##
## Null deviance: 146.782 on 87 degrees of freedom
## Residual deviance: 88.858 on 86 degrees of freedom
## AIC: 1551.4
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 15.76
## Std. Err.: 2.36
##
## 2 x log-likelihood: -1545.409
Now for some stats speak. The AIC of Model 1 is 103,552 which is significantly higher than the AIC of Model 1.5 (1,551.4). Further, model 1.5’s medial deviance residual is nearly 0. Both of these factors suggest that Model 1.5 better fits the data. And– lucky for us – the coefficient for rural is still significant. Now, let’s tranform them into IRR’s again so that we can better interpret the coefficient.
## Estimate
## (Intercept) -1.1599098
## rural1 -0.4103574
## Estimate
## (Intercept) 0.3135145
## rural1 0.6634131
This tells us that after controlling for the number of registered voters, rural counties vote yes at 0.66 of the rate of non-rural counties (p<0.00).
Let’s keep going with model 1.5 and see if we can improve the model fit by adding in additional variables and even interactions.
m2 <- glm.nb(for_count~rural+region+offset(log(registered)), data=Election_Data_Final)
summary(m2)
##
## Call:
## glm.nb(formula = for_count ~ rural + region + offset(log(registered)),
## data = Election_Data_Final, init.theta = 17.09488586, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3193 -0.5581 -0.1877 0.2853 4.5028
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.21644 0.05956 -20.424 < 2e-16 ***
## rural1 -0.39246 0.05495 -7.142 9.18e-13 ***
## regionNortheast 0.10323 0.07666 1.347 0.178
## regionNorthwest 0.14492 0.09004 1.610 0.107
## regionSoutheast -0.08264 0.08658 -0.954 0.340
## regionSouthwest 0.06678 0.10145 0.658 0.510
## regionWest 0.04913 0.08513 0.577 0.564
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(17.0949) family taken to be 1)
##
## Null deviance: 159.200 on 87 degrees of freedom
## Residual deviance: 88.752 on 81 degrees of freedom
## AIC: 1554.1
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 17.09
## Std. Err.: 2.56
##
## 2 x log-likelihood: -1538.09
I added “region” to the model and it appears that our model fits slightly worse (AIC: 1551.4 vs. 1554.1) (We want the AIC as low as we can get it.). Also, none of the region coefficents are significant, so we can eliminate it from our model. But, what about an interaction between rural and region?
m2.5 <- glm.nb(for_count~rural*region+offset(log(registered)), data=Election_Data_Final)
summary(m2.5)
##
## Call:
## glm.nb(formula = for_count ~ rural * region + offset(log(registered)),
## data = Election_Data_Final, init.theta = 18.40948297, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2623 -0.5697 -0.1948 0.2777 4.3184
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.23939 0.07038 -17.611 < 2e-16 ***
## rural1 -0.34196 0.10506 -3.255 0.00113 **
## regionNortheast 0.12029 0.09739 1.235 0.21677
## regionNorthwest 0.28042 0.15194 1.846 0.06495 .
## regionSoutheast -0.31456 0.15208 -2.068 0.03860 *
## regionSouthwest 0.11825 0.12581 0.940 0.34729
## regionWest 0.16601 0.12581 1.320 0.18698
## rural1:regionNortheast -0.03576 0.14961 -0.239 0.81111
## rural1:regionNorthwest -0.20369 0.18772 -1.085 0.27789
## rural1:regionSoutheast 0.26443 0.18495 1.430 0.15278
## rural1:regionSouthwest -0.12869 0.20040 -0.642 0.52077
## rural1:regionWest -0.20097 0.16730 -1.201 0.22965
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(18.4095) family taken to be 1)
##
## Null deviance: 171.388 on 87 degrees of freedom
## Residual deviance: 88.713 on 76 degrees of freedom
## AIC: 1557.5
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 18.41
## Std. Err.: 2.76
##
## 2 x log-likelihood: -1531.504
Again, here our AIC: 1557.5 is larger than model 1.5, suggesting a worse fit. So both region and the interaaction variable can be ruled out leaving us with model 1.5. So we have eliminated Region and will add Market to model 1.5.
m3 <- glm.nb(for_count~rural+market+offset(log(registered)), data=Election_Data_Final)
summary(m3)
##
## Call:
## glm.nb(formula = for_count ~ rural + market + offset(log(registered)),
## data = Election_Data_Final, init.theta = 18.77863195, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.4136 -0.5930 -0.0885 0.3420 3.9436
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.150288 0.099095 -11.608 < 2e-16 ***
## rural1 -0.409195 0.054096 -7.564 3.9e-14 ***
## marketCincinnati 0.006637 0.122622 0.054 0.9568
## marketCleveland 0.053208 0.106742 0.498 0.6182
## marketColumbus -0.062132 0.105122 -0.591 0.5545
## marketDayton 0.039114 0.112583 0.347 0.7283
## marketFt. Wayne -0.166553 0.186258 -0.894 0.3712
## marketLima -0.259761 0.251369 -1.033 0.3014
## marketParkersburg -0.118982 0.247419 -0.481 0.6306
## marketToledo 0.090870 0.110297 0.824 0.4100
## marketWheeling -0.322822 0.129556 -2.492 0.0127 *
## marketYoungstown -0.010655 0.162018 -0.066 0.9476
## marketZanesville 0.010186 0.247282 0.041 0.9671
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(18.7786) family taken to be 1)
##
## Null deviance: 174.809 on 87 degrees of freedom
## Residual deviance: 88.645 on 75 degrees of freedom
## AIC: 1557.7
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 18.78
## Std. Err.: 2.82
##
## 2 x log-likelihood: -1529.683
Based on the AIC (1557.7), it appears that “market” isn’t adding much to our equation and can be eliminated. Is there an interaction between rural and market?
m3.5 <- glm.nb(for_count~rural*market+offset(log(registered)), data=Election_Data_Final)
summary(m3.5)
##
## Call:
## glm.nb(formula = for_count ~ rural * market + offset(log(registered)),
## data = Election_Data_Final, init.theta = 20.67489761, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3965 -0.5572 -0.0888 0.3443 3.7295
##
## Coefficients: (4 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.70182 0.22053 -7.717 1.19e-14 ***
## rural1 0.21110 0.23828 0.886 0.37565
## marketCincinnati 0.58071 0.24149 2.405 0.01619 *
## marketCleveland 0.59094 0.23125 2.555 0.01061 *
## marketColumbus 0.46249 0.23032 2.008 0.04464 *
## marketDayton 0.69743 0.24645 2.830 0.00466 **
## marketFt. Wayne -0.23532 0.18071 -1.302 0.19285
## marketLima 0.29177 0.31163 0.936 0.34913
## marketParkersburg -0.18774 0.23820 -0.788 0.43061
## marketToledo 0.74288 0.25452 2.919 0.00351 **
## marketWheeling 0.21429 0.27001 0.794 0.42740
## marketYoungstown 0.54075 0.26988 2.004 0.04511 *
## marketZanesville -0.05857 0.23806 -0.246 0.80566
## rural1:marketCincinnati -0.68178 0.28761 -2.370 0.01777 *
## rural1:marketCleveland -0.58696 0.26184 -2.242 0.02498 *
## rural1:marketColumbus -0.55720 0.25941 -2.148 0.03172 *
## rural1:marketDayton -0.79393 0.27537 -2.883 0.00394 **
## rural1:marketFt. Wayne NA NA NA NA
## rural1:marketLima NA NA NA NA
## rural1:marketParkersburg NA NA NA NA
## rural1:marketToledo -0.75675 0.27988 -2.704 0.00685 **
## rural1:marketWheeling -0.59859 0.30562 -1.959 0.05016 .
## rural1:marketYoungstown -0.61990 0.35983 -1.723 0.08493 .
## rural1:marketZanesville NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(20.6749) family taken to be 1)
##
## Null deviance: 192.373 on 87 degrees of freedom
## Residual deviance: 88.585 on 68 degrees of freedom
## AIC: 1563.1
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 20.67
## Std. Err.: 3.11
##
## 2 x log-likelihood: -1521.136
It appears that rural x market ’s interation does not perform better than Model 1.5 (AIC: 1563.1 vs. AIC: 1551.4) So we can eliminate it as well, keep model 1.5 and move on to the next variable.
m4 <- glm.nb(for_count~rural+OD_12_17+offset(log(registered)), data=Election_Data_Final)
summary(m4)
##
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + offset(log(registered)),
## data = Election_Data_Final, init.theta = 23.62461016, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2188 -0.6348 -0.1015 0.3952 4.5322
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.342e+00 4.122e-02 -32.551 < 2e-16 ***
## rural1 -2.534e-01 4.902e-02 -5.168 2.36e-07 ***
## OD_12_17 3.979e-04 6.166e-05 6.454 1.09e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(23.6246) family taken to be 1)
##
## Null deviance: 219.66 on 87 degrees of freedom
## Residual deviance: 88.62 on 85 degrees of freedom
## AIC: 1517.4
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 23.62
## Std. Err.: 3.56
##
## 2 x log-likelihood: -1509.423
The number of overdoses from 2012-2017 does contribute to improving model fit over Model 1.5, based on its AIC of 1517 vs AIC: 1551.4.
## Estimate
## (Intercept) -1.3416407868
## rural1 -0.2533713769
## OD_12_17 0.0003979476
## Estimate
## (Intercept) 0.2614164
## rural1 0.7761796
## OD_12_17 1.0003980
Unfortunately, the effect size is very small, as the IRR is 0.0004 for OD. It really isn’t worth commenting on. However, it does change the coefficient for rural. Rural counties vote yes at 0.78 of the rate of non-rural counties (p<0.00), after controlling for the number of overdoses and the number of registered voters. So far, Model 4 is our leader, but let’s check for an interaction between rural and overdoses.
m4.5 <- glm.nb(for_count~rural*OD_12_17+offset(log(registered)), data=Election_Data_Final)
summary(m4.5)
##
## Call:
## glm.nb(formula = for_count ~ rural * OD_12_17 + offset(log(registered)),
## data = Election_Data_Final, init.theta = 24.99759174, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0066 -0.6803 -0.0676 0.3697 4.8059
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.336e+00 4.014e-02 -33.288 < 2e-16 ***
## rural1 -3.479e-01 6.190e-02 -5.620 1.91e-08 ***
## OD_12_17 3.841e-04 6.022e-05 6.379 1.78e-10 ***
## rural1:OD_12_17 1.478e-03 6.318e-04 2.339 0.0194 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(24.9976) family taken to be 1)
##
## Null deviance: 232.35 on 87 degrees of freedom
## Residual deviance: 88.53 on 84 degrees of freedom
## AIC: 1514.4
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 25.00
## Std. Err.: 3.77
##
## 2 x log-likelihood: -1504.364
This model has a slightly better AIC of 1514.4 vs AIC of 1517.
## Estimate
## (Intercept) -1.3361258101
## rural1 -0.3478615913
## OD_12_17 0.0003841533
## rural1:OD_12_17 0.0014775053
## Estimate
## (Intercept) 0.2628621
## rural1 0.7061966
## OD_12_17 1.0003842
## rural1:OD_12_17 1.0014786
But again, the effect size is minimal. When possible it is better to stick to simpler models, so I vote we stick with Model 4 and NOT include the interaction term. So, lets continue with Model 4 and add in the next variable. (Also, in case anyone is wondering, we don’t have to worry about colinearity with count models like we do with traditional linear models.)
m5 <- glm.nb(for_count~rural+OD_12_17+mrp_ideology_mean+offset(log(registered)), data=Election_Data_Final)
summary(m5)
##
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + mrp_ideology_mean +
## offset(log(registered)), data = Election_Data_Final, init.theta = 34.25353992,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3431 -0.5946 -0.1648 0.5536 3.4890
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1288637 0.0494595 -22.824 < 2e-16 ***
## rural1 -0.2323035 0.0410146 -5.664 1.48e-08 ***
## OD_12_17 0.0002213 0.0000591 3.745 0.00018 ***
## mrp_ideology_mean -0.6923904 0.1143334 -6.056 1.40e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(34.2535) family taken to be 1)
##
## Null deviance: 317.678 on 87 degrees of freedom
## Residual deviance: 88.459 on 84 degrees of freedom
## AIC: 1486.7
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 34.25
## Std. Err.: 5.19
##
## 2 x log-likelihood: -1476.652
Ideology does contribute to our model, decreasing our AIC further than model 4 did. Let’s see if there is an interaction, before we interpret the coefficients.
First lets check the potention interaction between ideology with OD.
m5.2 <- glm.nb(for_count~rural+OD_12_17*mrp_ideology_mean+offset(log(registered)), data=Election_Data_Final)
summary(m5.2)
##
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 * mrp_ideology_mean +
## offset(log(registered)), data = Election_Data_Final, init.theta = 36.14154168,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3552 -0.6116 -0.1997 0.4982 3.4440
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.138e+00 4.827e-02 -23.577 < 2e-16 ***
## rural1 -2.089e-01 4.132e-02 -5.054 4.32e-07 ***
## OD_12_17 2.663e-04 6.109e-05 4.360 1.30e-05 ***
## mrp_ideology_mean -7.708e-01 1.175e-01 -6.559 5.41e-11 ***
## OD_12_17:mrp_ideology_mean 5.147e-04 2.308e-04 2.230 0.0257 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(36.1415) family taken to be 1)
##
## Null deviance: 335.036 on 87 degrees of freedom
## Residual deviance: 88.446 on 83 degrees of freedom
## AIC: 1483.9
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 36.14
## Std. Err.: 5.48
##
## 2 x log-likelihood: -1471.945
Overdoses and ideology do interact, but the effect size is very small and theoretically, it doesn’t add much to the story. Also, the decrease in AIC is very minimal (AIC: 1483.9 vs. AIC: 1486.7). So, my vote is that it stays out of the model.
Now we need to check for the interaction between ideology and rural.
m5.3 <- glm.nb(for_count~OD_12_17+rural*mrp_ideology_mean+offset(log(registered)), data=Election_Data_Final)
summary(m5.3)
##
## Call:
## glm.nb(formula = for_count ~ OD_12_17 + rural * mrp_ideology_mean +
## offset(log(registered)), data = Election_Data_Final, init.theta = 35.01273488,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3282 -0.5587 -0.1530 0.4746 3.2328
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.190e+00 6.557e-02 -18.147 < 2e-16 ***
## OD_12_17 2.690e-04 6.794e-05 3.959 7.53e-05 ***
## rural1 -1.380e-01 8.113e-02 -1.701 0.08901 .
## mrp_ideology_mean -4.928e-01 1.821e-01 -2.706 0.00681 **
## rural1:mrp_ideology_mean -3.077e-01 2.280e-01 -1.350 0.17715
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(35.0127) family taken to be 1)
##
## Null deviance: 324.660 on 87 degrees of freedom
## Residual deviance: 88.468 on 83 degrees of freedom
## AIC: 1486.7
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 35.01
## Std. Err.: 5.31
##
## 2 x log-likelihood: -1474.743
The ruralxideology interaction term doesn’t decrease our AIC so we can eliminate it – bringing us back to model 5. So, let’s interpret model 5 - what does it tell us?
## Estimate
## (Intercept) -1.128863744
## rural1 -0.232303471
## OD_12_17 0.000221332
## mrp_ideology_mean -0.692390364
## Estimate
## (Intercept) 0.3234005
## rural1 0.7927055
## OD_12_17 1.0002214
## mrp_ideology_mean 0.5003786
It tells us that controlling for the number of registered voters, the number of overdoses, and mean ideology, rural counties voted yes at a rate 0.79 of non-rural counties (p<0.05). So essentially, ideology is pulling out some of the variance accounted for by rural. For every 1 unit increase in ideology (when the population because 1 unit more conservative), we had a 50% decrease in the rate of yes votes, holding rural, OD, and registered voters constant.
Moving on – lets build on Model 5 and see if we can get that AIC even lower! (wohoo!)
m6 <- glm.nb(for_count~rural+OD_12_17+mrp_ideology_mean+white+offset(log(registered)), data=Election_Data_Final)
summary(m6)
##
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + mrp_ideology_mean +
## white + offset(log(registered)), data = Election_Data_Final,
## init.theta = 40.55178286, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1557 -0.5828 -0.1363 0.5494 3.5219
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 8.848e-01 5.033e-01 1.758 0.0787 .
## rural1 -2.037e-01 3.842e-02 -5.301 1.15e-07 ***
## OD_12_17 -6.784e-05 9.081e-05 -0.747 0.4550
## mrp_ideology_mean -5.607e-01 1.094e-01 -5.125 2.98e-07 ***
## white -2.184e+00 5.423e-01 -4.028 5.63e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(40.5518) family taken to be 1)
##
## Null deviance: 375.524 on 87 degrees of freedom
## Residual deviance: 88.405 on 83 degrees of freedom
## AIC: 1473.8
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 40.55
## Std. Err.: 6.16
##
## 2 x log-likelihood: -1461.842
## Estimate
## (Intercept) 8.848433e-01
## rural1 -2.036767e-01
## OD_12_17 -6.783859e-05
## mrp_ideology_mean -5.607365e-01
## white -2.184499e+00
## Estimate
## (Intercept) 2.4226048
## rural1 0.8157260
## OD_12_17 0.9999322
## mrp_ideology_mean 0.5707885
## white 0.1125341
Model 5 had AIC: 1486.7, so this model does improve on the AIC (1473.8) and the proportion of whites does seem to influence the number of yes votes in a statistically significant manner. So, we are keeping it in and moving on with Model 6.
Interpretation of Coefficients: Rural counties voted yes 0.82 of the rate of nonrural counties, when holding OD, ideology, proportions of whites in the county, and registered voters constant. For every 1 unit more conservative a county becomes, the rate of yes votes decreases by 43%, when holding OD, rural, proportions of whites, and registered voters constant. For every 1 unit increase in the proportion of whites in a county, the rate of yes votes decreases by 89%, holding rural, ODs, ideology and registered voters constant.
Lets see if there is an interaction effect.
m6.1 = ideology*white
library(MASS)
m6.1 <- glm.nb(for_count~rural+OD_12_17+mrp_ideology_mean*white+offset(log(registered)), data=Election_Data_Final)
summary(m6.1)
##
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + mrp_ideology_mean *
## white + offset(log(registered)), data = Election_Data_Final,
## init.theta = 42.57485336, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1521 -0.6295 -0.1476 0.4075 3.5837
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.260e-01 5.459e-01 0.780 0.43518
## rural1 -1.787e-01 3.932e-02 -4.544 5.51e-06 ***
## OD_12_17 4.219e-05 1.053e-04 0.401 0.68860
## mrp_ideology_mean 1.947e+00 1.207e+00 1.613 0.10671
## white -1.707e+00 5.850e-01 -2.917 0.00353 **
## mrp_ideology_mean:white -2.711e+00 1.302e+00 -2.082 0.03738 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(42.5749) family taken to be 1)
##
## Null deviance: 394.069 on 87 degrees of freedom
## Residual deviance: 88.432 on 82 degrees of freedom
## AIC: 1471.6
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 42.57
## Std. Err.: 6.48
##
## 2 x log-likelihood: -1457.619
Miminal difference (1.5 decrease) in AIC so we can exclude this interaction.
Next, Model 6.2 = white*OD
m6.2 <- glm.nb(for_count~rural+OD_12_17*white+mrp_ideology_mean+offset(log(registered)), data=Election_Data_Final)
summary(m6.2)
##
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 * white + mrp_ideology_mean +
## offset(log(registered)), data = Election_Data_Final, init.theta = 43.33934695,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1962 -0.6775 -0.0674 0.3910 3.6153
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1197630 0.4936908 2.268 0.02332 *
## rural1 -0.1683326 0.0397981 -4.230 2.34e-05 ***
## OD_12_17 -0.0009903 0.0003808 -2.600 0.00931 **
## white -2.4905609 0.5360639 -4.646 3.38e-06 ***
## mrp_ideology_mean -0.5540262 0.1059649 -5.228 1.71e-07 ***
## OD_12_17:white 0.0012905 0.0005234 2.466 0.01368 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(43.3393) family taken to be 1)
##
## Null deviance: 401.072 on 87 degrees of freedom
## Residual deviance: 88.367 on 82 degrees of freedom
## AIC: 1470
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 43.34
## Std. Err.: 6.59
##
## 2 x log-likelihood: -1456.002
Again, minimal decrease in AIC (4 pt). What the interaction term tells us in a Negative Binomial model is the degree to which one variables affect is influenced by the other variable. I have to do a little more research here to figure out the correct interpretation of the interaction term, if we decide we want to keep them in.
Next, Model 6.3 = white*rural
m6.3 <- glm.nb(for_count~rural*white+OD_12_17+mrp_ideology_mean+offset(log(registered)), data=Election_Data_Final)
summary(m6.3)
##
## Call:
## glm.nb(formula = for_count ~ rural * white + OD_12_17 + mrp_ideology_mean +
## offset(log(registered)), data = Election_Data_Final, init.theta = 43.4272963,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3426 -0.5652 -0.0924 0.5678 3.2594
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.093e-01 5.559e-01 0.376 0.7066
## rural1 2.204e+00 9.619e-01 2.292 0.0219 *
## white -1.482e+00 5.943e-01 -2.494 0.0126 *
## OD_12_17 4.301e-05 9.834e-05 0.437 0.6618
## mrp_ideology_mean -5.070e-01 1.077e-01 -4.710 2.48e-06 ***
## rural1:white -2.556e+00 1.020e+00 -2.506 0.0122 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(43.4273) family taken to be 1)
##
## Null deviance: 401.878 on 87 degrees of freedom
## Residual deviance: 88.409 on 82 degrees of freedom
## AIC: 1469.9
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 43.43
## Std. Err.: 6.61
##
## 2 x log-likelihood: -1455.867
Again, here rural and white have small effect on the AIC (4 pt decrease).
If we were to keep both significant interaction terms….
m6.3 <- glm.nb(for_count~rural*white+OD_12_17*white+mrp_ideology_mean+offset(log(registered)), data=Election_Data_Final)
summary(m6.3)
##
## Call:
## glm.nb(formula = for_count ~ rural * white + OD_12_17 * white +
## mrp_ideology_mean + offset(log(registered)), data = Election_Data_Final,
## init.theta = 45.33544475, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2656 -0.6112 -0.1008 0.4977 3.3843
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.5199423 0.5621593 0.925 0.35502
## rural1 1.7830899 0.9626806 1.852 0.06400 .
## white -1.8556836 0.6067012 -3.059 0.00222 **
## OD_12_17 -0.0007233 0.0003934 -1.839 0.06598 .
## mrp_ideology_mean -0.5112310 0.1053964 -4.851 1.23e-06 ***
## rural1:white -2.0779211 1.0238619 -2.029 0.04241 *
## white:OD_12_17 0.0010448 0.0005249 1.991 0.04651 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(45.3354) family taken to be 1)
##
## Null deviance: 419.345 on 87 degrees of freedom
## Residual deviance: 88.381 on 81 degrees of freedom
## AIC: 1468.1
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 45.34
## Std. Err.: 6.90
##
## 2 x log-likelihood: -1452.09
Our AIC is 6 points lower, but both OD and rural become not significant….
I’m going to need to check on this to see if it is worth keeping in these interaction terms. For now, I am going to stick with Model 6 and add in poverty. But we can talk about whether or not we think it is a good idea to keep the interactions white x rural and white x OD in
m7 <- glm.nb(for_count~rural+OD_12_17+mrp_ideology_mean+white+poverty+offset(log(registered)), data=Election_Data_Final)
summary(m7)
##
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + mrp_ideology_mean +
## white + poverty + offset(log(registered)), data = Election_Data_Final,
## init.theta = 41.29685166, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0349 -0.6324 -0.1262 0.5032 4.0022
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.848e-01 5.032e-01 1.957 0.0503 .
## rural1 -1.842e-01 4.046e-02 -4.553 5.29e-06 ***
## OD_12_17 -6.463e-05 9.012e-05 -0.717 0.4732
## mrp_ideology_mean -5.907e-01 1.104e-01 -5.348 8.87e-08 ***
## white -2.220e+00 5.376e-01 -4.129 3.64e-05 ***
## poverty -4.869e-01 3.938e-01 -1.236 0.2164
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(41.2969) family taken to be 1)
##
## Null deviance: 382.356 on 87 degrees of freedom
## Residual deviance: 88.404 on 82 degrees of freedom
## AIC: 1474.3
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 41.30
## Std. Err.: 6.28
##
## 2 x log-likelihood: -1460.251
Our AIC score between Model 6 and 7 are the same, meaning that poverty is not adding anything to our model, so we can ommit it.
m8 <- glm.nb(for_count~rural+OD_12_17+mrp_ideology_mean+white+hs+offset(log(registered)), data=Election_Data_Final)
summary(m8)
##
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + mrp_ideology_mean +
## white + hs + offset(log(registered)), data = Election_Data_Final,
## init.theta = 45.99793179, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3380 -0.5704 -0.1879 0.5494 3.6157
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.667e-01 6.366e-01 -0.890 0.373322
## rural1 -1.732e-01 3.707e-02 -4.671 2.99e-06 ***
## OD_12_17 -1.977e-05 8.631e-05 -0.229 0.818810
## mrp_ideology_mean -5.675e-01 1.029e-01 -5.518 3.44e-08 ***
## white -1.848e+00 5.189e-01 -3.561 0.000369 ***
## hs 1.260e+00 3.720e-01 3.388 0.000705 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(45.9979) family taken to be 1)
##
## Null deviance: 425.406 on 87 degrees of freedom
## Residual deviance: 88.473 on 82 degrees of freedom
## AIC: 1464.9
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 46.00
## Std. Err.: 7.01
##
## 2 x log-likelihood: -1450.918
(coefm8 <-cbind(Estimate = coef(m8)))
## Estimate
## (Intercept) -5.667478e-01
## rural1 -1.731521e-01
## OD_12_17 -1.977169e-05
## mrp_ideology_mean -5.675469e-01
## white -1.848059e+00
## hs 1.260393e+00
exp(coefm8)
## Estimate
## (Intercept) 0.5673676
## rural1 0.8410097
## OD_12_17 0.9999802
## mrp_ideology_mean 0.5669144
## white 0.1575426
## hs 3.5268077
This give us a 10 point decrease in AIC, suggesting this model is a bit better.
Interpretation of Coefficients: Rural counties voted yes 0.84 of the rate of nonrural counties, when holding OD, ideology, proportions of whites in the county, proportion of high school graduates and registered voters constant. For every 1 unit more conservative a county becomes, the rate of yes votes decreases by 43%%, when holding OD, rural, proportions of whites, high school graduates and registered voters constant. For every unit increase in in the proportion of whites in a county, the rate of yes votes decreases by 85%, holding rural, ODs, ideology, high school graduates and registered voters constant. For every 1 unit increase in in the percentage of high school graduates in a county, the rate of yes votes increases by 353%, holding rural, ODs, ideology, and registered voters constant. All of these findings are statistically significant.
I am going to skip over the interactions and add in the proportion of county residents with a bachelors degree.
m9 <- glm.nb(for_count~rural+OD_12_17+mrp_ideology_mean+white+hs+BA+offset(log(registered)), data=Election_Data_Final)
summary(m9)
##
## Call:
## glm.nb(formula = for_count ~ rural + OD_12_17 + mrp_ideology_mean +
## white + hs + BA + offset(log(registered)), data = Election_Data_Final,
## init.theta = 55.87353615, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7706 -0.7047 -0.1453 0.6477 3.1945
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3176545 0.5812118 -0.547 0.584696
## rural1 -0.1223276 0.0356256 -3.434 0.000595 ***
## OD_12_17 -0.0000391 0.0000785 -0.498 0.618440
## mrp_ideology_mean -0.4640974 0.0958987 -4.839 1.30e-06 ***
## white -1.6690249 0.4732715 -3.527 0.000421 ***
## hs 0.4758415 0.3847196 1.237 0.216142
## BA 1.1414951 0.2605932 4.380 1.18e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(55.8735) family taken to be 1)
##
## Null deviance: 515.535 on 87 degrees of freedom
## Residual deviance: 88.402 on 81 degrees of freedom
## AIC: 1449.9
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 55.87
## Std. Err.: 8.55
##
## 2 x log-likelihood: -1433.928
(coefm9 <-cbind(Estimate = coef(m9)))
## Estimate
## (Intercept) -3.176545e-01
## rural1 -1.223276e-01
## OD_12_17 -3.909565e-05
## mrp_ideology_mean -4.640974e-01
## white -1.669025e+00
## hs 4.758415e-01
## BA 1.141495e+00
exp(coefm9)
## Estimate
## (Intercept) 0.7278542
## rural1 0.8848584
## OD_12_17 0.9999609
## mrp_ideology_mean 0.6287023
## white 0.1884307
## hs 1.6093679
## BA 3.1314466
When we add in BA, HS becomes not significant, which isn’t really surprising, considering what they measure overlaps. Adding BA decreases the AIC from Model 8 down 15 points. Notwithstanding interaction terms, it appears Model 9 is the best so far, but since OD is not significant and its effect was only marginal to begin with. I am going to try one more model that removes OD to see if our AIC improves or stays the same.
m10 <- glm.nb(for_count~rural+mrp_ideology_mean+white+hs+BA+offset(log(registered)), data=Election_Data_Final)
summary(m10)
##
## Call:
## glm.nb(formula = for_count ~ rural + mrp_ideology_mean + white +
## hs + BA + offset(log(registered)), data = Election_Data_Final,
## init.theta = 55.71489903, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7659 -0.7325 -0.1318 0.5938 3.2633
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.52528 0.40519 -1.296 0.194840
## rural1 -0.12118 0.03564 -3.400 0.000674 ***
## mrp_ideology_mean -0.46347 0.09594 -4.831 1.36e-06 ***
## white -1.48217 0.28676 -5.169 2.36e-07 ***
## hs 0.50831 0.38013 1.337 0.181159
## BA 1.13199 0.26050 4.345 1.39e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(55.7149) family taken to be 1)
##
## Null deviance: 514.090 on 87 degrees of freedom
## Residual deviance: 88.404 on 82 degrees of freedom
## AIC: 1448.2
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 55.71
## Std. Err.: 8.52
##
## 2 x log-likelihood: -1434.176
(coefm10 <-cbind(Estimate = coef(m10)))
## Estimate
## (Intercept) -0.5252807
## rural1 -0.1211783
## mrp_ideology_mean -0.4634722
## white -1.4821669
## hs 0.5083088
## BA 1.1319943
exp(coefm10)
## Estimate
## (Intercept) 0.5913893
## rural1 0.8858760
## mrp_ideology_mean 0.6290955
## white 0.2271450
## hs 1.6624772
## BA 3.1018364
Interpretation of Coefficients: Rural counties voted yes 0.89 of the rate of nonrural counties, when holding ideology, proportion of whites in the county, proportion of high school graduates, BA and registered voters constant. For every 1 unit more conservative a county becomes, the rate of yes votes decreases by 37%, when holding OD, rural, proportions of whites, high school graduates and registered voters constant. For every 1 unit increase in in the proportion of whites in a county, the rate of yes votes decreases by 77%, holding rural, ODs, ideology, high school graduates and registered voters constant. For every 1 unit increase in bachelors graduates in a county, the rate of yes votes increases by 310.2%, holding rural, ODs, ideology, high school graduates and registered voters constant. All of these findings are statistically significant.
Just to check to see how good our measure of ideology is….lets add in the number of votes for democratic governor.
m11 <- glm.nb(for_count~rural+mrp_ideology_mean+white+hs+BA++dem_gov+offset(log(registered)), data=Election_Data_Final)
summary(m11)
##
## Call:
## glm.nb(formula = for_count ~ rural + mrp_ideology_mean + white +
## hs + BA + +dem_gov + offset(log(registered)), data = Election_Data_Final,
## init.theta = 57.68288618, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7705 -0.6283 -0.1572 0.5581 3.0783
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.085e-02 5.257e-01 0.135 0.892793
## rural1 -1.178e-01 3.514e-02 -3.353 0.000799 ***
## mrp_ideology_mean -4.816e-01 9.514e-02 -5.062 4.15e-07 ***
## white -2.026e+00 4.221e-01 -4.801 1.58e-06 ***
## hs 4.116e-01 3.776e-01 1.090 0.275812
## BA 1.207e+00 2.593e-01 4.656 3.22e-06 ***
## dem_gov -9.440e-07 5.368e-07 -1.758 0.078686 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(57.6829) family taken to be 1)
##
## Null deviance: 532.003 on 87 degrees of freedom
## Residual deviance: 88.424 on 81 degrees of freedom
## AIC: 1447.2
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 57.68
## Std. Err.: 8.83
##
## 2 x log-likelihood: -1431.183
(coefm11 <-cbind(Estimate = coef(m11)))
## Estimate
## (Intercept) 7.085178e-02
## rural1 -1.178195e-01
## mrp_ideology_mean -4.816229e-01
## white -2.026489e+00
## hs 4.115509e-01
## BA 1.207428e+00
## dem_gov -9.439691e-07
exp(coefm11)
## Estimate
## (Intercept) 1.0734221
## rural1 0.8888565
## mrp_ideology_mean 0.6177800
## white 0.1317975
## hs 1.5091566
## BA 3.3448715
## dem_gov 0.9999991
The AIC doesnt change so it looks like our ideology measure accounts for the same thing our votes for democratic governor does. But, since it might be easier to interpret democratic governor counts vs. the mean ideology score, lets substitute dem_gov in for ideology.
m12 <- glm.nb(for_count~rural+dem_gov+white+hs+BA+offset(log(registered)), data=Election_Data_Final)
summary(m12)
##
## Call:
## glm.nb(formula = for_count ~ rural + dem_gov + white + hs + BA +
## offset(log(registered)), data = Election_Data_Final, init.theta = 44.47616583,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.4288 -0.5746 -0.0302 0.5219 3.6666
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.230e-01 5.896e-01 0.887 0.37503
## rural1 -1.079e-01 3.995e-02 -2.701 0.00692 **
## dem_gov -6.571e-07 6.057e-07 -1.085 0.27800
## white -2.526e+00 4.692e-01 -5.384 7.27e-08 ***
## hs 1.853e-01 4.250e-01 0.436 0.66283
## BA 1.527e+00 2.889e-01 5.284 1.26e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(44.4762) family taken to be 1)
##
## Null deviance: 411.481 on 87 degrees of freedom
## Residual deviance: 88.384 on 82 degrees of freedom
## AIC: 1467.8
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 44.48
## Std. Err.: 6.77
##
## 2 x log-likelihood: -1453.761
(coefm12 <-cbind(Estimate = coef(m12)))
## Estimate
## (Intercept) 5.230107e-01
## rural1 -1.078817e-01
## dem_gov -6.570572e-07
## white -2.526497e+00
## hs 1.853126e-01
## BA 1.526511e+00
exp(coefm12)
## Estimate
## (Intercept) 1.68709932
## rural1 0.89773376
## dem_gov 0.99999934
## white 0.07993857
## hs 1.20359458
## BA 4.60208998
Well, this is confirmation for the superiority of the ideology mean scover vs. just using governor counts so model 10 is our winner!!!
Next Steps Let’s discuss what we think is interesting about these findings and whether or not you would like me to check the remaining interaction terms. I suggest that we include interactions only when we have a reason for doing so because there are so many of them.