Quiz instructions: https://www.dropbox.com/s/fnzft6vyjlrictu/Quiz2.pdf?dl=0
dp<-function(modelname){
value<-sum(residuals(modelname, type="pearson")^2 / modelname$df.res)
return(value)
}
DropInDev<-function(bigMod, smallMod,Df1,Df2){
DropinD<-deviance(smallMod)-deviance(bigMod)
disp<- dp(bigMod)
teststat<- DropinD/(disp*Df1)
return(pf(teststat,df1=Df1,df2=Df2,lower.tail=FALSE))
}
library(faraway)
attach(africa)
*** A note from me after I had done all of this quiz:
sum(is.na(africa))
## [1] 11
getOption("na.action")
## [1] "na.omit"
There are missing values in the dataset, but my default na.action is na.omit. The problem with this is that the number of observations in in each model may be different. This also throws off the 2nd degrees of freedom in the drop in deviance test. I do not have time to fix this as I realized it after I finished everything. If I were to do this again, I would have made sure to omit the rows with missing values. My bad. # Data Exploration
hist(miltcoup)
The response variable appears to be strictly non-negative and heavily skewed to the right. This is one sign that we should use Poisson Regression.
Unless noted, all variables are quantitative Side by side plots are not meant to be compared. They are also all commented out for brevity.
# par(mfrow=c(1,2))
# plot(miltcoup~pctvote)
# plot(miltcoup~oligarchy)
# boxplot(miltcoup~pollib)
# # Pollib boxplot shows that there is a downward trend in the median number of successful coups as more civil rights are granted.
# #This variable is a ordinal w/ 3 levels
# par(mfrow=c(1,2))
# plot(miltcoup~parties)
# plot(miltcoup~popn)
# par(mfrow=c(1,2))
# plot(miltcoup~size)
# plot(miltcoup~numelec)
# plot(miltcoup~numregim)
Lets check to see if the log(miltcoup)~ any of these is better:
# par(mfrow=c(1,2))
# plot(log(miltcoup)~parties)
# plot(miltcoup~parties)
# par(mfrow=c(1,2))
# plot(log(miltcoup)~popn)
# plot(miltcoup~popn)
# par(mfrow=c(1,2))
# plot(log(miltcoup)~size)
# plot(miltcoup~size)
# par(mfrow=c(1,2))
# plot(log(miltcoup)~pctvote)
# plot(miltcoup~pctvote)
Some of these plots look better with the log() because there are fewer observations sitting at y=0, so a line could fit the data better.
Because of this, the skewness of the coups histogram, as well as Number of Coups being a count variable, we could use Poisson Regression
Because none of the plots showed
I have commented out the drop in deviance results for models that do not have the significant drop in deviance selected in order to make the document easier to read.
mod0<-glm(miltcoup~1, data= africa, family = poisson)
modvote<-glm(miltcoup~pctvote, data= africa, family = poisson)
modoli<-glm(miltcoup~oligarchy, data= africa, family = poisson)
modlib<-glm(miltcoup~pollib, data= africa, family = poisson)
modparties<-glm(miltcoup~parties, data= africa, family = poisson)
modpopn<-glm(miltcoup~popn, data= africa, family = poisson)
modsize<-glm(miltcoup~size, data= africa, family = poisson)
modelec<-glm(miltcoup~numelec, data= africa, family = poisson)
modregim<-glm(miltcoup~numregim, data= africa, family = poisson)
df1= 1 because we are considering adding 1 beta in each model. df2= 44 because n-(k+1) = 47 - (2+1) Although pollib is ordinal, I think it makes sense to keep it as one beta rather than treat it as more of a categorical variable and have multiple betas. This assumption changes the interpretation of the variable: it now means that the political liberalization is on a continuous scale from [0,2]. All of our observations happen to be on integers representing levels.
DropInDev(modvote,mod0,1,44) #0.0189
## [1] 0.01892002
DropInDev(modoli,mod0,1,44) #2.462496e-06
## [1] 2.462496e-06
DropInDev(modlib,mod0,1,44) #0.0007202124
## [1] 0.0007202124
# DropInDev(modparties,mod0,1,44)
# DropInDev(modpopn,mod0,1,44)
# DropInDev(modsize,mod0,1,44)
# DropInDev(modelec,mod0,1,44)
DropInDev(modregim,mod0,1,44) #0.01602584
## [1] 0.01602584
The variable "oligarchy" has the largest drop in deviance from predicting with the grand mean model.
summary(modoli)
##
## Call:
## glm(formula = miltcoup ~ oligarchy, family = poisson, data = africa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6894 -1.1334 -0.5946 0.4124 2.2838
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.44280 0.21321 -2.077 0.0378 *
## oligarchy 0.11761 0.01928 6.100 1.06e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 93.971 on 46 degrees of freedom
## Residual deviance: 57.048 on 45 degrees of freedom
## AIC: 134.73
##
## Number of Fisher Scoring iterations: 5
exp(coef(modoli)[2])
## oligarchy
## 1.124802
Every 1 year increase in number years country ruled by military oligarchy is associated w/ a multiplicative change in the mean number of successful military coups by a factor of 1.124802. We will now see if adding any more variables to modoli will significantly drop the deviance:
oli1<-glm(miltcoup~oligarchy+pctvote, data= africa, family = poisson)
oli2<-glm(miltcoup~oligarchy+pollib, data= africa, family = poisson)
oli3<-glm(miltcoup~oligarchy+parties, data= africa, family = poisson)
oli4<-glm(miltcoup~oligarchy+popn, data= africa, family = poisson)
oli5<-glm(miltcoup~oligarchy+size, data= africa, family = poisson)
oli6<-glm(miltcoup~oligarchy+numelec, data= africa, family = poisson)
oli7<-glm(miltcoup~oligarchy+numregim, data= africa, family = poisson)
DropInDev(oli1,modoli,1,44) # 0.02636369
## [1] 0.02636369
DropInDev(oli2,modoli,1,44) # 0.002467215
## [1] 0.002467215
# DropInDev(oli3,modoli,1,44)
# DropInDev(oli4,modoli,1,44)
# DropInDev(oli5,modoli,1,44)
DropInDev(oli6,modoli,1,44) # 0.02606258
## [1] 0.02606258
# DropInDev(oli7,modoli,1,44)
This time, "pollib" causes the most significant drop in deviance from the model with just "oligarchy".
summary(oli2)
##
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib, family = poisson,
## data = africa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4078 -1.1368 -0.3286 0.4070 1.9516
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.41840 0.36544 1.145 0.2522
## oligarchy 0.10079 0.01974 5.107 3.28e-07 ***
## pollib -0.42752 0.19134 -2.234 0.0255 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 79.124 on 41 degrees of freedom
## Residual deviance: 45.498 on 39 degrees of freedom
## (5 observations deleted due to missingness)
## AIC: 125.18
##
## Number of Fisher Scoring iterations: 5
We will now see if adding any more variables to model oli2 will significantly drop the deviance:
modA<-glm(miltcoup~oligarchy+pollib+pctvote, data= africa, family = poisson)
modB<-glm(miltcoup~oligarchy+numelec+pollib, data= africa, family = poisson)
modC<-glm(miltcoup~oligarchy+pollib+parties, data= africa, family = poisson)
modD<-glm(miltcoup~oligarchy+pollib+popn, data= africa, family = poisson)
modE<-glm(miltcoup~oligarchy+pollib+size, data= africa, family = poisson)
modF<-glm(miltcoup~oligarchy+pollib+numregim, data= africa, family = poisson)
modG<-glm(miltcoup~oligarchy+pollib+numelec, data= africa, family = poisson)
DropInDev(modA,oli2,1,44) #0.0121943
## [1] 0.0121943
# DropInDev(modB,oli2,1,44)
# DropInDev(modC,oli2,1,44)
# DropInDev(modD,oli2,1,44)
# DropInDev(modE,oli2,1,44)
# DropInDev(modF,oli2,1,44)
# DropInDev(modG,oli2,1,44)
"pctvote" causes the only significant drop in deviance from the model with "oligarchy" and "pollib."
summary(modA)
##
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + pctvote, family = poisson,
## data = africa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4749 -1.0883 -0.2364 0.4694 2.2862
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.305860 0.428281 0.714 0.4751
## oligarchy 0.097116 0.021094 4.604 4.14e-06 ***
## pollib -0.500663 0.210233 -2.381 0.0172 *
## pctvote 0.006897 0.008837 0.780 0.4351
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 65.945 on 35 degrees of freedom
## Residual deviance: 37.599 on 32 degrees of freedom
## (11 observations deleted due to missingness)
## AIC: 110.41
##
## Number of Fisher Scoring iterations: 5
I will note here, that adding "pctvote" produces a non-significant Wald Test for the variable. Since this forward selection is based on a drop in deviance test, I will continue, but this is something to keep in mind. We will now see if adding any more variables to model modA will significantly drop the deviance:
modH<- glm(miltcoup~oligarchy+pollib+pctvote+parties, data= africa, family = poisson)
modI<- glm(miltcoup~oligarchy+pollib+pctvote+popn, data= africa, family = poisson)
modJ<- glm(miltcoup~oligarchy+pollib+pctvote+size, data= africa, family = poisson)
modK<- glm(miltcoup~oligarchy+pollib+pctvote+numregim, data= africa, family = poisson)
modL<- glm(miltcoup~oligarchy+pollib+pctvote+numelec, data= africa, family = poisson)
DropInDev(modH,modA,1,44) #0.01119189
## [1] 0.01119189
# DropInDev(modI,modA,1,44)
# DropInDev(modJ,modA,1,44)
# DropInDev(modK,modA,1,44)
# DropInDev(modL,modA,1,44)
"parties" causes the only significant drop in deviance from the model with "oligarchy", "pollib", and "pctvote".
summary(modH)
##
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + pctvote + parties,
## family = poisson, data = africa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5456 -0.9841 -0.1881 0.5948 1.6705
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.093657 0.463279 -0.202 0.83979
## oligarchy 0.095358 0.022421 4.253 2.11e-05 ***
## pollib -0.666615 0.217564 -3.064 0.00218 **
## pctvote 0.012134 0.009056 1.340 0.18031
## parties 0.025630 0.009502 2.697 0.00699 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 65.945 on 35 degrees of freedom
## Residual deviance: 31.081 on 31 degrees of freedom
## (11 observations deleted due to missingness)
## AIC: 105.89
##
## Number of Fisher Scoring iterations: 5
It is worth noting again that "pctvote" still does not have a significant Wald Test result.
We will now see if adding any more variables to model modA will significantly drop the deviance:
modM<- glm(miltcoup~oligarchy+pollib+pctvote+parties+popn, data= africa, family = poisson)
modN<-glm(miltcoup~oligarchy+pollib+pctvote+parties+size, data= africa, family = poisson)
modO<- glm(miltcoup~oligarchy+pollib+pctvote+parties+numregim, data= africa, family = poisson)
modP<- glm(miltcoup~oligarchy+pollib+pctvote+parties+numelec, data= africa, family = poisson)
# DropInDev(modM,modH,1,44)
# DropInDev(modN,modH,1,44)
# DropInDev(modO,modH,1,44)
# DropInDev(modP,modH,1,44)
None of these are significant at the 0.05 level so the final model we will go with is:
bestmod<-glm(miltcoup~oligarchy+pollib+pctvote+parties, data= africa, family = poisson)
summary(bestmod)
##
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + pctvote + parties,
## family = poisson, data = africa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5456 -0.9841 -0.1881 0.5948 1.6705
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.093657 0.463279 -0.202 0.83979
## oligarchy 0.095358 0.022421 4.253 2.11e-05 ***
## pollib -0.666615 0.217564 -3.064 0.00218 **
## pctvote 0.012134 0.009056 1.340 0.18031
## parties 0.025630 0.009502 2.697 0.00699 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 65.945 on 35 degrees of freedom
## Residual deviance: 31.081 on 31 degrees of freedom
## (11 observations deleted due to missingness)
## AIC: 105.89
##
## Number of Fisher Scoring iterations: 5
expcoef<-c(exp(coef(bestmod)[2]), exp(coef(bestmod)[3]),exp(coef(bestmod)[4]),exp(coef(bestmod)[5]))
expcoef
## oligarchy pollib pctvote parties
## 1.1000523 0.5134437 1.0122077 1.0259612
According to the Wald Test, "pctvote" is not significant at the alpha = 0.05 level. Looking at the models produced from forward selection with the Drop in Deviance Tests, the AICs continue to decrease through each of our selected models (134.73 > 125.18 > 110.41 > 105.89) Adding "pctvote" to the model with "oligarchy" and "pollib" decreases the AIC by 14.77. Considering the Wald Test is the only test that indicates that "pctvote" shouldn't be added, I will leave it in the model. The other two tests indicate that it is beneficial and worth the extra complexity to keep it in.
Every 1 year increase in number years country ruled by military oligarchy is associated w/ a multiplicative change in the mean number of successful military coups by a factor of 1.1.
Every 1 level increase in the level of political liberalization is associated w/ a multiplicative change in the mean number of successful military coups by a factor of 0.51, holding all else constant.
Every 1% increase in the percent voting in the last election is associated w/ a multiplicative change in the mean number of successful military coups by a factor of 1.01, holding all else constant.
Every increase of one legal political party in 1993 is associated w/ a multiplicative change in the mean number of successful military coups by a factor of 1.03, holding all else constant.
modDP<- dp(bestmod)
modDP
## [1] 0.9294864
Since the dispersion parameter does not equal 1, our mean and variance are not equal. We must take this into account when looking at the standard errors for our betas in any confidence intervals.
confint(bestmod, level =0.95)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) -1.047913173 0.77344524
## oligarchy 0.051553704 0.13992677
## pollib -1.087836683 -0.23003301
## pctvote -0.005783981 0.02986276
## parties 0.006234406 0.04372873
exp(confint(bestmod, level =0.95)[2,])
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## 1.052906 1.150190
We are 95% certain that the multiplicative effect of 1 additional year that a country is ruled by a military oligarchy from independence to 1989 is between 1.05 and 1.15.
But this is without accounting for the dispersion parameter, which is not 1.
myz<- qnorm(0.975)
betterCI<- bestmod$coefficients[2]+c(-1,1)*myz*coef(summary(bestmod))[2,2]*sqrt(modDP)
exp(betterCI)
## [1] 1.054421 1.147658
Since the dispersion parameter is less than 1, applying it to the confidence interval for our model results in the CI becoming slightly wider. We are 95% certain that the multiplicative effect of 1 additional year that a country is ruled by a military oligarchy from independence to 1989 is between 1.0544 and 1.1477.
Other forward selection methods could produce a different final model.
plot(residuals(bestmod)~predict(bestmod,type="link"),xlab="Predicted log mean # of Successful Coups", ylab="Deviance Residuals", main = "Deviance Residual Plot")
sd<-sd(residuals(bestmod))
abline(h=0, col="red")
#sum(residuals(bestmod)<0)
#sum(residuals(bestmod)>0)
This residual plot is not glaringly poor, but there appears to be some sort of trend. There are 16 observations with a residual greater 0, while there are 20 with a residual less than 0. There does appear to be some sort of trend within the residuals that are less than 0. I am unsure what this indicates and what I can do about it.