Quiz instructions: https://www.dropbox.com/s/fnzft6vyjlrictu/Quiz2.pdf?dl=0

dp<-function(modelname){
  value<-sum(residuals(modelname, type="pearson")^2 / modelname$df.res) 
  return(value)
}

DropInDev<-function(bigMod, smallMod,Df1,Df2){
  DropinD<-deviance(smallMod)-deviance(bigMod)
  disp<- dp(bigMod)
  teststat<- DropinD/(disp*Df1)
  return(pf(teststat,df1=Df1,df2=Df2,lower.tail=FALSE))
}

library(faraway)
attach(africa)

*** A note from me after I had done all of this quiz:

sum(is.na(africa))

## [1] 11

getOption("na.action")

## [1] "na.omit"

There are missing values in the dataset, but my default na.action is na.omit. The problem with this is that the number of observations in in each model may be different. This also throws off the 2nd degrees of freedom in the drop in deviance test. I do not have time to fix this as I realized it after I finished everything. If I were to do this again, I would have made sure to omit the rows with missing values. My bad. # Data Exploration

hist(miltcoup)

The response variable appears to be strictly non-negative and heavily skewed to the right. This is one sign that we should use Poisson Regression.

Unless noted, all variables are quantitative Side by side plots are not meant to be compared. They are also all commented out for brevity.

# par(mfrow=c(1,2)) 
# plot(miltcoup~pctvote)
# plot(miltcoup~oligarchy)
# boxplot(miltcoup~pollib)
# # Pollib boxplot shows that there is a downward trend in the median number of successful coups as more civil rights are granted.
# #This variable is a ordinal w/ 3 levels
# par(mfrow=c(1,2)) 
# plot(miltcoup~parties)
# plot(miltcoup~popn)
# par(mfrow=c(1,2)) 
# plot(miltcoup~size)
# plot(miltcoup~numelec)
# plot(miltcoup~numregim)

Lets check to see if the log(miltcoup)~ any of these is better:

# par(mfrow=c(1,2)) 
# plot(log(miltcoup)~parties)
# plot(miltcoup~parties)
# par(mfrow=c(1,2)) 
# plot(log(miltcoup)~popn)
# plot(miltcoup~popn)
# par(mfrow=c(1,2)) 
# plot(log(miltcoup)~size)
# plot(miltcoup~size)
# par(mfrow=c(1,2)) 
# plot(log(miltcoup)~pctvote)
# plot(miltcoup~pctvote)

Some of these plots look better with the log() because there are fewer observations sitting at y=0, so a line could fit the data better.

Because of this, the skewness of the coups histogram, as well as Number of Coups being a count variable, we could use Poisson Regression

Because none of the plots showed

Forward Selection using Drop in Deviance with at least 0.05 significance

I have commented out the drop in deviance results for models that do not have the significant drop in deviance selected in order to make the document easier to read.

mod0<-glm(miltcoup~1, data= africa, family = poisson)

modvote<-glm(miltcoup~pctvote, data= africa, family = poisson) 
modoli<-glm(miltcoup~oligarchy, data= africa, family = poisson) 
modlib<-glm(miltcoup~pollib, data= africa, family = poisson) 
modparties<-glm(miltcoup~parties, data= africa, family = poisson) 
modpopn<-glm(miltcoup~popn, data= africa, family = poisson) 
modsize<-glm(miltcoup~size, data= africa, family = poisson) 
modelec<-glm(miltcoup~numelec, data= africa, family = poisson)
modregim<-glm(miltcoup~numregim, data= africa, family = poisson)

df1= 1 because we are considering adding 1 beta in each model. df2= 44 because n-(k+1) = 47 - (2+1) Although pollib is ordinal, I think it makes sense to keep it as one beta rather than treat it as more of a categorical variable and have multiple betas. This assumption changes the interpretation of the variable: it now means that the political liberalization is on a continuous scale from [0,2]. All of our observations happen to be on integers representing levels.

DropInDev(modvote,mod0,1,44) #0.0189

## [1] 0.01892002

DropInDev(modoli,mod0,1,44) #2.462496e-06

## [1] 2.462496e-06

DropInDev(modlib,mod0,1,44) #0.0007202124

## [1] 0.0007202124

# DropInDev(modparties,mod0,1,44)
# DropInDev(modpopn,mod0,1,44)
# DropInDev(modsize,mod0,1,44)
# DropInDev(modelec,mod0,1,44)
DropInDev(modregim,mod0,1,44) #0.01602584

## [1] 0.01602584

The variable "oligarchy" has the largest drop in deviance from predicting with the grand mean model.

summary(modoli)

## 
## Call:
## glm(formula = miltcoup ~ oligarchy, family = poisson, data = africa)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6894  -1.1334  -0.5946   0.4124   2.2838  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.44280    0.21321  -2.077   0.0378 *  
## oligarchy    0.11761    0.01928   6.100 1.06e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 93.971  on 46  degrees of freedom
## Residual deviance: 57.048  on 45  degrees of freedom
## AIC: 134.73
## 
## Number of Fisher Scoring iterations: 5

exp(coef(modoli)[2])

## oligarchy 
##  1.124802

Every 1 year increase in number years country ruled by military oligarchy is associated w/ a multiplicative change in the mean number of successful military coups by a factor of 1.124802. We will now see if adding any more variables to modoli will significantly drop the deviance:

oli1<-glm(miltcoup~oligarchy+pctvote, data= africa, family = poisson) 
oli2<-glm(miltcoup~oligarchy+pollib, data= africa, family = poisson) 
oli3<-glm(miltcoup~oligarchy+parties, data= africa, family = poisson) 
oli4<-glm(miltcoup~oligarchy+popn, data= africa, family = poisson) 
oli5<-glm(miltcoup~oligarchy+size, data= africa, family = poisson) 
oli6<-glm(miltcoup~oligarchy+numelec, data= africa, family = poisson) 
oli7<-glm(miltcoup~oligarchy+numregim, data= africa, family = poisson) 

DropInDev(oli1,modoli,1,44) # 0.02636369

## [1] 0.02636369

DropInDev(oli2,modoli,1,44) # 0.002467215

## [1] 0.002467215

# DropInDev(oli3,modoli,1,44)
# DropInDev(oli4,modoli,1,44)
# DropInDev(oli5,modoli,1,44)
DropInDev(oli6,modoli,1,44) # 0.02606258

## [1] 0.02606258

# DropInDev(oli7,modoli,1,44)

This time, "pollib" causes the most significant drop in deviance from the model with just "oligarchy".

summary(oli2)

## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib, family = poisson, 
##     data = africa)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4078  -1.1368  -0.3286   0.4070   1.9516  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.41840    0.36544   1.145   0.2522    
## oligarchy    0.10079    0.01974   5.107 3.28e-07 ***
## pollib      -0.42752    0.19134  -2.234   0.0255 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 79.124  on 41  degrees of freedom
## Residual deviance: 45.498  on 39  degrees of freedom
##   (5 observations deleted due to missingness)
## AIC: 125.18
## 
## Number of Fisher Scoring iterations: 5

We will now see if adding any more variables to model oli2 will significantly drop the deviance:

modA<-glm(miltcoup~oligarchy+pollib+pctvote, data= africa, family = poisson) 
modB<-glm(miltcoup~oligarchy+numelec+pollib, data= africa, family = poisson) 
modC<-glm(miltcoup~oligarchy+pollib+parties, data= africa, family = poisson) 
modD<-glm(miltcoup~oligarchy+pollib+popn, data= africa, family = poisson) 
modE<-glm(miltcoup~oligarchy+pollib+size, data= africa, family = poisson) 
modF<-glm(miltcoup~oligarchy+pollib+numregim, data= africa, family = poisson) 
modG<-glm(miltcoup~oligarchy+pollib+numelec, data= africa, family = poisson) 

DropInDev(modA,oli2,1,44) #0.0121943

## [1] 0.0121943

# DropInDev(modB,oli2,1,44)
# DropInDev(modC,oli2,1,44)
# DropInDev(modD,oli2,1,44)
# DropInDev(modE,oli2,1,44)
# DropInDev(modF,oli2,1,44)
# DropInDev(modG,oli2,1,44)

"pctvote" causes the only significant drop in deviance from the model with "oligarchy" and "pollib."

summary(modA)

## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + pctvote, family = poisson, 
##     data = africa)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4749  -1.0883  -0.2364   0.4694   2.2862  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.305860   0.428281   0.714   0.4751    
## oligarchy    0.097116   0.021094   4.604 4.14e-06 ***
## pollib      -0.500663   0.210233  -2.381   0.0172 *  
## pctvote      0.006897   0.008837   0.780   0.4351    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 37.599  on 32  degrees of freedom
##   (11 observations deleted due to missingness)
## AIC: 110.41
## 
## Number of Fisher Scoring iterations: 5

I will note here, that adding "pctvote" produces a non-significant Wald Test for the variable. Since this forward selection is based on a drop in deviance test, I will continue, but this is something to keep in mind. We will now see if adding any more variables to model modA will significantly drop the deviance:

modH<- glm(miltcoup~oligarchy+pollib+pctvote+parties, data= africa, family = poisson) 
modI<- glm(miltcoup~oligarchy+pollib+pctvote+popn, data= africa, family = poisson) 
modJ<- glm(miltcoup~oligarchy+pollib+pctvote+size, data= africa, family = poisson) 
modK<- glm(miltcoup~oligarchy+pollib+pctvote+numregim, data= africa, family = poisson) 
modL<- glm(miltcoup~oligarchy+pollib+pctvote+numelec, data= africa, family = poisson) 

DropInDev(modH,modA,1,44) #0.01119189

## [1] 0.01119189

# DropInDev(modI,modA,1,44)
# DropInDev(modJ,modA,1,44)
# DropInDev(modK,modA,1,44)
# DropInDev(modL,modA,1,44)

"parties" causes the only significant drop in deviance from the model with "oligarchy", "pollib", and "pctvote".

summary(modH)

## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + pctvote + parties, 
##     family = poisson, data = africa)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5456  -0.9841  -0.1881   0.5948   1.6705  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.093657   0.463279  -0.202  0.83979    
## oligarchy    0.095358   0.022421   4.253 2.11e-05 ***
## pollib      -0.666615   0.217564  -3.064  0.00218 ** 
## pctvote      0.012134   0.009056   1.340  0.18031    
## parties      0.025630   0.009502   2.697  0.00699 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 31.081  on 31  degrees of freedom
##   (11 observations deleted due to missingness)
## AIC: 105.89
## 
## Number of Fisher Scoring iterations: 5

It is worth noting again that "pctvote" still does not have a significant Wald Test result.

We will now see if adding any more variables to model modA will significantly drop the deviance:

modM<- glm(miltcoup~oligarchy+pollib+pctvote+parties+popn, data= africa, family = poisson) 
modN<-glm(miltcoup~oligarchy+pollib+pctvote+parties+size, data= africa, family = poisson) 
modO<- glm(miltcoup~oligarchy+pollib+pctvote+parties+numregim, data= africa, family = poisson) 
modP<- glm(miltcoup~oligarchy+pollib+pctvote+parties+numelec, data= africa, family = poisson) 

# DropInDev(modM,modH,1,44)
# DropInDev(modN,modH,1,44)
# DropInDev(modO,modH,1,44)
# DropInDev(modP,modH,1,44)

None of these are significant at the 0.05 level so the final model we will go with is:

bestmod<-glm(miltcoup~oligarchy+pollib+pctvote+parties, data= africa, family = poisson) 
summary(bestmod)

## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + pctvote + parties, 
##     family = poisson, data = africa)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5456  -0.9841  -0.1881   0.5948   1.6705  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.093657   0.463279  -0.202  0.83979    
## oligarchy    0.095358   0.022421   4.253 2.11e-05 ***
## pollib      -0.666615   0.217564  -3.064  0.00218 ** 
## pctvote      0.012134   0.009056   1.340  0.18031    
## parties      0.025630   0.009502   2.697  0.00699 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 31.081  on 31  degrees of freedom
##   (11 observations deleted due to missingness)
## AIC: 105.89
## 
## Number of Fisher Scoring iterations: 5

expcoef<-c(exp(coef(bestmod)[2]), exp(coef(bestmod)[3]),exp(coef(bestmod)[4]),exp(coef(bestmod)[5]))
expcoef

## oligarchy    pollib   pctvote   parties 
## 1.1000523 0.5134437 1.0122077 1.0259612

According to the Wald Test, "pctvote" is not significant at the alpha = 0.05 level. Looking at the models produced from forward selection with the Drop in Deviance Tests, the AICs continue to decrease through each of our selected models (134.73 > 125.18 > 110.41 > 105.89) Adding "pctvote" to the model with "oligarchy" and "pollib" decreases the AIC by 14.77. Considering the Wald Test is the only test that indicates that "pctvote" shouldn't be added, I will leave it in the model. The other two tests indicate that it is beneficial and worth the extra complexity to keep it in.

Interpreting the Coefficients

Every 1 year increase in number years country ruled by military oligarchy is associated w/ a multiplicative change in the mean number of successful military coups by a factor of 1.1.

Every 1 level increase in the level of political liberalization is associated w/ a multiplicative change in the mean number of successful military coups by a factor of 0.51, holding all else constant.

Every 1% increase in the percent voting in the last election is associated w/ a multiplicative change in the mean number of successful military coups by a factor of 1.01, holding all else constant.

Every increase of one legal political party in 1993 is associated w/ a multiplicative change in the mean number of successful military coups by a factor of 1.03, holding all else constant.

Checking Dispersion

modDP<- dp(bestmod)
modDP

## [1] 0.9294864

Since the dispersion parameter does not equal 1, our mean and variance are not equal. We must take this into account when looking at the standard errors for our betas in any confidence intervals.

Confidence Interval for Oligarchy

confint(bestmod, level =0.95)

## Waiting for profiling to be done...

##                    2.5 %      97.5 %
## (Intercept) -1.047913173  0.77344524
## oligarchy    0.051553704  0.13992677
## pollib      -1.087836683 -0.23003301
## pctvote     -0.005783981  0.02986276
## parties      0.006234406  0.04372873

exp(confint(bestmod, level =0.95)[2,])

## Waiting for profiling to be done...

##    2.5 %   97.5 % 
## 1.052906 1.150190

We are 95% certain that the multiplicative effect of 1 additional year that a country is ruled by a military oligarchy from independence to 1989 is between 1.05 and 1.15.

But this is without accounting for the dispersion parameter, which is not 1.

myz<- qnorm(0.975)
betterCI<- bestmod$coefficients[2]+c(-1,1)*myz*coef(summary(bestmod))[2,2]*sqrt(modDP)
exp(betterCI)

## [1] 1.054421 1.147658

Since the dispersion parameter is less than 1, applying it to the confidence interval for our model results in the CI becoming slightly wider. We are 95% certain that the multiplicative effect of 1 additional year that a country is ruled by a military oligarchy from independence to 1989 is between 1.0544 and 1.1477.

Other forward selection methods could produce a different final model.

Residuals

plot(residuals(bestmod)~predict(bestmod,type="link"),xlab="Predicted log mean # of Successful Coups", ylab="Deviance Residuals", main = "Deviance Residual Plot")
sd<-sd(residuals(bestmod))
abline(h=0, col="red")

#sum(residuals(bestmod)<0)
#sum(residuals(bestmod)>0)

This residual plot is not glaringly poor, but there appears to be some sort of trend. There are 16 observations with a residual greater 0, while there are 20 with a residual less than 0. There does appear to be some sort of trend within the residuals that are less than 0. I am unsure what this indicates and what I can do about it.

Quiz 2: Poisson Regression

Cheyanne Simpson

3/10/2021

Forward Selection using Drop in Deviance with at least 0.05 significance

Interpreting the Coefficients

Checking Dispersion

Confidence Interval for Oligarchy

Residuals