X Wang - Assignment 7

PSCI 7108: Advanced Data Analysis III

1: Count Model

Load the African coup data in the faraway library. Use the command data(africa) to get the data. Type ?africa to learn about the variables.

## Load Data and Libraries ##
library(faraway)
library(arm)
## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: lattice
## 
## Attaching package: 'lattice'
## 
## The following object is masked from 'package:faraway':
## 
##     melanoma
## 
## Loading required package: lme4
## 
## arm (Version 1.6-09, built: 2013-9-23)
## 
## Working directory is /Users/squishy/Documents
## 
## 
## Attaching package: 'arm'
## 
## The following object is masked from 'package:faraway':
## 
##     logit
library(apsrtable)
library(MASS)
data(africa)

## Examine Dataset ##
`?`(africa)
str(africa)  # There are some NAs in the data
## 'data.frame':    47 obs. of  9 variables:
##  $ miltcoup : int  0 5 0 6 2 0 1 3 1 2 ...
##  $ oligarchy: int  0 7 0 13 13 0 0 14 15 0 ...
##  $ pollib   : int  2 1 NA 2 2 2 2 2 2 2 ...
##  $ parties  : int  38 34 7 62 10 34 5 14 27 4 ...
##  $ pctvote  : num  NA 45.7 20.3 17.5 34.4 ...
##  $ popn     : num  9.7 4.6 1.2 8.8 5.3 11.6 0.361 3 5.5 0.458 ...
##  $ size     : num  1247 113 582 274 28 ...
##  $ numelec  : int  0 8 5 5 3 14 2 6 4 6 ...
##  $ numregim : int  1 3 1 3 3 3 1 4 3 2 ...
sum(!complete.cases(africa))  # See how many observations are incomplete
## [1] 11
ok <- complete.cases(africa)  # Index complete cases
africa2 <- africa[ok, ]  # Cleaned up dataset 
## Variables in African coup dataset ##
# miltcoup == number of successful military coups from independence to 1989
# oligarchy == number years country ruled by military oligarchy from independence to 1989
# pollib == political liberalization: 0 = no civil rights for political expression, 1 = limited civil rights for expression but right to form political parties, 2 = full civil rights
# parties == number of legal political parties in 1993
# pctvote == percent voting in last election
# popn == population in millions in 1989
# size == area in 1000 square km
# numelec == total number of legislative and presidential elections
# numregim == number of regime types

1. Develop a Poisson model of the number of coups (ignoring overdispersion for now). Use AIC or BIC to come up with and defend the independent variables chosen.

## Model Building ##

# Create matrix for AIC and BIC comparison between models
compare <- matrix(data = NA, nrow = 2, ncol = 8)
colnames(compare) <- c("m1", "m2", "m3", "m4", "m5", "m6", "m7", "m8")
rownames(compare) <- c("AIC", "BIC")

# m1: add oligarchy - # of years ruled by military oligarchy
m1 <- glm(miltcoup ~ oligarchy, family = poisson, data = africa2)
summary(m1)
## 
## Call:
## glm(formula = miltcoup ~ oligarchy, family = poisson, data = africa2)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.563  -1.245  -0.310   0.274   2.184  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.2545     0.2327   -1.09     0.27    
## oligarchy     0.0998     0.0208    4.79  1.7e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 42.947  on 34  degrees of freedom
## AIC: 111.8
## 
## Number of Fisher Scoring iterations: 5
compare[1, 1] <- AIC(m1)
compare[2, 1] <- BIC(m1)
compare
##        m1 m2 m3 m4 m5 m6 m7 m8
## AIC 111.8 NA NA NA NA NA NA NA
## BIC 114.9 NA NA NA NA NA NA NA

# m2: add to m1 pollib, political liberalization
m2 <- glm(miltcoup ~ oligarchy + pollib, family = poisson, data = africa2)
summary(m2)
## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib, family = poisson, 
##     data = africa2)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.423  -1.135  -0.279   0.410   2.071  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   0.4640     0.3742    1.24    0.215    
## oligarchy     0.0961     0.0209    4.60  4.1e-06 ***
## pollib       -0.4521     0.1992   -2.27    0.023 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 38.204  on 33  degrees of freedom
## AIC: 109
## 
## Number of Fisher Scoring iterations: 5
compare[1, 2] <- AIC(m2)
compare[2, 2] <- BIC(m2)
compare  # There is at least a difference of 2 comparing the AIC of the m1 and m2, but not comparing the BIC of the two models b/c BIC is more stringent about parsimony. This is a hard to call to make, but keeping m2 since its AIC and BIC are lower (though the BIC casts some doubt whether the unrestricted model is significantly better)
##        m1    m2 m3 m4 m5 m6 m7 m8
## AIC 111.8 109.0 NA NA NA NA NA NA
## BIC 114.9 113.8 NA NA NA NA NA NA

# m3: add to m2 parties, # of political parties in 1993
m3 <- glm(miltcoup ~ oligarchy + pollib + parties, family = poisson, data = africa2)
summary(m3)
## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + parties, family = poisson, 
##     data = africa2)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.358  -1.042  -0.286   0.628   1.752  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.25138    0.37269    0.67    0.500    
## oligarchy    0.09262    0.02178    4.25  2.1e-05 ***
## pollib      -0.57410    0.20438   -2.81    0.005 ** 
## parties      0.02206    0.00896    2.46    0.014 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 32.856  on 32  degrees of freedom
## AIC: 105.7
## 
## Number of Fisher Scoring iterations: 5
compare[1, 3] <- AIC(m3)
compare[2, 3] <- BIC(m3)
compare  # There is at least a difference of 2 comparing the AIC of the m2 and m3, but not comparing the BIC of the two models. This is again a hard to call to make, but keeping m3 since its AIC and BIC are lower (though the BIC casts some doubt whether the unrestricted model is significantly better)
##        m1    m2    m3 m4 m5 m6 m7 m8
## AIC 111.8 109.0 105.7 NA NA NA NA NA
## BIC 114.9 113.8 112.0 NA NA NA NA NA

# m4: add to m3 pctvote, percent voting in last election
m4 <- glm(miltcoup ~ oligarchy + pollib + parties + pctvote, family = poisson, 
    data = africa2)
summary(m4)
## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + parties + pctvote, 
##     family = poisson, data = africa2)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.546  -0.984  -0.188   0.595   1.671  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.09366    0.46328   -0.20   0.8398    
## oligarchy    0.09536    0.02242    4.25  2.1e-05 ***
## pollib      -0.66661    0.21756   -3.06   0.0022 ** 
## parties      0.02563    0.00950    2.70   0.0070 ** 
## pctvote      0.01213    0.00906    1.34   0.1803    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 31.081  on 31  degrees of freedom
## AIC: 105.9
## 
## Number of Fisher Scoring iterations: 5
compare[1, 4] <- AIC(m4)
compare[2, 4] <- BIC(m4)
compare  # Since the AIC and BIC for m4 is higher than for m3, toss pctvote; this is also validated by the fact that the variable is less than 2 SD from 0
##        m1    m2    m3    m4 m5 m6 m7 m8
## AIC 111.8 109.0 105.7 105.9 NA NA NA NA
## BIC 114.9 113.8 112.0 113.8 NA NA NA NA

# m5: add to m3 popn, population in M in 1989
m5 <- glm(miltcoup ~ oligarchy + pollib + parties + popn, family = poisson, 
    data = africa2)
summary(m5)
## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + parties + popn, 
##     family = poisson, data = africa2)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.433  -1.066  -0.206   0.585   1.726  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.17853    0.38815    0.46  0.64556    
## oligarchy    0.08365    0.02459    3.40  0.00067 ***
## pollib      -0.55656    0.20851   -2.67  0.00760 ** 
## parties      0.02466    0.00945    2.61  0.00910 ** 
## popn         0.00414    0.00520    0.80  0.42628    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 32.247  on 31  degrees of freedom
## AIC: 107.1
## 
## Number of Fisher Scoring iterations: 5
compare[1, 5] <- AIC(m5)
compare[2, 5] <- BIC(m5)
compare  # Since the AIC and BIC for m5 is higher than for m3, toss popn; this is also validated by the fact that the variable is less than 2 SD from 0
##        m1    m2    m3    m4    m5 m6 m7 m8
## AIC 111.8 109.0 105.7 105.9 107.1 NA NA NA
## BIC 114.9 113.8 112.0 113.8 115.0 NA NA NA

# m6: add to m3 size, area in 1000 square km
m6 <- glm(miltcoup ~ oligarchy + pollib + parties + size, family = poisson, 
    data = africa2)
summary(m6)
## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + parties + size, 
##     family = poisson, data = africa2)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.367  -1.006  -0.348   0.602   1.758  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.353843   0.411563    0.86   0.3899    
## oligarchy    0.095029   0.022148    4.29  1.8e-05 ***
## pollib      -0.600259   0.211167   -2.84   0.0045 ** 
## parties      0.021411   0.009006    2.38   0.0174 *  
## size        -0.000125   0.000226   -0.56   0.5788    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 32.539  on 31  degrees of freedom
## AIC: 107.3
## 
## Number of Fisher Scoring iterations: 5
compare[1, 6] <- AIC(m6)
compare[2, 6] <- BIC(m6)
compare  # Since the AIC and BIC for m6 is higher than for m3, toss size; this is also validated by the fact that the variable is less than 2 SD from 0
##        m1    m2    m3    m4    m5    m6 m7 m8
## AIC 111.8 109.0 105.7 105.9 107.1 107.3 NA NA
## BIC 114.9 113.8 112.0 113.8 115.0 115.3 NA NA

# m7: add to m1 numelec, total # of elections
m7 <- glm(miltcoup ~ oligarchy + pollib + parties + numelec, family = poisson, 
    data = africa2)
summary(m7)
## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + parties + numelec, 
##     family = poisson, data = africa2)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.417  -1.079  -0.252   0.601   1.718  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.00732    0.71708    0.01  0.99185    
## oligarchy    0.09752    0.02520    3.87  0.00011 ***
## pollib      -0.52599    0.23457   -2.24  0.02494 *  
## parties      0.02108    0.00935    2.25  0.02415 *  
## numelec      0.02234    0.05520    0.40  0.68569    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 32.689  on 31  degrees of freedom
## AIC: 107.5
## 
## Number of Fisher Scoring iterations: 5
compare[1, 7] <- AIC(m7)
compare[2, 7] <- BIC(m7)
compare  # Since the AIC and BIC for m7 is higher than for m3, toss numelec; this is also validated by the fact that the variable is less than 2 SD from 0
##        m1    m2    m3    m4    m5    m6    m7 m8
## AIC 111.8 109.0 105.7 105.9 107.1 107.3 107.5 NA
## BIC 114.9 113.8 112.0 113.8 115.0 115.3 115.4 NA

# m8: add to m3 numregim, # of regime types
m8 <- glm(miltcoup ~ oligarchy + pollib + parties + numregim, family = poisson, 
    data = africa2)
summary(m8)
## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + parties + numregim, 
##     family = poisson, data = africa2)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.320  -1.022  -0.335   0.592   1.753  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.05369    0.61884    0.09   0.9309    
## oligarchy    0.09112    0.02232    4.08  4.4e-05 ***
## pollib      -0.57610    0.20367   -2.83   0.0047 ** 
## parties      0.02180    0.00906    2.41   0.0161 *  
## numregim     0.07331    0.18187    0.40   0.6869    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 32.692  on 31  degrees of freedom
## AIC: 107.5
## 
## Number of Fisher Scoring iterations: 5
compare[1, 8] <- AIC(m8)
compare[2, 8] <- BIC(m8)
compare  # Since the AIC and BIC for m8 is higher than for m3, toss numregim; this is also validated by the fact that the variable is less than 2 SD from 0
##        m1    m2    m3    m4    m5    m6    m7    m8
## AIC 111.8 109.0 105.7 105.9 107.1 107.3 107.5 107.5
## BIC 114.9 113.8 112.0 113.8 115.0 115.3 115.4 115.4

The final Poisson model is m3, which includes oligarchy, pollib, and parties as predictors of the number of successful military coups from independence to 1989.

2. Run the same model using the quasipoisson family. How do your results differ from what you found in part 1.1?

## Quasipoisson Model ##
m3_q <- glm(miltcoup ~ oligarchy + pollib + parties, family = quasipoisson, 
    data = africa2)
# Compare output of Poisson and quasipoisson models using display()
display(m3)
## glm(formula = miltcoup ~ oligarchy + pollib + parties, family = poisson, 
##     data = africa2)
##             coef.est coef.se
## (Intercept)  0.25     0.37  
## oligarchy    0.09     0.02  
## pollib      -0.57     0.20  
## parties      0.02     0.01  
## ---
##   n = 36, k = 4
##   residual deviance = 32.9, null deviance = 65.9 (difference = 33.1)
display(m3_q)  # Results don't differ from m3; overdispersion parameter is 1, which is just Poisson
## glm(formula = miltcoup ~ oligarchy + pollib + parties, family = quasipoisson, 
##     data = africa2)
##             coef.est coef.se
## (Intercept)  0.25     0.37  
## oligarchy    0.09     0.02  
## pollib      -0.57     0.20  
## parties      0.02     0.01  
## ---
##   n = 36, k = 4
##   residual deviance = 32.9, null deviance = 65.9 (difference = 33.1)
##   overdispersion parameter = 1.0
# Compare out of Poisson and quasipoisson models using summary()
summary(m3)
## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + parties, family = poisson, 
##     data = africa2)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.358  -1.042  -0.286   0.628   1.752  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.25138    0.37269    0.67    0.500    
## oligarchy    0.09262    0.02178    4.25  2.1e-05 ***
## pollib      -0.57410    0.20438   -2.81    0.005 ** 
## parties      0.02206    0.00896    2.46    0.014 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 32.856  on 32  degrees of freedom
## AIC: 105.7
## 
## Number of Fisher Scoring iterations: 5
summary(m3_q)  # In fact, the standard errors are slightly different for the two models
## 
## Call:
## glm(formula = miltcoup ~ oligarchy + pollib + parties, family = quasipoisson, 
##     data = africa2)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.358  -1.042  -0.286   0.628   1.752  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.25138    0.36972    0.68  0.50145    
## oligarchy    0.09262    0.02161    4.29  0.00016 ***
## pollib      -0.57410    0.20276   -2.83  0.00795 ** 
## parties      0.02206    0.00888    2.48  0.01846 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 0.9841)
## 
##     Null deviance: 65.945  on 35  degrees of freedom
## Residual deviance: 32.856  on 32  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

The standard errors are slightly different between the Poisson and quasipoisson models with the same predictors, however they are the same up to 2 decimal places. They are similar enough that the overdispersion parameter for the quasipoisson is 1 (equal to Poisson).

3. Run a negative binomial model with the same specification as your Poisson model (use glm.nb()). Do you notice any differences between these and the other results?

## Negative Binomial Model ##
m3_nb <- glm.nb(miltcoup ~ oligarchy + pollib + parties, data = africa2)
## Warning: iteration limit reached
## Warning: iteration limit reached
summary(m3_nb)  # Overdispersion and sd values are unreasonable
## 
## Call:
## glm.nb(formula = miltcoup ~ oligarchy + pollib + parties, data = africa2, 
##     init.theta = 17716.24387, link = log)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.358  -1.042  -0.286   0.628   1.752  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.25137    0.37272    0.67    0.500    
## oligarchy    0.09262    0.02178    4.25  2.1e-05 ***
## pollib      -0.57409    0.20440   -2.81    0.005 ** 
## parties      0.02206    0.00896    2.46    0.014 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(17716) family taken to be 1)
## 
##     Null deviance: 65.939  on 35  degrees of freedom
## Residual deviance: 32.853  on 32  degrees of freedom
## AIC: 107.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  17716 
##           Std. Err.:  306762 
## Warning while fitting theta: iteration limit reached 
## 
##  2 x log-likelihood:  -97.67

# Check for multicollinearity
cor(africa2)  # Looks okay
##            miltcoup oligarchy  pollib  parties   pctvote     popn     size
## miltcoup   1.000000   0.60723 -0.3394  0.31106  0.009149  0.35931  0.13528
## oligarchy  0.607234   1.00000 -0.1291  0.15888  0.011192  0.34926  0.30190
## pollib    -0.339422  -0.12907  1.0000  0.17200  0.197183 -0.15424 -0.18475
## parties    0.311060   0.15888  0.1720  1.00000 -0.139612 -0.07516  0.06718
## pctvote    0.009149   0.01119  0.1972 -0.13961  1.000000 -0.17320 -0.09707
## popn       0.359307   0.34926 -0.1542 -0.07516 -0.173196  1.00000  0.42716
## size       0.135279   0.30190 -0.1848  0.06718 -0.097070  0.42716  1.00000
## numelec    0.030250  -0.19042 -0.1660  0.33215  0.233321  0.05891  0.20617
## numregim   0.255535   0.46160 -0.1619  0.19228  0.152832 -0.14044  0.02213
##            numelec numregim
## miltcoup   0.03025  0.25553
## oligarchy -0.19042  0.46160
## pollib    -0.16596 -0.16189
## parties    0.33215  0.19228
## pctvote    0.23332  0.15283
## popn       0.05891 -0.14044
## size       0.20617  0.02213
## numelec    1.00000  0.22991
## numregim   0.22991  1.00000

The output for m3_nb don't show a significant departure from that of m3 and m3_q. The coefficients are the same up to 3 decimal places. As well, the standard errors and z-statistics for the 3 models match up to 1 decimal place. However, the overdispersion parameter is incredibly large. The value seems unreasonable, so this may be a calculational error in estimating the parameter in the glm.nb() function.

4. Report the results of all three models in one LaTeX table. Optional: Plot all three models in a coefficient plot instead of reporting results in a table.

## Create LaTeX Table ##
apsrtable(m3, m3_q, m3_nb)
## \begin{table}[!ht]
## \caption{}
## \label{} 
## \begin{tabular}{ l D{.}{.}{2}D{.}{.}{2}D{.}{.}{2} } 
## \hline 
##   & \multicolumn{ 1 }{ c }{ Model 1 } & \multicolumn{ 1 }{ c }{ Model 2 } & \multicolumn{ 1 }{ c }{ Model 3 } \\ \hline
##  %                        & Model 1     & Model 2     & Model 3    \\ 
## (Intercept)              & 0.25        & 0.25        & 0.25       \\ 
##                          & (0.37)      & (0.37)      & (0.37)     \\ 
## oligarchy                & 0.09 ^*     & 0.09 ^*     & 0.09 ^*    \\ 
##                          & (0.02)      & (0.02)      & (0.02)     \\ 
## pollib                   & -0.57 ^*    & -0.57 ^*    & -0.57 ^*   \\ 
##                          & (0.20)      & (0.20)      & (0.20)     \\ 
## parties                  & 0.02 ^*     & 0.02 ^*     & 0.02 ^*    \\ 
##                          & (0.01)      & (0.01)      & (0.01)     \\ 
## \$$\backslash$theta\$ &             &             & 17716.24   \\ 
##                          &             &             & (306762.18) \\
##  $N$                      & 36          & 36          & 36         \\ 
## AIC                      & 105.66      &  NA         & 107.67     \\ 
## BIC                      & 131.00      &  NA         & 139.34     \\ 
## $\log L$                & -36.83      &  NA         & -33.83      \\ \hline
##  \multicolumn{4}{l}{\footnotesize{Standard errors in parentheses}}\\
## \multicolumn{4}{l}{\footnotesize{$^*$ indicates significance at $p< 0.05 $}} 
## \end{tabular} 
##  \end{table}

See \( \LaTeX \) document for results table.

Using QI simulation, plot the expected number of coups (y-axis) against the range of a continuous or ordinal independent variable of choice (x-axis). Include the point estimate and confidence intervals. Interpret this plot in 1-2 sentences.

## Simulate Coefficients ##
set.seed(23900)  # Set seed
m <- 1000  # Number of simulations

# Simulate coefficients from a multivariate normal
betas <- m3$coef  # Collecting betas
vcv <- vcov(m3)  # Collecting variance-covariance matrix 
sim.betas <- mvrnorm(m, betas, vcv)  # MASS package. Taking 1000 draws from multivariate normal, the mean is betas, vcv estimated from m3 (simulating from sampling distribution of the model) 

# Check to see if the simulated coefficients look like the real results
round(m3$coef, digits = 2)
## (Intercept)   oligarchy      pollib     parties 
##        0.25        0.09       -0.57        0.02
round(head(sim.betas, 10), digits = 2)
##       (Intercept) oligarchy pollib parties
##  [1,]       -0.36      0.14  -0.58    0.02
##  [2,]       -0.48      0.11  -0.33    0.03
##  [3,]        0.10      0.11  -0.67    0.02
##  [4,]       -0.63      0.10  -0.21    0.03
##  [5,]        0.42      0.06  -0.68    0.03
##  [6,]        0.28      0.09  -0.66    0.02
##  [7,]        0.62      0.05  -0.46    0.01
##  [8,]        0.32      0.11  -0.74    0.02
##  [9,]        0.27      0.09  -0.59    0.02
## [10,]        0.05      0.10  -0.35    0.01
data.frame(sim.means = apply(sim.betas, 2, mean), betas = betas, sim.sd = apply(sim.betas, 
    2, sd), se = sqrt(diag(vcv)))  # Mean of simulated betas, actual betas, sd of simulation, actual standard error; Looks pretty close overall
##             sim.means    betas   sim.sd       se
## (Intercept)   0.22576  0.25138 0.372688 0.372689
## oligarchy     0.09405  0.09262 0.022308 0.021779
## pollib       -0.56198 -0.57410 0.202837 0.204383
## parties       0.02206  0.02206 0.009017 0.008955

## Predicted Probability Plot ## Create a sequence of numbers spanning the
## range of length of oligarchy rule
oligarchy.seq <- seq(min(africa2$oligarchy), max(africa2$oligarchy), length = 50)

# Create hypothetical independent variable profile
x.africa2 <- data.frame(intercept = 0, oligarchy = oligarchy.seq, pollib = mean(africa2$pollib), 
    parties = mean(africa2$parties))  # Hold all other variables constant except oligarchy (not always the best)
x.africa2  # We're making this the variable we set to something because we are interested in oligarchy
##    intercept oligarchy pollib parties
## 1          0    0.0000  1.639   17.08
## 2          0    0.3673  1.639   17.08
## 3          0    0.7347  1.639   17.08
## 4          0    1.1020  1.639   17.08
## 5          0    1.4694  1.639   17.08
## 6          0    1.8367  1.639   17.08
## 7          0    2.2041  1.639   17.08
## 8          0    2.5714  1.639   17.08
## 9          0    2.9388  1.639   17.08
## 10         0    3.3061  1.639   17.08
## 11         0    3.6735  1.639   17.08
## 12         0    4.0408  1.639   17.08
## 13         0    4.4082  1.639   17.08
## 14         0    4.7755  1.639   17.08
## 15         0    5.1429  1.639   17.08
## 16         0    5.5102  1.639   17.08
## 17         0    5.8776  1.639   17.08
## 18         0    6.2449  1.639   17.08
## 19         0    6.6122  1.639   17.08
## 20         0    6.9796  1.639   17.08
## 21         0    7.3469  1.639   17.08
## 22         0    7.7143  1.639   17.08
## 23         0    8.0816  1.639   17.08
## 24         0    8.4490  1.639   17.08
## 25         0    8.8163  1.639   17.08
## 26         0    9.1837  1.639   17.08
## 27         0    9.5510  1.639   17.08
## 28         0    9.9184  1.639   17.08
## 29         0   10.2857  1.639   17.08
## 30         0   10.6531  1.639   17.08
## 31         0   11.0204  1.639   17.08
## 32         0   11.3878  1.639   17.08
## 33         0   11.7551  1.639   17.08
## 34         0   12.1224  1.639   17.08
## 35         0   12.4898  1.639   17.08
## 36         0   12.8571  1.639   17.08
## 37         0   13.2245  1.639   17.08
## 38         0   13.5918  1.639   17.08
## 39         0   13.9592  1.639   17.08
## 40         0   14.3265  1.639   17.08
## 41         0   14.6939  1.639   17.08
## 42         0   15.0612  1.639   17.08
## 43         0   15.4286  1.639   17.08
## 44         0   15.7959  1.639   17.08
## 45         0   16.1633  1.639   17.08
## 46         0   16.5306  1.639   17.08
## 47         0   16.8980  1.639   17.08
## 48         0   17.2653  1.639   17.08
## 49         0   17.6327  1.639   17.08
## 50         0   18.0000  1.639   17.08

# Compute the predicted probabilities and confidence intervals using the
# SIMULATED coefficients
ec.sim <- matrix(NA, nrow = m, ncol = length(oligarchy.seq))  # Compute the predicted probabilities 1000 times @ each of 50 education experience levels

for (i in 1:m) {
    ec.sim[i, ] <- exp(as.matrix(x.africa2) %*% sim.betas[i, ])
}

pe <- apply(ec.sim, 2, mean)
lo <- apply(ec.sim, 2, quantile, prob = 0.025)  # 95% CI, 2.5 percentile
hi <- apply(ec.sim, 2, quantile, prob = 0.975)  # 97.5 perentile
# pe, lo, and hi are vectors of length 50

# Make the plot
par(mar = c(4, 5, 0.1, 0.1))
hist(africa2$oligarchy, breaks = 15, col = "gray75", ylim = c(0, 25), main = "", 
    xlab = "", ylab = "", axes = F)
par(new = T)
plot(oligarchy.seq, pe, type = "n", ylim = c(0, 10), xlab = "", ylab = "", axes = F)
abline(v = seq(min(oligarchy.seq), max(oligarchy.seq), length = 10), col = "gray75", 
    lty = 3)
abline(h = seq(10, 25, 2), col = "gray75", lty = 3)
lines(oligarchy.seq, pe, lwd = 3, lty = 1)
lines(oligarchy.seq, lo, lwd = 2, lty = 2)
lines(oligarchy.seq, hi, lwd = 2, lty = 2)
title(ylab = expression("Expected Number of Military Coups"), line = 3.5, cex.lab = 1.5)
title(xlab = expression("Years of Military Oligarchy Rule"), line = 2.75, cex.lab = 1.5)
axis(1)
axis(2)
box()
legend("topleft", bty = "n", c(expression("Point Estimate"), expression("95% Conf. Interval")), 
    lty = c(1, 2), lwd = c(3, 2), cex = 1.25)

plot of chunk unnamed-chunk-6

As the time of military oligarchy rule increases, so does the expected number of military coups. However, as we move along the x-axis, the confidence intervals grow larger because there is more uncertainty. This is because there is less data for values on the right side of the x-axis, shown by the histograms.

2: Simulating Zero-Inflation

Conduct two Monte Carlo simulations showing the difference between a standard Poisson model and a zero-inflated Poisson (ZIP) model when the true DGP does and does not include a zero-inflation component.

  1. Produce a DGP that has true coefficient values for the count model and true coefficient values for the inflation model.
## Zero-Inflated Poisson Random Number Generator ## n is the # of obs, lambda
## is for the Poisson, zprob is the prob of 0 for Bernoulli
rzpois <- function(n, lambda, zprob) {
    ifelse(rbinom(n, 1, zprob) == 1, 0, rpois(n, lambda = lambda))  # This argument works like this: if rbinom()==T, return 0 for observation; else, draw from the Poisson distro
}

# Set true coefficient values for two models
g0 <- -0.1  # True value for the inflation intercept
g1 <- 0.3  # True value for the inflation slope
b0 <- 0.2  # True value for the count intercept
b1 <- 0.5  # True value for the count slope
n <- 1000  # Sample size

2. Generate at least one independent variable that will go in the count model.

set.seed(34003)
x <- runif(n, -1, 1)  # Count independent variable

3. Generate at least one independent variable that will go in the inflation model. This variable should be correlated with one of the variables from part 2.1.

set.seed(23489)
z <- rnorm(n, x, 1)  # Inflation independent variable

4. Create an empty matrix to store the coefficients on the count independent variable(s) for each model.

sims <- 1000
P.coeffs <- matrix(NA, nrow = sims, ncol = 2)  # Empty matrix to store count independent variable coeffs for standard And ZI Poisson models
colnames(P.coeffs) <- c("Pois", "ZIP")

5. In a for loop with at least 1,000 repetitions, generate the dependent variable, y, using the rzpois() function from class. Make sure you enter the systematic component you created (true coefficients and independent variables) correctly. Then estimate the Poisson and ZIP models and store the estimates on the count independent variable(s) from each model.

library(pscl)
## Loading required package: mvtnorm
## Loading required package: coda
## 
## Attaching package: 'coda'
## 
## The following object is masked from 'package:arm':
## 
##     traceplot
## 
## Loading required package: gam
## Loading required package: splines
## Loaded gam 1.09
## 
## Loading required package: vcd
## Loading required package: grid
## Classes and Methods for R developed in the
## 
## Political Science Computational Laboratory
## 
## Department of Political Science
## 
## Stanford University
## 
## Simon Jackman
## 
## hurdle and zeroinfl functions by Achim Zeileis

# For-loop
set.seed(98500)
for (i in 1:sims) {
    # Generate data
    y <- rzpois(n, lambda = exp(b0 + b1 * x), zprob = exp(g0 + g1 * z)/(1 + 
        exp(g0 + g1 * z)))

    # Model w/ Poisson and ZIP, interested in count independent variable x
    mP <- glm(y ~ x, family = poisson)
    mZIP <- zeroinfl(y ~ x | z, dist = "pois")

    # Store coeffs of x
    P.coeffs[i, 1] <- mP$coef[2]
    P.coeffs[i, 2] <- as.numeric(mZIP$coef$count[2])
}

6. Summarize the results by creating a histogram of the estimates stored from each model (i.e., a histogram of the Poisson estimates and another histogram of the ZIP estimates). Optional: Instead of histograms, plot two kernel density lines of the estimates on the same graph. Compute the mean and standard deviation of each set of estimates. Which one got closer to the true coefficient value? Why? Is one more efficient than the other (smaller standard deviation)?

P.coeffs.mean <- apply(P.coeffs, 2, mean)
P.coeffs.sd <- apply(P.coeffs, 2, sd)
P.coeffs.mean
##   Pois    ZIP 
## 0.3608 0.4994
P.coeffs.sd
##    Pois     ZIP 
## 0.08689 0.08377

The ZIP model is closer at estimating the true coefficient of 0.5. Efficiency-wise, both models are similar, though in this instance, the ZIP model's standard deviation is slightly lower (more efficient).

7. Now repeat steps #1–6, but with no zero-inflation in the true DGP. Again, which estimator got closer to your true coefficient value? Why? Is one more efficient than the other (smaller standard deviation)?

# Create another matrix to store count indepdent variable
P.coeffs.nozi <- matrix(NA, nrow = sims, ncol = 2)  # Empty matrix to store Poisson coeffs for standard and ZI Poisson models
colnames(P.coeffs.nozi) <- c("Pois - no ZI", "ZIP - no ZI")

# For-loop
set.seed(10980)
for (i in 1:sims) {
    # Generate data w/o ZI component
    y.nozi <- rzpois(n, lambda = exp(b0 + b1 * x), zprob = 0)  # Generate data w/ no ZI

    # Model w/ Poisson and ZIP, interested in count independent variable x
    mP.nozi <- glm(y.nozi ~ x, family = poisson)
    mZIP.nozi <- zeroinfl(y.nozi ~ x | z, dist = "pois")

    # Store coeffs of x for 2 variations
    P.coeffs.nozi[i, 1] <- mP.nozi$coef[2]
    P.coeffs.nozi[i, 2] <- as.numeric(mZIP.nozi$coef$count[2])
}

# Coefficents matrix for count independent variable
P.coeffs.nozi.mean <- apply(P.coeffs.nozi, 2, mean)
P.coeffs.nozi.sd <- apply(P.coeffs.nozi, 2, sd)
P.coeffs.nozi.mean
## Pois - no ZI  ZIP - no ZI 
##       0.5005       0.4990
P.coeffs.nozi.sd
## Pois - no ZI  ZIP - no ZI 
##       0.0480       0.0497

Both models estimate the true value of the parameter equally well, and both are equally efficient.