Executive Summary

To accurately analyze data, it is crucial to select the appropriate statistical method.

Binary regression is a statistical method used to analyze outcomes were the response variable can only take two value by estimating the probability of an event occurring. Ward and Ahlquist (2018) emphasize the importance of understanding the assumptions and interpretation of logistic regression coefficients, as well as checking for goodness of fit, influential observations, and multicollinearity. Logistic regression assumes that the relationship between predictor variables and the outcome variable is linear on the logit scale. Multinomial and ordered logistic regression are similar to the binary regression in many ways. They too assume there is a linear relationship between predictor variables and the outcome variable on the logit scale. Where they differ is, that these two statistical methods are used to analyze categorical outcomes, where the response variable can take on more than two values and may have a natural ordering compared to the binary regressions that could only take two non-ordered values.

Count Outcomes though differs from the previous two in multiple ways. It refers to situations where the response variable represents the number of times an event occurs in a given period. Poisson, negative binomial, and hurdle models are statistical methods used to analyze count outcomes within Count outcomes depending on its excess amount of zeros and its dispersion (Ward & Ahlquist, 2018). These models assume that the mean and variance of the response variable are equal and that the relationship between predictor variables and the outcome variable is linear on the log scale.

Event history data involves analyzing data related to the occurrence of events just like count outcomes. However, they differ in the type of outcome variable being analyzed. Event history data involves analyzing the time until a specific event occurs, while count outcomes involve analyzing the number of times an event occurs in a given period. This also means that Event history data are using survival models which assume that the hazard rate, which represents the probability of the event occurring at a specific time given that it has not occurred before that time, is constant over time or follows a specific distribution.

Finally, Hierarchical data structures, as explained by Gelman (2007), refer to data sets with a nested structure, where observations are clustered within larger units. This type of data is commonly encountered in social sciences, with individuals nested within households, neighborhoods, or schools. To properly analyze hierarchical data, it is necessary to account for the clustering effect, as observations within the same group are likely to be more similar to each other than to observations in other groups. Gelman emphasizes the importance of using hierarchical modeling techniques to account for this clustering effect and to properly estimate the standard errors of the model parameters. These techniques involve modeling the variation at different levels of the hierarchy and estimating the hyperparameters that govern this variation. Properly accounting for the hierarchical structure of the data can improve the accuracy and generalizability of the statistical analysis.

1. Summarize the argument that the authors make relating to H3 and make clear what comparisons they need to make in order to test their hypothesis.

Skiple et al. (2022) argue that judicial docket control mechanisms can have a significant impact on the success rate of private litigants. In addition, they suggest that judicial systems with mandatory dockets, which are lacking case selection mechanisms, lead to case outcomes that are skewed in favor of high-status litigants compared to judicial systems with discretionary dockets control. The reason behind this relationship is that less resourceful and / or less experienced litigants are more likely to pursue appeals without taking into account the probability of winning the case. Similarly, more resourceful and more experienced litigants are more likely to settle a case with a low probability of success without going into court. In sum, Skiple et al. put forward the hypothesis that courts with mandatory dockets often see a higher number of cases overall because the court must hear cases, and low-status litigants are more likely to appeal cases even with a low chances of success. This results in a higher proportion of cases that are easy to decide, and which are skewed towards a lower probability of success for low-status litigants. In contrast, courts with discretionary docket control have the mandate to filter out cases that are unlikely to succeed, resulting in a lower number of cases overall, and a lower proportion of cases that are easier to decide. Accordingly, they set up H3: Mandatory dockets increase the chance of success for high-status litigants, compared to discretionary dockets The authors aim to test their hypothesis by comparing the success rates of high-status litigants in two mostly similar judicial systems where the docket control mechanisms vary.

2. Replicate the descriptive statistics from Table 2 (p. 122) and visualize the bi- and/or trivariate relationships implied in H3.

Within this section, we have replicated table 2 from p. 122 representation of the descriptive statistics containing to the variables that the authors want to include in their statistical modelling. In addition, we have provided a visual depiction of the hypothesized relationship in figure 1.

Shiny applications not supported in static R Markdown documents
Table 2. Descriptive Statistics for Dependent and Independent Variables
Statistic N Mean St. Dev. Min Max
Nonunanimous 537 0.15 0.36 0 1
Dissent 537 0.13 0.34 0 1
Reversal 535 0.23 0.42 0 1
Government win 534 0.78 0.42 0 1
Docket type: Docket control 537 0.39 0.49 0 1
Docket type: Mandatory 537 0.36 0.48 0 1
Docket type: Mandatory principled 537 0.23 0.42 0 1
Docket type: External appeals board 537 0.02 0.15 0 1
Individuals 536 0.37 0.48 0 1

Figure 1 displays a trivariate relationship between the government’s win rate (GWR), docket control mechanisms, and the litigant’s status as defined by the authors. The left side of Figure 1 illustrates the government’s win rate under different docket control mechanisms for litigants who are corporations (i.e., high-status litigants), while the right side of Figure 1 shows the same relationship for individuals (i.e., low-status litigants).

Shiny applications not supported in static R Markdown documents

Using Figure 1, we are able to demonstrate the authors theoretical motivation for setting up H3. Specifically, we illustrate that low-status litigants are more likely to lose a case than high-status litigants in judicial systems with mandatory dockets or mandatory principled dockets. Under mandatory and mandatory principled dockets, the government’s win rate increases by 13 percentage points (from 84% to 97%, and from 80% to 93%, respectively). However, when comparing win rates in judicial systems with discretionary docket control, the increase in the government’s win rate is less than 1% when comparing high-status litigants to low-status litigants.

3. Replicate Model 4 in Table 3 (“Government Win”, p. 125) and interpret the results using text, numbers, graphics.

Length Class Mode 521 character character

Dependent variable:
Government win
Model 1
Intercept 1.68***
(.28)
Docket type: Mandatory principled -.23
(.39)
Docket type: Discretionary -1.01***
(.33)
Individuals 1.81***
(.65)
Docket type: Mandatory principled x individuals -.39
(1.01)
Docket type: Discretionary x individuals -1.90***
(.73)
Control variable: Government Appellant -.75***
(.29)
Observations 521
Log Likelihood -240.68
Akaike Inf. Crit. 495.37
Notes —Coefficients for are logits.
Standard errors in parentheses.
  • p < .1. ** p < .05. *** p < .01.

Text:

The regression table presents the results of our logistic model with government win as the dependent variable and individual litigants, docket type, and an interaction term between individual litigants and docket type as predictors. The findings indicate that both the type of docket and the litigant status are statistically significant predictors. Specifically, cases heard under the discretionary docket type are associated with lower odds (exp(-1.01) ≈ 0.33 odds-ratios) of a government win compared to the reference category of mandatory docket type. This can be interpreted as the odds of the government winning a case under the discretionary docket type decreases by 0.33 times compared to under the mandatory docket type while holding all else equal. In contrast, neither the mandatory principled docket type nor the interaction term of mandatory principled docket type are statistically significant. This, of course, can be explained by taking into account that the mandatory docket type has been set as the reference category. Furthermore, cases with individual litigants are associated with higher odds of a government win. Interestingly, the interaction term between discretionary docket type and individual litigants suggest that the higher odds of government winning a case is somewhat nullified under the discretionary docket type. However, getting a clearer vision of how this effect materializes would require us to introduce theoretically grounded scenarios, where we will be able to calculate a predicted probability of government win rate for these specific scenarios. Conclusively, it is important to note that the coefficient sizes are not reported in a straightforward manner, as they are presented as logits. Therefore, the probabilities and their corresponding confidence intervals would be a more interpretable way to discuss the findings. This is exactly what we will calculate next.

Numbers and graphics:

To calculate the predicted probabilities under different docket types and litigant status, we first need to determine which scenarios are worth investigating. We have selected six key scenarios that allow us to observe changes in litigant status under discretionary, mandatory, and mandatory principled dockets, with the government appellant control variable held at its mean. While government appellant is a binary variable in the real world, its binary nature is less crucial for calculating the average change in predicted probability for our scenarios.

Docket Litigant (Intercept)
Discretionary Individual litigant 94.06
Discretionary Group litigant 79.37
Mandatory Individual litigant 96.73
Mandatory Group litigant 82.92
Mandatory principled Individual litigant 61.86
Mandatory principled Group litigant 63.93

The table below presents the absolute values of our calculated probabilities, while the plot shows the predicted probabilities with 95% confidence intervals. The 95% confidence intervals for the probability estimates were calculated using simulations in the R mass package. The plot clearly indicates that litigant status has a statistically significant effect under mandatory dockets, as the confidence intervals do not overlap. However, litigant status is not statistically significant under discretionary dockets or mandatory principled dockets, as the confidence intervals overlap.

Shiny applications not supported in static R Markdown documents

4. Assess the model against a base-line model without interactions. Does the more complex model improve over the base-line model?

In order to assess the two models we first present the coefficient and standard errors in table 3. We see that without the interaction term in model 2, the effect (the coefficient) of docket types (and the intercept, which represents the reference docket type of mandatory) become slightly more powerful. In addition, the effect of docket type is still statistically significant. This is to be expected. However, the effect of individuals becomes less powerful in model 2. This is the first stab at the comparison. In order to do a more thorough assessment we will present the models’ brier score, comment on the AIC-value, present separation plots and the Received-operator-curve for both the models in the following. After having done this we make a brief conclusion on the models.

Dependent variable:
Government win
Model 1 Model 2
(1) (2)
Intercept 1.68*** 2.02***
(.28) (.26)
Docket type: Mandatory principled -.23 -.43
(.39) (.35)
Docket type: Discretionary -1.01*** -1.51***
(.33) (.29)
Individuals 1.81*** .58**
(.65) (.26)
Docket type: Discretionary x individuals -.39
(1.01)
Docket type: Mandatory principled x individuals -1.90***
(.73)
Control variable: Government Appellant -.75*** -.75***
(.29) (.29)
Observations 521 521
Log Likelihood -240.68 -245.61
Akaike Inf. Crit. 495.37 501.22
Notes —Coefficients for model 1 and model 2 are logits.
Standard errors in parentheses.
  • p < .1. ** p < .05. *** p < .01.

AIC-scores:

In order to compare the models we have calculated the AIC score for each of the four models. The general rule of thumb is that the AIC-score is a relative measure, which can be used to compare different models. The model with the lowest AIC-score is the model which has the best goodness-of-fit. Hence, in this comparison model 1 (AIC score of 495) provides the best goodness-of-fit, but the measure should be used in combination with other measures of model fit and evaluation.

Brier scores:

The Brier score is a measure of the accuracy of probabilistic predictions. It ranges from 0 to 1, where a score of 0 indicates perfect accuracy and a score of 1 indicates no accuracy. In order to calculate the Brier-scores for the two models we use the following formula, which states that a model’s Brier-score is calculated as the mean squared difference between the predicted probability and the actual outcome.

Model Model 1 Model 2 Brier score 0.22 0.21 Based on the Brier scores of Model 1 and Model 2, we can say that Model 2 has the lowest score, indicating that it has higher accuracy in predicting the outcome of the binary classification.

Separation plots:

To compare the models we have plotted separation plots for the model. These plots can help us assess the fit adequacy of our models. The vertical lines in the plots represent whether the government won a case or not in the observed . The purple lines depict observed where the government won, while the white lines depict observed cases where the litigant won. Each vertical line is ranked based on the predicted probability of the outcome, visualized by the horizontal black line. The further to the left a vertical lines is placed, the smaller is the estimated probability of observing this case outcome. When analyzing the separation plots, we see that there are very little difference between the two models. We also see that there were many purple lines scattered throughout the plots. This can be explained by two reasons. Firstly, the government on average has a high win rate, as shown in table 2 of the descriptive statistics (winning 78% of all cases). Secondly, the authors’ goal is not to predict the outcome of judicial cases with high accuracy. Instead, their aim is to determine if docket type and litigant status have an impact on the government’s win rate in general. For these reasons, we cannot expect the models to accurately predict whether the government wins or the litigant wins.

Roc curves:

The ROC Curves is an elaboration on the Brier-scores showing us how capable the model is to discover a true positive result and how capable it is to avoid false positive results. The higher AUC the ROC curves provide, the better the model is to provide us with true positive results (Ward and Ahlquist, 2018). In the model underneath, the two ROC curves for the two models. Furthermore, we interpret from the AUC calculated on the two ROC curves, that there is little to no noteworthy difference between the two curves. They are almost as capable of differentiating between true positive and false negative results. The two AUC values of 0,75 and 0,74 provide us with estimate of how good the ROC curves are at differentiating between the two result types without being perfect. Both models are however better than random prediction at an AUC value of 0,5.

5. Estimate a linear model and compare the results. Which one would you choose? Justify your choice.

In the following table, we have presented the regression output for four models; We have model 1 and model 2, which are logistic models with model 1 having an interaction term, and model 2 without the interaction term. Likewise, model 3 and model 4 are with an interaction term (model 4) and without an interaction term (model 3). Model 3 and model 4 are however linear regressions.

Dependent variable:
Government win
logistic OLS
Model 1 Model 2 Model 3 Model 4
(1) (2) (3) (4)
Intercept 1.68*** 2.02*** .85*** .87***
(.28) (.26) (.04) (.03)
Docket type: Mandatory principled -.23 -.43 -.03 -.04
(.39) (.35) (.06) (.05)
Docket type: Discretionary -1.01*** -1.51*** -.19*** -.24***
(.33) (.29) (.05) (.04)
Individuals 1.81*** .58** .13** .08**
(.65) (.26) (.06) (.04)
Docket type: Mandatory principled x individuals -.39 .01
(1.01) (.10)
Docket type: Discretionary x individuals -1.90*** -.15*
(.73) (.08)
Control variable: Government Appellant -.75*** -.75*** -.16*** -.16***
(.29) (.29) (.05) (.05)
AIC 495.37 501.22 512.16 511.99
McFaddens R² / R² 0.14 0.12 0.13 0.13
Observations 521 521 521 521
Notes —Coefficients for model 1 and model 2 are logits. Standard errors in
parentheses. Model 1 and Model 2 are reported with McFaddens R²,
while Model 3 and Model 4 are reported with regular R²
  • p < .1. ** p < .05. *** p < .01.

Goodness-of-fit: AIC-scores, R2 for linear models and McFaddens R2 for logit models

In order to compare the models we have calculated the AIC score and an R2-value for each of the four models. Once again, the general rule of thumb is that the AIC-score is a relative measure, and the model with the lowest AIC-score is the model which has the best goodness-of-fit. Hence, in this comparison model 1 provides the best goodness-of-fit. In addition, we can comment on the R2 values of the models. First, it should be noted that logit-models do not naturally provide us with R-squared values. Instead we have chosen to calculate the McFadden’s R2 for the logit models. For this reason a comparison between the logit-models pseudo R2 are not one-to-one comparable with the regular R2 of the linear models. As a rough estimation, however, we do see that all the four models explains only a fraction of the total variance. This again is in line with the authors not seeking to explain what predict government winning a case, but instead are interested in seeing whether docket type and litigant status are correlated with the rate. With these goodness-of-fit measures in mind they all point towards model 1 as having the best fit. We should however note, that the goodness-of-fit measures are not enough to decide which regression is the best. Instead we should discuss in which scenarios a logistic regression is more adequate to use than a linear model.

Choosing the regression type with regards to dependent variable and research question

There are several reasons why one might choose to use logistic regression instead of linear regression. One of the most important reasons is that logistic regression is specifically designed for modeling binary outcomes (i.e. government win), whereas linear regression assumes that the response variable is continuous and normally distributed. If a binary response variable is analyzed using linear regression, the resulting model will likely have poor predictive accuracy and biased coefficient estimates. For this reason, in this specific context, there are a strong argument towards favoring a logistic regression over a linear regression. Another reason to choose logistic regression is that it allows for the estimation of probabilities, which can be useful for predicting the likelihood of a particular event occurring. For example, logistic regression can be used to predict the probability of a particular even happening (i.e. government win) based on some theoretically selected predictors. Once again, this is compelling reason for favoring a logistic regression over a linear regression in this specific case. However, we fully understand why political scientists with a limited understanding of logit models dread using the logistic model. It is tedious work to compare specific theory-driven scenarios and transform the logit coefficients of the logistic model into discrete probabilistic calculations in comparison with the intuitive plug-and-play coefficients of the linear models. In this regard, the linear model offers a compelling model that is more convenient when interpretation the coefficients. In regards to the rigorous scientific method, we cannot however accept convenience over accuracy of our estimates.

In sum, in this specific case we agree with the authors on the choice of using a logistic regression to model the relationship between government win rate, docket type and litigant status.

1. Summarize the argument that the author makes and explain how the theory justifies the choice of model. What is his theoretical question? How does he test it? Does he find support for his expectations?

Scharpf seeks to the rationale behind Latin American governments’ deployment of their troops for training missions abroad. His theoretical question is whether this practice is driven by a desire to build a stronger military capacity or by the desire to foster political alliances with foreign countries. To test his question, Scharpf analyzes data from a survey of military attachés and diplomats from 17 countries in Latin America where he uses for example a zero-inflated negative binomial regressions.

Scharpf establishes that both capacity-building and alliance-forging objectives are significant factors that impel the dispatch of troops for overseas training, but their relative importance diverges from country to country. He finds support for his expectations that countries with weaker domestic military training programs and greater external security threats are more likely to send troops abroad for training, while countries with higher levels of military spending are less likely to use foreign military education.

2. Explore the distribution of the dependent variable. Begin by replicating Figure 1

To start with we will examine the distribution of the dependent variable, number of SOA courses attended per country per year. By visibly illustrating the distribution of data it provides us insight into the patterns of the data. We can furthermore outline outliers and skewness amongst other features within the dataset. Furthermore, we need to understand the distribution of data for selecting the appropriate methods for analysis while also assessing the possible challenge that may come with the distribution of the data. Without knowing the distribution of our data we cannot draw reliable conclusions from the data due to there being features within the data that affects estimation of parameters and the calculation of probabilities.

To start off with, lets start with an illustration of the dependent variable. As we can see, there is an extreme amount of smaller observations compared to higher valued observations. This tells us that we have an right skewness. We further back the visual distribution of the dependent variable with an table showing some key numbers. From this part we can once again conclude that more than 25% of our values are zeros. Furthermore, the median and the mean value differ by a excess margin meaning that our data is affected by an outliers on one side, which in this case is the few extreme observations. The distances between the higher quantiles (from the 90th to the 99th) is also shown in order to illustrate how the extreme cases affect the distribution of our data. Lastly we the calculated variance of 6167,7 tells us, that the scores are widely dispersed around the mean. A large variance can be indicative of a skewed or non-normal distribution, which may require different statistical methods for analysis and as previously mentioned we can see, that our data is extremely right skewed while also not being normally distributed.

The first histogram shows that the there is an excess amount of zeros in our dataset. But since it measures the amount of SOA courses that each country attend per year it fosters some other questions: Is it the same countries attending the courses or are they evenly spread amongst the observed countries? In order to get a better understanding of the diversity amongst the countries on the dependent variable we have created a histogram showing the total amount of courses attended by each country. Here we illustrate, that countries such as Colombia, El Salvador, Peru and Nicaragua are the main attendee on the SOA courses, with Colombia having more than 10.000 attendances at the courses. On the opposite end, Cuba and Grenada amongst other have attended no courses at all. Our graph show, that most attendances are centered amongst a few countries whereas more than half of the countries have attended less than 1.000 courses.

Shiny applications not supported in static R Markdown documents
Shiny applications not supported in static R Markdown documents

3. Replicate Model 1 in Table 1 (“Course attendance”, p. 742).

In order to gain a better understanding of the data we perform a ZINB regression model aimed at exploring the impact of various factors on the number of SOA courses attended by participants. The model was fitted with a dataset comprising 1532 observations and clustered into 33 clusters. Inflation equation predictions were used to determine zeros, with negative coefficients indicating a higher likelihood of attending at least one course. Positive coefficients in the count equation indicate a higher number of courses attended. The results of the ZINB regression indicate that, controlling for all other factors, there is a statistically significant negative association between the similarity with US foreign policy and the number of courses attended (β=-0,636, SE=0,130, p<0,001). A statistically significant positive association was also found between guerilla attack and the number of courses attended (β=0,875, SE=0,270, p<0,001). Other variables including strike, demonstration, riot, and conventional war were not found to have a statistically significant association with the number of courses attended. The model also indicated that the constant had a statistically significant positive association with the number of courses attended (β=3,937, SE=0,142, p<0,001). The AIC for the model was 10.747, which is a measure of the relative quality of statistical models for the given set of data. It takes into account both the goodness of fit of the model and the complexity of the model where lower AIC value indicates that the model has a better fit (Ward & Ahlquist, 2018: 195).

Zero-inflated Negative Binomial model
Number of SOA courses attended
Model 1
Similarity with US foreign policy -0.636***
(0.130)
Constant 1.303**
(0.538)
Guerilla attack 0.875***
(0.270)
Strike -0.214
(0.139)
Demonstration -0.014
(0.116)
Riot -0.071
(0.158)
Conventional war 0.167
(0.135)
Constant 3.937***
(0.142)
AIC 10747.36
Clusters 33
Observations 1586
Zero Observations 723
Note: Results from zero-inflated negative binomial regression.
Robust standard errors, clustered on countries. Inflation
equation predicts zeros, with negative coefficients indicating
a higher probability of attending at least one course. Positive
coefficients in count equation indicate a higher number of
courses attended

4. Assess the dispersion of this model in at least two ways. How would you describe the challenge? Are you satisfied?

In order to assess the dispersion of the model in part two we have focused on two ways to do this. Firstly, we compare the distribution of the observed and predicted outcomes. This allows us to test the model for overdispersion. Overdispersion means, that the variance of the residuals is larger than the mean (Ward and Ahlquist, 2018: 197). This is very common for models and it basically means that the variance in the data is much more diverse than what is expected by the average value of the data (Ward and Ahlquist, 2018: 197). In the second part we will use a zero-inflated negative binomial regression or also known as a ZINB model.

Shiny applications not supported in static R Markdown documents

From the table above form here there is a clear picture forming. It illustrates that clearly the zeros dominate the data that have gather. This supports the findings from earlier when we assed the distribution of the data in the dependent variable. The green columns here is just a histogram like the one that was introduced earlier, but we have added a certain aspect to this graph: a predicted variable. From the histogram we can compare what we observe from the dependent variable, number of SOA courses attended per country per year, with what we are predicted to observe instead. Here we illustrate that we are underpredicting too few (zeros and single digits) observations and too few extreme (one hundred and above) observations. This means, that the variation in the distribution of attended SOA courses per country per year is much more diverse than our model illustrates. This means, that we are we are overdispersion and that our standard errors are too small.

As the previous model illustrated our data have an excess of zero/single digits observations compared to our predicted observations hence we had overdispersion (Ward & Ahlquist, 2018: 197). In order to analyse the data it is then preferred to use a zero-inflated negative binomial regression model. This model basically assumes that the data is generated from two processes: 1) The process of “always zeros”, and 2) the observations that are “eligible” for the event (Ward & Ahlquist, 2018: 211). Where they differ is, that both of them can generate observed zeros but it is only the latter that can produce outcomes greater than zero (Ward & Ahlquist, 2018: 211). From this model we see a better method of prediciting the observations. This ZINB model albeit being more complex is better at capturing the excess amount of zeros but also the more extreme observations which we also underpredicted in the previous histogram.

We show in this part, that there is a possibility of overdispersion in the data if we do not take the excess amount of zeros into consideration. We show this through a observed versus predicted histogram, where it is visible that there are an excess amount of zeros and of extreme observations compared to what we can predicted based on the average observation. In order to combat this, we have used a zero inflated negative binomial regression model that is better equipped to capture the excess amount of zeros and extreme observations. The reason to why we face overdispersion could be down to the having incorrect standard errors. When data displays overdispersion, the standard errors of parameter estimates tend to be smaller than they should be, which can result in misleading inference. This means that p-values may be erroneously low and confidence intervals may be narrower than they should be (Ward & Ahlquist, 2018: 200-202). Consequently, it is important to be aware of the potential for overdispersion.

5. Interpret the marginal effects of the model (not the predicted).

In order to interpret the effects better visually than the previous regression table we can divide the two dimensions into two sectors. As Scharpf (2020) himself does, he clusters variables into a diplomatic and a military dimension. The first figure here shows, that the countries whose foreign policy is the most similar (around 4-5 on the X-scale) with the US foreign policy were highly likely to send their troops to attend the SOA courses hosted by the US. If we compare this result with the results from previously which showed it explains the decisions to a certain degree. For example a country like Cuba, which was ruled by communists and Castro, are not similar to the US in their foreign policy. This might explain why the Cubans did not attend any SOA courses in the US in the period. Though this does not explain all.

Shiny applications not supported in static R Markdown documents

We have also calculated the marginal effects for the military dimension where there are five variables compared to the one in the diplomatic. Here we see that the only statistical significant variable once more are the Guerrilla attack variable. Hence, if a government suffered a guerrilla attack by some insurgent rebels, then the country attended more SOA courses (Scharpf, 2020). The remaining four variables show no statistical significance neither at p<0,1 nor at p<0,05 hence we will not use much time discussing these.

Shiny applications not supported in static R Markdown documents

6. Fit an ordinary poisson model to these same predictors as in model 1. How do the results look like? And how is the dispersion of the model?

After having fitted an ordinary poisson model instead of the ZINB regression to the same predictors as previously have we created both an poission regression and also illustrated it. Note that in the zeroinfl() function, we used a zero-inflated negative binomial distribution because the outcome variable had excess zeros. Also note that we only focused on the military dimension hence not measuring the diplomatic dimension like Scharpf (2020) in his.

Poisson model
Number of SOA courses attended
Model 2
Constant 2.545***
(0.555)
Strike -0.013
(0.146)
Demonstration 0.308**
(0.139)
Riot -0.195
(0.162)
Conventional war 0.231
(0.190)
Similarity with US foreign policy 0.252
(0.165)
Guerilla attack 1.062***
(0.347)
AIC 132316.04
Clusters 33
Observations 1532
Zero Observations 676
Note: Results from poisson regression. Robust
standard errors, clustered on countries.

Using a Poisson regression on the data we get a much more different result from the one we had in the ZINB regression.

The model was fitted the exact same dataset comprising 1532 observations and clustered into 33 clusters. What we found here using the Poisson model was controlling for all other factors that, there now is a statistically significant positive association between the similarity with US foreign policy and the number of courses attended (β=2,545, SE=0,555, p<0,001). Meanwhile the variable “Demonstration” also showed positive statistical significance (β=0,308, SE=0,139 , p<0,05). compared to the previous ZINB regression, where it showed no significant association. Meanwhile both regressions showed a positive statistical significant association between the constant and the amount of SOA courses attended. Though, what we should be wary of is, that the Poisson model we used has a much greater AIC value compared to the ZINB model from before. The AIC model from before was 10.747, which is much less than the current 132.316, meaning that the previous model had a better goodness of fit of the model while also the complexity of the model indicated that the ZINB model was a better fit (Ward & Ahlquist, 2018: 195).

But why is that the AIC is so much higher at the Poisson model and we also get different significant results? To answer this, we have used the illustrations from the Poisson model. From the model can we see, that we are severely underpredicting zeros and also we underpredict the values between one and twelve to a certain degree, afterwards we overpredict to a certain degree the observations from fifteen and upwards. Note, that we are also overpredicting around the mean of 38 by a significant margin. Wave-like patterns as we see in the model and underpredictions of zeros are consistent with over-dispersion as previously mentioned (Ward & Ahlquist, 2018: 200). In the passion model we assume that the outcome variable follows a Poisson distribution, which does not have excess zeros. Therefore, if your outcome variable has excess zeros, the Poisson model may not be appropriate, and you may want to consider other models such as the Zero-Inflated Poisson model. However, Rootograms are useful to evaluate the model fit of both data in-data and also out-of-sample date (Ward & Ahlquist, 2018: 200). This is key, because a model might fit well for an in-sample-data but is not necessarily well suited to generalise to new data. Rootograms are therefore a useful method for evaluating whether a model has good accuracy and can generalize well to new data. What we can conclude is, that the ZINB model was much better suited for the dataset.

7. Relying on the other fixes we have considered, propose a viable alternative to the zeroinflated/hurdle models. How is the overdispersion?

Other ways to interpret the data would be perform a regular negative binomial model. The difference between a ZINB and a NB due to this model also allows for overdispersion, just like the ZINB model does (Ward & Ahlquist, 2018: 202). In that sense a negative binomial model can be justified in this cause. The only reason to justify a ZI-model (either P or NB) is because it is justified by the underlying theory. So in order to figure out which model suits our data the best with the excess amount of zeros and the extreme outliers we have compared the regression between the ZINB, the NB, a Zero Inflation Poisson-count (Ward & Ahlquist, 2018: 211-212) and lastly the Poisson model next to each other. From here a couple of core numbers are observed and it is clearly visible that the ZINB model is best suited followed closely by the NB. If we start with the Poisson and the ZIP models they have AIC values of respectively 132.316 and 75.797. These values are much higher than the values that the ZINB and the NB provides us of 10.747 and 10.975 respectively. The Poisson model, as previously mentioned, does not take the excess zeros into account giving us the extreme AIC value (Ward & Ahlquist, 2018: 200). Meanwhile a ZIP model assumes that the excess zeros are generated by a separate process from the count data (Ward & Ahlquist, 2018: 211), whereas a ZINB model assumes that the excess zeros are generated by the same process as the count data (Ward & Ahlquist, 2018: 201). this means that the ZINB model is more appropriate than the ZIP in our situation due the overdispersion in the count data. Meanwhile the ZINB model accounts for the excess zeros in the data whereas the NB, which assumes that the data follow a negative binomial distribution, does not account for excess zeros (Ward & Ahlquist: 2018: 202).

Zero-inflated Negative Binomial model
Number of SOA courses attended
ZINB-count NB ZIP-count Poisson
(1) (2) (3) (4)
Constant 3.937*** 1.813*** 4.027*** 2.545***
(0.069) (0.185) (0.007) (0.014)
Strike -0.214* 0.144 -0.166*** -0.013
(0.113) (0.173) (0.011) (0.011)
Demonstration -0.014 0.334** 0.040*** 0.308***
(0.104) (0.155) (0.009) (0.010)
Riot -0.071 -0.149 -0.157*** -0.195***
(0.104) (0.160) (0.009) (0.010)
Conventional war 0.167 0.212 0.149*** 0.231***
(0.109) (0.153) (0.009) (0.010)
Similarity with US foreign policy 0.465*** 0.252***
(0.060) (0.004)
Guerilla attack 0.875*** 1.276*** 0.821*** 1.062***
(0.107) (0.162) (0.008) (0.009)
AIC 10747.36 10975.73 75797.75 132316.04
Clusters 33 33 33 33
Observations 1532 1532 1532 1532
Zero Observations 676 676 676 676
Log Likelihood -5,364.680 -5,480.865 -37,890.870 -66,151.020
Note: Results from our regressions. ZINB = Zero-inflated
negative binomial model. NB = Negative binomial model.
ZIP = Zero-inflated poisson. Regular standard errors.

8. Which one would you prefer – Scharph’s or yours – and why?

So to conclude we would still prefer Scharph’s. Not solely because his test statistics (lower AIC, and statistically significant lrtest) supports that the ZINB model fits the data better than a regular NB model, the Poisson and the ZIP model. But we argue for his choice of the ZINB model because it has a theoretical motivation that corroborates that there are two distinct data generating processes producing the zeros – a diplomatic logic and a military logic. Meanwhile the ZINB model also takes the overdispersion of the data into account which other models lacks.