About 47% of the data does not have values for the parties’ propensity of violence. Following are some of the ways that systematic patterns can be identified within this data.
Country | Total Obs | Missing | Missing (%) |
---|---|---|---|
Afghanistan | 5 | 5 | 100.0 |
German Democratic Republic | 16 | 16 | 100.0 |
Oman | 9 | 9 | 100.0 |
Papua New Guinea | 64 | 64 | 100.0 |
Republic of Vietnam | 17 | 17 | 100.0 |
Solomon Islands | 52 | 52 | 100.0 |
South Yemen | 5 | 5 | 100.0 |
United Arab Emirates | 8 | 8 | 100.0 |
Suriname | 134 | 132 | 98.5 |
Cuba | 117 | 106 | 90.6 |
Eswatini | 16 | 14 | 87.5 |
Somalia | 17 | 14 | 82.4 |
Laos | 37 | 29 | 78.4 |
Jordan | 23 | 18 | 78.3 |
Syria | 51 | 38 | 74.5 |
Countries with the highest share of missing values in the variable are closed autocracies (e.g., Cuba, Laos, Oman), conflict-ridden states (e.g., Afghanistan, Somalia), and small democracies (e.g., Solomon Islands, Suriname). This suggests the underlying challenges in the dataset: in regimes without party competition, or where political information is nontransparent, coding party attitudes becomes difficult. A hundred percent missingness in one-party states or fragile states highlights the need for caution when generalizing findings globally.
Regime | Total | Missing | Missing (%) |
---|---|---|---|
Closed Autocracy | 1831 | 1327 | 72.5 |
Electoral Autocracy | 3855 | 2004 | 52.0 |
Electoral Democracy | 2909 | 1187 | 40.8 |
Liberal Democracy | 3090 | 975 | 31.6 |
Missing data is strongly patterned by regime type. The rate of missingness is highest in closed autocracies and electoral autocracies, and is substantially lower in electoral democracies and liberal democracies. This suggests that data is harder to code in regimes with limited political openness, press freedom, and transparency. In closed autocracies, the absence of competitive parties or censored political environments must make coding party stances difficult or meaningless.
We can furth plot one by regime type.
If we look closely, we would realize that missing data is not that big a worry because most of it is concentrated in electoral autocracies. Moreover, if one looks closely, in the post cold-war era, we still have more data than we don’t.
We can still do a bunch of things to allay these concerns.
##
## =============================================
## Dependent variable:
## ---------------------------
## missing_v2paviol
## ---------------------------------------------
## year -0.013***
## (0.002)
##
## v2x_polyarchy -1.442***
## (0.113)
##
## Constant 24.939***
## (3.926)
##
## ---------------------------------------------
## Observations 7,712
## Log Likelihood -3,704.606
## Akaike Inf. Crit. 7,415.211
## =============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
To show that the missingness is not random and thus not an issue, we estimate a logistic regression model with binary missingneness variable as the outcome. Results clearly show that closed and electoral autocracies, early years, and countries with lower polyarchy scores are significantly more likely to have missing values. This indicates that the missingness is not Missing Completely at Random (MCAR), and is more likely Missing at Random (MAR) — i.e., predictable based on observed variables. This can be a good rationale to limit our analysis to just electoral and liberal democracies (maybe even electoral autocracies if we want to include some big cases).