Missing Data

About 47% of the data does not have values for the parties’ propensity of violence. Following are some of the ways that systematic patterns can be identified within this data.

Countries with the most missing data

Top 15 Countries by Missingness in v2paviol_ord
Country Total Obs Missing Missing (%)
Afghanistan 5 5 100.0
German Democratic Republic 16 16 100.0
Oman 9 9 100.0
Papua New Guinea 64 64 100.0
Republic of Vietnam 17 17 100.0
Solomon Islands 52 52 100.0
South Yemen 5 5 100.0
United Arab Emirates 8 8 100.0
Suriname 134 132 98.5
Cuba 117 106 90.6
Eswatini 16 14 87.5
Somalia 17 14 82.4
Laos 37 29 78.4
Jordan 23 18 78.3
Syria 51 38 74.5

Countries with the highest share of missing values in the variable are closed autocracies (e.g., Cuba, Laos, Oman), conflict-ridden states (e.g., Afghanistan, Somalia), and small democracies (e.g., Solomon Islands, Suriname). This suggests the underlying challenges in the dataset: in regimes without party competition, or where political information is nontransparent, coding party attitudes becomes difficult. A hundred percent missingness in one-party states or fragile states highlights the need for caution when generalizing findings globally.

By regime type

Missing v2paviol_ord by Regime Type
Regime Total Missing Missing (%)
Closed Autocracy 1831 1327 72.5
Electoral Autocracy 3855 2004 52.0
Electoral Democracy 2909 1187 40.8
Liberal Democracy 3090 975 31.6

Missing data is strongly patterned by regime type. The rate of missingness is highest in closed autocracies and electoral autocracies, and is substantially lower in electoral democracies and liberal democracies. This suggests that data is harder to code in regimes with limited political openness, press freedom, and transparency. In closed autocracies, the absence of competitive parties or censored political environments must make coding party stances difficult or meaningless.

By country, year and continent

We can furth plot one by regime type.

If we look closely, we would realize that missing data is not that big a worry because most of it is concentrated in electoral autocracies. Moreover, if one looks closely, in the post cold-war era, we still have more data than we don’t.

We can still do a bunch of things to allay these concerns.

  • We can, as you suggested, just focus on categories other than closed autocracies.
  • We can model the missingness itself to show it’s not random but politically patterned.
  • We can restrict attention to the post-1990 period, where coverage improves significantly.
## 
## =============================================
##                       Dependent variable:    
##                   ---------------------------
##                        missing_v2paviol      
## ---------------------------------------------
## year                       -0.013***         
##                             (0.002)          
##                                              
## v2x_polyarchy              -1.442***         
##                             (0.113)          
##                                              
## Constant                   24.939***         
##                             (3.926)          
##                                              
## ---------------------------------------------
## Observations                 7,712           
## Log Likelihood            -3,704.606         
## Akaike Inf. Crit.          7,415.211         
## =============================================
## Note:             *p<0.1; **p<0.05; ***p<0.01

To show that the missingness is not random and thus not an issue, we estimate a logistic regression model with binary missingneness variable as the outcome. Results clearly show that closed and electoral autocracies, early years, and countries with lower polyarchy scores are significantly more likely to have missing values. This indicates that the missingness is not Missing Completely at Random (MCAR), and is more likely Missing at Random (MAR) — i.e., predictable based on observed variables. This can be a good rationale to limit our analysis to just electoral and liberal democracies (maybe even electoral autocracies if we want to include some big cases).