PCA: the Principal Components of Autocracies

Introduction

Fukuyama (1992) pronounced liberal democracy the end-point of mankind’s ideological evolution and the final form of human government. “But supposing the world has become”filled up” with liberal democracies, such as there exist no tyranny and oppression worthy of the name against which to struggle? (…) if men cannot struggle on behalf of a just cause (…) they will struggle against the just cause. (…) they will struggle against that peace and prosperity, and against democracy.” - he proclaimed over three decades ago, effectively predicting the third wave of autocratization succeeding the 2008 financial crisis (Maerz et al., 2019). Now – during the time of monsters, of the old world dying and a new one struggling to be born (Gramsci, 1926) – might be the time to turn one’s eyes towards the chimera-like bodies of autocracies.

The aim of this project is to analyze the dimensionality of democracy (and hence, autocracy) in an attempt to reject the clear schism between these regimes, which are often treated as the opposite ends of the same, one-dimensional spectrum. Dimension reduction methods (PCA and rPCA) will be applied to a multi-dimensional dataset constructed by the Varieties of Democracy (V-Dem), a collaborative project by V-Dem Institute (Univeristy of Gothenburg) providing granular data on the measures of democracy. The magnum opus of V-Dem Institute, besides the dataset, is the Electoral Democracy Index (EDI, Polyarchy Index) – measuring democracy on a scale of 0 to 1.

This project is structured as follows: the first section provides an overview of the employed data and describes the pre-processing: variable exclusion, the handling of missing values and the necessary transformations. Subsequently, data is inspected for suitability with the diemnsion reduction methods: correlation matrix is analyzed and Bartlett’s and K-M-O tests are carried out. Afterwards, Principal Component Analysis algorithms (PCA, rPCA) are run and its results analyzed. The section results compares the findings of this project with the Polyarchy Index.

Data Description and Pre-processing

This project will be based on Varieties of Democracy (V-Dem) dataset. It is a multidimensional dataset distinguishing between five key principles of democracy:

electoral,
liberal,
participatory,
deliberative,
egalitarian.

These principles are captured through a wide variety of variables, reflecting the complexity of democratic processes extending beyond, simply, the presence of elections. It is computed by a team of over 50 social scientists working with more than 3,000 local experts and a global International Advisory Board.

First, the dataset will be loaded into the environment using an R package especially dedicated to V-Dem data (vdemdata). The package contains the most recent V-Dem and V-Party datasets and provides additional functions, such as:

var_info(), which prints out basic information of a specific variable based on the V-Dem codebook,
find_var(), which allows for searching variables using keywords,
plot_indicator(), which plots V-Dem indicators for exploratory analysis.

devtools::install_github("vdeminstitute/vdemdata")

## Skipping install of 'vdemdata' from a github remote, the SHA1 (c7836967) has not changed since last install.
##   Use `force = TRUE` to force installation

vdem <- vdemdata::vdem

Now that the data is loaded, let’s inspect its dimensionality and content:

dim(vdem)

## [1] 27913  4607

str(vdem)

## 'data.frame':    27913 obs. of  4607 variables:
##  $ country_name                   : chr  "Mexico" "Mexico" "Mexico" "Mexico" ...
##  $ country_text_id                : chr  "MEX" "MEX" "MEX" "MEX" ...
##  $ country_id                     : num  3 3 3 3 3 3 3 3 3 3 ...
##  $ year                           : num  1789 1790 1791 1792 1793 ...
##  $ historical_date                : Date, format: "1789-12-31" "1790-12-31" ...
##  $ project                        : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ historical                     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ histname                       : chr  "Viceroyalty of New Spain" "Viceroyalty of New Spain" "Viceroyalty of New Spain" "Viceroyalty of New Spain" ...
##  $ codingstart                    : num  1789 1789 1789 1789 1789 ...
##  $ codingend                      : num  2024 2024 2024 2024 2024 ...
##  $ codingstart_contemp            : num  1900 1900 1900 1900 1900 1900 1900 1900 1900 1900 ...
##  $ codingend_contemp              : num  2024 2024 2024 2024 2024 ...
##  $ codingstart_hist               : num  1789 1789 1789 1789 1789 ...
##  $ codingend_hist                 : num  1920 1920 1920 1920 1920 1920 1920 1920 1920 1920 ...
##  $ gapstart1                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gapstart2                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gapstart3                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gapend1                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gapend2                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gapend3                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gap_index                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ COWcode                        : num  70 70 70 70 70 70 70 70 70 70 ...
##  $ v2x_polyarchy                  : num  0.028 0.028 0.028 0.028 0.028 0.028 0.028 0.028 0.028 0.028 ...
##  $ v2x_polyarchy_codelow          : num  0.016 0.016 0.016 0.016 0.016 0.016 0.016 0.016 0.016 0.016 ...
##  $ v2x_polyarchy_codehigh         : num  0.037 0.037 0.037 0.037 0.037 0.037 0.037 0.037 0.037 0.037 ...
##  $ v2x_polyarchy_sd               : num  0.011 0.011 0.011 0.011 0.011 0.011 0.011 0.011 0.011 0.011 ...
##  $ v2x_libdem                     : num  0.044 0.044 0.044 0.044 0.044 0.044 0.044 0.044 0.044 0.044 ...
##  $ v2x_libdem_codelow             : num  0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.026 ...
##  $ v2x_libdem_codehigh            : num  0.055 0.055 0.055 0.055 0.055 0.055 0.055 0.055 0.055 0.055 ...
##  $ v2x_libdem_sd                  : num  0.014 0.014 0.014 0.014 0.014 0.014 0.014 0.014 0.014 0.014 ...
##  $ v2x_partipdem                  : num  0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 ...
##  $ v2x_partipdem_codelow          : num  0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 ...
##  $ v2x_partipdem_codehigh         : num  0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...
##  $ v2x_partipdem_sd               : num  0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 ...
##  $ v2x_delibdem                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ v2x_delibdem_codelow           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ v2x_delibdem_codehigh          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ v2x_delibdem_sd                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ v2x_egaldem                    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ v2x_egaldem_codelow            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ v2x_egaldem_codehigh           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ v2x_egaldem_sd                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ v2x_api                        : num  0.056 0.056 0.056 0.056 0.056 0.056 0.056 0.056 0.056 0.056 ...
##  $ v2x_api_codelow                : num  0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 ...
##  $ v2x_api_codehigh               : num  0.074 0.074 0.074 0.074 0.074 0.074 0.074 0.074 0.074 0.074 ...
##  $ v2x_api_sd                     : num  0.021 0.021 0.021 0.021 0.021 0.021 0.021 0.021 0.021 0.021 ...
##  $ v2x_mpi                        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2x_mpi_codelow                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2x_mpi_codehigh               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2x_mpi_sd                     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2x_freexp_altinf              : num  0.175 0.175 0.175 0.175 0.175 0.175 0.175 0.175 0.175 0.175 ...
##  $ v2x_freexp_altinf_codelow      : num  0.085 0.085 0.085 0.085 0.085 0.085 0.085 0.085 0.085 0.085 ...
##  $ v2x_freexp_altinf_codehigh     : num  0.234 0.234 0.234 0.234 0.234 0.234 0.234 0.234 0.234 0.234 ...
##  $ v2x_freexp_altinf_sd           : num  0.078 0.078 0.078 0.078 0.078 0.078 0.078 0.078 0.078 0.078 ...
##  $ v2x_frassoc_thick              : num  0.042 0.042 0.042 0.042 0.042 0.042 0.042 0.042 0.042 0.042 ...
##  $ v2x_frassoc_thick_codelow      : num  0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 0.009 ...
##  $ v2x_frassoc_thick_codehigh     : num  0.059 0.059 0.059 0.059 0.059 0.059 0.059 0.059 0.059 0.059 ...
##  $ v2x_frassoc_thick_sd           : num  0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 ...
##  $ v2x_suffr                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2xel_frefair                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2xel_frefair_codelow          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2xel_frefair_codehigh         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2xel_frefair_sd               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2x_elecoff                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2x_liberal                    : num  0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 ...
##  $ v2x_liberal_codelow            : num  0.105 0.105 0.105 0.105 0.105 0.105 0.105 0.105 0.105 0.105 ...
##  $ v2x_liberal_codehigh           : num  0.218 0.218 0.218 0.218 0.218 0.218 0.218 0.218 0.218 0.218 ...
##  $ v2x_liberal_sd                 : num  0.056 0.056 0.056 0.056 0.056 0.056 0.056 0.056 0.056 0.056 ...
##  $ v2xcl_rol                      : num  0.204 0.204 0.204 0.204 0.204 0.204 0.204 0.204 0.204 0.204 ...
##  $ v2xcl_rol_codelow              : num  0.118 0.118 0.118 0.118 0.118 0.118 0.118 0.118 0.118 0.118 ...
##  $ v2xcl_rol_codehigh             : num  0.275 0.275 0.275 0.275 0.275 0.275 0.275 0.275 0.275 0.275 ...
##  $ v2xcl_rol_sd                   : num  0.084 0.084 0.084 0.084 0.084 0.084 0.084 0.084 0.084 0.084 ...
##  $ v2x_jucon                      : num  0.293 0.293 0.293 0.293 0.293 0.293 0.293 0.293 0.293 0.293 ...
##  $ v2x_jucon_codelow              : num  0.132 0.132 0.132 0.132 0.132 0.132 0.132 0.132 0.132 0.132 ...
##  $ v2x_jucon_codehigh             : num  0.42 0.42 0.42 0.42 0.42 0.42 0.42 0.42 0.42 0.42 ...
##  $ v2x_jucon_sd                   : num  0.144 0.144 0.144 0.144 0.144 0.144 0.144 0.144 0.144 0.144 ...
##  $ v2xlg_legcon                   : num  0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ...
##  $ v2xlg_legcon_codelow           : num  0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 ...
##  $ v2xlg_legcon_codehigh          : num  0.158 0.158 0.158 0.158 0.158 0.158 0.158 0.158 0.158 0.158 ...
##  $ v2xlg_legcon_sd                : num  0.114 0.114 0.114 0.114 0.114 0.114 0.114 0.114 0.114 0.114 ...
##  $ v2x_partip                     : num  0.021 0.021 0.021 0.021 0.021 0.021 0.021 0.021 0.021 0.021 ...
##  $ v2x_partip_codelow             : num  0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 ...
##  $ v2x_partip_codehigh            : num  0.035 0.035 0.035 0.035 0.035 0.035 0.035 0.035 0.035 0.035 ...
##  $ v2x_partip_sd                  : num  0.027 0.027 0.027 0.027 0.027 0.027 0.027 0.027 0.027 0.027 ...
##  $ v2x_cspart                     : num  0.031 0.031 0.031 0.031 0.031 0.031 0.031 0.031 0.031 0.031 ...
##  $ v2x_cspart_codelow             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2x_cspart_codehigh            : num  0.057 0.057 0.057 0.057 0.057 0.057 0.057 0.057 0.057 0.057 ...
##  $ v2x_cspart_sd                  : num  0.065 0.065 0.065 0.065 0.065 0.065 0.065 0.065 0.065 0.065 ...
##  $ v2xdd_dd                       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ v2xel_locelec                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2xel_locelec_codelow          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2xel_locelec_codehigh         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2xel_locelec_sd               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2xel_regelec                  : num  0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 ...
##  $ v2xel_regelec_codelow          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ v2xel_regelec_codehigh         : num  0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.03 ...
##  $ v2xel_regelec_sd               : num  0.047 0.047 0.047 0.047 0.047 0.047 0.047 0.047 0.047 0.047 ...
##  $ v2xdl_delib                    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ v2xdl_delib_codelow            : num  NA NA NA NA NA NA NA NA NA NA ...
##   [list output truncated]

When dealing with such a large dataset as V-Dem, which contains 27,913 rows of observations and 4,607 columns, it is important to understand what exactly the variables in the data represent. This is a crucial step that needs to be taken before moving onto the further analysis – in order to be able to eliminate the redundant variables or understand the cause behind so many NA values. Therefore, this section provides an overview of the process behind the quantitative encoding of the quality of various attributes of democracy.

Before any data is collected, V-Dem breaks down the abstract concept of democracy, based on Robert Dahl’s concept of Polyarchy (the acquisition of institutions that leads to the participation of a plurality of actors), into the, previously described, five key principles (high-level democracy indices). These are then deconstructed into mid-level components, a set of aggregated indices based on yet another, even deeper, set of low-level indices. The analysis performed in this project will focus on low-level indices, seeing as, both the mid-level and high-level indices are computed with the use of those. They represent the layer, in which additional, perhaps previously uncaptured dimensions of democracy could be found. The process used for the construction of the low-level indices can be described in a following way: first, a survey is conducted among the aforementioned country experts, who rate the level of openness or strength of an institution on an ordinal scale. Subsequently, Bayesian Item Response Theory (IRT) model is used for correcting the cross-national subjectivity (Differential Item Functioning) of experts. This model treats the experts’ answers as data points and calculates each expert’s reliability and strictness threshold (e.g. if an expert is consistently harsher than others, the model adjusts their scores upward and if experts disagree significantly the model increases the uncertainty margin for a given data point). This process is partially responsible for the multi-dimensionality of data, the variables whose codes end with _sd, _codelow, _codehigh represent the subproducts of the applied model. These variables, as well as those that end with _osp (original scale posterior), will, therefore, be excluded from further analysis, as they hardly provide merit.

As described above, V-Dem recognizes several levels of aggregation, the magnum opus of which is the V-Dem Electoral Democracy Index (EDI, v2x_polyarchy) – a single index illustarting the level of democracy in a country. This index consist of five sub-components, which are, themselves, a result of yet another set of lower-level sub-components. These were selected so as to capture Dahl’s institutions of polyarchy:

freedom of association,
suffrage,
clean elections,
elected executive,
freedom of expression,
alternative sources of information.

Further analysis will be focused on the lower-level indices used for EDI’s calculation, excluding the 20 indices pertaining to elected officials. These will be excluded following the methodology employed by Wilson et al. (2023), who justify this decision by the binary nature of the variables.

edi_variables <- c("v2clacfree", "v2cldiscm", "v2cldiscw", "v2mebias", "v2mecenefm", "v2mecrit", "v2meharjrn", "v2merange", "v2meslfcen", # freedom of expression
"v2cseeorgs", "v2csreprss", "v2elmulpar", "v2psbars", "v2psoppaut", "v2psparban", 
# freedom of association
"v2elembaut", "v2elembcap", "v2elfrfair", "v2elintim", "v2elirreg", "v2elpeace", "v2elrgstry", "v2elvotbuy", # clean elections
"v2elsuffrage") # suffrage

# selecting edi variables
vdem_edi <- vdem %>%
  dplyr::select(country_name, year, dplyr::any_of(edi_variables))

dim(vdem_edi)

## [1] 27913    26

range(vdem_edi$year)

## [1] 1789 2024

The data coverage spans from 1789 to 2024 and contains information on 25 low-level indices for 27,913 observations. This project however won’t be focused on the trajectory of democracy over time, but rather on its hidden dimensions. Therefore further analysis will be devoted to a singular year – 2024 – the last year recorded in this version of V-Dem.

vdem_edi_2024 <- vdem_edi %>%
  filter(year == 2024)

anyNA(vdem_edi_2024)

## [1] TRUE

The problem of missing values for some variables will be tackled employing the methodology proposed by Wilson et al. (2023). Variables regarding the elections (e.g. v2elfrfair) are only coded for the years in which elections took place. The authors addressed this issue by using a forward-fill method – carrying the value from the last recorded election forward for a maximum duration of five years. Their approach rightly assumes that the characteristics of the electoral process remain constant in the period between the elections or the five-year limit is reached. Following this imputation, should any missing data remain, it will be removed from the dataset listwise.

vdem_edi_2024 <- vdem_edi %>%
  pivot_longer(cols = dplyr::all_of(edi_variables), names_to = "variable", 
               values_to = "value") %>%
  group_by(country_name, variable) %>%
  arrange(year) %>%
  mutate(source_year = ifelse(!is.na(value), year, NA)) %>%
  fill(value, source_year, .direction = "down") %>%
  mutate(age = year - source_year, 
         value = ifelse(age>5, NA, value)) %>%
  ungroup() %>%
  filter(year==2024) %>%
  dplyr::select(-source_year, -age, -year) %>%
  pivot_wider(names_from = "variable", values_from = "value") %>%
  na.omit()


# country name as row name 
vdem_edi_2024 <- as.data.frame(vdem_edi_2024)
row.names(vdem_edi_2024) <- vdem_edi_2024$country_name
vdem_edi_2024 <- vdem_edi_2024[,-1]

dim(vdem_edi_2024)

## [1] 169  24

summary(vdem_edi_2024)

##    v2clacfree        v2cldiscm         v2cldiscw         v2mebias      
##  Min.   :-3.3510   Min.   :-3.5820   Min.   :-3.631   Min.   :-3.0780  
##  1st Qu.:-0.0650   1st Qu.: 0.0840   1st Qu.:-0.127   1st Qu.: 0.4090  
##  Median : 1.0770   Median : 1.2420   Median : 1.126   Median : 0.9730  
##  Mean   : 0.7712   Mean   : 0.9106   Mean   : 0.831   Mean   : 0.7686  
##  3rd Qu.: 1.7900   3rd Qu.: 1.9840   3rd Qu.: 1.811   3rd Qu.: 1.5410  
##  Max.   : 3.2430   Max.   : 3.0860   Max.   : 2.993   Max.   : 3.0170  
##    v2mecenefm         v2mecrit         v2meharjrn        v2merange      
##  Min.   :-2.8230   Min.   :-2.9780   Min.   :-2.9980   Min.   :-2.7750  
##  1st Qu.:-0.5810   1st Qu.:-0.0210   1st Qu.:-0.3020   1st Qu.: 0.3270  
##  Median : 0.7210   Median : 0.8910   Median : 0.6050   Median : 1.0290  
##  Mean   : 0.4963   Mean   : 0.7494   Mean   : 0.5251   Mean   : 0.7655  
##  3rd Qu.: 1.5140   3rd Qu.: 1.6890   3rd Qu.: 1.4290   3rd Qu.: 1.7290  
##  Max.   : 3.5070   Max.   : 3.4060   Max.   : 3.7190   Max.   : 2.6290  
##    v2meslfcen        v2cseeorgs        v2csreprss        v2elmulpar    
##  Min.   :-2.9700   Min.   :-3.2890   Min.   :-3.1360   Min.   :-3.601  
##  1st Qu.:-0.2350   1st Qu.: 0.1260   1st Qu.:-0.3670   1st Qu.:-0.110  
##  Median : 0.5860   Median : 1.3380   Median : 1.0660   Median : 1.203  
##  Mean   : 0.4063   Mean   : 0.9004   Mean   : 0.7403   Mean   : 0.544  
##  3rd Qu.: 1.2280   3rd Qu.: 1.9830   3rd Qu.: 1.9350   3rd Qu.: 1.403  
##  Max.   : 3.1220   Max.   : 3.2080   Max.   : 3.0850   Max.   : 1.871  
##     v2psbars        v2psoppaut       v2psparban       v2elembaut    
##  Min.   :-3.246   Min.   :-3.388   Min.   :-3.641   Min.   :-2.950  
##  1st Qu.: 0.390   1st Qu.: 0.341   1st Qu.: 0.527   1st Qu.:-0.574  
##  Median : 1.430   Median : 1.314   Median : 1.326   Median : 0.958  
##  Mean   : 1.057   Mean   : 1.064   Mean   : 1.036   Mean   : 0.696  
##  3rd Qu.: 2.139   3rd Qu.: 2.102   3rd Qu.: 2.004   3rd Qu.: 1.919  
##  Max.   : 2.937   Max.   : 3.305   Max.   : 2.657   Max.   : 3.785  
##    v2elembcap        v2elfrfair        v2elintim        v2elirreg      
##  Min.   :-3.1410   Min.   :-3.2680   Min.   :-3.317   Min.   :-2.8820  
##  1st Qu.: 0.0290   1st Qu.:-1.0510   1st Qu.:-0.961   1st Qu.:-0.9350  
##  Median : 0.8800   Median : 0.2810   Median : 0.202   Median : 0.1140  
##  Mean   : 0.8574   Mean   : 0.2015   Mean   : 0.124   Mean   : 0.1583  
##  3rd Qu.: 1.8450   3rd Qu.: 1.7510   3rd Qu.: 1.512   3rd Qu.: 1.3440  
##  Max.   : 3.2150   Max.   : 2.2800   Max.   : 2.136   Max.   : 2.5270  
##    v2elpeace        v2elrgstry        v2elvotbuy        v2elsuffrage   
##  Min.   :-2.424   Min.   :-1.9520   Min.   :-2.64200   Min.   : 36.00  
##  1st Qu.:-0.717   1st Qu.:-0.0890   1st Qu.:-0.99400   1st Qu.:100.00  
##  Median : 0.345   Median : 1.0350   Median :-0.25500   Median :100.00  
##  Mean   : 0.214   Mean   : 0.7897   Mean   : 0.03216   Mean   : 99.62  
##  3rd Qu.: 1.310   3rd Qu.: 1.8150   3rd Qu.: 1.15000   3rd Qu.:100.00  
##  Max.   : 2.280   Max.   : 2.5810   Max.   : 2.70700   Max.   :100.00

Thus filtered dataset, constituting the basis for further analysis, contains information on 24 crucial, low-level indices for 169 countries. The table below provides a description and the question the experts get asked to evaluate each dimension of democracy represented by the variables employed in this project. The descriptions and and questions are retrieved using a function of vdemdata package: var_info().

Description of Used Variables
Code	Name	Question
v2clacfree	Freedom of academic and cultural expression	Is there academic freedom and freedom of cultural expression related to political issues?
v2cldiscm	Freedom of discussion for men	Are men able to openly discuss political issues in private homes and in public spaces?
v2cldiscw	Freedom of discussion for women	Are women able to openly discuss political issues in private homes and in public spaces?
v2mebias	Media bias	Is there media bias against opposition parties or candidates?
v2mecenefm	Government censorship effort — Media	Does the government directly or indirectly attempt to censor the print or broadcast media?
v2mecrit	Print/broadcast media critical	Of the major print and broadcast outlets, how many routinely criticize the government?
v2meharjrn	Harassment of journalists	Are individual journalists harassed — i.e., threatened with libel, arrested, imprisoned, beaten, or killed — by governmental or powerful nongovernmental actors while engaged in legitimate journalistic activities?
v2merange	Print/broadcast media perspectives	Do the major print and broadcast media represent a wide range of political perspectives?
v2meslfcen	Media self-censorship	Is there self-censorship among journalists when reporting on issues that the government considers politically sensitive?
v2cseeorgs	CSO entry and exit	To what extent does the government achieve control over entry and exit by civil society organizations (CSOs) into public life?
v2csreprss	CSO repression	Does the government attempt to repress civil society organizations (CSOs)?
v2elmulpar	Elections multiparty	Was this national election multiparty?
v2psbars	Barriers to parties	How restrictive are the barriers to forming a party?
v2psoppaut	Opposition parties autonomy	Are opposition parties independent and autonomous of the ruling regime?
v2psparban	Party ban	Are any parties banned?
v2elembaut	EMB autonomy	Does the Election Management Body (EMB) have autonomy from government to apply election laws and administrative rules impartially in national elections?
v2elembcap	EMB capacity	Does the Election Management Body (EMB) have sufficient staff and resources to administer a well-run national election?
v2elfrfair	Election free and fair	Taking all aspects of the pre-election period, election day, and the post-election process into account, would you consider this national election to be free and fair?
v2elintim	Election government intimidation	In this national election, were opposition candidates/parties/campaign workers subjected to repression, intimidation, violence, or harassment by the government, the ruling party, or their agents?
v2elirreg	Election other voting irregularities	In this national election, was there evidence of other intentional irregularities by incumbent and/or opposition parties, and/or vote fraud?
v2elpeace	Election other electoral violence	In this national election, was the campaign period, election day, and post-election process free from other types not by the government, the ruling party, or their agents) of violence related to the conduct of the election and the campaigns (but not conducted by the government and its agents)?
v2elrgstry	Election voter registry	In this national election, was there a reasonably accurate voter registry in place and was it used?
v2elvotbuy	Election vote buying	In this national election, was there evidence of vote and/or turnout buying?
v2elsuffrage	Percentage of population with suffrage	What percentage (%) of adult citizens (as defined by statute) has the legal right to vote in national elections?

Dimension reduction methods require the data to be numeric and normalized. I will, therefore, convert the data frame into a numeric matrix suitable for algebraic operations and normalize it, so that the variance in the dataset is not dominated entirely by variables that, simply, possess a different scale

vdem_M <- as.matrix(vdem_edi_2024)

vdem_z <- data.Normalization(vdem_M, type = "n1", normalization = "column")

Finally, the correlation matrix is computed using Pearson’s method. The correlation matrix plot makes it apparent that most of the variables are, indeed, highly correlated. This suggests a high level of information redundancy, deeming the data suitable for PCA.

M_cor <- cor(vdem_M, method = "pearson")
corrplot(M_cor, tl.cex = 0.6)

Overall Measures of Intercorrelation

To base further analysis on statistical measures, rather than simple suppositions, Bartlett’s test and Measure of Sampling Adequacy (MSA, K-M-O) will be performed on the previously computed correlation matrix (Hair et al., 2019).

Bartlett’s Test

It is a statistical test for the presence of correlations among the variables. It indicates whether the correlation matrix has significant correlations across at least some of the variables. The desired result would be to reject the test’s following null hypothesis:

\(H_0\) : The correlation matrix is an identity matrix (i.e., variables are orthogonal and completely unrelated).

bartlett_test <- cortest.bartlett(M_cor, n = nrow(vdem_z))
print(bartlett_test$p.value)

## [1] 0

The p-value for Bartlett test is zero, therefore the null hypothesis is rejected and further analysis can proceed.

Measure of Sampling Adequacy

MSA or the Kaiser-Meyer-Olkin test is an index ranging from 0 to 1, when data is perfectly predicted by other variables. Hair et al. (2019) provide the following guidelines of result’s interpretation: 0.8 or above – meritorous, 0.7 or above – middling, 0.6 or above – mediocre, 0.5 or above – miserable and below 0.5 – unacceptable. MSA increases alongside the size of the sample, average correlations or the number of variables.

kmo <- KMO(M_cor)
print(kmo)

## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = M_cor)
## Overall MSA =  0.96
## MSA for each item = 
##   v2clacfree    v2cldiscm    v2cldiscw     v2mebias   v2mecenefm     v2mecrit 
##         0.98         0.95         0.94         0.97         0.98         0.98 
##   v2meharjrn    v2merange   v2meslfcen   v2cseeorgs   v2csreprss   v2elmulpar 
##         0.97         0.98         0.98         0.97         0.97         0.96 
##     v2psbars   v2psoppaut   v2psparban   v2elembaut   v2elembcap   v2elfrfair 
##         0.96         0.98         0.97         0.98         0.96         0.94 
##    v2elintim    v2elirreg    v2elpeace   v2elrgstry   v2elvotbuy v2elsuffrage 
##         0.95         0.92         0.95         0.96         0.90         0.55

The result of overall MSA equal to 0.96, exceeding the threshold of 0.7, indicates that the data is almost perfectly predicted. Similarly, the results for every variable, save for suffrage, remain not lower than 0.9, meaning that the variables share sufficient variance.

Principal Component Analysis (PCA)

To reduce the dimensionality of the V-Dem dataset and identify latent regime patterns, this project employs Principal Component Analysis (PCA). PCA is a linear dimensionality reduction technique. It transforms a large set of correlated variables into smaller sets of variables – Principal Components (PCs), which are orthogonal to one another. It does so with retaining as much of the original information as possible by finding the axes that account for the largest variance in the dataset.

pca <- prcomp(vdem_M, center = TRUE, scale = TRUE)

Determining the Number of Components

Eigenvalue Decomposition

The step succeeding covariance analysis in PCA algorithms is the eigenvalue decomposition of the covariance matrix. It is the process of finding a set of scalars (eigenvalues) and vectors (eigenvectors) that complete the following equation:

\[{\Sigma}*v = {\lambda}*v\]

Where \({\Sigma}\) is the covariance matrix, \(v\) is a non-zero eigenvector and \({\lambda}\) is the eigenvalue. Eigenvector indicate the directions of maximum variance in the data, while eigenvalues quantify the variance captured by each principal component.

The eigenvalues for this project’s covariance matrix are computed below.

eigen(M_cor)$values

##  [1] 16.19574516  2.61040445  1.01651417  0.62463023  0.46549319  0.41371661
##  [7]  0.36441841  0.29674948  0.25355807  0.22859219  0.19688679  0.16425697
## [13]  0.15947244  0.14179693  0.13742931  0.12371808  0.10607827  0.10103766
## [19]  0.09316266  0.08904539  0.07592003  0.06517182  0.04798190  0.02821979

Eigenvalues are also used to determine the the number of components to retain. According to Kaiser’s rule: only components with eigenvalues greater than 1 should be retained. The number of components that exceed Kaiser’s threshold of 1 in this dataset is equal to 3 (granted, the third one barely exceeds it). It is said that Kaiser’s criterion can overestimate the number of components to be retained in datasets characterized by a large number of variables, which are highly correlated. However, employing Cattell’s (1966) the elbow rule leads to a similar conclusion. The first substantial drop in eigenvalues can be observed when there are two components and the second one at \(n = 3\), which is suceeded by a gradual, near-flat drop continuing all the way to the end.

fviz_eig(pca, choice = "eigenvalue", ylim=c(0,20), ncp = 24, addlabels = TRUE, main = "Eigenvalues")

The first three components collectively explain 82.6% of the total variance. That level of variance preservation ensures that noise is filtered out, while the vast majority of characteristics are retained.

summary(pca)

## Importance of components:
##                           PC1    PC2     PC3     PC4    PC5     PC6     PC7
## Standard deviation     4.0244 1.6157 1.00822 0.79034 0.6823 0.64321 0.60367
## Proportion of Variance 0.6748 0.1088 0.04235 0.02603 0.0194 0.01724 0.01518
## Cumulative Proportion  0.6748 0.7836 0.82594 0.85197 0.8714 0.88860 0.90379
##                            PC8     PC9    PC10   PC11    PC12    PC13    PC14
## Standard deviation     0.54475 0.50355 0.47811 0.4437 0.40529 0.39934 0.37656
## Proportion of Variance 0.01236 0.01056 0.00952 0.0082 0.00684 0.00664 0.00591
## Cumulative Proportion  0.91615 0.92672 0.93624 0.9445 0.95129 0.95793 0.96384
##                           PC15    PC16    PC17    PC18    PC19    PC20    PC21
## Standard deviation     0.37071 0.35174 0.32570 0.31786 0.30523 0.29840 0.27554
## Proportion of Variance 0.00573 0.00515 0.00442 0.00421 0.00388 0.00371 0.00316
## Cumulative Proportion  0.96957 0.97472 0.97914 0.98335 0.98724 0.99095 0.99411
##                           PC22   PC23    PC24
## Standard deviation     0.25529 0.2190 0.16799
## Proportion of Variance 0.00272 0.0020 0.00118
## Cumulative Proportion  0.99682 0.9988 1.00000

The scree plot, illustrating the proportion of variance explained per component, allows for a visual interpretation of results. A distinct elbow can be observed at the second and third component, which is consistent with the number of components this analysis retains.

fviz_eig(pca, addlabels = TRUE, ylim = c(0,100), main = "Variance Explained")

Analysis of Components

The plot displayed below illustrates a projection of the 24 features in the dataset onto a two-dimensional space formed by the first two Principal Components, which, together, account for 78% of variance in the data. It illustrates how the variables relate to the PC1 and PC2. The x axis explained the majority of differences between the countries. The arrows are concentrated on the right side of the plot, with none of the arrows representing variables pointing to the left. This confirms, both, that the variables are strongly correlated and that democracy is largely a single, coherent concept. The second dimension, plotted on the y-axis, it explains 10.9% of variance - representing the hidden nuances within regimes. The vertical dimension is comprised of variables measuring administrative quality and orderliness of the election process (e.g. election peace, no irregularities, no vote buying) – the arrows pointing upwards, and , on the other side: variables measuring the freedom of speech and association (media bias, media criticism, party bans, academic freedom). The PC2 reveals a distinction between administrative election quality and civil liberties. There is one distinct variable whose contribution is exceptionally low – an indicator measuring the level of suffrage – it’s probably due to current high level of suffrage across countries. Overall, the measures of democracy seem to be, indeed, mostly one-dimensional and homogeneous (67.5%), however upon a closer inspection they reveal a more subtle, nuanced dimension of democracy.

fviz_pca_var(pca, col.var = "contrib",
             gradient.cols = viridis::plasma(3),
             repel = TRUE,
             title = "Variables Factor Map")

The quality plot is inspected in order to validate the analysis. This plot illustrates the 169 countries plotted reduced to a two-dimensional space. The color gradient represents the values of the squared cosine (cos2), which is a measure of the quality of representation of each observation (here: country) on a 2D map. The map, plotted below, reveals a U-shape of reliability – the points on the edges exhibit high squared cosine values, which is illustrated by their bright yellow and orange. On the right side of the plot, if one really squints one’s eyes, countries such as: Belgium, Norway, Germany, United Kingdom and Australia (among others) can be seen – they fit the model perfectly, representing the standard democracies. On the other end of that spectrum are countries such as North Korea, Belarus and Nicaragua – constituting perfect representations of the other end of a spectrum: autocracies. There is one point that strikes the observer with a force uncomparable to that of the dense, nearly incomprehensible mass of points condensed near the x-axis – that is, the United Arab Emirates on the top left side of the plot. It exemplifies a country with low democracy (Dim 1), but high state capacity and order (Dim 2). The dense mass of points concentrated near the coordinate origin represents countries that don’t necessarily fit the primary dimensions of democracy-autocarcy, or order-chaos neatly.Their low cosine values suggest that the complex characteristics possessed by those transitional or unstable regimes (e.g. Somalia, Mali) are not fully represented by the two-dimensional model.

fviz_pca_ind(pca,
             geom = c("point", "text"),
             repel = TRUE,
             labelsize = 2,
             pointsize = 3,
             alpha.ind = 0.6,
             col.ind = "cos2", 
             gradient.cols = viridis::plasma(3),
             title = "Quality of Representation")

To further understand the structure of principal components, the contribution of each feature to the first three PCs is analyzed. For the sake of adequate qualification of dimensions represented by these components, this section will employ the definition of a variable (vdem_info()) accompanying with its super-category, rather than its code. The red dashed line represents the average contribution if all variables contributed equally – significant variables constituting each component are considered to be those that exceed the level of that line. As expected, the first dimension is comprised of the largest number of variables, these include:

EMB autonomy (clean elections),
Freedom of discussion for men and, subsequently, women (freedom of expression),
CSO repression (freedom of association),
Harassment of journalists (freedom of expression),
Opposition parties autonomy (freedom of association),
Government censorship effort - Media (freedom of expression),
Media self-censorship (freedom of expression),
CSO entry and exit (freedom of association),
Freedom of academic and cultural expression (freedom of expression),
Barriers to parties (freedom of association),
Print/broadcast criticism of government (freedom of expression),
Election free and fair (clean elections),
Media perspective(freedom of expression),
Media bias (freedom of expression),
Multiparty elections (freedom of association),
Election government intimidation (clean elections).

The first dimension is defined by a broad mix of civic freedoms and the autonomy of political actors, mostly reflected in freedom of association and expression. Each super-category of low-level indices, save from the one-element suffrage subgroup, is represented in this dimension. This wide variety of the first component does not make for an easy interpretation, but then again, should democracy pose such a simple concept, this analysis would not be carried out in the first place. PC1 dimension can, thus, be defined as the civic liberty axis, representing the freedom and autonomy of individual actors within the political body. The first component is, thus, associated more closely with the quality of civil rights and liberties.

my_theme <- theme_minimal() + 
                 theme(axis.text.y = element_text(size = 7),
                       plot.title = element_text(size = 12, face = "bold"))

fviz_contrib(pca, choice = "var", axes = 1, fill = "blueviolet", color = "blueviolet", title = "Contributions to Dim1") +
  coord_flip() +
  my_theme

The second dimension, with its smaller number of significant variables, is easier to interpret. The variables contributing significantly to PC2 include:

Election peace (clean elections),
Vote buying (clean elections),
Election other irregularities (clean elections),
EMB capacity (clean elections),
Party ban (freedom of association),
Suffrage.

This dimension is defined almost exclusively by the technical integrity and peaceful conduct of elections, it captures the electoral order of a given country. The second component is, thus, connected to the state’s ability to effectively carry out elections.

fviz_contrib(pca, choice = "var", axes = 2, fill = "orchid", color = "orchid", title = "Contributions to Dim2") +
  coord_flip() +
  my_theme

The third dimension is unequivocally dominated by the measure of suffrage with a barely significant contribution of election voter registry. Suffrage is a measure capturing the share of population allowed to vote, similarly voter registry variable captures proper representation of voters.

fviz_contrib(pca, choice = "var", axes = 3, fill = "lightpink", color = "lightpink", title = "Contributions to Dim3") +
  coord_flip() +
  my_theme

PCA Rotation

To deepen the analysis, loadings will be analyzed. Loadings represent the correlation coefficients between the original variables and the newly derived Principal Components. Essentially, loadings illustrate how much a given variable contributes to a given component. However, seeing as standard PCA algorithm can be greedy and force the first components to explain a too-huge-of-a-portion of variance, for the purpose of loading analysis, a varimax perpendicular rotation will be applied to PCA. Rotation uses an optimization algorithm (axes rotation) to better align with the clusters of variables and maximize interpretability. The varimax algorithm looks for a rotation of matrix that maximizes the variance of the squared loadings, forcing hard separation. The new components (RC1, etc.) must remain uncorrelated.

The Varimax rotation successfully divided the dataset into three components, the explained variance of which exhibits a slightly more equal distribution. RC1 is comprised of a similar set of variables as PC1 – a highly-differentiated cluster representing civil liberties and political autonomy– responsible for 54% of variance. RC2 explains 23.7% of variance and is mostly comprised of variables regarding the coduct of elections. However, as a result of rotation, it now consists also of tow variabels regarding the governmenal and sel- censorship of media. The third component (RC3) is comprised solely of suffrage and explains 4.8% of variance In social sciences hard separation, forced by the varimax algorithms can be inefficient as the data is often correlated and hard to represent unambiguously, by nature.

rot <- principal(vdem_z, nfactors = 3, rotate = "varimax")
print(loadings(rot), digits = 3, cutoff = 0.4, sort = TRUE)

## 
## Loadings:
##              RC1    RC2    RC3   
## v2clacfree    0.859              
## v2cldiscm     0.866              
## v2cldiscw     0.856              
## v2mebias      0.907              
## v2mecenefm    0.794  0.432       
## v2mecrit      0.843              
## v2meharjrn    0.799  0.432       
## v2merange     0.882              
## v2meslfcen    0.855              
## v2cseeorgs    0.898              
## v2csreprss    0.843              
## v2elmulpar    0.856              
## v2psbars      0.865              
## v2psoppaut    0.859              
## v2psparban    0.868              
## v2elembaut    0.792  0.503       
## v2elfrfair    0.654  0.633       
## v2elembcap           0.785       
## v2elintim     0.605  0.652       
## v2elirreg            0.868       
## v2elpeace            0.856       
## v2elrgstry    0.533  0.658       
## v2elvotbuy           0.859       
## v2elsuffrage                0.962
## 
##                  RC1   RC2   RC3
## SS loadings    12.97 5.695 1.157
## Proportion Var  0.54 0.237 0.048
## Cumulative Var  0.54 0.778 0.826

The plot below displays how variables score on the dimension of uniqueness and complexity. Uniqueness is the proportion of variance that is not shared with other variables, which, ideally, should be low as then reducing the space into a smaller number of dimensions is easier. Complexity, however, relates to how many variables constitute a single factor. Similarly, the desired output of complexity should be low as it provides easier interpretation. Uniqueness is plotted on the y-axis and complexity on x-axis – most of the variables fall into the low-complexity, low-uniquness quadrant, however the data is also distributed across high-uniqueness, low-complexity and low-uniqueness, high-complexity quadrants. However, it is important to note that the highest value of uniqueness is 0.3, which means that even for the worst variable, the factors still explain 70% of variance. Variables such as suffrage and civil liberties show ideal simple structure (low complexity and uniqueness) – anchoring their respective dimensions. Conversely, administartive variables such as EMB autonomy exhibit higher complexity, which reflect their role in both ensuring electoral integrity and representing democratic institutional capacity.

plot <- data.frame(Name = names(rot$uniquenesses),
  Complexity = rot$complexity,
  Uniqueness = rot$uniquenesses)

ggplot(plot, aes(x = Complexity, y = Uniqueness, label = Name)) +
  geom_point(color = "black") +
  geom_vline(xintercept = mean(rot$complexity), col = "red", linetype = "dashed") +
  geom_hline(yintercept = mean(rot$uniquenesses), col = "red", linetype = "dashed")+
  geom_text_repel(size = 3.5, box.padding = 0.5) +
  labs(title = "Complexity vs Uniqueness",
       x = "Complexity", 
       y = "Uniqueness") +
  theme_minimal()

Results

I have to confess to a slight distortion of reality of which I have been guilty throughout the preceding analysis. Namely, while claiming that the first component manages to capture the notion of democracy with a near-perfect accuracy, I have not presented the evidence behind such a bold of a claim. I will proceed to account for my mishaps, or blasphemies, what the reader will, in the following section.

If the first principal component is to account for the general notion of democracy, it should then be perfectly correlated with the established V-Dem Polyarchy Index (also known and referenced as the Electoral Democracy Index). Therefore, a near-perfect linear relationship between PC1 and EDI is expected. To test this hypothesis the unprocessed version of dataset is used to retrieve the EDI (v2x_polyarchy) and the type of regime of a given country (v2x_regime), for future purposes.

The relationship between the first component and the Polyarchy (EDI) index is plotted below. The correlation coefficient, as well as the p-value are displayed above the regression line. Visual analysis of the plot indicates that the relationship between PC1 and the EDI is, indeed, near-perfectly linear. The correlation coefficient exceeds 0.95, legitimizing my previous claims: the PC1 effectively captures the latent concept of democraticness.

# country coordinates from the UNROTATED PCA
pca_scores <- data.frame(get_pca_ind(pca)$coord)
pca_scores$country_name <- rownames(pca_scores)

# original V-Dem indices for validation (polyarchy and regime type)
validation_data <- vdem %>%
  dplyr::filter(year == 2024) %>%
  dplyr::select(country_name, v2x_polyarchy, v2x_regime)

# merge
analysis_df <- left_join(pca_scores, validation_data, by = "country_name")

# pc1 and polyarchy
ggplot(analysis_df, aes(x = Dim.1, y = v2x_polyarchy)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", color = "red") +
  stat_cor(method = "pearson", label.x = -5, label.y = 0.8) + 
  labs(title = "Relationship between PC1 and EDI",
       x = "PC1", y = "Electoral Democracy Index") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

This finding, while interesting itself, provokes yet another probing question on the nature of the second component – does it represent merely more democracy or is it, rather, a dimension of democracy operating on a level distinct from the Polyarchy Index? Following the methodology employed to the analysis of PC1 – a locally estimated scatterplot smoothing (LOESS) of the EDI against the second principal component is illustarted below. It reveals a strikingly different, non-linear relationship, further confirmed by the low correlation coefficient (0.16). The LOESS curve exposes a structural asymmetry: liberal democracies form a rather condensed cluster on top of the plot, while the authoritarian regimes are highly scattered – ranging from high-capacity electoral autocracies to disordered, low-capacity states.

# pc2 and edi
ggplot(analysis_df, aes(x = Dim.2, y = v2x_polyarchy)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "loess", color = "blue") + 
    stat_cor(method = "pearson", label.x = -5, label.y = 0.8) + 
  labs(title = "Relatonship between PC2 and EDI",
       x = "PC2", y = "Electoral Democracy Index") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

The previously retrieved v2x_regime numeric index is reassigned its original text labels, the description of which is, as follows:

0: Closed Autocracy – the most restrictive form of governance with no form of political competition – no multiparty elections for the chief executive or legislature.
1: Electoral Autocracy – de-jure multiparty elections, but not free and fair, or de-facto multiparty.
2: Electoral Democracy – de-facto free and fair multiparty elections and a minimum level of Dahl’s institutional prerequisites for polyarchy. However, there’s either no access to justice, or transparent law enforcement, or the respect of personal liberties. Characterized by not full satisfaction of personal liberties, rule of law and judicial and legislative constraints on the executive.
3: Liberal Democarcy – de-facto free and fair multiparty elections, fulfilling the measures of personal liberties, rule of law and judicial and legislative constraints on teh executive.

This classification will now be used to evaluate the results of PCA, namely do the dimensions of principal components align with the regime types separated according to the polyarchy index?

Projecting the 2024 regime data onto the previously established two dimensions reveals that while liberal democracies form a tight and concise cluster characterized by relatively high scores in both dimensions, closed autocracies exhibit significant dispersion across the entirety of the second component. The x-axis (PC1: Civic Liberty) demarcates the level of liberal democracy, separating free regimes from autocracies, but the y-axis (PC2: Electoral Order/State Capacity) exposes a deep ramification within the authoritarian groups. What may strike the viewer the most upon a first, granted, brief glance on the plot are the countries occupying the upper-left quadrant – the high-capacity autocracies. It is the stomping ground of closed autocracies such as North Korea, which manage to obtain high scores of electoral variables constituting the PC2 Electoral Order/State Capacity dimension, while, rightfully so, scoring low on civic liberties. This finding might seem conflicting (it certainly appeared so to me), but upon a longer consideration: these countries, indeed, maintain perfect electoral peace, enforce accurate voter registration and prevent irregularities, even more effectively than some democracies, through absolute state control (hence, high state capacity) and monopoly on violence. The autocracies occupying the lower-left quadrant, however, lack both the freedom of democracy and the organizational power of effective dictatorship.

The PCA map reveals that while the path of freedom is singular, the path of subjugation/unfreedom, like Frost’s road in a yellow wood, diverges in two: the chaos of the weak state and the rigid, Orwellian order of the strong state.

# labels for regime types 
analysis_df$RegimeType <- factor(analysis_df$v2x_regime,
                                 levels = 0:3,
                                 labels = c("Closed Autocracy", 
                                            "Electoral Autocracy", 
                                            "Electoral Democracy", 
                                            "Liberal Democracy"))
top_contributors <- analysis_df %>%
  mutate(pca_contribution = Dim.1^2 + Dim.2^2) %>%
  group_by(RegimeType) %>%
  slice_max(order_by = pca_contribution, n = 5) %>%
  pull(country_name)

# regime map 
ggplot(analysis_df, aes(x = Dim.1, y = Dim.2, color = RegimeType)) +
  geom_point(size = 2.5, alpha = 0.7) +
  geom_text_repel(aes(label = ifelse(country_name %in% top_contributors, yes = country_name, "")),
                  size = 3.5,
                  box.padding = 0.5,
                  max.overlaps = Inf,
                  show.legend = FALSE,
                  force = 10) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = 0, linetype = "dashed") +
  scale_color_viridis_d() + 
  labs(title = "V-Dem Regimes vs PCA",
       x = "PC1: Civic Liberty",
       y = "PC2: Electoral Order/State Capacity") +
  theme_minimal() +
  theme(legend.position = "bottom")

Conclusions

The main conclusion of the preceding analysis is that behind the broadly popularized, one-dimensional metrics of democracy lie dimensions, which do not necessarily substitute one another or behave in the same direction, that would explain their subsumption. This is, of course, in line with the centuries of theoretical discourse on democracy, which has always treated democracy as a multifaceted notion. V-Dem’s Polyarchy Index, while eventually single-dimensional, is comprised of over 40 low-level indices, which are supposed to illustrate five dimensions of democracy based on Dahl’s conception of polyarchy. The EDI is merely, and understandably so, a simplified metric of an outstandingly complex concept. I am not claiming to discover anything revolutionary, when I say that such simplified symbols of democracy are, indeed, simplified – this is the sole purpose of this metric. However, what truly makes my findings interesting is the distinct relationship between EDI and the first and second component. Thee first component (Civic Liberty), by itself captures 67.5% of variance and exhibits a near-perfect linear relationship with the Polyarchy Index, while the second component (Electoral Order/State Capacity) behaves almost independently of EDI in a manner that does not resemble a linear relationship. Thus, the first component captures the standard definition of democracy, while the second component poses a distinct dimension of administartive capability of a state to manage elections and maintain civil order. Quintesentially, while the two dimensions capture the differences between V-Dem’s definition of democratic states (Liberal Democarcy – high freedom, high state capacity, and Electoral Democracy – high, but lower freedom, low state capacity), there is a substantial divergence between V-Dem’s definition of autocracies and their scores on PC1 and PC2. It would seem as though V-Dem’s standard definition of autocracies does not fully manage to capture the institutional differences within the cluster of Closed Autocracies, which does not occupy a specific quadrant, but rather scatters across the entire dimension of Electoral Order/State Capacity. This vertical bifurcation shines a light on a distinction of autocracies seemingly invisible to the Polyarchy Index – the difference between coercive order of high-capacity totalitarian states and the disorder of fragile or failed states. State capacity is not a prerequisite for closed autocracies in the same way it is for liberal democracies. This would suggest that the paths to autocracy are diverse, while the path to liberal democracy appears to be captured well by the one-dimensional EDI.

Bibliography

Coppedge, M., Gerring, J., Knutsen, C. H., Lindberg, S. I., Teorell, J., Marquardt, K. L., … & Wilson, S. L. (2024). V-Dem Methodology v14. V-Dem Dataset.

Dahl, R. A. (2008). Polyarchy: Participation and opposition. Yale university press.

Fukuyama, F. (1992). The end of history and the last man. Hamish Hamilton, London.

Gramsci, A. (1926). Prison Notebooks.

Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E., & Tatham, R. L. (2019). Multivariate data analysis.

Maerz, Seraphine F., Lührmann, Anna, Hellmeier, Sebastian, Grahn, Sandra; Lindberg, Staffan I. (18 May 2020). “State of the world 2019: autocratization surges – resistance grows”. Democratization. 27 (6): 909–927.

Wilson, M. C., Wiesner, K., & Bien, S. (2023). The Hidden Dimension in Democracy. V-Dem Working Paper, 137.