What determines good governance? Here, I explore some correlates of good governance with the aid of an interactive correlation network.

First, I need the requisite packages…

library(tidyverse) # to work with data
library(corrr)     # for tidy correlations
library(networkD3) # for interactive visualizations

And now I need the data…

dt = read_csv('dataviz-02.csv')
dt
## # A tibble: 195 x 37
##    country countryISO population surfaceArea X5    giniIndex happyPlanetIndex
##    <chr>   <chr>           <dbl>       <dbl> <lgl>     <dbl>            <dbl>
##  1 Afghan~ AFG          36000000      652230 NA         NA               20.2
##  2 Albania ALB           2900000       27398 NA         29               36.8
##  3 Algeria DZA          41000000     2381740 NA         35.3             33.3
##  4 Andorra AND             77000         468 NA         NA               NA  
##  5 Angola  AGO          30000000     1246700 NA         42.7             NA  
##  6 Antigu~ ATG            102000         443 NA         NA               NA  
##  7 Argent~ ARG          44000000     2736690 NA         41.7             35.2
##  8 Armenia ARM           3000000       28203 NA         31.5             25.7
##  9 Austra~ AUS          25000000     7682300 NA         30.3             21.2
## 10 Austria AUT           9000000       82445 NA         30.5             30.5
## # ... with 185 more rows, and 30 more variables: humanDevelopmentIndex <dbl>,
## #   worldHappinessReportScore <dbl>,
## #   sustainableEconomicDevelopmentAssessment <dbl>, X11 <lgl>, gdp <dbl>,
## #   gdpPerCapita <dbl>, gdpGrowth <dbl>, healthExpenditureShareOfGDP <dbl>,
## #   healthExpenditurePerPerson <dbl>, educationExpenditureShareOfGDP <dbl>,
## #   educationExpenditurePerPerson <dbl>, schoolLifeExpectancy <dbl>,
## #   percentUnemployed <dbl>, governmentSpendingScore <dbl>,
## #   governmentExpenditureShareOfGDP <dbl>, X23 <lgl>,
## #   politicalRightsScore <dbl>, civilLibertiesScore <dbl>,
## #   politicalStabilityScore <dbl>, governmentEffectivenessScore <dbl>,
## #   regulatoryQualityScore <dbl>, ruleOfLawScore <dbl>,
## #   controlOfCorruptionScore <dbl>, judicialEffectivenessScore <dbl>,
## #   governmentIntegrityScore <dbl>, propertyRightsScore <dbl>,
## #   taxBurdenScore <dbl>, economicFreedomScore <dbl>,
## #   financialFreedomScore <dbl>, percentWomenMP <dbl>

The above dataset contains a lot of variables, ranging from GDP per capita, population, Sustainable Economic Development Assessment, World Happiness Report Scores, Control of Corruption, and a host of other measures for 195 countries.

There’s a lot to wrap our heads around here. One useful way to explore how so many variables are interrelated is with the aid of a correlation network (and an interactive one at that!).

First, let’s just look at the bivariate correlations in the data. We can do this with the correlate() function in the corrr package.

cors = correlate(
  dt %>%
    # get rid of empty vectors and country ID info:
    select(-country,-countryISO,-X5,-X11,-X23)
) 
cors # show output
## # A tibble: 32 x 33
##    rowname population surfaceArea giniIndex happyPlanetIndex humanDevelopmen~
##    <chr>        <dbl>       <dbl>     <dbl>            <dbl>            <dbl>
##  1 popula~   NA            0.447     0.0371           0.0645         -0.00663
##  2 surfac~    0.447       NA         0.0705          -0.0814          0.101  
##  3 giniIn~    0.0371       0.0705   NA               -0.121          -0.375  
##  4 happyP~    0.0645      -0.0814   -0.121           NA               0.359  
##  5 humanD~   -0.00663      0.101    -0.375            0.359          NA      
##  6 worldH~   -0.0979       0.0328   -0.346            0.406           0.787  
##  7 sustai~   -0.0698       0.0541   -0.496            0.277           0.936  
##  8 gdp        0.793        0.598     0.0395           0.0311          0.187  
##  9 gdpPer~   -0.0458       0.0542   -0.365            0.113           0.695  
## 10 gdpGro~    0.124       -0.0612    0.0885           0.0985         -0.00561
## # ... with 22 more rows, and 27 more variables:
## #   worldHappinessReportScore <dbl>,
## #   sustainableEconomicDevelopmentAssessment <dbl>, gdp <dbl>,
## #   gdpPerCapita <dbl>, gdpGrowth <dbl>, healthExpenditureShareOfGDP <dbl>,
## #   healthExpenditurePerPerson <dbl>, educationExpenditureShareOfGDP <dbl>,
## #   educationExpenditurePerPerson <dbl>, schoolLifeExpectancy <dbl>,
## #   percentUnemployed <dbl>, governmentSpendingScore <dbl>,
## #   governmentExpenditureShareOfGDP <dbl>, politicalRightsScore <dbl>,
## #   civilLibertiesScore <dbl>, politicalStabilityScore <dbl>,
## #   governmentEffectivenessScore <dbl>, regulatoryQualityScore <dbl>,
## #   ruleOfLawScore <dbl>, controlOfCorruptionScore <dbl>,
## #   judicialEffectivenessScore <dbl>, governmentIntegrityScore <dbl>,
## #   propertyRightsScore <dbl>, taxBurdenScore <dbl>,
## #   economicFreedomScore <dbl>, financialFreedomScore <dbl>,
## #   percentWomenMP <dbl>

We now have a tibble of correlations for 32 variables. While it’s simple enough to look through these results to get a sense for how variables are correlated, it will take some time to fully digest these results as they stand.

Alternatively, I can make a network plot that let’s me more clearly see how measures of governance correlate with other features of a country, like income (GDP per capita), rule of law, healthcare spending, and so on.

I can do this really easily with the simpleNetwork() function in the networkD3 package.

First, let’s recreate the correlation matrix. But, before doing that, I want to clean up the names of the variables in the data…

dt %>%
  select(
    -country,
    -countryISO,
    -X5,-X11,-X23
  ) %>%
  rename(
    GINI=giniIndex,
    'Happy Planet'=happyPlanetIndex,
    'World Happiness'=worldHappinessReportScore,
    'Human Dev.'=humanDevelopmentIndex,
    'Sustainable Dev.'=sustainableEconomicDevelopmentAssessment,
    Population=population,
    GDP=gdp,
    Income=gdpPerCapita,
    'GDP Growth'=gdpGrowth,
    'Health Spending per GDP'=healthExpenditureShareOfGDP,
    'Per Capita Health Spending'=healthExpenditurePerPerson,
    'Edu. Spending per GDP'=educationExpenditureShareOfGDP,
    'Per Capita Edu. Spending'=educationExpenditurePerPerson,
    'Years in School'=schoolLifeExpectancy,
    'Unemployment Rate'=percentUnemployed,
    'Govt. Spending Score'=governmentSpendingScore,
    'Govt. Spending per GDP'=governmentExpenditureShareOfGDP,
    'Lack of Political Rights'=politicalRightsScore,
    'Low Civil Liberties'=civilLibertiesScore,
    'Political Stability'=politicalStabilityScore,
    'Govt. Effectiveness'=governmentEffectivenessScore,
    'Regulatory Quality'=regulatoryQualityScore,
    'Rule of Law'=ruleOfLawScore,
    'Control of Corruption'=controlOfCorruptionScore,
    'Judicial Effectiveness'=judicialEffectivenessScore,
    'Govt. Integrity'=governmentIntegrityScore,
    'Property Rights'=propertyRightsScore,
    'Tax Burden'=taxBurdenScore,
    'Econ. Freedom'=economicFreedomScore,
    'Financial Freedom'=financialFreedomScore,
    'Women MPs'=percentWomenMP
  ) -> clean_dt

Next, I use correlate() on the data.

clean_dt %>%
  correlate() -> cors

# make NAs zeros
diag(cors[,-1]) = 0

I then use stretch() to reformat the data so that the first and second columns denote the pairwise names of variables, and where the third column denotes the correlation coefficient.

cors %>%
  stretch() -> stretch_cors
stretch_cors # show output
## # A tibble: 1,024 x 3
##    x          y                       r
##    <chr>      <chr>               <dbl>
##  1 Population Population        0      
##  2 Population surfaceArea       0.447  
##  3 Population GINI              0.0371 
##  4 Population Happy Planet      0.0645 
##  5 Population Human Dev.       -0.00663
##  6 Population World Happiness  -0.0979 
##  7 Population Sustainable Dev. -0.0698 
##  8 Population GDP               0.793  
##  9 Population Income           -0.0458 
## 10 Population GDP Growth        0.124  
## # ... with 1,014 more rows

Looking at the above, it’s clear we have a lot of pairwise correlations (1,024 to be precise). To simplify things, it can be useful to zero in on the most strongly correlated variable pairs in the data. I do this by filtering out pairwise correlations where the absolute value of the correlation coefficient is less than \(|\rho| = 0.7\).

stretch_cors %>%
  filter(abs(r)>=0.7) -> strong_cors
strong_cors
## # A tibble: 192 x 3
##    x          y                              r
##    <chr>      <chr>                      <dbl>
##  1 Population GDP                        0.793
##  2 Human Dev. World Happiness            0.787
##  3 Human Dev. Sustainable Dev.           0.936
##  4 Human Dev. Per Capita Health Spending 0.737
##  5 Human Dev. Years in School            0.888
##  6 Human Dev. Govt. Effectiveness        0.827
##  7 Human Dev. Regulatory Quality         0.777
##  8 Human Dev. Rule of Law                0.756
##  9 Human Dev. Control of Corruption      0.707
## 10 Human Dev. Property Rights            0.764
## # ... with 182 more rows

That narrows things down to 192 variable pairs.

With the above in hand, I now use the simpleNetwork() function to make an interactive network plot.

# first, set the color of the edges (or links)
# between nodes (variables):
strong_cors$col='red' # for negative correlations
strong_cors$col[strong_cors$r>0]='lightblue' # for positive

# make the network plot
strong_cors %>%
  simpleNetwork(charge = -25, # add negative charge to force points apart
                fontSize = 6,
                linkColour = .$col,
                zoom=T,
                opacity = .75,
                nodeColour = 'grey',
                fontFamily = 'calibri') 

Figure 1: Correlation Network of Correlates of Good Governance

There are few cool things about the above network plot.

  1. When you hover your mouse over nodes (or variables), the relevant node, as well as the other nodes connected with it, will be highlighted. This lets you easily see how one variable is connected with others. For example, if you hover your mouse over Income (GDP per capita), you will see Sustainable Development, Per Capita Education Spending, Government Effectiveness, Per Capita Health Spending, and World Happiness highlighted as well.

  2. If the positions of certain nodes obscure your view, you can also click on a given node in the network and pull it to a different position. Try it!

  3. You can zoom in and out if you want to take a closer look at a set of variables, or if you want a wider perspective on the entire network. If you have a mouse with a fancy rolly-thingy (that’s the technical name), you can spin that up or down with your mouse hovering over the plot to control the zoom. Or, if you have a fancy touch screen, you can just use your fingers.

All in all, the above results offer a nuanced picture of drivers of good governance, and ultimately, of happy citizens. For instance, look at the World Happiness measure, which is based on a simple survey about quality of life. If you highlight World Happiness, you’ll instantly see other parts of the network come to the fore: Property Rights, Rule of Law, Control of Corruption, Government Effectiveness, Sustainable Development, Regulatory Quality, Income (maybe money does buy happiness?), Per Capita Education Spending, Per Capita Health Spending, and Human Development (based on the Human Development Index, which measures healthy lives, standard of living, and knowledge).

There seems to be a story here about how investment in education and healthcare, coupled with strong property rights and low levels of corruption, promote quality of life (or at least are necessary conditions for it).