What determines good governance? Here, I explore some correlates of good governance with the aid of an interactive correlation network.
First, I need the requisite packages…
library(tidyverse) # to work with data
library(corrr) # for tidy correlations
library(networkD3) # for interactive visualizations
And now I need the data…
dt = read_csv('dataviz-02.csv')
dt
## # A tibble: 195 x 37
## country countryISO population surfaceArea X5 giniIndex happyPlanetIndex
## <chr> <chr> <dbl> <dbl> <lgl> <dbl> <dbl>
## 1 Afghan~ AFG 36000000 652230 NA NA 20.2
## 2 Albania ALB 2900000 27398 NA 29 36.8
## 3 Algeria DZA 41000000 2381740 NA 35.3 33.3
## 4 Andorra AND 77000 468 NA NA NA
## 5 Angola AGO 30000000 1246700 NA 42.7 NA
## 6 Antigu~ ATG 102000 443 NA NA NA
## 7 Argent~ ARG 44000000 2736690 NA 41.7 35.2
## 8 Armenia ARM 3000000 28203 NA 31.5 25.7
## 9 Austra~ AUS 25000000 7682300 NA 30.3 21.2
## 10 Austria AUT 9000000 82445 NA 30.5 30.5
## # ... with 185 more rows, and 30 more variables: humanDevelopmentIndex <dbl>,
## # worldHappinessReportScore <dbl>,
## # sustainableEconomicDevelopmentAssessment <dbl>, X11 <lgl>, gdp <dbl>,
## # gdpPerCapita <dbl>, gdpGrowth <dbl>, healthExpenditureShareOfGDP <dbl>,
## # healthExpenditurePerPerson <dbl>, educationExpenditureShareOfGDP <dbl>,
## # educationExpenditurePerPerson <dbl>, schoolLifeExpectancy <dbl>,
## # percentUnemployed <dbl>, governmentSpendingScore <dbl>,
## # governmentExpenditureShareOfGDP <dbl>, X23 <lgl>,
## # politicalRightsScore <dbl>, civilLibertiesScore <dbl>,
## # politicalStabilityScore <dbl>, governmentEffectivenessScore <dbl>,
## # regulatoryQualityScore <dbl>, ruleOfLawScore <dbl>,
## # controlOfCorruptionScore <dbl>, judicialEffectivenessScore <dbl>,
## # governmentIntegrityScore <dbl>, propertyRightsScore <dbl>,
## # taxBurdenScore <dbl>, economicFreedomScore <dbl>,
## # financialFreedomScore <dbl>, percentWomenMP <dbl>
The above dataset contains a lot of variables, ranging from GDP per capita, population, Sustainable Economic Development Assessment, World Happiness Report Scores, Control of Corruption, and a host of other measures for 195 countries.
There’s a lot to wrap our heads around here. One useful way to explore how so many variables are interrelated is with the aid of a correlation network (and an interactive one at that!).
First, let’s just look at the bivariate correlations in the data. We can do this with the correlate() function in the corrr package.
cors = correlate(
dt %>%
# get rid of empty vectors and country ID info:
select(-country,-countryISO,-X5,-X11,-X23)
)
cors # show output
## # A tibble: 32 x 33
## rowname population surfaceArea giniIndex happyPlanetIndex humanDevelopmen~
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 popula~ NA 0.447 0.0371 0.0645 -0.00663
## 2 surfac~ 0.447 NA 0.0705 -0.0814 0.101
## 3 giniIn~ 0.0371 0.0705 NA -0.121 -0.375
## 4 happyP~ 0.0645 -0.0814 -0.121 NA 0.359
## 5 humanD~ -0.00663 0.101 -0.375 0.359 NA
## 6 worldH~ -0.0979 0.0328 -0.346 0.406 0.787
## 7 sustai~ -0.0698 0.0541 -0.496 0.277 0.936
## 8 gdp 0.793 0.598 0.0395 0.0311 0.187
## 9 gdpPer~ -0.0458 0.0542 -0.365 0.113 0.695
## 10 gdpGro~ 0.124 -0.0612 0.0885 0.0985 -0.00561
## # ... with 22 more rows, and 27 more variables:
## # worldHappinessReportScore <dbl>,
## # sustainableEconomicDevelopmentAssessment <dbl>, gdp <dbl>,
## # gdpPerCapita <dbl>, gdpGrowth <dbl>, healthExpenditureShareOfGDP <dbl>,
## # healthExpenditurePerPerson <dbl>, educationExpenditureShareOfGDP <dbl>,
## # educationExpenditurePerPerson <dbl>, schoolLifeExpectancy <dbl>,
## # percentUnemployed <dbl>, governmentSpendingScore <dbl>,
## # governmentExpenditureShareOfGDP <dbl>, politicalRightsScore <dbl>,
## # civilLibertiesScore <dbl>, politicalStabilityScore <dbl>,
## # governmentEffectivenessScore <dbl>, regulatoryQualityScore <dbl>,
## # ruleOfLawScore <dbl>, controlOfCorruptionScore <dbl>,
## # judicialEffectivenessScore <dbl>, governmentIntegrityScore <dbl>,
## # propertyRightsScore <dbl>, taxBurdenScore <dbl>,
## # economicFreedomScore <dbl>, financialFreedomScore <dbl>,
## # percentWomenMP <dbl>
We now have a tibble of correlations for 32 variables. While it’s simple enough to look through these results to get a sense for how variables are correlated, it will take some time to fully digest these results as they stand.
Alternatively, I can make a network plot that let’s me more clearly see how measures of governance correlate with other features of a country, like income (GDP per capita), rule of law, healthcare spending, and so on.
I can do this really easily with the simpleNetwork() function in the networkD3 package.
First, let’s recreate the correlation matrix. But, before doing that, I want to clean up the names of the variables in the data…
dt %>%
select(
-country,
-countryISO,
-X5,-X11,-X23
) %>%
rename(
GINI=giniIndex,
'Happy Planet'=happyPlanetIndex,
'World Happiness'=worldHappinessReportScore,
'Human Dev.'=humanDevelopmentIndex,
'Sustainable Dev.'=sustainableEconomicDevelopmentAssessment,
Population=population,
GDP=gdp,
Income=gdpPerCapita,
'GDP Growth'=gdpGrowth,
'Health Spending per GDP'=healthExpenditureShareOfGDP,
'Per Capita Health Spending'=healthExpenditurePerPerson,
'Edu. Spending per GDP'=educationExpenditureShareOfGDP,
'Per Capita Edu. Spending'=educationExpenditurePerPerson,
'Years in School'=schoolLifeExpectancy,
'Unemployment Rate'=percentUnemployed,
'Govt. Spending Score'=governmentSpendingScore,
'Govt. Spending per GDP'=governmentExpenditureShareOfGDP,
'Lack of Political Rights'=politicalRightsScore,
'Low Civil Liberties'=civilLibertiesScore,
'Political Stability'=politicalStabilityScore,
'Govt. Effectiveness'=governmentEffectivenessScore,
'Regulatory Quality'=regulatoryQualityScore,
'Rule of Law'=ruleOfLawScore,
'Control of Corruption'=controlOfCorruptionScore,
'Judicial Effectiveness'=judicialEffectivenessScore,
'Govt. Integrity'=governmentIntegrityScore,
'Property Rights'=propertyRightsScore,
'Tax Burden'=taxBurdenScore,
'Econ. Freedom'=economicFreedomScore,
'Financial Freedom'=financialFreedomScore,
'Women MPs'=percentWomenMP
) -> clean_dt
Next, I use correlate() on the data.
clean_dt %>%
correlate() -> cors
# make NAs zeros
diag(cors[,-1]) = 0
I then use stretch() to reformat the data so that the first and second columns denote the pairwise names of variables, and where the third column denotes the correlation coefficient.
cors %>%
stretch() -> stretch_cors
stretch_cors # show output
## # A tibble: 1,024 x 3
## x y r
## <chr> <chr> <dbl>
## 1 Population Population 0
## 2 Population surfaceArea 0.447
## 3 Population GINI 0.0371
## 4 Population Happy Planet 0.0645
## 5 Population Human Dev. -0.00663
## 6 Population World Happiness -0.0979
## 7 Population Sustainable Dev. -0.0698
## 8 Population GDP 0.793
## 9 Population Income -0.0458
## 10 Population GDP Growth 0.124
## # ... with 1,014 more rows
Looking at the above, it’s clear we have a lot of pairwise correlations (1,024 to be precise). To simplify things, it can be useful to zero in on the most strongly correlated variable pairs in the data. I do this by filtering out pairwise correlations where the absolute value of the correlation coefficient is less than \(|\rho| = 0.7\).
stretch_cors %>%
filter(abs(r)>=0.7) -> strong_cors
strong_cors
## # A tibble: 192 x 3
## x y r
## <chr> <chr> <dbl>
## 1 Population GDP 0.793
## 2 Human Dev. World Happiness 0.787
## 3 Human Dev. Sustainable Dev. 0.936
## 4 Human Dev. Per Capita Health Spending 0.737
## 5 Human Dev. Years in School 0.888
## 6 Human Dev. Govt. Effectiveness 0.827
## 7 Human Dev. Regulatory Quality 0.777
## 8 Human Dev. Rule of Law 0.756
## 9 Human Dev. Control of Corruption 0.707
## 10 Human Dev. Property Rights 0.764
## # ... with 182 more rows
That narrows things down to 192 variable pairs.
With the above in hand, I now use the simpleNetwork() function to make an interactive network plot.
# first, set the color of the edges (or links)
# between nodes (variables):
strong_cors$col='red' # for negative correlations
strong_cors$col[strong_cors$r>0]='lightblue' # for positive
# make the network plot
strong_cors %>%
simpleNetwork(charge = -25, # add negative charge to force points apart
fontSize = 6,
linkColour = .$col,
zoom=T,
opacity = .75,
nodeColour = 'grey',
fontFamily = 'calibri')
Figure 1: Correlation Network of Correlates of Good Governance
There are few cool things about the above network plot.
When you hover your mouse over nodes (or variables), the relevant node, as well as the other nodes connected with it, will be highlighted. This lets you easily see how one variable is connected with others. For example, if you hover your mouse over Income (GDP per capita), you will see Sustainable Development, Per Capita Education Spending, Government Effectiveness, Per Capita Health Spending, and World Happiness highlighted as well.
If the positions of certain nodes obscure your view, you can also click on a given node in the network and pull it to a different position. Try it!
You can zoom in and out if you want to take a closer look at a set of variables, or if you want a wider perspective on the entire network. If you have a mouse with a fancy rolly-thingy (that’s the technical name), you can spin that up or down with your mouse hovering over the plot to control the zoom. Or, if you have a fancy touch screen, you can just use your fingers.
All in all, the above results offer a nuanced picture of drivers of good governance, and ultimately, of happy citizens. For instance, look at the World Happiness measure, which is based on a simple survey about quality of life. If you highlight World Happiness, you’ll instantly see other parts of the network come to the fore: Property Rights, Rule of Law, Control of Corruption, Government Effectiveness, Sustainable Development, Regulatory Quality, Income (maybe money does buy happiness?), Per Capita Education Spending, Per Capita Health Spending, and Human Development (based on the Human Development Index, which measures healthy lives, standard of living, and knowledge).
There seems to be a story here about how investment in education and healthcare, coupled with strong property rights and low levels of corruption, promote quality of life (or at least are necessary conditions for it).