Introduction

This project investigates whether the global inequalities can be divided into two distinct, independent dimensions: the economic and the social. While economic inequalities are quantifiable and relatively easy to measure, social inequalities—such as gender disparity or life expectancy—are often more qualitative and harder to define.

A central debate in development economics is whether social inequalities aren’t just a derivative of economic ones. One might argue that the concentration of capital naturally dictates access to healthcare and education, making social outcomes a secondary effect of wealth distribution. My goal is to test this hypothesis by reducing multiple inequality indicators into two principal components (PCA). By doing so, I aim to determine if social and economic measurements load into separate dimensions and whether these dimensions are orthogonal. If they are, it would demonstrate that social inequality is not just a byproduct of economic wealth, but a separate phenomenon that requires its own unique policy interventions.

Finally, this study explores the nature of environmental inequality, represented by the carbon footprint of the top 10%. The goal is to determine whether this variable aligns with the economic or the social dimension. Specifically, I aim to test if high emissions are merely a function of financial capability (the rich can afford to emit more) or if they reflect broader social structures, such as consumption norms

Data

For this project I took data from two sources:

  1. World Inequality Database (WID): This database is managed by the World Inequality Lab, led by researchers such as Thomas Piketty and Gabriel Zucman. For this analysis, I am using the share of total wealth and income owned by the richest 1% of the population and precantage of carbon-dioxide emssion caused by the richest 10%.

  2. Human Development Report Database created by United Nations Development Programme. This will be my source for measurments of social inequalities, such as gender, education, and life expectancy inequalities.

# Reading necessary libraries
library(readxl)
library(WDI)
library(tidyr)
library(dplyr)
library(stringr)
library(countrycode)
library(corrplot)
library(psych)
library(factoextra)
library(gridExtra)
library(knitr)

# Uploading data
# Loading World Inequalities Data
wid<- read_excel('WID_Data_DR.xlsx')
wid<- wid %>%
  mutate(type = case_when(startsWith(Indicator, "sptinc") ~ 'Income', startsWith(Indicator, 'lpfghg') ~ "CO2Emission", startsWith(Indicator, 'shweal') ~'Wealth'))

wid$type <- paste(wid$type, wid$Precentile, sep = '_')
wid<-pivot_wider(data = wid, id_cols = c(Country, Year),names_from = type, values_from = Value)
wid$iso3c<-countrycode(sourcevar = wid$Country, origin = "country.name", destination = "iso3c")

#Loading data from Human Development Report

undp<- read_excel('hdr_undp.xlsx')
undp<- pivot_wider(data = undp, id_cols = c(country, year) ,names_from = indicatorCode, values_from = value )
undp$iso3c<-countrycode(sourcevar = undp$country, origin = "country.name", destination = "iso3c")
undp<- rename(undp, Year = year, Country = country)
undp$gii<- as.numeric(undp$gii)
undp$ineq_edu<- as.numeric(undp$ineq_edu)
undp$ineq_le<- as.numeric(undp$ineq_le)
undp$Year<- as.numeric(undp$Year)
wid<- wid[!is.na(wid$iso3c),]
undp<- undp[!is.na(undp$iso3c),]

#Merging and handling missing values

DF <- wid %>%
  left_join(select(undp, -Country), by = c("iso3c", "Year"))
DF$ineq_edu <- DF$ineq_edu/100
DF$ineq_le <- DF$ineq_le/100
DF$CO2Emission_p90p100 <- DF$CO2Emission_p90p100/100
df<- DF[DF$Year == 2019,] #extracting data for year 2019
cat("Number of observations without missing variables:",sum(complete.cases(df)))
## Number of observations without missing variables: 155
missing_values <- colSums(is.na(df))
kable(as.data.frame(missing_values), col.names = "Number of NAs", caption = "Missing values per variable")
Missing values per variable
Number of NAs
Country 0
Year 0
Income_p99p100 6
Wealth_p99p100 6
CO2Emission_p90p100 54
iso3c 0
gii 46
ineq_edu 32
ineq_le 21
df<-na.omit(df)
kable(head(df, 10))
Country Year Income_p99p100 Wealth_p99p100 CO2Emission_p90p100 iso3c gii ineq_edu ineq_le
Afghanistan 2019 0.1469 0.2362 0.0341253 AFG 0.676 0.45365 0.28075
Albania 2019 0.0934 0.2247 0.1067012 ALB 0.131 0.12333 0.06452
Algeria 2019 0.2261 0.2801 0.0791766 DZA 0.385 0.33283 0.13008
Angola 2019 0.2584 0.3841 0.0616115 AGO 0.536 0.34171 0.29883
Argentina 2019 0.1482 0.2457 0.1816059 ARG 0.283 0.05787 0.07992
Armenia 2019 0.1457 0.2341 0.0989824 ARM 0.216 0.02935 0.07804
Australia 2019 0.0992 0.2300 0.5162757 AUS 0.079 0.04306 0.03537
Austria 2019 0.0876 0.2958 0.3616485 AUT 0.052 0.02855 0.03354
Azerbaijan 2019 0.1345 0.2310 0.1572052 AZE 0.314 0.04045 0.12254
Bahamas 2019 0.2266 0.3103 0.2951544 BHS 0.335 0.05143 0.14526
summary(df)
##    Country               Year      Income_p99p100   Wealth_p99p100  
##  Length:155         Min.   :2019   Min.   :0.0681   Min.   :0.1383  
##  Class :character   1st Qu.:2019   1st Qu.:0.1212   1st Qu.:0.2405  
##  Mode  :character   Median :2019   Median :0.1528   Median :0.2685  
##                     Mean   :2019   Mean   :0.1610   Mean   :0.2829  
##                     3rd Qu.:2019   3rd Qu.:0.2016   3rd Qu.:0.3105  
##                     Max.   :2019   Max.   :0.3134   Max.   :0.5472  
##  CO2Emission_p90p100    iso3c                gii            ineq_edu      
##  Min.   :0.005952    Length:155         Min.   :0.0140   Min.   :0.01369  
##  1st Qu.:0.072093    Class :character   1st Qu.:0.1595   1st Qu.:0.05912  
##  Median :0.165056    Mode  :character   Median :0.3370   Median :0.14627  
##  Mean   :0.216709                       Mean   :0.3362   Mean   :0.18562  
##  3rd Qu.:0.273318                       3rd Qu.:0.5005   3rd Qu.:0.28663  
##  Max.   :1.436777                       Max.   :0.8160   Max.   :0.50124  
##     ineq_le       
##  Min.   :0.02331  
##  1st Qu.:0.04767  
##  Median :0.10057  
##  Mean   :0.13302  
##  3rd Qu.:0.21141  
##  Max.   :0.40913

Pre-estimation Diagnostics

The first step of the analysis is to examine the correlation matrix. My hypothesis is that variables will show strong intra-group correlation (within the economic and social categories) but weak inter-group correlation.

df_num <- select(df, -c(Country, Year, iso3c))
View(df_num)
correlation<-cor(df_num, method = 'spearman')
corrplot(correlation, type = 'lower')

Correlation matrix seems to confirm my assumptions, social variable (gender, education, and life expectancy inequalities) and economic (Wealth and income) are highly correlated within their groups, but not outside of them. Another thing that can be read from this matrix is high negative correlation of CO2 variable with social features, and low with the economic ones.

KMO(correlation)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = correlation)
## Overall MSA =  0.79
## MSA for each item = 
##      Income_p99p100      Wealth_p99p100 CO2Emission_p90p100                 gii 
##                0.67                0.57                0.91                0.75 
##            ineq_edu             ineq_le 
##                0.93                0.79
cortest.bartlett(correlation, n = nrow(df_num))
## $chisq
## [1] 813.2199
## 
## $p.value
## [1] 1.276562e-163
## 
## $df
## [1] 15

The preliminary tests confirm that the dataset is suitable for Principal Component Analysis. The KMO score of 0.79 indicates ‘middling to meritorious’ sampling adequacy, meaning the variables share enough common variance to be grouped into dimensions. Null hypotesis of Bartlett’s test was rejected, so I can assume that the variables are related and perform dimension reduction.

PCA

pca <- prcomp(df_num, center = TRUE, scale = TRUE)
summary(pca)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6
## Standard deviation     1.7840 1.2879 0.76347 0.53912 0.44963 0.28814
## Proportion of Variance 0.5304 0.2764 0.09715 0.04844 0.03369 0.01384
## Cumulative Proportion  0.5304 0.8069 0.90403 0.95247 0.98616 1.00000
eig_val <- get_eigenvalue(pca)
eig_val$eigenvalue
## [1] 3.18256635 1.65871422 0.58287887 0.29064895 0.20216410 0.08302751
fviz_eig(pca, choice = 'eigenvalue', addlabels = TRUE,   main = "Eigenvalues", barfill = 'hotpink2')

To test my hypothesis regarding the dual nature of inequality, it was necessary to use only the first two principal components (PC1 and PC2). This choice is strongly supported by the statistical evidence: the Scree Plot shows that only the first two dimensions have eigenvalues greater than one (Kaiser’s Criterion), while PC3 falls significantly below this threshold. Furthermore, these two components together explain 80% of the total variance in the dataset.

fviz_pca_var(pca,
             col.var = "contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE,
             title = "Correlation Circle",
             xlab = "PC1",
             ylab = "PC2")

PC1 <- fviz_contrib(pca, choice = "var", axes = 1)
PC2 <- fviz_contrib(pca, choice = "var", axes = 2)
grid.arrange(PC1, PC2, ncol =1)

Conclusions

The Correlation Circle confirms the clear division between economic and social inequalities. The variables representing social dimensions—Gender Inequality (GII), Life Expectancy, and Education—align almost perfectly with PC2. According to the contribution plots, these variables are the primary drivers of this dimension, each contributing between 25% and 30%. Their near-perpendicular orientation relative to the economic variables provides strong evidence that social inequality is an independent phenomenon, rather than a derivative of financial wealth.

The economic dimension is almost entirely defined by Wealth and Income Inequality, which dominate PC1, with each contributing over 40%. This orthogonality suggests that a country’s economic distribution does not automatically dictate its social outcomes.

Regarding Environmental Inequality (\(CO_{2}\) p90p100), the analysis shows that its contribution falls below the significance threshold for both dimensions. While its overall impact is lower than that of the other indicators, it leans more towards the social dimension (nearly 15% contribution) than the economic one (under 10%). This suggests that extreme carbon footprints are more closely linked to society structure and development patterns than to capital accumulation alone.