some years, statistician Andrew Gelman published a thought-provoking critique of the Human Development Index (HDI) used by the United States states. The notion was simple but powerful: if you apply the theory behind the global HDI (which includes measures such as life expectancy, education, and income) to the states of the United States, does it truly assess “human development” or is it simply another proxy for GDP?
Gelman’s main concept was that the HDI in the United States mostly reflects economic elements such as Gross National Income (GNI), rather than capturing broader components of human well-being such as education and healthcare. So we decided to dig deeper into this by analyzing genuine data on HDI in US states.
in this analysis, we’ll look at how different variables such as life expectancy, education, and income relate to HDI for U.S. states.
HDI Rank – The ranking of states based on their HDI value. HDI Value – The value of the Human Development Index itself. Life Expectancy – Average life expectancy for residents in the state. Expected Years of Schooling – The number of years of schooling a child can expect to receive. Mean Years of Schooling – The average number of years adults have been educated. GNI per Capita – The gross national income per capita for the state. HDI Rank Actual – A correction of the HDI rank, factoring in GNI per capita.
# Load libraries
suppressWarnings({
suppressPackageStartupMessages({
library(readxl)
library(tidyverse)
library(corrr)
library(GGally)
library(writexl)
library(readxl)
})
})
Load the dataset
# Load the data from the Excel file
hdi_data <- read_excel("CynthiaFolajimi/Bussines Forcasting/hdi_data.xlsx")
View(hdi_data)
Data Structure
str(hdi_data)
## tibble [13 × 9] (S3: tbl_df/tbl/data.frame)
## $ HDI_Rank : num [1:13] 23 15 21 37 8 10 50 42 44 34 ...
## $ Country : chr [1:13] "California" "Texas" "Florida" "New York" ...
## $ HDI_Value : num [1:13] 0.879 0.929 0.891 0.938 0.944 ...
## $ Life_Expectancy : num [1:13] 83 84 85 79 77 85 83 83 83 77 ...
## $ Expected_Years_of_Schooling: num [1:13] 13 12 14 15 12 14 16 15 13 16 ...
## $ Mean_Years_of_Schooling : num [1:13] 12 13 14 14 12 10 11 14 14 13 ...
## $ GNI_per_Capita : num [1:13] 71150 64891 63055 55965 67370 ...
## $ GNI_Rank_minus_HDI_Rank : num [1:13] 17 33 9 -25 23 36 -20 -7 -30 -5 ...
## $ HDI_Rank_Actual : num [1:13] 23 15 21 37 8 10 50 42 44 34 ...
# Check the first few rows of the data
head(hdi_data)
How Does HDI Relate to Other Variables? Our first task is to examine how the HDI correlates with other factors like GNI per Capita, Life Expectancy, and Education. If HDI is just a reflection of economic performance, we’d expect to see a very strong correlation between HDI and GNI per Capita.
The Correlation Matrix
First, we’ll compute the correlation between HDI and the other variables using a correlation matrix. This matrix will tell us how closely related the various components of HDI are. If GNI per Capita has a strong correlation with HDI, it might signal that HDI is largely driven by economic factors (i.e., GDP disguised as human development).
Here’s a glimpse of what we might find:
HDI and GNI per Capita: Strong positive correlation. If this is high, it suggests that states with higher incomes also tend to have higher HDIs, which could imply HDI is just reflecting wealth. HDI and Life Expectancy: We might see a moderately positive correlation here too, because states with better healthcare and living conditions often have higher life expectancy. HDI and Education: Similar to life expectancy, we might see a moderate correlation, but it might not be as strong as GNI.
GGally::ggpairs(hdi_data)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# HDI vs. GDP (GNI per Capita) scatter plot
ggplot(hdi_data, aes(x = GNI_per_Capita, y = HDI_Value)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", color = "red") +
labs(
title = "HDI vs. GNI per Capita",
x = "GNI per Capita (2017 PPP $)",
y = "Human Development Index (HDI)"
)
## `geom_smooth()` using formula = 'y ~ x'
The correlation between HDI rank and GNI rank we’ll rank states by their GNI per Capita and compare it to their HDI Rank. If the ranks are very similar, it would support the idea that HDI is really just a proxy for GDP. After all, wealthier states tend to do better in both rankings.
We’ll create a scatter plot comparing HDI Rank to GNI Rank, adding a regression line to see how closely the two rankings align.
hdi_data$GNI_Rank_Actual <- rank(-hdi_data$GNI_per_Capita) # Rank GNI per capita in descending order
# Plot HDI rank vs. GNI rank
ggplot(hdi_data, aes(x = GNI_Rank_Actual, y = HDI_Rank_Actual)) +
geom_point(color = "purple") +
geom_smooth(method = "lm", color = "green") +
labs(
title = "HDI Rank vs. GNI Rank",
x = "GNI Rank",
y = "HDI Rank"
)
## `geom_smooth()` using formula = 'y ~ x'
Linear Regression Analysis to assess the relationship between HDI and economic factors we can plot the coefficients to see which variables have the most impact on HDI. This will allow us to assess if HDI is a real development measure or just another form of GDP per Capita.
model <- lm(HDI_Value ~ GNI_per_Capita + Life_Expectancy + Expected_Years_of_Schooling + Mean_Years_of_Schooling, data = hdi_data)
summary(model)
##
## Call:
## lm(formula = HDI_Value ~ GNI_per_Capita + Life_Expectancy + Expected_Years_of_Schooling +
## Mean_Years_of_Schooling, data = hdi_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.025061 -0.020823 0.001573 0.013503 0.031135
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.406e+00 3.161e-01 4.449 0.00214 **
## GNI_per_Capita -6.981e-07 1.120e-06 -0.623 0.55032
## Life_Expectancy -5.510e-03 2.838e-03 -1.941 0.08816 .
## Expected_Years_of_Schooling -8.400e-03 5.959e-03 -1.410 0.19629
## Mean_Years_of_Schooling 9.220e-03 4.643e-03 1.986 0.08233 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0236 on 8 degrees of freedom
## Multiple R-squared: 0.5267, Adjusted R-squared: 0.29
## F-statistic: 2.225 on 4 and 8 DF, p-value: 0.1559
# Plot the regression coefficients
coef_df <- data.frame(
Variable = names(model$coefficients),
Coefficient = model$coefficients
)
ggplot(coef_df, aes(x = reorder(Variable, Coefficient), y = Coefficient)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Regression Coefficients for HDI")