The research question I chose for my project is: Does income inequality predict life expectancy across countries?
For this analysis, I use data from the World Bank’s World Development Indicators database. This dataset provides internationally comparable information across countries. The two main variables I focus on are life expectancy at birth (measured in years) and the Gini index, which measures income inequality within a country. Higher values of the Gini index indicate greater income inequality.
The dataset includes information for over 200 countries and multiple years. However for my analysis, I focus on the most recent year with sufficient overlapping data for both variables so that I can make consistent comparisons across countries.
I chose this topic because like the midterm, it relates to my major of Global and International Studies and I am also interested in global inequality and how economic differences between countries may relate to health outcomes. I specifically want to understand whether countries with higher income inequality tend to have lower life expectancy, and what this might suggest about broader patterns across the world.
life_exp <- read.csv("WB_WDI_SP_DYN_LE00_IN_WIDEF.csv")
gini_index <- read.csv("WB_WDI_SI_POV_GINI_WIDEF.csv")
health_exp <- read.csv("WB_WDI_SH_XPD_CHEX_GD_ZS_WIDEF.csv")
data <- life_exp %>% select(REF_AREA, X2022) %>% rename(LifeExpectancy = X2022) %>% inner_join( gini_index %>% select(REF_AREA, X2022) %>% rename(Gini = X2022), by = "REF_AREA") %>% inner_join( health_exp %>% select(REF_AREA, X2022) %>% rename(HealthSpending = X2022), by = "REF_AREA") %>%
drop_na()
head(data)
## REF_AREA LifeExpectancy Gini HealthSpending
## 1 CAN 81.09488 31.5 11.081197
## 2 RUS 72.54561 33.9 6.878866
## 3 IND 71.69800 25.5 3.396235
## 4 MEX 73.97300 43.5 5.709197
## 5 CYP 80.43400 31.5 8.272014
## 6 ITA 82.70000 33.7 8.850229
nrow(data)
## [1] 72
summary(data)
## REF_AREA LifeExpectancy Gini HealthSpending
## Length :72 Min. :54.08 Min. :24.1 Min. : 2.215
## N.unique :72 1st Qu.:72.49 1st Qu.:30.9 1st Qu.: 5.689
## N.blank : 0 Median :76.14 Median :33.9 Median : 7.596
## Min.nchar: 3 Mean :75.43 Mean :35.4 Mean : 7.486
## Max.nchar: 3 3rd Qu.:80.65 3rd Qu.:38.9 3rd Qu.: 9.165
## Max. :83.60 Max. :54.8 Max. :16.529
ggplot(data, aes(x = Gini, y = LifeExpectancy)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs( title = "Income Inequality vs Life Expectancy", x = "Gini Index (Income Inequality)", y = "Life Expectancy (Years)")
## `geom_smooth()` using formula = 'y ~ x'
cor(data$Gini, data$LifeExpectancy)
## [1] -0.293772
model <- lm(LifeExpectancy ~ Gini, data = data)
summary(model)
##
## Call:
## lm(formula = LifeExpectancy ~ Gini, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.7355 -2.9299 0.6904 4.5367 8.0135
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 84.5053 3.5996 23.477 <2e-16 ***
## Gini -0.2564 0.0997 -2.571 0.0123 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.983 on 70 degrees of freedom
## Multiple R-squared: 0.0863, Adjusted R-squared: 0.07325
## F-statistic: 6.612 on 1 and 70 DF, p-value: 0.01226
To account for another factor that may influence life expectancy, I added health spending (% of GDP) as a second predictor in a multiple linear regression model.
model2 <- lm(LifeExpectancy ~ Gini + HealthSpending, data = data)
summary(model2)
##
## Call:
## lm(formula = LifeExpectancy ~ Gini + HealthSpending, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.8904 -2.6069 0.6962 2.9534 9.3836
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 75.88844 3.52760 21.513 < 2e-16 ***
## Gini -0.26169 0.08568 -3.054 0.0032 **
## HealthSpending 1.17622 0.23154 5.080 3.08e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.141 on 69 degrees of freedom
## Multiple R-squared: 0.335, Adjusted R-squared: 0.3157
## F-statistic: 17.38 on 2 and 69 DF, p-value: 7.712e-07
The correlation between income inequality (Gini index) and life expectancy is approximately -0.29, which indicates a weak to moderate negative relationship. Countries with higher income inequality generally tend to have lower life expectancy.
The simple linear regression found a statistically significant negative relationship between income inequality and life expectancy with p = 0.012 however the model explained only about 8.6% of the variation in life expectancy with the R² = 0.086, suggesting that many other factors also influence life expectancy.
To improve the analysis, I performed a multiple regression by adding health spending as a second predictor. The new model explained about 33.5% of the variation in life expectancy (R² = 0.335), a substantial improvement over the simple regression. Both income inequality (p = 0.003) and health spending (p < 0.001) remained statistically significant predictors of life expectancy.
The results from this analysis show a negative relationship between income inequality and life expectancy across countries. Countries with higher levels of income inequality tend to have lower average life expectancy.
The regression results support this pattern statistically. When breaking it down specifically, a one point increase in the Gini index is associated with about a 0.26 year decrease in life expectancy however this relationship is not very strong overall. The R-squared value of 0.086 suggests to us that income inequality explains only a small portion of the differences in life expectancy between countries.
After adding health spending as a second predictor in a multiple regression model, the results became much stronger. The model explained about one third of the variation in life expectancy across countries, and both income inequality and health spending remained statistically significant predictors. This suggests that while income inequality matters, health spending also plays an important role in explaining why life expectancy differs across countries.
Future research could include additional variables such as GDP per capita, education levels, or access to healthcare to better understand the many factors that influence life expectancy.
World Bank. (2022). Life expectancy at birth, total (years) [SP.DYN.LE00.IN]. World Development Indicators. https://data360.worldbank.org/en/indicator/WB_WDI_SP_DYN_LE00_IN
World Bank. (2022). Gini index (World Bank estimate) [SI.POV.GINI]. World Development Indicators. https://data360.worldbank.org/en/indicator/WB_WDI_SI_POV_GINI