2026-05-08

Abstract

This study examines the socioeconomic and health factors most strongly associated with life expectancy across 2,928 countries from 2000 to 2015 using data from WHO and UN. The data contains 15 observations for each year for each country looking at variables such as GDP per capita, schooling, alcohol consumption, BMI, development status and more This was an observational study with no testing or experimentation. Data was collected by UN and WHO. We can develop associations from this experiment and no formal causations. I ran a t-test to compare the two means between developed countries and developing countries. I also ran a multi-regression analysis to compare the different variables that can play a potential role in affecting lifespan across countries. I would look at the coefficients for each variable and see the effect. The t-test revealed a significant difference of about 10 years between developed and developing countries in average life span. From the regression findings we can see that some variables had a greater effect on life-span. Schooling was seen to have the greatest effect on lifespan with a positive relationship of more years of schooling associated with longer lifespan. Alcohol consumption exhibited a negative relationship with increased consumption leading to decreased lifespan. We can also see a lack of relationship from government expenditure into healthcare effecting average lifespan. This analysis can provide valuable insight into where countries should focus efforts if they want to increase average lifespan of their residents. Schooling has the strongest effect. Government officials can focus policies to where data shows positive relationships and decreases spending where there is negative or negligible changes.

Global Life Expectancy

Research Question - What socioeconomic and health factors are most strongly associated with life expectancy across countries, and do these relationships differ between developed and developing nations?

What are we asking? - Is GDP a significant predictor of life expectancy? - Does “Schooling” predict how long people live / does it affect life expectancy? - Does healthcare spending matter? - Is the gap between Developed and Developing nations satistically significant?

Overview: Data Collection

  • Source: World Health Organization (WHO) and United Nations
  • Time Period: 2000 – 2015
  • Countries: 193 countries
  • Observations: 2,938 (one row per country per year)
  • Study Type: Observational
  • Variables Life expectancy, mortality, vaccination rates, GDP, income composition…

The WHO collected health variables (life expectancy, mortality, vaccination rates), while the UN provided economic indicators (GDP, income composition).

Overview: Variables

Response Variable:

  • life_expectancy — average life expectancy in years

Explanatory Variables:

  • GDP — GDP per capita in USD
  • Schooling — average years of schooling
  • total_expenditure — govt health spending as % of total expenditure
  • Alcohol — alcohol consumption per capita in litres
  • BMI — average BMI of population
  • Status — Developed vs Developing

Summary Statistics : Response Variable

Variable Min Q1 Median Mean Q3 Max NAs
Life Expectancy (years) 36.3 63.1 72.1 69.22 75.7 89 0

Summary Statistics: Explanatory Variables

Variable Min Q1 Median Mean Q3 Max NAs
Alcohol 0.01 0.90 3.77 4.61 7.72 17.87 193
BMI 1.00 19.30 43.35 38.24 56.10 77.60 32
GDP 1.68 463.85 1764.97 7494.21 5932.90 119172.74 443
Schooling 0.00 10.10 12.30 12.00 14.30 20.70 160
total_expenditure 0.37 4.26 5.75 5.93 7.49 17.60 226

Development Status Breakdown

Status Count Percentage
Developed 512 17.5%
Developing 2416 82.5%

Distribution of Life Expectancy

Life Expectancy by Development Status

Life Expectancy vs GDP

Life Expectancy vs Schooling

Hypothesis Test: T-Test

H0: No difference in mean life expectancy between developed and developing countries

H1: Developed countries have significantly higher mean life expectancy

t.test(life_expectancy ~ Status, data = df)
## 
##  Welch Two Sample t-test
## 
## data:  life_expectancy by Status
## t = 47.868, df = 1807, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Developed and group Developing is not equal to 0
## 95 percent confidence interval:
##  11.59118 12.58159
## sample estimates:
##  mean in group Developed mean in group Developing 
##                 79.19785                 67.11147

T-Test: Interpretation

Metric Value
T-Statistic 47.868
P-Value 1.976263e-323
95% CI [11.59, 12.58]
Mean (Developed) 79.2
Mean (Developing) 67.11
  • P-value < 0.05 → Reject the null hypothesis
  • The difference in life expectancy between developed and developing nations is statistically significant
  • Developed countries live on average ~10 years longer than developing countries

Multiple Linear Regression

model <- lm(life_expectancy ~ GDP + Schooling + total_expenditure +
              Alcohol + BMI + Status, data = df)
summary(model)
## 
## Call:
## lm(formula = life_expectancy ~ GDP + Schooling + total_expenditure + 
##     Alcohol + BMI + Status, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.9848  -3.2022   0.4901   3.8236  30.6806 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        4.946e+01  8.089e-01  61.147  < 2e-16 ***
## GDP                7.633e-05  1.009e-05   7.563 5.66e-14 ***
## Schooling          1.666e+00  5.368e-02  31.041  < 2e-16 ***
## total_expenditure -1.145e-01  5.463e-02  -2.096   0.0362 *  
## Alcohol           -3.159e-01  4.119e-02  -7.669 2.54e-14 ***
## BMI                1.103e-01  7.598e-03  14.518  < 2e-16 ***
## StatusDeveloping  -3.595e+00  4.382e-01  -8.204 3.81e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.917 on 2301 degrees of freedom
##   (620 observations deleted due to missingness)
## Multiple R-squared:  0.6288, Adjusted R-squared:  0.6279 
## F-statistic: 649.7 on 6 and 2301 DF,  p-value: < 2.2e-16

Regression Interpertation

Metric Value
R-Squared 0.6288
Adjusted R-Squared 0.6279
  • R-Squared: Our model explains 62.9% of the variation in life expectancy
  • Adjusted R-Squared accounts for the number of variables — more reliable than R-Squared alone
  • Schooling and Status are the strongest predictors (*** in regression output)

Residual Analysis

  • Model assumptions are met, random scattering around zero
  • Few Outliers in QQ plot
  • Minor deviations at the end and beginning, follows line closely

Conclusion

Key Findings:

  • Schooling and GDP are the strongest predictors of life expectancy
  • The ~10 year gap between developed and developing nations is statistically significant
  • Our model explains approximately 62.9% of variation in life expectancy

Limitations:

  • Observational data — association not causation
  • Missing values in GDP and Schooling may introduce bias
  • Country-level averages hide within-country inequality
  • Data only goes to 2015

References