A. Introduction: To what extent do unemployment, government spending, GDP per capita, and GDP per capita growth explain
The dataset used in this project comes from the World Bank World Development Indicators and originally contained 2,359 observations and 32 variables. For this analysis, I selected eight variables that were relevant to the research question: country name, country code, year, life expectancy at birth, unemployment rate, government final consumption expenditure as a percentage of GDP, GDP per capita (constant 2015 U.S. dollars), and GDP per capita growth. Each observation represents a country in a given year, which allows comparisons across both countries and time.
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
setwd("C:/Users/mezni/OneDrive/Desktop/final project")
data <- read.csv("hehiiiiiiiiii.csv", stringsAsFactors = FALSE)
dim(data)
## [1] 2359 32
auto_data <- data %>%
select(
Country.Name,
Country.Code,
Time,
Life.expectancy.at.birth..total..years...SP.DYN.LE00.IN.,
Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS.,
General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS.,
GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD.,
GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG.
)
dim(auto_data)
## [1] 2359 8
Before running the regression, the data were cleaned by replacing the World Bank missing value symbol (“..”) with NA and removing rows with missing values. The year and all economic variables were converted to numeric format so they could be used in the regression model. After cleaning, the final dataset contained 1,290 observations. Summary statistics showed that the variables had reasonable ranges and values, indicating that the data were suitable for multiple linear regression analysis.
auto_data <- auto_data %>%
mutate(across(where(is.character), ~ na_if(., ".."))) %>% # only character columns
mutate(
Time = as.numeric(Time),
Life.expectancy.at.birth..total..years...SP.DYN.LE00.IN. =
as.numeric(Life.expectancy.at.birth..total..years...SP.DYN.LE00.IN.),
Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS. =
as.numeric(Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS.),
General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS. =
as.numeric(General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS.),
GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD. =
as.numeric(GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD.),
GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG. =
as.numeric(GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG.)
) %>%
filter(
!is.na(Time),
!is.na(Life.expectancy.at.birth..total..years...SP.DYN.LE00.IN.),
!is.na(Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS.),
!is.na(General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS.),
!is.na(GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD.),
!is.na(GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG.)
)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Time = as.numeric(Time)`.
## Caused by warning:
## ! NAs introduced by coercion
dim(auto_data)
## [1] 1290 8
summary(auto_data)
## Country.Name Country.Code Time
## Length:1290 Length:1290 Min. :2005
## Class :character Class :character 1st Qu.:2007
## Mode :character Mode :character Median :2010
## Mean :2010
## 3rd Qu.:2013
## Max. :2015
## Life.expectancy.at.birth..total..years...SP.DYN.LE00.IN.
## Min. :44.30
## 1st Qu.:70.35
## Median :74.47
## Mean :73.87
## 3rd Qu.:79.34
## Max. :84.60
## Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS.
## Min. : 0.170
## 1st Qu.: 4.119
## Median : 6.785
## Mean : 8.236
## 3rd Qu.:10.077
## Max. :45.400
## General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS.
## Min. : 5.075
## 1st Qu.: 12.328
## Median : 16.291
## Mean : 16.883
## 3rd Qu.: 19.647
## Max. :105.192
## GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD.
## Min. : 291.9
## 1st Qu.: 3245.0
## Median : 7690.2
## Mean : 17699.1
## 3rd Qu.: 27436.8
## Max. :118382.9
## GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG.
## Min. :-24.1024
## 1st Qu.: 0.4845
## Median : 2.3359
## Mean : 2.5479
## 3rd Qu.: 4.6445
## Max. : 91.7814
hist(auto_data$GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG.,
main = "Histogram of GDP Growth Rates",
xlab = "GDP Growth Rate (%)",
col = "lightblue",
border = "black")
boxplot(auto_data$GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG.,
main = "Boxplot of GDP Growth Rates",
ylab = "GDP Growth Rate (%)",
col = "orange")
A histogram and a boxplot were used to examine the distribution of GDP per capita growth. The histogram shows that most countries experience relatively small positive growth rates, while a few countries have very high or negative growth values. The boxplot confirms the presence of outliers, which is expected in cross-country economic data. These plots help provide context for why GDP per capita growth does not show a strong linear relationship with life expectancy in the regression results.
Statistical Analysis
Multiple linear regression was used to examine how unemployment, government spending, GDP per capita, and GDP per capita growth are related to life expectancy across countries. Life expectancy at birth was used as the dependent variable, while the four economic indicators were included as independent variables. This method allows the effect of each predictor on life expectancy to be examined while holding the other variables constant.
the results show that the overall regression model is statistically significant and explains about 41.7% of the variation in life expectancy across countries (R² = 0.417). GDP per capita has a positive and statistically significant relationship with life expectancy, suggesting that countries with higher income levels tend to have longer life expectancy. Unemployment is also statistically significant, with a positive coefficient, which may reflect differences in how unemployment is measured across countries and how labor markets function in more developed economies. Government spending as a percentage of GDP and GDP per capita growth are not statistically significant in this model, indicating that after controlling for income level and unemployment, they do not have a strong linear relationship with life expectancy in this dataset.
model <- lm(
Life.expectancy.at.birth..total..years...SP.DYN.LE00.IN. ~
Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS. +
General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS. +
GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD. +
GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG.,
data = auto_data
)
summary(model)
##
## Call:
## lm(formula = Life.expectancy.at.birth..total..years...SP.DYN.LE00.IN. ~
## Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS. +
## General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS. +
## GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD. +
## GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG.,
## data = auto_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.771 -1.844 1.155 3.203 7.731
##
## Coefficients:
## Estimate
## (Intercept) 6.959e+01
## Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS. 9.980e-02
## General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS. -3.901e-03
## GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD. 2.020e-04
## GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG. -2.112e-02
## Std. Error
## (Intercept) 4.099e-01
## Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS. 2.337e-02
## General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS. 1.960e-02
## GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD. 6.874e-06
## GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG. 2.941e-02
## t value
## (Intercept) 169.786
## Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS. 4.270
## General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS. -0.199
## GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD. 29.381
## GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG. -0.718
## Pr(>|t|)
## (Intercept) < 2e-16
## Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS. 2.1e-05
## General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS. 0.842
## GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD. < 2e-16
## GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG. 0.473
##
## (Intercept) ***
## Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS. ***
## General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS.
## GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD. ***
## GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG.
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.005 on 1285 degrees of freedom
## Multiple R-squared: 0.417, Adjusted R-squared: 0.4152
## F-statistic: 229.8 on 4 and 1285 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2), mar = c(4, 4, 2, 1))
plot(model)
par(mfrow = c(1, 1))
plot(auto_data$GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD.,
auto_data$Life.expectancy.at.birth..total..years...SP.DYN.LE00.IN.,
xlab = "GDP per capita (constant 2015 US$)",
ylab = "Life expectancy",
main = "Life expectancy vs GDP per capita")
abline(lm(
Life.expectancy.at.birth..total..years...SP.DYN.LE00.IN. ~ GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD.,
data = auto_data
))
cor(auto_data[, 5:8], use = "complete.obs")
## Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS.
## Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS. 1.00000000
## General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS. 0.17447664
## GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD. -0.18240250
## GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG. -0.03854393
## General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS.
## Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS. 0.17447664
## General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS. 1.00000000
## GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD. 0.11753408
## GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG. -0.09466996
## GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD.
## Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS. -0.1824025
## General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS. 0.1175341
## GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD. 1.0000000
## GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG. -0.1719687
## GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG.
## Unemployment..total....of.total.labor.force...national.estimate...SL.UEM.TOTL.NE.ZS. -0.03854393
## General.government.final.consumption.expenditure....of.GDP...NE.CON.GOVT.ZS. -0.09466996
## GDP.per.capita..constant.2015.US....NY.GDP.PCAP.KD. -0.17196867
## GDP.per.capita.growth..annual.....NY.GDP.PCAP.KD.ZG. 1.00000000
C. Regression Assumptions
The assumptions of multiple linear regression were checked using diagnostic plots. The Residuals vs Fitted plot shows that residuals are generally centered around zero, although a slight curved pattern is present, suggesting minor nonlinearity. This pattern is consistent with the scatterplot of life expectancy versus GDP per capita, which shows diminishing returns at higher income levels. The Normal Q–Q plot indicates that residuals are approximately normally distributed, with some deviations in the tails, which is acceptable given the large sample size.
The Scale–Location plot suggests that the variance of the residuals is mostly constant, with a small increase at higher fitted values. The Residuals vs Leverage plot does not show any highly influential observations based on Cook’s distance. Overall, while some assumptions are not perfectly met, the violations are mild, and the model is appropriate for this analysis.
D. Conclusion
In conclusion, this analysis suggests that economic conditions help explain differences in life expectancy across countries. GDP per capita is the strongest predictor in the model and shows a clear positive association with life expectancy. Unemployment is also significantly related to life expectancy, although its direction highlights the complexity of comparing countries with different economic structures. Government spending and GDP per capita growth were not significant predictors in this model.
These results suggest that long-term income levels are more closely related to life expectancy than short-term economic changes. Future research could include additional variables such as healthcare access, education, or regional differences, and could use methods that better account for repeated observations within the same countries over time.