Economic Development and life Expectancy across countries!
Question:
Do richer countries live longer?
Introduction
This project uses data from the World Bank’s World Development Indicators to examine whether richer countries tend to have higher life expectancy than poorer ones. The dataset includes categorical variables such as country name and income group, and quantitative variables such as GDP per capita, life expectancy, population, school enrollment, and unemployment rate.
The goal of my project analyse is to understand how economic and social conditions influence life expectancy across countries.
Source: World Bank, World Development Indicators
library(readr)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(scales)
Attaching package: 'scales'
The following object is masked from 'package:readr':
col_factor
library(RColorBrewer)
Data cleaning
The data was downloaded from the World Bank. I selected the year 2020, renamed the variables, merged the datasets by country, added income group from the metadata file, and removed missing values.
New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (65): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ... lgl (2): 2025,
...71
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`
life <-read_csv("~/Downloads/project 2/API_SP.DYN.LE00.IN_DS2_en_csv_v2_163.csv", skip =4)
New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (65): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ... lgl (2): 2025,
...71
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`
pop <-read_csv("~/Downloads/project 2/API_SP.POP.TOTL_DS2_en_csv_v2_58.csv", skip =4)
New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (65): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ... lgl (2): 2025,
...71
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`
school <-read_csv("~/Downloads/project 2/API_SE.SEC.ENRR_DS2_en_csv_v2_758.csv", skip =4)
New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (56): 1970,
1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, ... lgl (11): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, ...71
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`
New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (35): 1991,
1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, ... lgl (32): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`
meta <-read_csv("~/Downloads/project 2/Metadata_Country_API_NY.GDP.PCAP.CD_DS2_en_csv_v2_245.csv")
New names:
Rows: 265 Columns: 6
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(5): Country Code, Region, IncomeGroup, SpecialNotes, TableName lgl (1): ...6
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...6`
# Keep only 2020gdp <- gdp %>%select(`Country Name`, `Country Code`, `2020`) %>%rename(GDP_per_capita =`2020`)life <- life %>%select(`Country Name`, `Country Code`, `2020`) %>%rename(Life_expectancy =`2020`)pop <- pop %>%select(`Country Name`, `Country Code`, `2020`) %>%rename(Population =`2020`)school <- school %>%select(`Country Name`, `Country Code`, `2020`) %>%rename(School_enrollment =`2020`)unemp <- unemp %>%select(`Country Name`, `Country Code`, `2020`) %>%rename(Unemployment_rate =`2020`)meta_small <- meta %>%select(`Country Code`, IncomeGroup)# Merge everythingproject2_data <- gdp %>%inner_join(life, by =c("Country Name", "Country Code")) %>%inner_join(pop, by =c("Country Name", "Country Code")) %>%inner_join(school, by =c("Country Name", "Country Code")) %>%inner_join(unemp, by =c("Country Name", "Country Code")) %>%inner_join(meta_small, by ="Country Code") %>%filter(!is.na(GDP_per_capita),!is.na(Life_expectancy),!is.na(Population),!is.na(School_enrollment),!is.na(Unemployment_rate),!is.na(IncomeGroup) )glimpse(project2_data)
The dependent variable in this project is life expectancy. The independent variables are GDP per capita, school enrollment, and unemployment rate.
model <-lm(Life_expectancy ~ GDP_per_capita + School_enrollment + Unemployment_rate,data = project2_data)summary(model)
Call:
lm(formula = Life_expectancy ~ GDP_per_capita + School_enrollment +
Unemployment_rate, data = project2_data)
Residuals:
Min 1Q Median 3Q Max
-13.4412 -1.9395 0.5143 2.4052 8.4438
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.069e+01 1.485e+00 40.882 < 2e-16 ***
GDP_per_capita 1.353e-04 2.076e-05 6.518 1.61e-09 ***
School_enrollment 1.217e-01 1.864e-02 6.529 1.53e-09 ***
Unemployment_rate 1.212e-02 6.802e-02 0.178 0.859
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.859 on 124 degrees of freedom
Multiple R-squared: 0.646, Adjusted R-squared: 0.6374
F-statistic: 75.41 on 3 and 124 DF, p-value: < 2.2e-16
Visualization
This graph shows the relationship between GDP per capita and life expectancy across countries. Income group is shown with color, and population is shown with point size.
final_plot <-ggplot( project2_data,aes(x = GDP_per_capita,y = Life_expectancy,color = IncomeGroup,size = Population,text =paste("Country:", `Country Name`,"<br>Income Group:", IncomeGroup,"<br>GDP per capita:", round(GDP_per_capita, 2),"<br>Life expectancy:", round(Life_expectancy, 2),"<br>Population:", comma(Population) ) )) +geom_point(alpha =0.8) +scale_color_brewer(palette ="Set2") +labs(title ="Do Richer Countries Live Longer?",x ="GDP per Capita (USD)",y ="Life Expectancy (Years)",color ="Income Group",size ="Population",caption ="Source: World Bank, World Development Indicators" ) +theme_minimal()final_plot
This interactive version of the graph allows you professor to see and go over each point to see detailed information about each country. I found it interesting to implementing in.
ggplotly(final_plot, tooltip ="text")
Final Analysis
The results suggest that richer countries generally tend to have higher life expectancy. The visualization shows a positive relationship between GDP per capita and life expectancy, meaning that countries with higher income levels usually have longer average lifespans. The regression model helps show whether GDP per capita, school enrollment, and unemployment rate are useful in explaining differences in life expectancy across countries.One limitation of this analysis is that other important variables, such as healthcare spending or access to clean water, were not included.