Economic Development and Public Health Across Countries
Question:
Does access to clean water improve life expectancy across countries ?
Introduction
This project analyzes the relationship between public health infrastructure, economic development, and life expectancy across countries. The main goal is to understand whether countries with better access to clean drinking water, improved sanitation, higher healthcare spending, and stronger economic conditions tend to have higher life expectancy.
The dataset comes from the World Bank World Development Indicators database. The variables used in this project include country name, income group, region, life expectancy at birth, access to basic drinking water services, access to basic sanitation services, GDP per capita, current health expenditure per capita, and population.
I chose this topic because access to clean water and healthcare are important global issues. Many countries still face inequalities in public health and economic development, and this project helps show how those differences may affect people’s quality of life.
## Dataset Source
library(readr)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2)library(scales)
Attaching package: 'scales'
The following object is masked from 'package:readr':
col_factor
library(broom)
Data Cleaning
The data was downloaded from the World Bank. I selected the year 2020, renamed the variables, merged the datasets by country,
life <-read_csv("~/Downloads/Life expectancy at birth, total .csv", skip =4)
New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (65): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ... lgl (2): 2025,
...71
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`
water <-read_csv("~/Downloads/People using at least basic drinking water services .csv", skip =4)
New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (25): 2000,
2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, ... lgl (42): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`
sanitation <-read_csv("~/Downloads/People using at least basic sanitation services (% of population).csv", skip =4)
New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (25): 2000,
2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, ... lgl (42): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`
gdp <-read_csv("~/Downloads/GDP per capita (current US$).csv", skip =4)
New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (65): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ... lgl (2): 2025,
...71
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`
health <-read_csv("~/Downloads/Current health expenditure per capita (current US$).csv", skip =4)
New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (25): 2000,
2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, ... lgl (42): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`
population <-read_csv("~/Downloads/ppulation.csv", skip =4)
New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (65): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ... lgl (2): 2025,
...71
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`
New names:
Rows: 265 Columns: 6
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(5): Country Code, Region, IncomeGroup, SpecialNotes, TableName lgl (1): ...6
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...6`
Country Name Country Code life_expectancy clean_water_access
Length:229 Length:229 Min. :50.60 Min. : 35.51
Class :character Class :character 1st Qu.:66.36 1st Qu.: 83.70
Mode :character Mode :character Median :72.14 Median : 94.31
Mean :71.61 Mean : 88.42
3rd Qu.:77.16 3rd Qu.: 99.08
Max. :86.09 Max. :100.00
sanitation_access gdp_per_capita health_expenditure population
Min. : 9.013 Min. : 255.8 Min. : 12.80 Min. :1.040e+04
1st Qu.: 59.711 1st Qu.: 1993.6 1st Qu.: 79.81 1st Qu.:3.399e+06
Median : 88.927 Median : 5299.0 Median : 356.81 Median :1.553e+07
Mean : 76.879 Mean : 14388.6 Mean : 1252.97 Mean :3.682e+08
3rd Qu.: 98.034 3rd Qu.: 15369.6 3rd Qu.: 1182.54 3rd Qu.:1.012e+08
Max. :100.000 Max. :176891.9 Max. :11648.50 Max. :7.855e+09
Region IncomeGroup
Length:229 Length:229
Class :character Class :character
Mode :character Mode :character
Call:
lm(formula = life_expectancy ~ clean_water_access + sanitation_access +
gdp_per_capita + health_expenditure, data = world_health_clean)
Residuals:
Min 1Q Median 3Q Max
-11.1699 -1.5153 0.5194 2.0465 9.0706
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.300e+01 1.957e+00 27.089 < 2e-16 ***
clean_water_access 4.648e-02 3.682e-02 1.262 0.208088
sanitation_access 1.686e-01 2.047e-02 8.240 1.44e-14 ***
gdp_per_capita 6.640e-05 1.923e-05 3.453 0.000663 ***
health_expenditure 4.618e-04 2.127e-04 2.171 0.030961 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.185 on 224 degrees of freedom
Multiple R-squared: 0.8109, Adjusted R-squared: 0.8075
F-statistic: 240.1 on 4 and 224 DF, p-value: < 2.2e-16
Visualization
The first visualization shows that countries with higher access to basic drinking water services usually have higher life expectancy. Most countries with very high clean water access are grouped near the top of the graph, meaning they tend to have longer life expectancy. This suggests that clean water is an important public health factor.
# Visualization 1: Clean Water Access and Life Expectancyggplot(world_health_clean,aes(x = clean_water_access,y = life_expectancy,color = IncomeGroup)) +geom_point(size =3, alpha =0.7) +geom_smooth(method ="lm", se =FALSE, color ="black") +scale_color_brewer(palette ="Set2") +labs(title ="Access to Clean Water and Life Expectancy",x ="Access to Basic Drinking Water Services (%)",y ="Life Expectancy at Birth (Years)",color ="Income Group",caption ="Source: World Bank World Development Indicators" ) +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 50 rows containing missing values or values outside the scale range
(`geom_point()`).
The second visualization shows the relationship between GDP per capita and life expectancy. Countries with higher GDP per capita generally have higher life expectancy. This may happen because richer countries usually have more resources for healthcare, clean water, sanitation, and public services.
# Visualization 2:# GDP per Capita and Life Expectancyggplot(world_health_clean,aes(x = gdp_per_capita,y = life_expectancy,color = Region,size = population)) +geom_point(alpha = .7) +scale_x_log10(labels = dollar) +scale_color_brewer(palette ="Dark2") +labs(title ="GDP per Capita and Life Expectancy",x ="GDP per Capita",y ="Life Expectancy",color ="Region",size ="Population",caption ="Source: World Bank World Development Indicators") +theme_light()
Warning: Removed 48 rows containing missing values or values outside the scale range
(`geom_point()`).
# Save cleaned dataset for Tableauwrite_csv(world_health_clean, "world_health_tableau.csv")
## Tableau Visualization
For Tableau, I created a world map using the cleaned dataset called `world_health_tableau.csv`.
The Tableau visualization is titled **Global Health and Economic Development Dashboard**.
In Tableau, I used:
Country Name = Location
Life Expectancy = Color
Population = Size
Region = Filter
IncomeGroup = Filter
Clean Water Access = Tooltip
GDP per Capita = Tooltip
Health Expenditure = Tooltip
The Tableau map allows viewers to compare life expectancy across countries and explore how public health and economic development indicators differ by region and income group.
Tableau Public Link: paste your Tableau link here.
Final Analysis
This project shows that life expectancy is connected to public health infrastructure and economic development. Countries with better access to clean drinking water, stronger sanitation services, higher healthcare spending, and higher GDP per capita generally tend to have longer life expectancy.
The regression model also supports this pattern. The adjusted R-squared value is about 0.8075, which means the model explains about 80.75% of the variation in life expectancy. Sanitation access, GDP per capita, and health expenditure were statistically significant predictors because their p-values were below 0.05. Clean water access had a positive relationship with life expectancy, but it was not statistically significant in this model.
Overall, the results suggest that improving health infrastructure and economic conditions can help improve life expectancy across countries. One important takeaway is that life expectancy is not only about hospitals or medicine. It is also connected to clean water, sanitation, income, public investment, and global inequality.