This is a statistical improvement as Individual context matters. This analysis aims to identify countries currently most in need of healthcare or economic interventions, using a country-specific approach that focuses on intrinsic data rather than regional or neighbor-based comparisons
We will first take the top 50 countries with the lowest life expectancy and GDP per ca-pita in the latest available year data.
Why this is important: * Lower GDP and life expectancy means that they have poor healthcare systems, inadequate nutrition and limited access to basic services. * Small investments may yield significant improvements in these countries because of the low baseline conditions.
Hence, This data helps to prioritize countries where the funds can have the greatest proportional impact.
alldata <- gapminder_unfiltered
latest_year_data <- gapminder_unfiltered |>
filter(year == max(year))
countries_most_in_need_10 <- latest_year_data |>
arrange(lifeExp, gdpPercap) |>
head(50) |>
select(country,continent,lifeExp,gdpPercap,pop, year)
ggplot(countries_most_in_need_10, aes(x = gdpPercap, y = lifeExp)) +
geom_point(color = "blue") + # Blue points for simplicity
labs(
title = "Top 50 Countries by Lowest Life Expectancy and GDP per Capita",
x = "GDP per Capita",
y = "Life Expectancy"
) +
theme_minimal()
print(head(countries_most_in_need_10))
## # A tibble: 6 × 6
## country continent lifeExp gdpPercap pop year
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Swaziland Africa 39.6 4513. 1133066 2007
## 2 Mozambique Africa 42.1 824. 19951656 2007
## 3 Zambia Africa 42.4 1271. 11746035 2007
## 4 Sierra Leone Africa 42.6 863. 6144562 2007
## 5 Lesotho Africa 42.6 1569. 2012649 2007
## 6 Angola Africa 42.7 4797. 12420476 2007
The table above is giving more importance to life Exp and we are not taking gdp per ca-pita in consideration so we can combine both using weighted sum to get the countries which have the lowest gdp and life expectancy when both are considered equally.
50% gdp per ca-pita and 50% life expectancy
# Combine GDP per capita and life expectancy using weighted sum
top_50_countries <- countries_most_in_need_10 |>
mutate(combined_score = 0.5 * gdpPercap + 0.5 * lifeExp) |>
arrange(combined_score)
print(top_50_countries)
## # A tibble: 50 × 7
## country continent lifeExp gdpPercap pop year combined_score
## <fct> <fct> <dbl> <dbl> <int> <int> <dbl>
## 1 Congo, Dem. Rep. Africa 46.5 278. 6.46e7 2007 162.
## 2 Liberia Africa 45.7 415. 3.19e6 2007 230.
## 3 Burundi Africa 49.6 430. 8.39e6 2007 240.
## 4 Zimbabwe Africa 43.5 470. 1.23e7 2007 257.
## 5 Guinea-Bissau Africa 46.4 579. 1.47e6 2007 313.
## 6 Niger Africa 56.9 620. 1.29e7 2007 338.
## 7 Eritrea Africa 58.0 641. 4.91e6 2007 350.
## 8 Ethiopia Africa 52.9 691. 7.65e7 2007 372.
## 9 Central African Repu… Africa 44.7 706. 4.37e6 2007 375.
## 10 Malawi Africa 48.3 759. 1.33e7 2007 404.
## # ℹ 40 more rows
We could identify that the majority of the low GDP and low expectancy countries are in Africa and Myanmar and Afganistan outside of Africa.
This is necessary because if the life expectancy doesn’t increase with GDP per ca-pita then we have correctly weighed them by giving 50-50 weightage. But if they have a strong correlation then we can give a higher weight to life expectancy as that is one of the main variables we are trying to maximize.
correlation <- cor(top_50_countries$gdpPercap, top_50_countries$lifeExp)
print(paste("Correlation between GDP and Life Expectancy is ",correlation))
## [1] "Correlation between GDP and Life Expectancy is 0.0252053612609126"
There is a weak correlation between the two so it doesn’t make sense to adjust the weightage.
So we can keep it at 50:50
This is a statistical improvement as a combined score of life expectancy and gdp is a more robust metric for analysis than assessing them independently as was done in the Week 6 notebook
It gives a more holistic evaluation of the current situation of the countries.
We will see which countries are already improving their life expectancy in the bottom 50 so that we can omit those countries.
top_20_countries <- top_50_countries |> arrange(combined_score) |> head(20)
yearly_improvement <- gapminder_unfiltered |>
group_by(country) |>
arrange(country, year) |>
mutate(life_expectancy_change = lifeExp - lag(lifeExp)) |>
filter(country %in% top_20_countries$country) |>
filter(!is.na(life_expectancy_change))
ggplot(yearly_improvement, aes(x = year, y = lifeExp, color = country)) +
geom_line(size = 0.5) +
geom_point(aes(size = abs(life_expectancy_change)), alpha = 1) +
labs(
title = "Yearly Improvement in Life Expectancy for Bottom 10 Countries",
x = "Year",
y = "Life Expectancy",
color = "Country"
) +
theme_minimal() +
theme(legend.position = "left") +
scale_size_continuous(name = "Yearly Change Magnitude")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
top_20_countries <- yearly_improvement |> summarise(total_improvement = sum(life_expectancy_change, na.rm = TRUE)) |> arrange(total_improvement)
print(top_20_countries)
## # A tibble: 20 × 2
## country total_improvement
## <fct> <dbl>
## 1 Zimbabwe -4.96
## 2 Rwanda 6.24
## 3 Liberia 7.20
## 4 Congo, Dem. Rep. 7.32
## 5 Central African Republic 9.28
## 6 Burundi 10.5
## 7 Mozambique 10.8
## 8 Malawi 12.0
## 9 Sierra Leone 12.2
## 10 Guinea-Bissau 13.9
## 11 Afghanistan 15.0
## 12 Somalia 15.2
## 13 Ethiopia 18.9
## 14 Niger 19.4
## 15 Togo 19.8
## 16 Mali 20.8
## 17 Eritrea 22.1
## 18 Guinea 22.4
## 19 Myanmar 25.8
## 20 Gambia 29.4
countries_needing_help <- merge(top_20_countries, top_50_countries, by="country")
We could see that Zimbabwe has reduced life expectancy and the other countries have shown minimal improvement in comparison.So we can recommend the following countries to the UN
We do this so we can assess the uncertainty in the combined scores and provide estimates for improvements.
We consider the combined score for bootstrapping rather than the individual columns which will give a more whole image of how accurate our metric of improvement index is
We have also improved the visualization previously adding the magnitude of the change in the life expectancy and taking the absolute change in life expectancy as bubbles
countries_of_interest <- data.frame(
country = c("Zimbabwe", "Rwanda", "Liberia", "Democratic Republic of Congo", "Central African Republic")
)
bootstrap_input_data <- gapminder |>
filter(country %in% countries_of_interest$country) |>
mutate(combined_score = 0.5 * gdpPercap + 0.5 * lifeExp) |>
arrange(combined_score)
n_iterations <- 100
bootstrap_results <- replicate(n_iterations, {
resampled_data <- bootstrap_input_data[sample(nrow(bootstrap_input_data), replace = TRUE), ]
mean_combined_score <- mean(resampled_data$combined_score, na.rm = FALSE)
return(mean_combined_score)
})
ci_lower <- quantile(bootstrap_results, 0.025, na.rm = TRUE)
ci_upper <- quantile(bootstrap_results, 0.975, na.rm = TRUE)
bootstrap_df <- data.frame(bootstrap_results)
ggplot(bootstrap_df, aes(x = bootstrap_results)) +
geom_histogram(bins = 30, fill = "skyblue", color = "black") +
geom_vline(aes(xintercept = ci_lower), color = "red", linetype = "dashed") +
geom_vline(aes(xintercept = ci_upper), color = "red", linetype = "dashed") +
labs(
title = "Bootstrap Distribution of Combined Scores for Recommended Countries",
x = "Combined Score",
y = "Frequency"
) +
theme_minimal()
The confidence interval is more spread out so there is a slight amount of uncertainty and additional data might be required to reliably use this data.