# Load necessary libraries
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(gapminder)
## Warning: package 'gapminder' was built under R version 4.4.2
# Load Gapminder data
data("gapminder")
# Filter the data to include only countries with GDP > the median GDP
high_gdp <- gapminder %>%
filter(gdpPercap > median(gdpPercap))
# Identify countries with low life expectancy (below median)
low_life_expectancy <- high_gdp %>%
filter(lifeExp < median(lifeExp))
# Top 10 countries with low life expectancy despite high GDP
top_10_countries <- low_life_expectancy %>%
arrange(lifeExp) %>%
head(10)
# View the results
print(top_10_countries)
## # A tibble: 10 Ă— 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Angola Africa 1957 32.0 4561361 3828.
## 2 Angola Africa 1962 34 4826015 4269.
## 3 Angola Africa 1967 36.0 5247469 5523.
## 4 Gabon Africa 1952 37.0 420702 4293.
## 5 Angola Africa 1972 37.9 5894858 5473.
## 6 Gabon Africa 1957 39.0 434904 4976.
## 7 Swaziland Africa 2007 39.6 1133066 4513.
## 8 Saudi Arabia Asia 1952 39.9 4005677 6460.
## 9 Gabon Africa 1962 40.5 455661 6631.
## 10 Angola Africa 2007 42.7 12420476 4797.
Life Expectancy: This is a key variable for identifying health outcomes.
GDP per Capita: This represents economic
strength.
We will assume that higher GDP generally correlates with better health
outcomes, but in this case, we are interested in identifying countries
where life expectancy is low despite high GDP, potentially due to
factors like healthcare access, inequality, or other socio-economic
issues.
Objective:Our goal is to:
Identify the top 10 countries with low life expectancy despite having high GDP.
Analyze the relationship between life expectancy and GDP.
Visualize trends that could highlight any anomalies where countries with high GDP have relatively low life expectancy.
Success Criteria
We can visualize the relationship between GDP per capita and life expectancy using a scatter plot. The plot will help identify outliers where countries have high GDP but low life expectancy.
#Scatter plot for GDP vs Life Expectancy
ggplot(gapminder, aes(x=gdpPercap, y=lifeExp, color=continent)) +
geom_point(alpha=0.6) +
geom_smooth() +
scale_x_log10() + # Log scale for GDP
labs(title="GDP per Capita vs Life Expectancy",
x="GDP per Capita (log scale)",
y="Life Expectancy") +
theme_minimal()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Observation
The positive correlation indicates that higher GDP per capita generally leads to higher life expectancy.Data points are color-coded by continent, showing different trends across regions.Africa tends to have lower life expectancy even at lower GDP per capita compared to other continents.The trend suggests that wealthier nations invest more in healthcare, infrastructure, and social services, contributing to better life expectancy outcomes.
A boxplot of life expectancy by continent can help identify regional trends. It may also highlight countries that deviate from their continent’s average life expectancy despite high GDP.
# Boxplot for life expectancy by continent
ggplot(gapminder, aes(x=continent, y=lifeExp, fill=continent)) +
geom_boxplot() +
labs(title="Life Expectancy by Continent", x="Continent", y="Life Expectancy") +
theme_minimal()
We can perform a correlation analysis between GDP and life expectancy
to measure the strength of their relationship.
# Correlation between GDP and life expectancy
correlation <- cor(gapminder$gdpPercap, gapminder$lifeExp, use="complete.obs")
print(paste("Correlation between GDP and Life Expectancy: ", round(correlation, 2)))
## [1] "Correlation between GDP and Life Expectancy: 0.58"
The moderate positive correlation of 0.58 indicates that as GDP increases, life expectancy tends to improve and it also depends on other factors also.
Observation
This boxplot displays the distribution of life expectancy values across
different continents.Europe and Oceania have higher median life
expectancy, with less spread.Africa shows the lowest median life
expectancy and a wide range, indicating variability in healthcare and
living conditions.Asia and the Americas have a wider spread of values,
suggesting differing conditions across countries within each
continent.This visualization highlights the global disparity in life
expectancy, especially between Africa and other regions.
We are using the composite score to combine GDP and life expectancy into a single metric, allowing for a balanced comparison of countries with high economic strength but low health outcomes. This helps identify countries that may require targeted interventions despite their economic success.
Rank Score FormulaThe formula for calculating the
rank_score
is as follows:
\[ \text{rank\_score} = \left( \frac{\text{GDP} - \min(\text{GDP})}{\max(\text{GDP}) - \min(\text{GDP})} \right) + \left( 1 - \frac{\text{Life Expectancy} - \min(\text{Life Expectancy})}{\max(\text{Life Expectancy}) - \min(\text{Life Expectancy})} \right) \]
# Load necessary libraries
library(ggplot2)
library(dplyr)
library(gapminder)
# Load Gapminder data
data("gapminder")
# Filter the data to include only countries with GDP > the median GDP
high_gdp <- gapminder %>%
filter(gdpPercap > median(gdpPercap))
# Identify countries with low life expectancy (below median)
low_life_expectancy <- high_gdp %>%
filter(lifeExp < median(lifeExp))
# Normalize GDP and Life Expectancy for equal weightage
low_life_expectancy <- low_life_expectancy %>%
mutate(
gdp_normalized = (gdpPercap - min(gdpPercap)) / (max(gdpPercap) - min(gdpPercap)),
life_expectancy_normalized = 1 - (lifeExp - min(lifeExp)) / (max(lifeExp) - min(lifeExp)),
rank_score = gdp_normalized + life_expectancy_normalized
)
# Rank countries based on the composite rank_score
low_life_expectancy <- low_life_expectancy %>%
group_by(country)|>
filter(year == max(year))|>
mutate(rank = rank(-rank_score)) # Higher score -> Higher rank
# Top 10 countries with low life expectancy despite high GDP
top_10_countries <- low_life_expectancy %>%
arrange(desc(rank_score)) %>%
head(10)
# Create a bar plot to visualize the results
ggplot(top_10_countries, aes(x = reorder(country, rank_score), y = rank_score, fill = rank_score)) +
geom_bar(stat = "identity") +
labs(title = "Top 10 Countries with High GDP and Low Life Expectancy",
x = "Country",
y = "Composite Rank Score") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_gradient(low = "red", high = "green")
Observation
The bar chart ranks the top 10 countries with high GDP but low
life expectancy based on a composite score.Countries like Congo, Rep.,
Gabon, and Namibia are highlighted with relatively low life expectancy
despite high GDP.The countries’ rank scores suggest that GDP alone is
insufficient for improving life expectancy.The graph emphasizes the
importance of other factors, such as healthcare access and wealth
distribution, in determining life expectancy.These nations may face
challenges like disease, corruption, or unequal resource distribution,
which hinder improvements in public health.
Overcoming biases (existing or potential):
The analysis may overlook factors like income inequality or healthcare disparities. To mitigate bias, additional data on socio-economic variables should be included.
Possible risks or societal implications:
The findings might lead to stigmatization of certain countries, reinforcing negative stereotypes. Clear communication is needed to highlight the complexity of health disparities beyond GDP alone.
Crucial issues which might not be measurable:
Social, cultural, and healthcare system quality factors are often unquantifiable but crucial to health outcomes. These should be acknowledged, and qualitative insights could complement the data.
Who would be affected by this project, and how does that affect your critique?
Policymakers may make decisions based on flawed interpretations of data. It’s crucial to ensure that recommendations are contextually sensitive and do not harm vulnerable populations.