# Load necessary libraries
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(gapminder)
## Warning: package 'gapminder' was built under R version 4.4.2
# Load Gapminder data
data("gapminder")

# Filter the data to include only countries with GDP > the median GDP
high_gdp <- gapminder %>%
  filter(gdpPercap > median(gdpPercap))

# Identify countries with low life expectancy (below median)
low_life_expectancy <- high_gdp %>%
  filter(lifeExp < median(lifeExp))

# Top 10 countries with low life expectancy despite high GDP
top_10_countries <- low_life_expectancy %>%
  arrange(lifeExp) %>%
  head(10)

# View the results
print(top_10_countries)
## # A tibble: 10 Ă— 6
##    country      continent  year lifeExp      pop gdpPercap
##    <fct>        <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Angola       Africa     1957    32.0  4561361     3828.
##  2 Angola       Africa     1962    34    4826015     4269.
##  3 Angola       Africa     1967    36.0  5247469     5523.
##  4 Gabon        Africa     1952    37.0   420702     4293.
##  5 Angola       Africa     1972    37.9  5894858     5473.
##  6 Gabon        Africa     1957    39.0   434904     4976.
##  7 Swaziland    Africa     2007    39.6  1133066     4513.
##  8 Saudi Arabia Asia       1952    39.9  4005677     6460.
##  9 Gabon        Africa     1962    40.5   455661     6631.
## 10 Angola       Africa     2007    42.7 12420476     4797.

Goal 1: Business Scenario

Goal 2: Model Critique

Analysis 1: GDP vs Life Expectancy Plot

We can visualize the relationship between GDP per capita and life expectancy using a scatter plot. The plot will help identify outliers where countries have high GDP but low life expectancy.

#Scatter plot for GDP vs Life Expectancy
ggplot(gapminder, aes(x=gdpPercap, y=lifeExp, color=continent)) +
  geom_point(alpha=0.6) + 
  geom_smooth() +
  scale_x_log10() +  # Log scale for GDP
  labs(title="GDP per Capita vs Life Expectancy",
       x="GDP per Capita (log scale)",
       y="Life Expectancy") +
  theme_minimal()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Observation

The positive correlation indicates that higher GDP per capita generally leads to higher life expectancy.Data points are color-coded by continent, showing different trends across regions.Africa tends to have lower life expectancy even at lower GDP per capita compared to other continents.The trend suggests that wealthier nations invest more in healthcare, infrastructure, and social services, contributing to better life expectancy outcomes.

Analysis 2: Boxplot of Life Expectancy by Continent with Correlation

A boxplot of life expectancy by continent can help identify regional trends. It may also highlight countries that deviate from their continent’s average life expectancy despite high GDP.

# Boxplot for life expectancy by continent
ggplot(gapminder, aes(x=continent, y=lifeExp, fill=continent)) +
  geom_boxplot() +
  labs(title="Life Expectancy by Continent", x="Continent", y="Life Expectancy") +
  theme_minimal()

We can perform a correlation analysis between GDP and life expectancy to measure the strength of their relationship.

# Correlation between GDP and life expectancy
correlation <- cor(gapminder$gdpPercap, gapminder$lifeExp, use="complete.obs")
print(paste("Correlation between GDP and Life Expectancy: ", round(correlation, 2)))
## [1] "Correlation between GDP and Life Expectancy:  0.58"

The moderate positive correlation of 0.58 indicates that as GDP increases, life expectancy tends to improve and it also depends on other factors also.

Observation
This boxplot displays the distribution of life expectancy values across different continents.Europe and Oceania have higher median life expectancy, with less spread.Africa shows the lowest median life expectancy and a wide range, indicating variability in healthcare and living conditions.Asia and the Americas have a wider spread of values, suggesting differing conditions across countries within each continent.This visualization highlights the global disparity in life expectancy, especially between Africa and other regions.

Analysis 3: Creating a Composite Score to identify the top Ten Countries

We are using the composite score to combine GDP and life expectancy into a single metric, allowing for a balanced comparison of countries with high economic strength but low health outcomes. This helps identify countries that may require targeted interventions despite their economic success.

Rank Score FormulaThe formula for calculating the rank_score is as follows:

\[ \text{rank\_score} = \left( \frac{\text{GDP} - \min(\text{GDP})}{\max(\text{GDP}) - \min(\text{GDP})} \right) + \left( 1 - \frac{\text{Life Expectancy} - \min(\text{Life Expectancy})}{\max(\text{Life Expectancy}) - \min(\text{Life Expectancy})} \right) \]

# Load necessary libraries
library(ggplot2)
library(dplyr)
library(gapminder)

# Load Gapminder data
data("gapminder")

# Filter the data to include only countries with GDP > the median GDP
high_gdp <- gapminder %>%
  filter(gdpPercap > median(gdpPercap))

# Identify countries with low life expectancy (below median)
low_life_expectancy <- high_gdp %>%
  filter(lifeExp < median(lifeExp))

# Normalize GDP and Life Expectancy for equal weightage
low_life_expectancy <- low_life_expectancy %>%
  mutate(
    gdp_normalized = (gdpPercap - min(gdpPercap)) / (max(gdpPercap) - min(gdpPercap)),
    life_expectancy_normalized = 1 - (lifeExp - min(lifeExp)) / (max(lifeExp) - min(lifeExp)),
    rank_score = gdp_normalized + life_expectancy_normalized
  )

# Rank countries based on the composite rank_score
low_life_expectancy <- low_life_expectancy %>%
  group_by(country)|>
  filter(year == max(year))|>
  mutate(rank = rank(-rank_score))  # Higher score -> Higher rank

# Top 10 countries with low life expectancy despite high GDP
top_10_countries <- low_life_expectancy %>%
  arrange(desc(rank_score)) %>%
  head(10)

# Create a bar plot to visualize the results
ggplot(top_10_countries, aes(x = reorder(country, rank_score), y = rank_score, fill = rank_score)) +
  geom_bar(stat = "identity") +
  labs(title = "Top 10 Countries with High GDP and Low Life Expectancy",
       x = "Country",
       y = "Composite Rank Score") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_fill_gradient(low = "red", high = "green")

Observation
The bar chart ranks the top 10 countries with high GDP but low life expectancy based on a composite score.Countries like Congo, Rep., Gabon, and Namibia are highlighted with relatively low life expectancy despite high GDP.The countries’ rank scores suggest that GDP alone is insufficient for improving life expectancy.The graph emphasizes the importance of other factors, such as healthcare access and wealth distribution, in determining life expectancy.These nations may face challenges like disease, corruption, or unequal resource distribution, which hinder improvements in public health.

Goal 3: Ethical and Epistemological Concerns

Overcoming biases (existing or potential):

The analysis may overlook factors like income inequality or healthcare disparities. To mitigate bias, additional data on socio-economic variables should be included.

Possible risks or societal implications:

The findings might lead to stigmatization of certain countries, reinforcing negative stereotypes. Clear communication is needed to highlight the complexity of health disparities beyond GDP alone.

Crucial issues which might not be measurable:

Social, cultural, and healthcare system quality factors are often unquantifiable but crucial to health outcomes. These should be acknowledged, and qualitative insights could complement the data.

Who would be affected by this project, and how does that affect your critique?

Policymakers may make decisions based on flawed interpretations of data. It’s crucial to ensure that recommendations are contextually sensitive and do not harm vulnerable populations.