Problem Definition

This analysis investigates how life expectancy varies across continents over time using the Gapminder dataset. We focus on the year 2007, analyzing the relationship between GDP per capita and life expectancy, and summarizing trends across continents.

Data Wrangling

We preprocess the data by filtering for the year 2007, selecting relevant columns (country, continent, lifeExp, gdpPercap, pop), and adding a new column for GDP in billions.

# Load necessary libraries
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
library(gapminder)
## Warning: package 'gapminder' was built under R version 4.4.2
# Step 1: Filter data for the year 2007
data_2007 <- gapminder %>%
  filter(year == 2007)

# Step 2: Select relevant columns
# Verify that `pop` exists in the dataset
if (!"pop" %in% colnames(gapminder)) {
  stop("The 'pop' column is missing from the dataset.")
}

selected_data <- data_2007 %>%
  select(country, continent, lifeExp, gdpPercap, pop)

# Step 3: Add a new column for GDP in billions
mutated_data <- selected_data %>%
  mutate(gdp_in_billions = gdpPercap * pop / 1e9)

head(mutated_data)
## # A tibble: 6 × 6
##   country     continent lifeExp gdpPercap      pop gdp_in_billions
##   <fct>       <fct>       <dbl>     <dbl>    <int>           <dbl>
## 1 Afghanistan Asia         43.8      975. 31889923            31.1
## 2 Albania     Europe       76.4     5937.  3600523            21.4
## 3 Algeria     Africa       72.3     6223. 33333216           207. 
## 4 Angola      Africa       42.7     4797. 12420476            59.6
## 5 Argentina   Americas     75.3    12779. 40301927           515. 
## 6 Australia   Oceania      81.2    34435. 20434176           704.

Summary Table Section

Summary Table

The table below summarizes the average and median life expectancy and GDP per capita for each continent.

summary_table <- mutated_data %>%
  group_by(continent) %>%
  summarise(
    avg_life_expectancy = mean(lifeExp, na.rm = TRUE),
    median_life_expectancy = median(lifeExp, na.rm = TRUE),
    avg_gdp_per_capita = mean(gdpPercap, na.rm = TRUE),
    median_gdp_per_capita = median(gdpPercap, na.rm = TRUE)
  ) %>%
  arrange(desc(avg_life_expectancy))

summary_table
## # A tibble: 5 × 5
##   continent avg_life_expectancy median_life_expectancy avg_gdp_per_capita
##   <fct>                   <dbl>                  <dbl>              <dbl>
## 1 Oceania                  80.7                   80.7             29810.
## 2 Europe                   77.6                   78.6             25054.
## 3 Americas                 73.6                   72.9             11003.
## 4 Asia                     70.7                   72.4             12473.
## 5 Africa                   54.8                   52.9              3089.
## # ℹ 1 more variable: median_gdp_per_capita <dbl>

Visualization Section

Visualization

Below is a scatter plot showing the relationship between GDP per capita and life expectancy in 2007, colored by continent. A trendline is added to illustrate the correlation.

scatter_plot <- ggplot(mutated_data, aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point(size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, linetype = "dashed", color = "black") +
  scale_x_log10() +
  labs(
    title = "Relationship Between GDP Per Capita and Life Expectancy (2007)",
    x = "GDP Per Capita (Log Scale)",
    y = "Life Expectancy",
    color = "Continent"
  ) +
  theme_minimal()

scatter_plot
## `geom_smooth()` using formula = 'y ~ x'

Summary and Interpretation

```markdown ## Summary and Interpretation The analysis reveals the following insights:

  1. Higher GDP, Higher Life Expectancy: Continents with higher average GDP per capita, such as Europe, tend to have higher average life expectancy.
  2. Positive Correlation: There is a positive correlation between GDP per capita and life expectancy, though the relationship is not perfectly linear.
  3. Disparities in Africa: Africa has the lowest average life expectancy and GDP per capita, highlighting disparities in development and health outcomes.
  4. Median vs. Average: The inclusion of median values shows trends consistent with averages but reduces the impact of outliers.

Conclusion

This study underscores the strong relationship between economic development and health outcomes. Addressing disparities in GDP per capita, especially in regions like Africa, may lead to improved life expectancy.