This analysis investigates how life expectancy varies across continents over time using the Gapminder dataset. We focus on the year 2007, analyzing the relationship between GDP per capita and life expectancy, and summarizing trends across continents.
We preprocess the data by filtering for the year 2007, selecting
relevant columns (country, continent,
lifeExp, gdpPercap, pop), and
adding a new column for GDP in billions.
# Load necessary libraries
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
library(gapminder)
## Warning: package 'gapminder' was built under R version 4.4.2
# Step 1: Filter data for the year 2007
data_2007 <- gapminder %>%
filter(year == 2007)
# Step 2: Select relevant columns
# Verify that `pop` exists in the dataset
if (!"pop" %in% colnames(gapminder)) {
stop("The 'pop' column is missing from the dataset.")
}
selected_data <- data_2007 %>%
select(country, continent, lifeExp, gdpPercap, pop)
# Step 3: Add a new column for GDP in billions
mutated_data <- selected_data %>%
mutate(gdp_in_billions = gdpPercap * pop / 1e9)
head(mutated_data)
## # A tibble: 6 × 6
## country continent lifeExp gdpPercap pop gdp_in_billions
## <fct> <fct> <dbl> <dbl> <int> <dbl>
## 1 Afghanistan Asia 43.8 975. 31889923 31.1
## 2 Albania Europe 76.4 5937. 3600523 21.4
## 3 Algeria Africa 72.3 6223. 33333216 207.
## 4 Angola Africa 42.7 4797. 12420476 59.6
## 5 Argentina Americas 75.3 12779. 40301927 515.
## 6 Australia Oceania 81.2 34435. 20434176 704.
The table below summarizes the average and median life expectancy and GDP per capita for each continent.
summary_table <- mutated_data %>%
group_by(continent) %>%
summarise(
avg_life_expectancy = mean(lifeExp, na.rm = TRUE),
median_life_expectancy = median(lifeExp, na.rm = TRUE),
avg_gdp_per_capita = mean(gdpPercap, na.rm = TRUE),
median_gdp_per_capita = median(gdpPercap, na.rm = TRUE)
) %>%
arrange(desc(avg_life_expectancy))
summary_table
## # A tibble: 5 × 5
## continent avg_life_expectancy median_life_expectancy avg_gdp_per_capita
## <fct> <dbl> <dbl> <dbl>
## 1 Oceania 80.7 80.7 29810.
## 2 Europe 77.6 78.6 25054.
## 3 Americas 73.6 72.9 11003.
## 4 Asia 70.7 72.4 12473.
## 5 Africa 54.8 52.9 3089.
## # ℹ 1 more variable: median_gdp_per_capita <dbl>
Below is a scatter plot showing the relationship between GDP per capita and life expectancy in 2007, colored by continent. A trendline is added to illustrate the correlation.
scatter_plot <- ggplot(mutated_data, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(size = 3, alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE, linetype = "dashed", color = "black") +
scale_x_log10() +
labs(
title = "Relationship Between GDP Per Capita and Life Expectancy (2007)",
x = "GDP Per Capita (Log Scale)",
y = "Life Expectancy",
color = "Continent"
) +
theme_minimal()
scatter_plot
## `geom_smooth()` using formula = 'y ~ x'
```markdown ## Summary and Interpretation The analysis reveals the following insights:
This study underscores the strong relationship between economic development and health outcomes. Addressing disparities in GDP per capita, especially in regions like Africa, may lead to improved life expectancy.