library(dslabs)
## Warning: package 'dslabs' was built under R version 4.5.3
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.4     âś” readr     2.1.6
## âś” forcats   1.0.1     âś” stringr   1.6.0
## âś” ggplot2   4.0.2     âś” tibble    3.3.1
## âś” lubridate 1.9.4     âś” tidyr     1.3.2
## âś” purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data("gapminder")

# Creates a new income_per_person variable because "income" is not a preset variable
# Filters for the year 2010 and ensure there are no missing values
gap_clean <- gapminder %>%
  filter(year == 2010) %>%
  mutate(income_per_person = gdp / population) %>%  # creates a new variable
  drop_na(income_per_person, life_expectancy, region)

# Create the multivariable scatterplot
ggplot(gap_clean, aes(x = income_per_person,
                      y = life_expectancy,
                      color = region)) +
  geom_point(alpha = 0.8, size = 3) +
  scale_x_log10(labels = scales::comma) +   # log scale helps spread the data
  labs(
    title = "Income Per Person vs Life Expectancy Across World Regions (2010)",
    x = "Income Per Person (log scale)",
    y = "Life Expectancy (Years)",
    color = "World Region"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 18),
    legend.position = "right",
    panel.grid.major = element_line(color = "gray80")
  )

For this assignment I used the gapminder dataset to fulfill the task provided to us. This dataset has demographic and economic indicators for countries across the globe over the decades. These indicators include variables such as fertility, life expectancy, GDP, and population. Since the dataset doesn’t have an already existing income variable I created one by dividing the GDP by the population. From there I filtered the dataset for just the year 2010 and then removed all missing value to make sure my dataset is clean. My visualization uses income per person on the x-axis, life expectancy on the y-axis, and region as a third variable represented by the color. I also added a log scale to income in order to reduce the skew. The plot presents clear regional patterns such as Europe and East Asia tend to cluster at higher life expectancy and higher income per person compared to Africa.