# I loaded the dslabs package because it contains the dataset I need
# I loaded tidyverse since it helps me manipulate data and create graphs
library(dslabs)
## Warning: package 'dslabs' was built under R version 4.5.3
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'dplyr' was built under R version 4.5.2
## Warning: package 'forcats' was built under R version 4.5.2
## Warning: package 'lubridate' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# I loaded the gapminder dataset into my workspace so I can work with it
data(gapminder)
# I used filter() to select the rows where the year is equal to 2007
# This helps me focus on a specific point so the comparison between countries is fair and does spread across different years
gapminder_2007 <- gapminder %>%
  filter(year == 2007)
# I am creating a multivariable scatterplot with faceting
# This graph compares GDP and life expectancy across continents over time

# I used color to represent continent and size to represent population
# I used ggplot() to create a visualization from my filtered dataset from 2007
# I use geom_point() to plot each country as a dot 
# I used alpha which makes the points transparent
# This helps me compare countries at the same point in time
# I applied a log scale to GDP because the values are very large and spread out
# This makes the pattern easier to see and compare
# I added labels to help the visibility of the graph
# I used a theme minimal for the graph to be easily readable

ggplot(gapminder_2007,aes(
  x = gdp,
  y = life_expectancy,
  color = continent,
  size = population
)) +
  geom_point(alpha = 0.6) +
  
  # I used log scale again because GDP values vary a lot
  scale_x_log10() +
  
  # I split the graph into panels by continent to make comparisons clearer
  facet_wrap(~continent) +
  
  # I changed the color palette to something non-default
  scale_color_brewer(palette = "Set2") +
  
  labs(
    title = "Global Development Trends Over Time",
    subtitle = "GDP vs Life Expectancy Across Continents (1980–2011)",
    x = "GDP (log scale)",
    y = "Life Expectancy",
    color = "Continent",
    size = "Population",
    caption = "Data Source: DS Labs (Gapminder Dataset)"
  ) +
  
  # I used a non-default theme
  theme_classic()
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

# I created a scatterplot to study the relationship between fertility and life expectancy in 2007
# I use ggplot() to start building the graph using my dataset (gapminder_2007)

 # I set the fertility as the x-axis variable which is the birth rate per woman
# I set the life_expectancy as the y-axis variable which is the average lifespan 

# I used the function geom_point() to create a scatterplot which plots each country as a dot on the graph.
# Each point represents a country in the year 2007

# I used geom_smooth to add a line which shows the relationship in the data 
# method = "loess" also created a curved line which followed the pattern of the data

# I used labs in order to add labels and a title to make the graph more visible which helps explain with clarity the axis and graph

# I used theme_minimal to add a clean aesthetic to my plot 

ggplot(gapminder_2007, aes(
  x = fertility,
  y = life_expectancy   #
)) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "loess", span = 0.15) +
  labs(
    title = "Fertility vs Life Expectancy (2007)",
    x = "Fertility Rate",
    y = "Life Expectancy",
    caption = "Data Source: DS Labs (Gapminder Dataset)"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Paragraph

For this assignment, I used the gapminder dataset from the dslabs package and focused only on the year 2007 so I could compare all countries at the same point in time. The dataset includes variables such as GDP, life expectancy, fertility rate, population, and continent. This dataset helped me explore the relationships between the population trends which are across different countries.

In my first graph, I created a scatterplot to compare GDP and life expectancy. I also grouped countries by continent using color which included fertility rate as the size of each point in order to show differences in the population patterns. I used a log scale for GDP due to the values being spread out between different countries, which made the graph easier to read and interpret.

In my second graph, I looked at the relationship between fertility rate and life expectancy. I added a smooth trend line to show the overall pattern in the data, which made it easier to see the general relationship instead of focusing only on individual countries. I also noticed that more fertility rate there is the shorter the life expectancy is. As a result this also shows a negative trend line. These results display how economic development is closely connected to health outcomes and population trends across the world.