# I loaded the dslabs package because it contains the dataset I need
# I loaded tidyverse since it helps me manipulate data and create graphs
library(dslabs)
## Warning: package 'dslabs' was built under R version 4.5.3
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'dplyr' was built under R version 4.5.2
## Warning: package 'forcats' was built under R version 4.5.2
## Warning: package 'lubridate' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# I loaded the gapminder dataset into my workspace so I can work with it
data(gapminder)
# I used filter() to select the rows where the year is equal to 2007
# This helps me focus on a specific point so the comparison between countries is fair and does spread across different years
gapminder_2007 <- gapminder %>%
filter(year == 2007)
# I am creating a multivariable scatterplot with faceting
# This graph compares GDP and life expectancy across continents over time
# I used color to represent continent and size to represent population
# I used ggplot() to create a visualization from my filtered dataset from 2007
# I use geom_point() to plot each country as a dot
# I used alpha which made the points transparent
# This helps me compare countries at the same point in time
# I applied a log scale to GDP since the values are very large and spread out
# This makes the pattern easier to see and compare
# I added labels to help the visibility of the graph
# I used a theme minimal for the graph to be easily readable
ggplot(gapminder_2007,aes(
x = gdp,
y = life_expectancy,
color = continent,
size = population
)) +
geom_point(alpha = 0.6) +
# I used log scale again because GDP values vary a lot
scale_x_log10() +
# I split the graph into panels by continent to make comparisons clearer
facet_wrap(~continent) +
# I changed the color palette to something non-default
scale_color_brewer(palette = "Set2") +
labs(
title = "Global Development Trends Over Time",
subtitle = "GDP vs Life Expectancy Across Continents (1980–2011)",
x = "GDP (log scale)",
y = "Life Expectancy",
color = "Continent",
size = "Population",
caption = "Data Source: DS Labs (Gapminder Dataset)"
) +
# I used a non-default theme
theme_classic()
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
In this graph, I observed a positive relationship between GDP and life expectancy which means that as GDP increases the life expectancy also tends to increase. By using faceting, I was able to compare the distribution across continents and see clear differences between them. I noticed that Europe and the Americas generally have higher values, while Africa and Oceania have lower values.
I applied a log transformation to GDP help reduce skewness. In this graph found a clear association between economic development and health outcomes, as well as variation which occured across continents.
# I created a scatterplot to study the relationship between fertility and life expectancy in 2007
# I use ggplot() to start building the graph using my dataset (gapminder_2007)
# I set the fertility as the x-axis variable which is the birth rate per woman
# I set the life_expectancy as the y-axis variable which is the average lifespan
# I used the function geom_point() to create a scatterplot which plots each country as a dot on the graph.
# Each point represents a country in the year 2007
# I used geom_smooth to add a line which shows the relationship in the data
# method = "loess" also created a curved line which followed the pattern of the data
# I used labs in order to add labels and a title to make the graph more visible which helps explain with clarity the axis and graph
# I used theme_minimal to add a clean aesthetic to my plot
ggplot(gapminder_2007, aes(
x = fertility,
y = life_expectancy #
)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "loess", span = 0.15) +
labs(
title = "Fertility vs Life Expectancy (2007)",
x = "Fertility Rate",
y = "Life Expectancy",
caption = "Data Source: DS Labs (Gapminder Dataset)"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
For this assignment, I used the gapminder dataset from the dslabs package and focused only on the year 2007 in order for me to have a comparison of all countries at the same point in time. This dataset includes variables such as GDP, life expectancy, fertility rate. This dataset helped me explore the relationships between the population trends which are across different countries.
In this graph I looked at the relationship between fertility rate and life expectancy. I added a smooth trend line to show the overall pattern in the data this made it easier to see the general relationship instead of focusing just on individual countries. I also noticed that more fertility rate there is the shorter the life expectancy is. As a result this also shows a negative trend line. The results showed how economic development is connected to health outcomes and population trends across the world.