DsLabs Assignment

Author

Martia Eyi

Load required packages

To begin, I load the necessary packages.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dslabs)
data(package="dslabs")
list.files(system.file("script",package = "dslabs"))
 [1] "make-admissions.R"                   
 [2] "make-brca.R"                         
 [3] "make-brexit_polls.R"                 
 [4] "make-calificaciones.R"               
 [5] "make-death_prob.R"                   
 [6] "make-divorce_margarine.R"            
 [7] "make-gapminder-rdas.R"               
 [8] "make-greenhouse_gases.R"             
 [9] "make-historic_co2.R"                 
[10] "make-mice_weights.R"                 
[11] "make-mnist_127.R"                    
[12] "make-mnist_27.R"                     
[13] "make-movielens.R"                    
[14] "make-murders-rda.R"                  
[15] "make-na_example-rda.R"               
[16] "make-nyc_regents_scores.R"           
[17] "make-olive.R"                        
[18] "make-outlier_example.R"              
[19] "make-polls_2008.R"                   
[20] "make-polls_us_election_2016.R"       
[21] "make-pr_death_counts.R"              
[22] "make-reported_heights-rda.R"         
[23] "make-research_funding_rates.R"       
[24] "make-stars.R"                        
[25] "make-temp_carbon.R"                  
[26] "make-tissue-gene-expression.R"       
[27] "make-trump_tweets.R"                 
[28] "make-weekly_us_contagious_diseases.R"
[29] "save-gapminder-example-csv.R"        

Load and preview the dataset

In this step, I load the ‘gapminder’ dataset from the ‘dslabs’ package and examine the structure to understand what variables are available.

library(ggthemes)
library(ggrepel)
data(gapminder)

Prepare the dataset

In this step, I select the variables ‘fertility’, ‘life_expectancy’, and ‘continent’ from the dataset. I also remove missing values and convert the ‘continent’ column into a factor to group the data properly.

df <- gapminder %>%
  select(fertility, life_expectancy, continent) %>%
  drop_na() %>%
  mutate(continent = as.factor(continent))

Preview cleaned data

head(df)
  fertility life_expectancy continent
1      6.19           62.87    Europe
2      7.65           47.50    Africa
3      7.32           35.98    Africa
4      4.43           62.97  Americas
5      3.11           65.39  Americas
6      4.55           66.86      Asia

Create the multivariable scatterplot

here, I visualize the relationship between fertility rate and life expectancy. I use continent as a third variable represented by color.

ggplot(df, aes(x = fertility, y = life_expectancy, color = continent)) +
  geom_point(size = 3, alpha = 0.8) +
  labs(
    title = "Fertility Rate vs Life Expectancy by Continent",
    x = "Fertility Rate (Births per Woman)",
    y = "Life Expectancy (Years)",
    color = "Continent"
  ) +
  theme_economist() +  # From ggthemes
  scale_color_brewer(palette = "Set1")

Description

For this assignment, I used the gapminder dataset from the dslabs package. This dataset includes global statistics such as fertility rate, life expectancy, GDP, and population by country and year. I chose this dataset because it allows me to explore meaningful relationships between health and development indicators across continents.

I selected three variables: fertility, life_expectancy, and continent. After checking for missing values and cleaning the data, I created a multivariable scatterplot. The x-axis represents fertility rate (births per woman), and the y-axis shows life expectancy (years). I used continent as a third variable represented by color. I applied a non-default theme (theme_economist) and changed the color palette using scale_color_brewer() to make the graph more readable and visually engaging.

The resulting plot reveals an inverse relationship between fertility and life expectancy and highlights regional differences across continents.