DSlabs Datasets Homework

Author

Senay LK

Loading in the libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dslabs)
Warning: package 'dslabs' was built under R version 4.4.3
library(plotly) # for interactivity 
Warning: package 'plotly' was built under R version 4.4.3

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
data("gapminder")

Filtering

gapminder1 <- gapminder |> filter(!is.na(life_expectancy),country %in% c("China", "Cambodia", "Mali", "Pakistan", "United States", "Japan", "Ethiopia","Rwanda" )) # creating a subset data set by filtering specific countries and removing NA values 

Plotting

p1 <- gapminder1 |>
  ggplot(aes(x = year, y = life_expectancy, color = country)) + 
  geom_line()+ # making a line chart for the filtered countries since 1960
  scale_color_brewer(palette = "Dark2")+ # choosing Dark2 color palette
  theme_minimal() + # for a minimalistic theme
  labs(x = "Year", 
       y = "Life Expectancy(in years)",
       title = "Life Expectancy of Different Countries Since 1960",
       color = "Country") # labels

p1 <- ggplotly(p1) # using plotly for interactivity to find out more information at a specific point 
p1

Summary

For my visualization, I used the “gapminder” data set from the DS labs package. I firstly wanted to create an Alluvial of the life expectancy of the countries across the years, but I ran into difficulties so I decided to plot a line chart instead. I started off by creating a subset data set by filtering for specific countries and removing NA values from the original data set. Then, I plotted the line chart since I had trouble making an alluvial. Finally, I incorporated interactivity to mouse over the lines and find out more information. An interesting and unfortunate insight I found is the plummeting of life expectancy for Cambodia and Rwanda in 1977 and 1994 respectively. After further research, I found out this was due to genocides in both countries. This highlights the detrimental and immense effect the tragedies had on the population.