DSLabs

Author

Sam Rajabian

Load libraries

library(tidyverse)
library(dslabs)
library(highcharter)
library(RColorBrewer)
#data(package="dslabs")

Inspect dataset

#str(gapminder)
head(gapminder)
              country year infant_mortality life_expectancy fertility
1             Albania 1960           115.40           62.87      6.19
2             Algeria 1960           148.20           47.50      7.65
3              Angola 1960           208.00           35.98      7.32
4 Antigua and Barbuda 1960               NA           62.97      4.43
5           Argentina 1960            59.87           65.39      3.11
6             Armenia 1960               NA           66.86      4.55
  population          gdp continent          region
1    1636054           NA    Europe Southern Europe
2   11124892  13828152297    Africa Northern Africa
3    5270844           NA    Africa   Middle Africa
4      54681           NA  Americas       Caribbean
5   20619075 108322326649  Americas   South America
6    1867396           NA      Asia    Western Asia

Filter only 2011 and add GDP (trillions) column

gapminder_df <- gapminder |>
  filter(year == 2011) |>
  mutate(gdp_trillions = gdp / 10^12)

Plot the data

#Additional Highcharter help from https://jkunst.com/highcharter/articles/highcharter.html

hchart(
  gapminder_df,
  "scatter",
  hcaes(x = gdp_trillions, y = infant_mortality, group = continent)) |>
  hc_title(text = "Infant Mortality vs GDP (2011)") |>
  hc_xAxis(type = "logarithmic", 
           title = list(text="GDP in $ trillions (log scale)")) |>
  hc_yAxis(title = list(text="Infant Mortality per 1000")) |>
  hc_colors(brewer.pal(5, "Dark2")) |>
  hc_tooltip(borderColor = "black",
             pointFormat = "{point.country}<br>
             GDP ($ trillions):{point.gdp_trillions:.4f}<br>
             IMR:{point.infant_mortality}") |>
  hc_add_theme(hc_theme_538())

Paragraph

I chose the gapminder DSLabs dataset for this assignment, specifically filtering only data from 2011 because this is the most recent year with GDP information. The dataset includes countries’ health and economic information from 1960-2016. With this information I plotted the infant mortality rate versus the total GDP in trillions for each country on a scatterplot, differentiating continent by point color. I used the log scale for GDP to more clearly display the wide range in country GDP. I also included interactivity through the highcharter library to see the country names. The plot shows that wealthier countries generally have fewer infant deaths.