HDS 5.5.3

Author

Alex Sevre

library(tidyverse)
library(gapminder)

Begin by loading the tidyverse and gapminder packages in the code chunk above and adding your name as the author.

The penguins tutorial dplyr::case_when() teaches you how to create new variables using the case_when() function, but you should also review the simpler if_else() function in Section 5.5.3 of HDS.

We will again use the gapminder data to illustrate these functions. Each code chuck below should start with the original gapminder data frame followed by a sequence of functions to create the new data frame, connected by the pipe operator, |>.

Creating New Variables Based on Conditions

Let’s start by using the if_else() function to create a new character variable called century that is either "20th" or "21st" based on the year. Modify this code by filling in the ______ to do so:

gapminder |>
  mutate(century = if_else(year <= 2000, "20th", "21st"))
# A tibble: 1,704 × 7
   country     continent  year lifeExp      pop gdpPercap century
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl> <chr>  
 1 Afghanistan Asia       1952    28.8  8425333      779. 20th   
 2 Afghanistan Asia       1957    30.3  9240934      821. 20th   
 3 Afghanistan Asia       1962    32.0 10267083      853. 20th   
 4 Afghanistan Asia       1967    34.0 11537966      836. 20th   
 5 Afghanistan Asia       1972    36.1 13079460      740. 20th   
 6 Afghanistan Asia       1977    38.4 14880372      786. 20th   
 7 Afghanistan Asia       1982    39.9 12881816      978. 20th   
 8 Afghanistan Asia       1987    40.8 13867957      852. 20th   
 9 Afghanistan Asia       1992    41.7 16317921      649. 20th   
10 Afghanistan Asia       1997    41.8 22227415      635. 20th   
# ℹ 1,694 more rows

Now create the exact same variable, but using the case_when() function:

gapminder |>
  mutate(century = case_when(
    year <= 2000 ~ "20th",
    year > 2000 ~ "21st")
    )
# A tibble: 1,704 × 7
   country     continent  year lifeExp      pop gdpPercap century
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl> <chr>  
 1 Afghanistan Asia       1952    28.8  8425333      779. 20th   
 2 Afghanistan Asia       1957    30.3  9240934      821. 20th   
 3 Afghanistan Asia       1962    32.0 10267083      853. 20th   
 4 Afghanistan Asia       1967    34.0 11537966      836. 20th   
 5 Afghanistan Asia       1972    36.1 13079460      740. 20th   
 6 Afghanistan Asia       1977    38.4 14880372      786. 20th   
 7 Afghanistan Asia       1982    39.9 12881816      978. 20th   
 8 Afghanistan Asia       1987    40.8 13867957      852. 20th   
 9 Afghanistan Asia       1992    41.7 16317921      649. 20th   
10 Afghanistan Asia       1997    41.8 22227415      635. 20th   
# ℹ 1,694 more rows

Create a new integer variable, continent_code, that assigns the integers 1-5 to each of the five continents and gives any missing values a value of 99:

gapminder <- gapminder %>%
  mutate(continent_code = factor(continent, levels = c("Asia", "Europe", "Africa", "Americas", "Oceania"), labels = 1:5))

Create a new integer variable, billion, that is 1 if a country has a population over one billion people, and 0 if not:

gapminder <- gapminder %>%
  mutate(billion = ifelse(pop > 1000000000, 1, 0))

Create a new integer variable, old, that is 1 if a country has a lifeExp over the median lifeExp, and 0 if not:

median_lifeExp <- median(gapminder$lifeExp, na.rm = TRUE)
gapminder <- gapminder %>% mutate(old = ifelse(lifeExp > median_lifeExp, 1, 0))

Create a new character variable, pop_size, that is "Small", "Medium", "Large", "Really Large", or "Humongous" depending on whether the population of each country is between zero and 1 million, 1 and 10 million, 10 and 100 million, 100 million and 1 billion, or greater than 1 billion:

gapminder <- gapminder %>% mutate(pop_size = case_when(
  pop <= 1000000 ~ "Small",
  pop <= 10000000 ~ "Medium",
  pop <= 100000000 ~ "Large",
  pop <= 1000000000 ~ "Really Large",
  pop > 1000000000 ~ "Humongous"
))