library(tidyverse)
library(gapminder)HDS 5.5.3
Begin by loading the tidyverse and gapminder packages in the code chunk above and adding your name as the author.
The penguins tutorial dplyr::case_when() teaches you how to create new variables using the case_when() function, but you should also review the simpler if_else() function in Section 5.5.3 of HDS.
We will again use the gapminder data to illustrate these functions. Each code chuck below should start with the original gapminder data frame followed by a sequence of functions to create the new data frame, connected by the pipe operator, |>.
Creating New Variables Based on Conditions
Let’s start by using the if_else() function to create a new character variable called century that is either "20th" or "21st" based on the year. Modify this code by filling in the ______ to do so:
gapminder |>
mutate(century = if_else(year <= 2000, "20th", "21st"))# A tibble: 1,704 × 7
country continent year lifeExp pop gdpPercap century
<fct> <fct> <int> <dbl> <int> <dbl> <chr>
1 Afghanistan Asia 1952 28.8 8425333 779. 20th
2 Afghanistan Asia 1957 30.3 9240934 821. 20th
3 Afghanistan Asia 1962 32.0 10267083 853. 20th
4 Afghanistan Asia 1967 34.0 11537966 836. 20th
5 Afghanistan Asia 1972 36.1 13079460 740. 20th
6 Afghanistan Asia 1977 38.4 14880372 786. 20th
7 Afghanistan Asia 1982 39.9 12881816 978. 20th
8 Afghanistan Asia 1987 40.8 13867957 852. 20th
9 Afghanistan Asia 1992 41.7 16317921 649. 20th
10 Afghanistan Asia 1997 41.8 22227415 635. 20th
# ℹ 1,694 more rows
Now create the exact same variable, but using the case_when() function:
gapminder |>
mutate(century = case_when(
year <= 2000 ~ "20th",
year > 2000 ~ "21st")
)# A tibble: 1,704 × 7
country continent year lifeExp pop gdpPercap century
<fct> <fct> <int> <dbl> <int> <dbl> <chr>
1 Afghanistan Asia 1952 28.8 8425333 779. 20th
2 Afghanistan Asia 1957 30.3 9240934 821. 20th
3 Afghanistan Asia 1962 32.0 10267083 853. 20th
4 Afghanistan Asia 1967 34.0 11537966 836. 20th
5 Afghanistan Asia 1972 36.1 13079460 740. 20th
6 Afghanistan Asia 1977 38.4 14880372 786. 20th
7 Afghanistan Asia 1982 39.9 12881816 978. 20th
8 Afghanistan Asia 1987 40.8 13867957 852. 20th
9 Afghanistan Asia 1992 41.7 16317921 649. 20th
10 Afghanistan Asia 1997 41.8 22227415 635. 20th
# ℹ 1,694 more rows
Create a new integer variable, continent_code, that assigns the integers 1-5 to each of the five continents and gives any missing values a value of 99:
gapminder <- gapminder %>%
mutate(continent_code = factor(continent, levels = c("Asia", "Europe", "Africa", "Americas", "Oceania"), labels = 1:5))Create a new integer variable, billion, that is 1 if a country has a population over one billion people, and 0 if not:
gapminder <- gapminder %>%
mutate(billion = ifelse(pop > 1000000000, 1, 0))Create a new integer variable, old, that is 1 if a country has a lifeExp over the median lifeExp, and 0 if not:
median_lifeExp <- median(gapminder$lifeExp, na.rm = TRUE)
gapminder <- gapminder %>% mutate(old = ifelse(lifeExp > median_lifeExp, 1, 0))Create a new character variable, pop_size, that is "Small", "Medium", "Large", "Really Large", or "Humongous" depending on whether the population of each country is between zero and 1 million, 1 and 10 million, 10 and 100 million, 100 million and 1 billion, or greater than 1 billion:
gapminder <- gapminder %>% mutate(pop_size = case_when(
pop <= 1000000 ~ "Small",
pop <= 10000000 ~ "Medium",
pop <= 100000000 ~ "Large",
pop <= 1000000000 ~ "Really Large",
pop > 1000000000 ~ "Humongous"
))