David Ranzolin
October 6, 2015
Creating new variables is often required for statistical modeling. A teacher, for example, may have a data frame with numeric variables (quiz scores, final grade, etc.) but wants to perform a logistic regression model with a binary variable.
mutate() and ifelse() will get it done.
mutate() is a basic verb from the dplyr package, and numerous introductions to the package and its functions already exist. I recommend Kevin Markham's “Hands-on dplyr tutorial for faster data manipulation in R.”
Example from the mtcars dataset:
library(dplyr)
head(mutate(mtcars, disp_l = disp / 61.0237), 3)
mpg disp disp_l
1 21.0 160 2.621932
2 21.0 160 2.621932
3 22.8 108 1.769804
ifelse() is from base R. The function tests a logical condition in its first argument. If the test is TRUE, ifelse() returns the second argument. If the test is FALSE, ifelse() returns the third argument.
Example:
x <- 10
ifelse(x > 9, "x is greater than 9", "x is not greater than 9")
[1] "x is greater than 9"
mutate() and ifelse() make for a powerful combination in tandem. The combo allows users to conduct a logical test across a single variable (or vector), and then populate the fields of a new variable depending on the outcome of the tests.
In the following examples, ifelse() is called within mutate().
Example:
section <- c("MATH111", "MATH111", "ENG111")
grade <- c(78, 93, 56)
student <- c("David", "Kristina", "Mycroft")
gradebook <- data.frame(section, grade, student)
mutate(gradebook, Pass.Fail = ifelse(grade > 60, "Pass", "Fail"))
section grade student Pass.Fail
1 MATH111 78 David Pass
2 MATH111 93 Kristina Pass
3 ENG111 56 Mycroft Fail
There is no limit to the amount of ifelse() clauses you can nest within a single call to mutate():
mutate(gradebook, letter = ifelse(grade %in% 60:69, "D",
ifelse(grade %in% 70:79, "C",
ifelse(grade %in% 80:89, "B",
ifelse(grade %in% 90:99, "A", "F")))))
section grade student letter
1 MATH111 78 David C
2 MATH111 93 Kristina A
3 ENG111 56 Mycroft F
Note how the third “no” argument (“F”) is pushed to the end of the statement:
mutate(gradebook, letter = ifelse(grade %in% 60:69, "D",
ifelse(grade %in% 70:79, "C",
ifelse(grade %in% 80:89, "B",
ifelse(grade %in% 90:99, "A", "F")))))
If you're working with a dataframe or vector with character strings, calling grepl() within the ifelse() clause adds to dplyr's functionality. grepl() is also from base R, and searches for matches to a character string specified in the first argument. If a match is found, grepl() returns TRUE. If a match is not found, grepl() returns FALSE.
Example:
grepl("MATH", gradebook$section)
[1] TRUE TRUE FALSE
Creating a new variable:
mutate(gradebook, department = ifelse(grepl("MATH", section), "Math Department",
ifelse(grepl("ENG", section), "English Department", "Other")))
section grade student department
1 MATH111 78 David Math Department
2 MATH111 93 Kristina Math Department
3 ENG111 56 Mycroft English Department