Creating New Variables in R with mutate() and ifelse()

David Ranzolin
October 6, 2015

Creating New Variables in R

Creating new variables is often required for statistical modeling. A teacher, for example, may have a data frame with numeric variables (quiz scores, final grade, etc.) but wants to perform a logistic regression model with a binary variable.

mutate() and ifelse() will get it done.

mutate()

mutate() is a basic verb from the dplyr package, and numerous introductions to the package and its functions already exist. I recommend Kevin Markham's “Hands-on dplyr tutorial for faster data manipulation in R.”

Example from the mtcars dataset:

library(dplyr)
head(mutate(mtcars, disp_l = disp / 61.0237), 3)
   mpg disp   disp_l
1 21.0  160 2.621932
2 21.0  160 2.621932
3 22.8  108 1.769804

ifelse()

ifelse() is from base R. The function tests a logical condition in its first argument. If the test is TRUE, ifelse() returns the second argument. If the test is FALSE, ifelse() returns the third argument.

Example:

x <- 10
ifelse(x > 9, "x is greater than 9", "x is not greater than 9")
[1] "x is greater than 9"

In tandem: mutate() and ifelse()

mutate() and ifelse() make for a powerful combination in tandem. The combo allows users to conduct a logical test across a single variable (or vector), and then populate the fields of a new variable depending on the outcome of the tests.

In the following examples, ifelse() is called within mutate().

In tandem: mutate() and ifelse() cont.

Example:

section <- c("MATH111", "MATH111", "ENG111")
grade <- c(78, 93, 56)
student <- c("David", "Kristina", "Mycroft")
gradebook <- data.frame(section, grade, student)
mutate(gradebook, Pass.Fail = ifelse(grade > 60, "Pass", "Fail"))
  section grade  student Pass.Fail
1 MATH111    78    David      Pass
2 MATH111    93 Kristina      Pass
3  ENG111    56  Mycroft      Fail

In tandem: mutate() and ifelse() cont.

There is no limit to the amount of ifelse() clauses you can nest within a single call to mutate():

mutate(gradebook, letter = ifelse(grade %in% 60:69, "D",
                                     ifelse(grade %in% 70:79, "C",
                                            ifelse(grade %in% 80:89, "B",
                                                   ifelse(grade %in% 90:99, "A", "F")))))
  section grade  student letter
1 MATH111    78    David      C
2 MATH111    93 Kristina      A
3  ENG111    56  Mycroft      F

In tandem: mutate() and ifelse() cont.

Note how the third “no” argument (“F”) is pushed to the end of the statement:

mutate(gradebook, letter = ifelse(grade %in% 60:69, "D",
                                     ifelse(grade %in% 70:79, "C",
                                            ifelse(grade %in% 80:89, "B",
                                                   ifelse(grade %in% 90:99, "A", "F")))))

A third wheel: grepl()

If you're working with a dataframe or vector with character strings, calling grepl() within the ifelse() clause adds to dplyr's functionality. grepl() is also from base R, and searches for matches to a character string specified in the first argument. If a match is found, grepl() returns TRUE. If a match is not found, grepl() returns FALSE.

Example:

grepl("MATH", gradebook$section)
[1]  TRUE  TRUE FALSE

grepl() cont.

Creating a new variable:

mutate(gradebook, department = ifelse(grepl("MATH", section), "Math Department",
                                      ifelse(grepl("ENG", section), "English Department", "Other")))
  section grade  student         department
1 MATH111    78    David    Math Department
2 MATH111    93 Kristina    Math Department
3  ENG111    56  Mycroft English Department