I’m working to become more facile with tidyeval, which is the name of a new framework for “computing on the language” (the R language, that is). The immediate motivation for tidyeval is to provide better tools for programming around packages in the tidyverse, especially dplyr and, eventually, ggplot2. To learn more:

What does this have to do with spreadsheets?

Luis Verde Arregoitia tagged me in a tweet linking to his recent blog post Using Tidy Evaluation to untangle header rows. I admire his posts on the practical problems at the interface of R and spreadsheets. In this post, he does battle with non-data rows that are embedded in the data rectangle and not necessarily at the top. This is an nasty combination of two common patterns:

For the full setup, please read his post.

What does this have to do with tidy eval?

Luis wants a function to put the “trickle down” information into its own variable and then eliminate the non-data rows. And he wants it to feel like a dplyr verb: user can provide the names of these variables in bare form, i.e. not surrounded by quotes.

What does this have to do me?

First, I am easily sucked into spreadsheet problems. Second, the best way I have found to learn something, like tidyeval, is to take every opportunity to work seemingly simple examples. If they are simple … great! You’ve gotten practice. If they are not … great! You’ve learned something.

“My” version of untangle()

So here’s my version, where “my” means I actually benefitted from very helpful input from the masterminds behind tidyeval, Lionel Henry and Hadley Wickham.

Load dplyr, tidyr, and rlang. Why rlang? dplyr re-exports the rlang functions that are most useful for “civilians”, but it was helpful here to use sym(), which is not re-exported by dplyr.

library(dplyr, warn.conflicts = FALSE)
library(tidyr)
library(rlang)

Set up a tibble with one column, which is contaminated by both species and diet information. Ultimately, only the rows holding "Sp1", "Sp2", etc. should survive as data rows, with new variables for species and diet.

(dat <- tibble(
  jumble = c("Muridae", "diet:seeds", "Sp1", "Sp2", "Sp3",
             "diet:unknown", "Sp4", "Sp5", "Cricetidae", "diet:fruits", "Sp11",
             "Sp32", "Sp113")
))
## # A tibble: 13 x 1
##          jumble
##           <chr>
##  1      Muridae
##  2   diet:seeds
##  3          Sp1
##  4          Sp2
##  5          Sp3
##  6 diet:unknown
##  7          Sp4
##  8          Sp5
##  9   Cricetidae
## 10  diet:fruits
## 11         Sp11
## 12         Sp32
## 13        Sp113

Define the new verb, untangle2(). The regex identifies cells holding group-level info in the existing variable orig. new specifies the new variable you want to create.

 untangle2 <- function(df, regex, orig, new) {
  orig <- enquo(orig)
  new <- sym(quo_name(enquo(new)))

  df %>%
    mutate(
      !!new := if_else(grepl(regex, !! orig), !! orig, NA_character_)
    ) %>%
    fill(!! new) %>%
    filter(!grepl(regex, !! orig))
}

dat %>%
  untangle2("dae$", jumble, family) %>%
  untangle2("^diet", jumble, diet)
## # A tibble: 8 x 3
##   jumble     family         diet
##    <chr>      <chr>        <chr>
## 1    Sp1    Muridae   diet:seeds
## 2    Sp2    Muridae   diet:seeds
## 3    Sp3    Muridae   diet:seeds
## 4    Sp4    Muridae diet:unknown
## 5    Sp5    Muridae diet:unknown
## 6   Sp11 Cricetidae  diet:fruits
## 7   Sp32 Cricetidae  diet:fruits
## 8  Sp113 Cricetidae  diet:fruits

Voilà!

What are our innovations? How and why does this differ from Luis’s original (included below)?

Here’s Luis’s original, for easier comparison:

untangle_luis <- function(dframe, matchstring, newCol) {
  match <- enquo(match)
  newCol <- quo_name(newCol)

  dframe %>% mutate(!!newCol := case_when(grepl(!!matchstring,!!dframe[[1]])~!!dframe[[1]]),
                    removeLater = case_when(grepl(!!matchstring,!!dframe[[1]])~"yes")) %>%
    fill(newCol) %>%
    filter(is.na(removeLater)) %>% select(-removeLater)
}

Huge thanks again to Luis for this great series of spreadsheet posts and for sharing an excellent motivating example for practicing with tidyeval!