In this project, I put together this short tutorial on the unnest() function for an upcoming book on the tidyverse. This book was co-created by the spring 2018 class of DATA 607 (Data Acquisition & Management) students at the CUNY School for Professional Studies.


library(dplyr)
library(tidyverse)
library(tidyr)
library(knitr)
library(kableExtra)

unnest ()

Description

If you have a column with lists of items (a list-column), unnest() makes each list element its own row.

Usage:

unnest (data, ..., .drop = NA, .id = NULL, .sep = NULL, .preserve = NULL)

  • data — a data frame.

  • ... — the columns to unnest; defaults to all list-columns

  • .drop — whether additional list columns should be dropped

  • .id — data frame identifier; creates a new column with the name .id, giving a unique identifier.

  • .sep — identify a separator to use in the names of unnested data frame columns, which combine the name of the original list-col with the names from nest data frame

  • .preserve — list-columns to preserve in the output


Example

For this example, I used the Biopics dataframe from FiveThirtyEight.

To begin, read in the test dataframe and filter for the biopics with multiple directors listed.

biopics <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/biopics/biopics.csv") %>% 
             # Filter the "directors" column for entries that contain a comma -- that have more than one name
             filter(str_detect(director, ".\\,.")) %>%
             # Select a few columns of the dataframe for demonstration purposes
             select(title, country, director)           

head(biopics, 3) %>% 
  kable("html")  %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
title country director
Above and Beyond US Melvin Frank, Norman Panama
American Splendor US Shari Springer Berman, Robert Pulcini
Burning Blue US D.M.W. Greer With: Trent Ford, Tammy Blanchard, Morgan Spector


Not all director names are separated by a comma. For example, the first director in Burning Blue is separated from the others by a colon. To accommodate these separations, we can apply the unnest() function twice – first to split along commas, and the next to split along colons. Finally, we can remove the word “With” to get a clean list of individual directors.

# Unnest the "directors" col-list twice along two different separators
dir_unnest <- unnest(biopics, director = strsplit(director, ",")) %>% 
              unnest(director = strsplit(director, ":"))
  
# Remove the pattern of a space and the word "With"            
dir_unnest$director <- str_replace(dir_unnest$director, "[[:space:]]With", "")

head(dir_unnest, 8) %>% 
  kable("html")  %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
title country director
Above and Beyond US Melvin Frank
Above and Beyond US Norman Panama
American Splendor US Shari Springer Berman
American Splendor US Robert Pulcini
Burning Blue US D.M.W. Greer
Burning Blue US Trent Ford
Burning Blue US Tammy Blanchard
Burning Blue US Morgan Spector