unnest() function for an upcoming book on the tidyverse. This book was co-created by the spring 2018 class of DATA 607 (Data Acquisition & Management) students at the CUNY School for Professional Studies.library(dplyr)
library(tidyverse)
library(tidyr)
library(knitr)
library(kableExtra)
unnest() makes each list element its own row.unnest (data, ..., .drop = NA, .id = NULL, .sep = NULL, .preserve = NULL)data — a data frame.
... — the columns to unnest; defaults to all list-columns
.drop — whether additional list columns should be dropped
.id — data frame identifier; creates a new column with the name .id, giving a unique identifier.
.sep — identify a separator to use in the names of unnested data frame columns, which combine the name of the original list-col with the names from nest data frame
.preserve — list-columns to preserve in the output
biopics <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/biopics/biopics.csv") %>%
# Filter the "directors" column for entries that contain a comma -- that have more than one name
filter(str_detect(director, ".\\,.")) %>%
# Select a few columns of the dataframe for demonstration purposes
select(title, country, director)
head(biopics, 3) %>%
kable("html") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| title | country | director |
|---|---|---|
| Above and Beyond | US | Melvin Frank, Norman Panama |
| American Splendor | US | Shari Springer Berman, Robert Pulcini |
| Burning Blue | US | D.M.W. Greer With: Trent Ford, Tammy Blanchard, Morgan Spector |
unnest() function twice – first to split along commas, and the next to split along colons. Finally, we can remove the word “With” to get a clean list of individual directors.# Unnest the "directors" col-list twice along two different separators
dir_unnest <- unnest(biopics, director = strsplit(director, ",")) %>%
unnest(director = strsplit(director, ":"))
# Remove the pattern of a space and the word "With"
dir_unnest$director <- str_replace(dir_unnest$director, "[[:space:]]With", "")
head(dir_unnest, 8) %>%
kable("html") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| title | country | director |
|---|---|---|
| Above and Beyond | US | Melvin Frank |
| Above and Beyond | US | Norman Panama |
| American Splendor | US | Shari Springer Berman |
| American Splendor | US | Robert Pulcini |
| Burning Blue | US | D.M.W. Greer |
| Burning Blue | US | Trent Ford |
| Burning Blue | US | Tammy Blanchard |
| Burning Blue | US | Morgan Spector |