In this assignment we are trying some Tidyverse recipies. As part of this i would like to use unnest() function in tidyverse.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0 v readr 1.1.1
## v tibble 1.4.2 v purrr 0.2.5
## v tidyr 0.8.1 v stringr 1.3.1
## v ggplot2 3.0.0 v forcats 0.3.0
## -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tidyr)
library(knitr)
library(kableExtra)
unnest (data, …, .drop = NA, .id = NULL, .sep = NULL, .preserve = NULL)
data - a data frame. … - the columns to unnest; defaults to all list-columns .drop - whether additional list columns should be dropped .id - data frame identifier; creates a new column with the name .id, giving a unique identifier. .sep - identify a separator to use in the names of unnested data frame columns, which combine the name of the original list-col with the names from nest data frame .preserve - list-columns to preserve in the output
biopics <- read_csv("https://raw.githubusercontent.com/vijay564/R-Maincode/master/tidyverse_recipies.csv") %>%
# Filter the "directors" column for entries that contain a comma -- that have more than one name
filter(str_detect(director, ".\\,.")) %>%
# Select a few columns of the dataframe for demonstration purposes
select(title, country, director)
## Parsed with column specification:
## cols(
## title = col_character(),
## site = col_character(),
## country = col_character(),
## year_release = col_integer(),
## box_office = col_character(),
## director = col_character(),
## number_of_subjects = col_integer(),
## subject = col_character(),
## type_of_subject = col_character(),
## race_known = col_character(),
## subject_race = col_character(),
## person_of_color = col_integer(),
## subject_sex = col_character(),
## lead_actor_actress = col_character()
## )
head(biopics, 3) %>%
kable("html") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| title | country | director |
|---|---|---|
| Above and Beyond | US | Melvin Frank, Norman Panama |
| American Splendor | US | Shari Springer Berman, Robert Pulcini |
| Burning Blue | US | D.M.W. Greer With: Trent Ford, Tammy Blanchard, Morgan Spector |
# Unnest the "directors" col-list twice along two different separators
dir_unnest <- unnest(biopics, director = strsplit(director, ",")) %>%
unnest(director = strsplit(director, ":"))
# Remove the pattern of a space and the word "With"
dir_unnest$director <- str_replace(dir_unnest$director, "[[:space:]]With", "")
head(dir_unnest, 8) %>%
kable("html") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| title | country | director |
|---|---|---|
| Above and Beyond | US | Melvin Frank |
| Above and Beyond | US | Norman Panama |
| American Splendor | US | Shari Springer Berman |
| American Splendor | US | Robert Pulcini |
| Burning Blue | US | D.M.W. Greer |
| Burning Blue | US | Trent Ford |
| Burning Blue | US | Tammy Blanchard |
| Burning Blue | US | Morgan Spector |