Title- Tidyverse Recipies

Introduction

In this assignment we are trying some Tidyverse recipies. As part of this i would like to use unnest() function in tidyverse.

Load Libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0     v readr   1.1.1
## v tibble  1.4.2     v purrr   0.2.5
## v tidyr   0.8.1     v stringr 1.3.1
## v ggplot2 3.0.0     v forcats 0.3.0
## -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(tidyr)
library(knitr)
library(kableExtra)

unnest ()

Unnest is used when you want to make each list element its own row from a column with lists of items

unnest (data, …, .drop = NA, .id = NULL, .sep = NULL, .preserve = NULL)

data - a data frame. … - the columns to unnest; defaults to all list-columns .drop - whether additional list columns should be dropped .id - data frame identifier; creates a new column with the name .id, giving a unique identifier. .sep - identify a separator to use in the names of unnested data frame columns, which combine the name of the original list-col with the names from nest data frame .preserve - list-columns to preserve in the output

Example

biopics <- read_csv("https://raw.githubusercontent.com/vijay564/R-Maincode/master/tidyverse_recipies.csv") %>% 
    # Filter the "directors" column for entries that contain a comma -- that have more than one name
    filter(str_detect(director, ".\\,.")) %>%
    # Select a few columns of the dataframe for demonstration purposes
    select(title, country, director)           
## Parsed with column specification:
## cols(
##   title = col_character(),
##   site = col_character(),
##   country = col_character(),
##   year_release = col_integer(),
##   box_office = col_character(),
##   director = col_character(),
##   number_of_subjects = col_integer(),
##   subject = col_character(),
##   type_of_subject = col_character(),
##   race_known = col_character(),
##   subject_race = col_character(),
##   person_of_color = col_integer(),
##   subject_sex = col_character(),
##   lead_actor_actress = col_character()
## )
head(biopics, 3) %>% 
    kable("html")  %>% 
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
title country director
Above and Beyond US Melvin Frank, Norman Panama
American Splendor US Shari Springer Berman, Robert Pulcini
Burning Blue US D.M.W. Greer With: Trent Ford, Tammy Blanchard, Morgan Spector
# Unnest the "directors" col-list twice along two different separators
dir_unnest <- unnest(biopics, director = strsplit(director, ",")) %>% 
    unnest(director = strsplit(director, ":"))

# Remove the pattern of a space and the word "With"            
dir_unnest$director <- str_replace(dir_unnest$director, "[[:space:]]With", "")

head(dir_unnest, 8) %>% 
    kable("html")  %>% 
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
title country director
Above and Beyond US Melvin Frank
Above and Beyond US Norman Panama
American Splendor US Shari Springer Berman
American Splendor US Robert Pulcini
Burning Blue US D.M.W. Greer
Burning Blue US Trent Ford
Burning Blue US Tammy Blanchard
Burning Blue US Morgan Spector