Chapter 8 - Automated Data Collection with R

library(stringr)

Problem #3

A)rearrange the vector so all names conform to firstname_lastname

  1. Extract names data into a character vector name
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"

name <- (unlist(str_extract_all(raw.data,"[[:alpha:],. ]{2,}")))
name
## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

2)split name character vector into matrix of character vectors. The pttern will be the comma in order to put those firstname into a seperate column

names_split <-str_split_fixed(name,pattern=fixed(","),n=2)
names_split
##      [,1]                   [,2]            
## [1,] "Moe Szyslak"          ""              
## [2,] "Burns"                " C. Montgomery"
## [3,] "Rev. Timothy Lovejoy" ""              
## [4,] "Ned Flanders"         ""              
## [5,] "Simpson"              " Homer"        
## [6,] "Dr. Julius Hibbert"   ""
  1. Create a new character vector consisting of all names in first name, last name format by using the str_c function to rejoin the elements on column2 to column1. If the name was already in the proper sequence, the second column will contain spaces. I also seerated each set of names with a double space, to make extracting easier.
firstname_lastname <- str_c(names_split[,2],names_split[,1], sep = fixed(" "), collapse = "  ")
firstname_lastname
## [1] " Moe Szyslak   C. Montgomery Burns   Rev. Timothy Lovejoy   Ned Flanders   Homer Simpson   Dr. Julius Hibbert"

B)Construct a logical vector indicating whether a character has a title

  1. split name into a list of character vectors. Then use str_detect to look for a period, which is in every title
names_split
##      [,1]                   [,2]            
## [1,] "Moe Szyslak"          ""              
## [2,] "Burns"                " C. Montgomery"
## [3,] "Rev. Timothy Lovejoy" ""              
## [4,] "Ned Flanders"         ""              
## [5,] "Simpson"              " Homer"        
## [6,] "Dr. Julius Hibbert"   ""
has_title <- str_detect(names_split[,1], pattern = fixed("."))
has_title
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE

B)Construct a logical vector indicating whether a character has a second name

  1. use the str_detect from above to evaluate the second column for a period
names_split
##      [,1]                   [,2]            
## [1,] "Moe Szyslak"          ""              
## [2,] "Burns"                " C. Montgomery"
## [3,] "Rev. Timothy Lovejoy" ""              
## [4,] "Ned Flanders"         ""              
## [5,] "Simpson"              " Homer"        
## [6,] "Dr. Julius Hibbert"   ""
has_second_name <- str_detect(names_split[,2], pattern = fixed("."))
has_second_name
## [1] FALSE  TRUE FALSE FALSE FALSE FALSE