Assignment

Please deliver links to an R Markdown file (in GitHub and rpubs.com) with solutions to problems 3 and 4 from chapter 8 of Automated Data Collection in R.

Answers

3. Copy the introductory example. The vector ‘names’ stores the estracted names.

raw.data <- "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
library (stringr)
name <- unlist(str_extract_all(raw.data,  "[[:alpha:]., ]{2,}"))
name
## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

(a) Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.

My first attempt excluded the middle name after the first initial.

unlist(str_extract_all(name, "(\\w+),\\s(\\w+)"))
## [1] "Burns, C"       "Simpson, Homer"

With inclusion of the middle name after the first initial:

unlist(str_extract_all(name, "(\\w+),\\s(\\w+)?(.\\s(\\w+))?"))
## [1] "Burns, C. Montgomery" "Simpson, Homer"
newname <- str_replace_all(name,"(\\w+),\\s(\\w+)?(.\\s(\\w+))?", "\\2\\3 \\1")
newname
## [1] "Moe Szyslak"          "C. Montgomery Burns"  "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Homer Simpson"        "Dr. Julius Hibbert"

source: https://stackoverflow.com/questions/33826650/last-name-first-name-to-first-name-last-name

(b) Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.)

str_extract(string = newname, pattern = "(Rev|Dr)\\.")
## [1] NA     NA     "Rev." NA     NA     "Dr."

(c) Construct a logical vector indicating whether a character has a second name.

str_extract(string = newname, pattern = "\\s(\\w+)(.\\s(\\w+))")
## [1] NA                  " Montgomery Burns" " Timothy Lovejoy" 
## [4] NA                  NA                  " Julius Hibbert"

Unfortunately, I could not figure out how to do this one…