Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.
Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.).
Construct a logical vector indicating whether a character has a second name.
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
Extrating names, we use the following regex ‘[[:alpha:]., ]{2,}’ [:alpha:] is to indicate all letters in the alphabet, we add ., and ’ ’ in the our regex, as names can contain a period, comma, or a space along with the letters. Finally we indicated a quantifier {min, max}, but only indicated a min value of 2, and max of none. This way our extraction will only occur if there is a minimun of two vaules found with our regex. This prevents the extraction of all single characters.
name <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
name
## [1] "Moe Szyslak" "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Simpson, Homer" "Dr. Julius Hibbert"
3(a) Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.
firstName <- str_extract(str_extract(name, "[[:alpha:]]{3,} | [[:alpha:]]{3,}"), "[[:alpha:]]{1,}")
firstName
## [1] "Moe" "Montgomery" "Timothy" "Ned" "Homer"
## [6] "Julius"
lastName <- str_extract(str_extract(name, "[[:alpha:]]{1,}\\,|\\b [[:alpha:]]{3,}"), "[[:alpha:]]{1,}")
lastName
## [1] "Szyslak" "Burns" "Lovejoy" "Flanders" "Simpson" "Hibbert"
fullName <- data.frame(name = paste(firstName," ",lastName))
fullName
## name
## 1 Moe Szyslak
## 2 Montgomery Burns
## 3 Timothy Lovejoy
## 4 Ned Flanders
## 5 Homer Simpson
## 6 Julius Hibbert
3(b) Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.). Use str_detect to determine the following regex: at least 2 letters with a period at the end.
title <- data.frame(fullName = name, hasTitle = str_detect(name, "[:alpha:]{2,}\\."))
title
## fullName hasTitle
## 1 Moe Szyslak FALSE
## 2 Burns, C. Montgomery FALSE
## 3 Rev. Timothy Lovejoy TRUE
## 4 Ned Flanders FALSE
## 5 Simpson, Homer FALSE
## 6 Dr. Julius Hibbert TRUE
3(c) Construct a logical vector indicating whether a character has a second name. use str_detect to determing the following regex: exactly 1 letter after a space and ending with a period
secondName <- data.frame(fullName = name, hasSecondName = str_detect(name, " [:alpha:]{1}\\."))
secondName
## fullName hasSecondName
## 1 Moe Szyslak FALSE
## 2 Burns, C. Montgomery TRUE
## 3 Rev. Timothy Lovejoy FALSE
## 4 Ned Flanders FALSE
## 5 Simpson, Homer FALSE
## 6 Dr. Julius Hibbert FALSE
myExample <- "123$dajfk."
result <- str_extract(myExample, "[0-9]+\\$")
result
## [1] "123$"
myExample2 <- "WHAT that is?"
result2 <- str_extract(myExample2, "\\b[a-z]{1,4}\\b")
result2
## [1] "that"
myExample3 <- "blability blam23jkd akfda.jpg fjdskf.txt"
result3 <- str_extract(myExample3, ".*?\\.txt$")
result3
## [1] "blability blam23jkd akfda.jpg fjdskf.txt"
myExample4 <- "2131fjdksa 323 09/15/2019riew83"
result4 <- str_extract(myExample4, "\\d{2}/\\d{2}/\\d{4}")
result4
## [1] "09/15/2019"
myExample5 <- "<p> blah blah paragraph </p> daslj212 !jlsakjf"
result5 <- str_extract(myExample5, "<(.+?)>.+?</\\1>")
result5
## [1] "<p> blah blah paragraph </p>"
encoded <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo
Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO
d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5
fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"
decoded <- str_replace_all(paste0(unlist(str_extract_all(encoded, "[[A-Z][:punct:]]")), collapse = ""), "\\.", "\\ ")
decoded
## [1] "CONGRATULATIONS YOU ARE A SUPERNERD!"