1. Copy the introductory example. The vector name stores the extracted names.
  1. Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.
  2. Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr. ).
  3. Construct a logical vector indicating whether a character has a second name.
library(stringr)
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"

name <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))

#a Rearrange the vector so that all elements conform to the standard first_name, last_name
name <- str_replace_all(str_replace_all(name, "(.+)(, .+)$", "\\2 \\1"), ", ", "")

#b A logical vector indicating whether a character has a title (i.e. Rev. and Dr.)
title <- str_detect(name, "[[:alpha:]]{2,}\\.")

#c A logical vector indicating whether a character has a second name.
middle_name <- str_detect(name, "[:upper:]\\.") 

print(data.frame(name, title, middle_name))
##                   name title middle_name
## 1          Moe Szyslak FALSE       FALSE
## 2  C. Montgomery Burns FALSE        TRUE
## 3 Rev. Timothy Lovejoy  TRUE       FALSE
## 4         Ned Flanders FALSE       FALSE
## 5        Homer Simpson FALSE       FALSE
## 6   Dr. Julius Hibbert  TRUE       FALSE
  1. Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.
#a [0-9]+ \\ $
# the following regular expression matches vector containing a range of number from 0 to 9 (one or more digits) and ending with a dollar sign.
q4a <- "We write dollar amount as $1000 not 1000$"
unlist(str_extract_all(q4a, "[0-9]+\\$"))
## [1] "1000$"
#b \\ b[a-z]{1,4} \\ b
# the following regular expression matches vector containing a word with 1 to 4 lowercase letters.
q4b <- "The Best preparation for tomorrow is doing your best today"
unlist(str_extract_all(q4b, "\\b[a-z]{1,4}\\b"))
## [1] "for"  "is"   "your" "best"
#c .*? \\ .txt$
# the following regular expression matches vector that ends in .txt
q4c <- c("This is a text", "This is my file.txt")
unlist(str_extract_all(q4c, ".*?\\.txt$"))
## [1] "This is my file.txt"
#d \\ d{2}/ \\ d{2}/ \\ d{4}
# the following regular expression matches vector with two digits, forward slash, two digits, foward slash, four digits, respectively.   
q4d <- c("This is a date 09/05/2018", "This is a division 2/4/8")
unlist(str_extract_all(q4d, "\\d{2}/\\d{2}/\\d{4}"))
## [1] "09/05/2018"
#e <(.+?)>.+?</ \\ 1>
# the following regular expression matches vector that starts with a zero or more letter inside inequality symbol and ends with forward slash and the same word or letter inside inequality symbol.
q4e <- c("<tag> </tag>", "<tag>Hello World!</tag>", "<taG>Hello World!</tag>")
unlist(str_extract_all(q4e, "<(.+?)>.+?</\\1>"))
## [1] "<tag> </tag>"            "<tag>Hello World!</tag>"
  1. The following code hides a secret message. Crack it with R and regular expressions.
q9 <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"

# from a glance, the code seems to have a mixture of digits, lowercase, uppercase and some punctuation. 
# unlist(str_extract_all(q9, "\\d"))
# unlist(str_extract_all(q9, "[:lower:]"))
# unlist(str_extract_all(q9, "[:punct:]"))
# unlist(str_extract_all(q9, "[:upper:]"))
message <- unlist(str_extract_all(q9, "[[:upper:].!]")) # this could be the message, lets clean it
paste(str_replace(message, "[.]", " "), collapse = "")
## [1] "CONGRATULATIONS YOU ARE A SUPERNERD!"