1. Copy the introductory example. The vector name stores the extracted names. R> name [1] “Moe Szyslak” “Burns, C. Montgomery” “Rev. Timothy Lovejoy” [4] “Ned Flanders” “Simpson, Homer” “Dr. Julius Hibbert”
  1. Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.
  2. Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.).
  3. Construct a logical vector indicating whether a character has a second name.

Load library and data set

library(stringr)
raw.data <- "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5543642Dr. Julius Hibbert"
name <- unlist(str_extract_all(raw.data,"[[:alpha:]., ]{2,}"))
name
## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

Isolate names and remove titles.

N2 <- sub(" [A-z]{1}\\. "," ",name)
N3 <- sub("(\\w+),\\s(\\w+)","\\2 \\1", N2)
N4 <- sub("[A-z]{2,3}\\. ","",N3)
N4
## [1] "Moe Szyslak"      "Montgomery Burns" "Timothy Lovejoy" 
## [4] "Ned Flanders"     "Homer Simpson"    "Julius Hibbert"

Creation of Data frame

first_name <- str_extract(N4, "[[:alpha:]+]{2,}")
last_name <- str_extract(N4, "[[:space:]][[:alpha:]+]{2,}")
Simpsonnames <- data.frame(first_name, last_name)
Simpsonnames
##   first_name last_name
## 1        Moe   Szyslak
## 2 Montgomery     Burns
## 3    Timothy   Lovejoy
## 4        Ned  Flanders
## 5      Homer   Simpson
## 6     Julius   Hibbert
Title <- str_detect(name, "[[:alpha:]+]{2,3}[.]")
data.frame(Simpsonnames,Title)
##   first_name last_name Title
## 1        Moe   Szyslak FALSE
## 2 Montgomery     Burns FALSE
## 3    Timothy   Lovejoy  TRUE
## 4        Ned  Flanders FALSE
## 5      Homer   Simpson FALSE
## 6     Julius   Hibbert  TRUE
Secondname<- str_detect(name, "[A-Z]\\.")
data.frame(Simpsonnames,Secondname)
##   first_name last_name Secondname
## 1        Moe   Szyslak      FALSE
## 2 Montgomery     Burns       TRUE
## 3    Timothy   Lovejoy      FALSE
## 4        Ned  Flanders      FALSE
## 5      Homer   Simpson      FALSE
## 6     Julius   Hibbert      FALSE
  1. Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.
  1. [0-9]+\$
PAttern1="[0-9]+\\$"
example1=c("123$345","zse$as")
str_detect(example1,PAttern1)
## [1]  TRUE FALSE

Numbers followed by $

  1. \b[a-z]{1,4}\b
pattern2="\\b[a-z]{1,4}\\b"
example2=c("a","bc","def","ghij")
str_detect(example2,pattern2)
## [1] TRUE TRUE TRUE TRUE

Words of 1 to 4 letters

  1. .*?\.txt$
pattern3=".*?\\.txt$"
example3=c(".txt","123.txt","abc.txt","a$b#1.txt")
str_detect(example3,pattern3)
## [1] TRUE TRUE TRUE TRUE

String ending on .txt

  1. \d{2}/\d{2}/\d{4}
pattern4 = "\\d{2}/\\d{2}/\\d{4}"
example4=c("12/31/2017","01/01/2018")
str_detect(example4,pattern4)
## [1] TRUE TRUE

Date format MM/DD/YYYY

  1. <(.+?)>.+?</\1>
pattern5="<(.+?)>.+?</\\1>"
example5=c("<h1>blah</h1>")
str_detect(example5,pattern5)
## [1] TRUE

Characters between <> </>, Following pattern of HTML code.

  1. The following code hides a secret message. Crack it with R and regular expressions. Hint: Some of the characters are more revealing than others! The code snippet is also available in the materials at www.r-datacollection.com.
SecretMessage <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"
Test1 <- unlist(str_extract_all(SecretMessage, "[[:upper:].]{1,}"))
Test2 <- str_replace_all(paste(Test1, collapse =''), "[.]", " ")
Test2
## [1] "CONGRATULATIONS YOU ARE A SUPERNERD"