library(stringr)

Load raw data

raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"

Extract all names

name <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}")) 
name
## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

Rearrange vector to First Name then Last Name

name
## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"
fl_extract <- unlist(str_split(name[5], ","))
fl_extract
## [1] "Simpson" " Homer"
new_name <- str_c(fl_extract[2], fl_extract[1], sep = " ")
new_name
## [1] " Homer Simpson"
name[5] <- new_name
fl_extract <- unlist(str_split(name[2], ","))
fl_extract
## [1] "Burns"          " C. Montgomery"
new_name <- str_c(fl_extract[2], fl_extract[1], sep = " ")
new_name
## [1] " C. Montgomery Burns"
name[2] <- new_name

show all new names

name
## [1] "Moe Szyslak"          " C. Montgomery Burns" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         " Homer Simpson"       "Dr. Julius Hibbert"

Vector indicating if charachter has a title

pmatch(c("Dr.", "Rev."), name)
## [1] 6 3

Vector checking if character has a second name

str_extract(name, "[[:alpha:]][[:blank:]][[:alpha:]]")
## [1] "e S" "y B" "y L" "d F" "r S" "s H"

Question 4 - describe type of strings and construct example 4.1 Digits at the end of the exression or sting ex: Street number 87

#[0-9]+\\$

4.2 Word edge, all letters a-z, item should be matched at least 1 time but no more than 4, follwed by word edge. Or a 4 letter word. ex: “book”

#\\b[a-z]{1,4}\\b

4.3 Can contain a “.” and is optional. Sting or expression ends in .txt ex: “file.txt”

#.*?\\.txt$

4.4 2 digits, a forward slash, followed by 2 more digits, a forward slash, and ends with 4 digits. Like a date ex: 02/01/2018

#\\d{2}/\\d{2}/\\d{4}

4.5 Optional period(s) enclosed in perentheses, followed by another optional period(s), followed by a forward slash and ending with the number 1. ex: (..)../1 or ()../1 or (..)/1

#<(.+?)>.+?</\\1>