Assignment 3

Copy the introductory example. The vector name stores the extracted names.

Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.
Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr. ).
Construct a logical vector indicating whether a character has a second name.

library(stringr)
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"

name <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))

#a Rearrange the vector so that all elements conform to the standard first_name, last_name
name <- str_replace_all(str_replace_all(name, "(.+)(, .+)$", "\\2 \\1"), ", ", "")

#b A logical vector indicating whether a character has a title (i.e. Rev. and Dr.)
title <- str_detect(name, "[[:alpha:]]{2,}\\.")

#c A logical vector indicating whether a character has a second name.
middle_name <- str_detect(name, "[:upper:]\\.") 

print(data.frame(name, title, middle_name))

##                   name title middle_name
## 1          Moe Szyslak FALSE       FALSE
## 2  C. Montgomery Burns FALSE        TRUE
## 3 Rev. Timothy Lovejoy  TRUE       FALSE
## 4         Ned Flanders FALSE       FALSE
## 5        Homer Simpson FALSE       FALSE
## 6   Dr. Julius Hibbert  TRUE       FALSE

Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.

#a [0-9]+ \\ $
# the following regular expression matches vector containing a range of number from 0 to 9 (one or more digits) and ending with a dollar sign.
q4a <- "We write dollar amount as $1000 not 1000$"
unlist(str_extract_all(q4a, "[0-9]+\\$"))

## [1] "1000$"

#b \\ b[a-z]{1,4} \\ b
# the following regular expression matches vector containing a word with 1 to 4 lowercase letters.
q4b <- "The Best preparation for tomorrow is doing your best today"
unlist(str_extract_all(q4b, "\\b[a-z]{1,4}\\b"))

## [1] "for"  "is"   "your" "best"

#c .*? \\ .txt$
# the following regular expression matches vector that ends in .txt
q4c <- c("This is a text", "This is my file.txt")
unlist(str_extract_all(q4c, ".*?\\.txt$"))

## [1] "This is my file.txt"

#d \\ d{2}/ \\ d{2}/ \\ d{4}
# the following regular expression matches vector with two digits, forward slash, two digits, foward slash, four digits, respectively.   
q4d <- c("This is a date 09/05/2018", "This is a division 2/4/8")
unlist(str_extract_all(q4d, "\\d{2}/\\d{2}/\\d{4}"))

## [1] "09/05/2018"

#e <(.+?)>.+?</ \\ 1>
# the following regular expression matches vector that starts with a zero or more letter inside inequality symbol and ends with forward slash and the same word or letter inside inequality symbol.
q4e <- c("<tag> </tag>", "<tag>Hello World!</tag>", "<taG>Hello World!</tag>")
unlist(str_extract_all(q4e, "<(.+?)>.+?</\\1>"))

## [1] "<tag> </tag>"            "<tag>Hello World!</tag>"

The following code hides a secret message. Crack it with R and regular expressions.

q9 <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"

# from a glance, the code seems to have a mixture of digits, lowercase, uppercase and some punctuation. 
# unlist(str_extract_all(q9, "\\d"))
# unlist(str_extract_all(q9, "[:lower:]"))
# unlist(str_extract_all(q9, "[:punct:]"))
# unlist(str_extract_all(q9, "[:upper:]"))
message <- unlist(str_extract_all(q9, "[[:upper:].!]")) # this could be the message, lets clean it
paste(str_replace(message, "[.]", " "), collapse = "")

## [1] "CONGRATULATIONS YOU ARE A SUPERNERD!"

Assignment 3

Saayed Alam

September 15, 2018