Please deliver links to an R Markdown file (in GitHub and rpubs.com) with solutions to problems 3 and 4 from chapter 8 of Automated Data Collection in R. Problem 9 is extra credit.

Problems

3.Copy the introductory example. The vector name stores the extracted names

library(stringr)
raw.data <- "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
main_names <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))  # extract names only.
main_names
## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

(a)Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.

names1 <- str_replace(main_names, "\\s[A-z]\\. ", " ")#remove initials and then replace with blank
names2  <- str_replace(names1, "(\\w+),\\s(\\w+)", "\\2 \\1")#find words and then swap.switch from last name first name to first name last name
final_names  <- sub("[A-z]{2,3}\\. ","",names2)#Remove titles
final_names 
## [1] "Moe Szyslak"      "Montgomery Burns" "Timothy Lovejoy" 
## [4] "Ned Flanders"     "Homer Simpson"    "Julius Hibbert"

(b)Construct a logical vector indicating whether a character has a title.

title <- str_detect(names2,"[A-z]{2,3}\\. ")
df_title <- data.frame(names2,title)
df_title 
##                 names2 title
## 1          Moe Szyslak FALSE
## 2     Montgomery Burns FALSE
## 3 Rev. Timothy Lovejoy  TRUE
## 4         Ned Flanders FALSE
## 5        Homer Simpson FALSE
## 6   Dr. Julius Hibbert  TRUE

(c).Construct a logical vector indicating whether a character has a second name

second_name <- str_detect(main_names," [A-Z]{1}\\. ")
df_name <- data.frame(main_names,second_name)
df_name
##             main_names second_name
## 1          Moe Szyslak       FALSE
## 2 Burns, C. Montgomery        TRUE
## 3 Rev. Timothy Lovejoy       FALSE
## 4         Ned Flanders       FALSE
## 5       Simpson, Homer       FALSE
## 6   Dr. Julius Hibbert       FALSE

4.Describe the types of strings that conform to the following regular expressions and

construct an example that is matched by the regular expression.

  1. [0-9]+\$

It matches vector containing a range of number from 0 to 9 (one or more digits) and ending with a dollar sign.

reg_exp <- "The average salary amount for data scientist can be over 100000$"
str_extract_all(reg_exp, "[0-9]+\\$" )
## [[1]]
## [1] "100000$"
  1. \b[a-z]{1,4}\b

It matches vector containing a word Edge with 1 to 4 lowercase letters

lower <- "automated data collection with r"
unlist(str_extract_all(lower, "\\b[a-z]{1,4}\\b"))
## [1] "data" "with" "r"
  1. .*?\.txt$

It matches vector that ends in .txt

text <- c("You should write a text with 500 words", "You can save the file.txt")
unlist(str_extract_all(text, ".*?\\.txt$"))
## [1] "You can save the file.txt"
  1. \d{2}/\d{2}/\d{4}

It matches vector with two digits and forward slash, two digits and foward slash, four digits.

digit <- c("Today is  09/11/2019", "Can you guess 9/11/9")
unlist(str_extract_all(digit, "\\d{2}/\\d{2}/\\d{4}"))
## [1] "09/11/2019"
  1. <(.+?)>.+?</\1>

It matches vector that starts with a zero or more letter inside inequality symbol and ends with forward slash and the same word or letter inside inequality symbol-Pulls a html code

compare <- c("<p> Assignment number three </p>")
unlist(str_extract_all(compare, "<(.+?)>.+?</\\1>"))
## [1] "<p> Assignment number three </p>"

9. The following code hides a secret message. Crack it with R and regular expressions.Hint: Some of the characters are more revealing than others!

clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5 fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr

secret_message <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo
Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO
d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5
fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"

Get all capital letters and periods

secret_message1 <- unlist(str_extract_all(secret_message, "[[A-Z].]"))
secret_message1
##  [1] "C" "O" "N" "G" "R" "A" "T" "U" "L" "A" "T" "I" "O" "N" "S" "." "Y"
## [18] "O" "U" "." "A" "R" "E" "." "A" "." "S" "U" "P" "E" "R" "N" "E" "R"
## [35] "D"

Put the letters together and remove the spaces

secret_message1 <- paste(secret_message1, collapse = "")
secret_message1
## [1] "CONGRATULATIONS.YOU.ARE.A.SUPERNERD"

Replace ‘.’ with a space

str_replace_all(secret_message1, "[.]", " ")
## [1] "CONGRATULATIONS YOU ARE A SUPERNERD"
secret_message1
## [1] "CONGRATULATIONS.YOU.ARE.A.SUPERNERD"