Please deliver links to an R Markdown file (in GitHub and rpubs.com) with solutions to problems 3 and 4 from chapter 8 of Automated Data Collection in R. Problem 9 is extra credit. You may work in a small group, but please submit separately with names of all group participants in your submission.
Here is the referenced code for the introductory example in #3:
raw.data <-“555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert”
library(stringr)
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
raw_names <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
raw_names
## [1] "Moe Szyslak" "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Simpson, Homer" "Dr. Julius Hibbert"
a.Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.
First remove the tittles:
no_title_names <- str_trim(sub("[A-Za-z]{1,}\\.","",raw_names))
no_title_names
## [1] "Moe Szyslak" "Burns, Montgomery" "Timothy Lovejoy"
## [4] "Ned Flanders" "Simpson, Homer" "Julius Hibbert"
Reverse the last name and first name:
real_name <- sub("(\\w+),\\s+(\\w+)","\\2 \\1", no_title_names )
real_name
## [1] "Moe Szyslak" "Montgomery Burns" "Timothy Lovejoy"
## [4] "Ned Flanders" "Homer Simpson" "Julius Hibbert"
title <- str_detect(raw_names, "[[:alpha:]]{2,}\\.")
title
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
raw_names[title == TRUE]
## [1] "Rev. Timothy Lovejoy" "Dr. Julius Hibbert"
second_name <- str_detect(raw_names, "[A-Z]\\.{1}")
second_name
## [1] FALSE TRUE FALSE FALSE FALSE FALSE
raw_names[second_name == TRUE]
## [1] "Burns, C. Montgomery"
x <- c("1$", "153$", "$123", "abc1$", "12345", "abc9$2", "we are having$45$ fun$35$")
a <- unlist(str_extract_all(x,"[0-9]+\\$"))
a
## [1] "1$" "153$" "1$" "9$" "45$" "35$"
t <- "Iam studying at99 cuny. I work for10 CTS. i like playing wit$ my son. My son's name is Arnav ."
b <- unlist(str_extract_all(t, "\\b[a-z]{1,4}\\b"))
b
## [1] "cuny" "work" "i" "like" "wit" "my" "son" "son" "s" "name"
## [11] "is"
Note: Its treating apostrophe as separator.
y <- c("Iam writing my output to a .txt file", ".txt", "i have a txt file", ".txtfile", "All .txt's are in a folder", "no file", "100.txt", "5txt files", "I have kept my records in a file that is .txt")
c <- unlist(str_extract_all(y, ".*?\\.txt$"))
c
## [1] ".txt"
## [2] "100.txt"
## [3] "I have kept my records in a file that is .txt"
d.\d{2}/\d{2}/\d{4}: Matches strings with 2 digits, then “/”, then 2 more digits, then one more “/”, and then 4 digits
p <- c("03/12/2008", "4/09/1979","04/20/59","03/09/2008", "1a/2b/2001", "12202008", "aa/bb/cccc", "02/16/2018", "aaa12/12/2008", "bbbb/03/12/2020cccc")
d <- unlist(str_extract_all(p,"\\d{2}/\\d{2}/\\d{4}"))
d
## [1] "03/12/2008" "03/09/2008" "02/16/2018" "12/12/2008" "03/12/2020"
z <-c("<d> The superbowl was very exciting.</d>.", "<b> I watched <it on </tv/>", "The score was <41> eagles and <32> patriots")
e <- unlist(str_extract_all(z, "<(.+?)>.+?</\\1>"))
e
## [1] "<d> The superbowl was very exciting.</d>"
secret_message <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"
f <- unlist(str_extract_all(secret_message,"[A-Z]"))
f
## [1] "C" "O" "N" "G" "R" "A" "T" "U" "L" "A" "T" "I" "O" "N" "S" "Y" "O"
## [18] "U" "A" "R" "E" "A" "S" "U" "P" "E" "R" "N" "E" "R" "D"
The secret message is “CONGRATULATIONS YOU ARE A SUPER NERD.”