Simpsons!
library(stringr)
name <- c("Moe Szyslak", "Burns, C. Montgomery", "Rev. Timothy Lovejoy", "Ned Flanders", "Simpson, Homer", "Dr. Julius Hibbert")
Therefore, any name that has a comma in it must be in the wrong order. If it has only a blank space, then it is in the correct order.
str_detect(name, ",")
## [1] FALSE TRUE FALSE FALSE TRUE FALSE
We know there are two that are not already in the correct format - the second and fifth elements of name. To conform to the first_name last_name format, we only need to change those elements. Let’s do this with a copy.
formatted.name <- name
reformat.names <- function(x){
comma <- str_detect(name, ",")
comma.pos <- which(comma)
for(i in comma.pos){
l <- str_locate(x[i], ",")
last <- str_sub(x[i], end = l[1] - 1)
first <- str_sub(x[i], start = l[1] + 2)
full.name <- str_c(first, last, sep = " ")
x[i] <- full.name
}
return(x)
}
formatted.name <- reformat.names(formatted.name)
formatted.name
## [1] "Moe Szyslak" "C. Montgomery Burns" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Homer Simpson" "Dr. Julius Hibbert"
str_detect function in stringrtitles.name <- str_detect(name, "Dr.|Rev.")
titles.name
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
We could even expand on it a little more by adding in other titles, such as str_detect(name, "Dr.|Rev.|Mr.|Mrs.").
str_count and str_detect. We assume that if a person has three names, then their entry must have at least two blank spaces (“first last” vs “first mid last”). Of those, we can then check if the extra space is because of a title.second.name <- str_count(name, " ") > 1 & str_detect(name, "Dr.|Rev.") == FALSE
second.name
## [1] FALSE TRUE FALSE FALSE FALSE FALSE
ex1 <- "I never know if it's 32$ or $32."
str_extract_all(ex1, "[0-9]+\\$")
## [[1]]
## [1] "32$"
ex2 <- "This should extract all words of FOUR lowercase letters or Less."
str_extract_all(ex2, "\\b[a-z]{1,4}\\b")
## [[1]]
## [1] "all" "of" "or"
ex3 <- "Was the file test.txt or test1.txt?\nActually, it's test2.txt\n"
str_extract_all(ex3, ".*?\\.txt$")
## [[1]]
## [1] "Actually, it's test2.txt"
ex4 <- "Do you write your birthday 6/6/80 or 06/06/1980 or do you use dashes?"
str_extract_all(ex4, "\\d{2}/\\d{2}/\\d{4}")
## [[1]]
## [1] "06/06/1980"
ex5 <- "<h2>I <b>HATE</b> having to format my blog <i>manually</i>! <img scr = angry_cute_cat.gif /> "
str_extract_all(ex5, "<(.+?)>.+?</\\1>")
## [[1]]
## [1] "<b>HATE</b>" "<i>manually</i>"
At first glance, the code seems like a bunch of gibberis; however, the hint points out that “some of the characters are more revealing than others”. On closer inspection, some of the letters are capitalized. Looking at the first line alone, you can easily pick out C O N G R A T, which is pretty close to “Congratulations” or “CONGRATS!”. So let’s try that out first.
code <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5 fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"
answer <- str_extract_all(code, "[:upper:]")
answer
## [[1]]
## [1] "C" "O" "N" "G" "R" "A" "T" "U" "L" "A" "T" "I" "O" "N" "S" "Y" "O"
## [18] "U" "A" "R" "E" "A" "S" "U" "P" "E" "R" "N" "E" "R" "D"
Oops, that’s still a little hard to decipher, but clearly we’re on the right track. Looking again, now that we’ve extracted the letters, we see that there are strategically placed periods separating each word, like a telegraph.
answer <- str_extract_all(code, "[[:upper:].!]")
cat(unlist(answer))
## C O N G R A T U L A T I O N S . Y O U . A R E . A . S U P E R N E R D !