(a) Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.
Solution:
# Load the stringr library
library(stringr)
# Input the data
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
# Extract just the names (by looking for letters)
name <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
name
## [1] "Moe Szyslak" "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Simpson, Homer" "Dr. Julius Hibbert"
# Run a loop through the subset name. If there's a comma in that name, split it into a 1x2 vector, and switch the order. Lastly, 'trim' the whitespace from the left side.
for (i in 1:(length(name)))
{
pat <- ","
if (grepl(pat, name[i]) == 1)
{
separate_names <- unlist(str_split(name[i], ","))
name[i] <- str_c(separate_names[2], separate_names[1], sep = " ")
}
}
str_trim(name, side = "left")
## [1] "Moe Szyslak" "C. Montgomery Burns" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Homer Simpson" "Dr. Julius Hibbert"
(b) Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.).
Solution:
# Search our subset for either of these titles
grep("(Rev.|Dr.)", name)
## [1] 3 6
# Identify the two results
name[c(3,6)]
## [1] "Rev. Timothy Lovejoy" "Dr. Julius Hibbert"
(c) Construct a logical vector indicating whether a character has a second name.
Solution:
second_name <- str_count(fixed(name[c(1,2,4,5)]), "\\w+")
grep("3", second_name)
## [1] 2
name[2]
## [1] " C. Montgomery Burns"
Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.
# Matches one or more digits followed by a dollar sign.
pat <- "[0-9]+\\$"
ex <- c("1$", "abc123$", "123$abc")
grepl(pat, ex)
## [1] TRUE TRUE TRUE
# Matches between 1 and 4 consecutive lowercase letters
pat <- "\\b[a-z]{1,4}\\b"
ex <- c("alpha", "beta", "gamma", "ZeTa", "eta")
grepl(pat, ex)
## [1] FALSE TRUE FALSE FALSE TRUE
# Matches wildcard string ending in .txt
pat <- ".*?\\.txt$"
ex <- c("abc.txt", "123.txt", "123abc123.txt", "%#LMV#Ckdnf3i2.txt")
grepl(pat, ex)
## [1] TRUE TRUE TRUE TRUE
# Matches a string of digits in the format 12/34/5678
pat <- "\\d{2}/\\d{2}/\\d{4}"
ex <- c("98/76/5432", "1/2/3", "12/34/56")
grepl(pat, ex)
## [1] TRUE FALSE FALSE
# Matches
The following code hides a secret message. Crack it with R and regular expressions.
clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5 fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr
Solution:
My first guess was all the capital letters, since there were only a handful of them. Extracting each manually, I found this: CONGRATULATIONSYOUAREASUPERNERD. Now to do it in R:
# Load the encrypted message
encrypt <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo
Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO
d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5
fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"
# Extract just the capital letters
decrypt <- unlist(str_extract_all(encrypt, "[[:upper:]]{1,}"))
decrypt
## [1] "C" "O" "N" "G" "R" "A" "T" "U" "L" "AT" "I" "O" "N" "S"
## [15] "Y" "O" "U" "A" "R" "E" "A" "S" "U" "P" "E" "R" "N" "E"
## [29] "R" "D"