library(stringr)
names <- c('Moe Szyslak', 'Burns, C. Montgomery', 'Rev. Timothy Lovejoy', 'Ned Flanders', 'Simpson, Homer', 'Dr. Julius Hibbert')
names
## [1] "Moe Szyslak" "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Simpson, Homer" "Dr. Julius Hibbert"
names <- str_split(names, ",")
names <- sapply(names, str_trim)
names <- sapply(names, rev)
Ducks. In. A. Line. Now we need to merge them together. This is tricky, since the function you’d expect to work, doesn’t work very well.
str_c(names, sep = " ")
## [1] "Moe Szyslak" "c(\"C. Montgomery\", \"Burns\")"
## [3] "Rev. Timothy Lovejoy" "Ned Flanders"
## [5] "c(\"Homer\", \"Simpson\")" "Dr. Julius Hibbert"
For that we can use a function. This function will remove the string from the first vector in the table, merge them with a space, and move to the next string. F
for (i in 1:length(names)) {
names[i] <- paste(unlist(names[i]), collapse=" ")
}
unlist(names)
## [1] "Moe Szyslak" "C. Montgomery Burns" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Homer Simpson" "Dr. Julius Hibbert"
3b. Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.). Simply enough, we’ll just want to identify if the name has a period and more than two letters (since C. Montgomery Burns has a period for abbreviated name, and not a title). This funtion is the str_detect to find any alphabetical letter, at least two letters long, with a literal period (not functional period would would be anything).
str_detect(names, "[[:alpha:]]{2,}\\.")
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
3c. Construct a logical vector indicating whether a character has a second name.
abbrnames <- str_extract(names, "[[:alpha:]]+\\.")
str_length(abbrnames) < 3
## [1] NA TRUE FALSE NA NA FALSE
#side-note, there is an easier way to do this. I tried to detect/extract all strings that included an alphabetical character and a period with 2 letters but failed doing so. This longer version is much too inefficient.
title <- c("<title>++BREAKING NEWS+++</title>")
str_extract(title, "<[[:alnum:]]+>")
## [1] "<title>"
binomialstr <- c("(5-3)^2=5^2-2*5*3+3^2")
str_extract(binomialstr, "[\\^\\-0-9=+*()]+")
## [1] "(5-3)^2=5^2-2*5*3+3^2"
code <- c("clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr")
newcode <- str_replace_all(code, "[[:lower:]]?[[:digit:]]?", "")
newcode <- str_replace_all(newcode, "\\.", " ")
newcode
## [1] "CONGRATULATIONS YOU ARE A SUPERNERD!"