## [1] "Moe Szyslak" "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Simpson,Homer" "Dr. Julius Hibbert"
# separate last_name, first_name by comma
last_first <- str_split(name, ",")
# reorder to first_name last_name
for (i in 1:length(name)) {
name[i] <- paste(str_trim(rev(last_first[[i]])), collapse = " ")
}
name
## [1] "Moe Szyslak" "C. Montgomery Burns" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Homer Simpson" "Dr. Julius Hibbert"
The titles given end with periods. However, there is a first name with a period as well, so length will have to be considered. Titles have 2 or 3 letters before the period. Using these criteria, a regular expression can be created:
(has_title <- str_detect(name, "[:alpha:]{2,}\\."))
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
Characters with second names will have an extra space in their full name. Taking into account that titles also add a space:
(second_name <- str_count(name, " ") > ifelse(has_title, 2, 1))
## [1] FALSE TRUE FALSE FALSE FALSE FALSE
The tag, as written, returns the entire string:
str_extract("<title>+++BREAKING NEWS+++</title>", "<.+>")
## [1] "<title>+++BREAKING NEWS+++</title>"
This can be corrected by adding a question mark to the expression to indicate that the goal is to find the shortest sequence of characters between html tags:
str_extract("<title>+++BREAKING NEWS+++</title>", "<.+?>")
## [1] "<title>"
The tag, as written, returns only a dash:
str_extract("(5-3)^2=5^2-2*5*3+3^2 conforms to the binomial theorem", "[^0-9=+*()]+")
## [1] "-"
This fails because the caret needs to be designated as a literal character. The literal dash also needs to be added (and the one in 0-9 needs to remain).
str_extract("(5-3)^2=5^2-2*5*3+3^2 conforms to the binomial theorem", "[\\^0-9=+*()\\-]+")
## [1] "(5-3)^2=5^2-2*5*3+3^2"