R> name [1] “Moe Szyslak” “Burns, C. Montgomery” “Rev. Timothy Lovejoy” [4] “Ned Flanders” “Simpson, Homer” “Dr. Julius Hibbert”
name <- c("Moe Szyslak", "Burns, C. Montgomery", "Rev. Timothy Lovejoy",
"Ned Flanders", "Simpson, Homer", "Dr. Julius Hibbert")
first <- unlist(str_extract_all(name, '\\w+ |\\, \\w+$|\\. \\w+$'))
first <- unlist(str_extract_all(first, '\\w+'))
last <- unlist(str_extract_all(name, '\\b \\w+|\\w+,'))
last <- unlist(str_extract_all(last, '\\w+'))
full_name <- paste0(first, ' ', last)
full_name
## [1] "Moe Szyslak" "Montgomery Burns" "Timothy Lovejoy"
## [4] "Ned Flanders" "Homer Simpson" "Julius Hibbert"
title <- str_detect(name, 'Rev.|Dr.')
second_name <- str_detect(name, '\\ .\\. ')
33 4. Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.
Digits ending with dollar sign
unlist(str_extract_all('212$13 33$', '[0-9]+\\$'))
## [1] "212$" "33$"
Any lower case words with a word edge and repeated at least 1 and up to 4 times.
unlist(str_extract_all('a wordExample4 2 word', '\\b[a-z]{1,4}\\b'))
## [1] "a" "word"
Either none or any characters ending with .txt
unlist(str_extract_all('a wordExample4 2 word.txt', '.*?\\.txt$'))
## [1] "a wordExample4 2 word.txt"
unlist(str_extract_all('.txt', '.*?\\.txt$'))
## [1] ".txt"
Date format with ‘/’, dd/mm/year
unlist(str_extract_all('23/02/1902 23/12/1998', '\\d{2}/\\d{2}/\\d{4}'))
## [1] "23/02/1902" "23/12/1998"
Search for html or xml tag pattern begin with
unlist(str_extract_all('<html>content</html>, <tag>text</tag>', '<(.+?)>.+?</\\1>'))
## [1] "<html>content</html>" "<tag>text</tag>"
Reference: https://github.com/wwells/CUNY_DATA_607/blob/master/Week3/Regex_Week3.Rmd