Copy the introductory example. The vector name stores the extracted names. R> name [1] “Moe Szyslak” “Burns, C. Montgomery” “Rev. Timothy Lovejoy” [4] “Ned Flanders” “Simpson, Homer” “Dr. Julius Hibbert”
library(stringr)
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
name <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
# Remove the title and exchange last name and first name if the it has" ,"
arrange_name<-str_replace(str_replace(name,"\\w{2,3}\\. ",""),"(\\w+), (.+)","\\2 \\1")
#remove C.
arrange_name<-str_replace(arrange_name,"C. ","")
arrange_name
## [1] "Moe Szyslak" "Montgomery Burns" "Timothy Lovejoy"
## [4] "Ned Flanders" "Homer Simpson" "Julius Hibbert"
str_detect(name,"[[:alpha:]]{2,3}[\\.]")
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
str_detect(name,"[:upper:]. ")
## [1] FALSE TRUE FALSE FALSE FALSE FALSE
Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.
[0-9]means the expression has numbers from 0 to 9. Except the numbers, it should also has $ sign. “\” means literal match.
name1<-"455$"
str_detect(name1,"[0-9]+\\$")
## [1] TRUE
[a-z] means the experession has the lower case letter from a to z,and {1,4} limits the word to be at least one but no more than four. The symble / b match the empty string at either edge of a word.
name2<-"our"
str_detect(name2,"\\b[a-z]{1,4}\\b")
## [1] TRUE
Any expression end with.txt. “.” means match any character,"*“means the preceding item will be matched zero or more times.”?" means the preceding item is optional and will be matched at most once. Double slash before “.” means literal match.
name3<-"our.txt"
str_detect(name3,".*?\\.txt$")
## [1] TRUE
Any expression with two digits, forward slash, two digits, forward slash, and then 4 digits.
name4<-"09/15/2019"
str_detect(name4,"\\d{2}/\\d{2}/\\d{4}")
## [1] TRUE
Any expression starts with <> and end with </>. “.” means match any character,“?” means preceding item is optional and will be matched at most once.“+?” looks for any tag, not one in particular. “//1” is a reference back to what was inside the starting <>
name5<-"<html>abc</html>"
str_detect(name5,"<(.+?)>.+?</\\1>")
## [1] TRUE
The following code hides a secret message. Crack it with R and regular expressions. Hint: Some of the characters are more revealing than others! The code snippet is also available in the materials at www.r-datacollection.com. clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5 fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr
code<-"clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo
Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO
d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5
fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"
# Extract upper case letters, "." and ”!“
decode<-unlist(str_extract_all(code,"[[:upper:].!]"))
#replace . as blank
paste(str_replace(decode, "[.]", " "), collapse = "")
## [1] "CONGRATULATIONS YOU ARE A SUPERNERD!"