Copy the introductory example. The vector name stores the extracted names. R> name [1] “Moe Szyslak” “Burns, C. Montgomery” “Rev. Timothy Lovejoy” [4] “Ned Flanders” “Simpson, Homer” “Dr. Julius Hibbert” (a) Use the tools of this chapter to rearrange the vector so that all elements conform to thestandardfirst_name last_name. (b) Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.). (c) Construct a logical vector indicating whether a character has a second name.
# (a). Use the tools of this chapter to rearrange the vector so that all elements conform to thestandardfirst_name last_name.
firtName_LastName <- unlist(sub("([A-z]{1,}), *([A-z]{1,})", "\\2 \\1", sub("[A-z]{1,}\\. "," ",name)))
firtName_LastName
## [1] "Moe Szyslak" "Montgomery Burns" " Timothy Lovejoy"
## [4] "Ned Flanders" "Homer Simpson" " Julius Hibbert"
# (b). Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.).
tileCheckVector <- c(str_detect(name, "(^[A-z]{1,}\\. ).*"))
tileCheckDF <- data.frame(name, tileCheckVector)
names(tileCheckDF) <- c("Name", "Has Title?")
tileCheckDF
## Name Has Title?
## 1 Moe Szyslak FALSE
## 2 Burns, C. Montgomery FALSE
## 3 Rev. Timothy Lovejoy TRUE
## 4 Ned Flanders FALSE
## 5 Simpson, Homer FALSE
## 6 Dr. Julius Hibbert TRUE
# (c). Construct a logical vector indicating whether a character has a second name.
secondNameCheckVector <- str_detect(name, " [A-z]{1,}\\. ")
secondNameCheckDF <- data.frame(name, secondNameCheckVector)
names(secondNameCheckDF) <- c("Name", "Has Second Name?")
secondNameCheckDF
## Name Has Second Name?
## 1 Moe Szyslak FALSE
## 2 Burns, C. Montgomery TRUE
## 3 Rev. Timothy Lovejoy FALSE
## 4 Ned Flanders FALSE
## 5 Simpson, Homer FALSE
## 6 Dr. Julius Hibbert FALSE
Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression. (a) [0-9]+\$ (b) \b[a-z]{1,4}\b (c) .*?\.txt$ (d) \d{2}/\d{2}/\d{4} (e) <(.+?)>.+?</\1>
# (a) [0-9]+\\$ - N number of digits followed by $
cat("\014")
q4aData <- c("Kalyan: 4073647173$", "Partha: 4073647173")
q4aDataDF <- data.frame(q4aData, str_detect(q4aData, "[0-9]+\\$"))
names(q4aDataDF) <- c("Data", "Pattern Matched?")
q4aDataDF
## Data Pattern Matched?
## 1 Kalyan: 4073647173$ TRUE
## 2 Partha: 4073647173 FALSE
# (b) \\b[a-z]{1,4}\\b
namesList <- c("tom phan", "chris esser", "eric fisher", "james johnson")
namesListDF <- data.frame(namesList, str_detect(namesList, "\\b[a-z]{1,4}\\b"))
names(namesListDF) <- c("Name", "Pattern Matched?")
namesListDF
## Name Pattern Matched?
## 1 tom phan TRUE
## 2 chris esser FALSE
## 3 eric fisher TRUE
## 4 james johnson FALSE
# (c) .*?\\.txt$
filesList <- c("ReadMe.txt", "index.html", "setup.ini", "resume.txt")
filesListDF <- data.frame(filesList, str_detect(filesList, ".*?\\.txt$"))
names(filesListDF) <- c("File Name", "Pattern Matched?")
filesListDF
## File Name Pattern Matched?
## 1 ReadMe.txt TRUE
## 2 index.html FALSE
## 3 setup.ini FALSE
## 4 resume.txt TRUE
# (d) \\d{2}/\\d{2}/\\d{4}
dobList <- c("03/05/1981", "02-09-1976", "04/05/2005", "05-22-2010")
dobListDF <- data.frame(dobList, str_detect(dobList, "(\\d{2}/\\d{2}/\\d{4})"))
names(dobListDF) <- c("DOB", "Pattern Matched?")
dobListDF
## DOB Pattern Matched?
## 1 03/05/1981 TRUE
## 2 02-09-1976 FALSE
## 3 04/05/2005 TRUE
## 4 05-22-2010 FALSE
# (e) <(.+?)>.+?</\\1>
htmlScript <- c("<table>Table</table>", "<tr>Rows</tr>", "<td>Improper cell definition<td>")
htmlScriptDF <- data.frame(htmlScript, str_detect(htmlScript, "<(.+?)>.+?</\\1>"))
names(htmlScriptDF) <- c("HTML Syntax", "Pattern Matched?")
htmlScriptDF
## HTML Syntax Pattern Matched?
## 1 <table>Table</table> TRUE
## 2 <tr>Rows</tr> TRUE
## 3 <td>Improper cell definition<td> FALSE
The following code hides a secret message. Crack it with R and regular expressions. Hint: Some of the characters are more revealing than others! The code snippet is also available in the materials at www.r-datacollection.com. clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5 fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr
dataString <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"
dataStringDecoded <- unlist(str_extract_all(dataString, "[[:upper:].]{1,}"))
dataStringDecoded <- str_replace_all(paste(dataStringDecoded, collapse = ''), fixed("."), " ")
dataStringDecoded
## [1] "CONGRATULATIONS YOU ARE A SUPERNERD"