R Markdown

Question 3.

Copy the introductory example. The vector name stores the extracted names. R> name [1] “Moe Szyslak” “Burns, C. Montgomery” “Rev. Timothy Lovejoy” [4] “Ned Flanders” “Simpson, Homer” “Dr. Julius Hibbert” (a) Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.

raw.data <- "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555
-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson,
Homer5553642Dr. Julius Hibbert"

name <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
step1 <- sub(" [A-z]{1}\\. "," ",name)  # remove initials
step2 <- sub("(\\w+),?\\s(\\w+)","\\2 \\1", step1) # switch last,first to first last 
step3 <- sub("[A-z]{2,}\\. ","",step2)  # remove titles
standard_name <- gsub('^[[:punct:]]|[[:punct:]]$', '', step3) # trim any punctuations if any. 

standard_name
## [1] "Szyslak Moe"      "Montgomery Burns" "Lovejoy Timothy" 
## [4] "Flanders Ned"     "Simpson"          "Homer"           
## [7] "Hibbert Julius"
  1. Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.).
title <- str_detect(name,"[A-z]{2,}\\. ") #logical vector to show the title.
title
## [1] FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
  1. Construct a logical vector indicating whether a character has a second name.
second_name <- str_detect(name," [A-z]{1}\\. ") #logical vector to show the second name. 
second_name
## [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
nameDF <- data.frame(standard_name,title, second_name)
names(nameDF) <- c("Name" , "Title", "Second_Name")
nameDF
##               Name Title Second_Name
## 1      Szyslak Moe FALSE       FALSE
## 2 Montgomery Burns FALSE        TRUE
## 3  Lovejoy Timothy  TRUE       FALSE
## 4     Flanders Ned FALSE       FALSE
## 5          Simpson FALSE       FALSE
## 6            Homer FALSE       FALSE
## 7   Hibbert Julius  TRUE       FALSE

Question 4.

Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression. (a) [0-9]+\$ Regex to extract the digit sufixed with $

strA <- "The price of my watch is 67$"

dollarValues <- str_extract_all(strA , "[0-9]+\\$")
dollarValues
## [[1]]
## [1] "67$"
  1. \b[a-z]{1,4}\b

This reges expression extract a word matches 1 to 4 character lower case letters.

strB <-  "Lord says, My grace is sufficient for you, for my power is made perfect in weakness. Therefore I will boast all the more gladly about my weaknesses, so that Christ power may rest on me."

words <- str_extract_all(strB , "\\b[a-z]{1,4}\\b")
words
## [[1]]
##  [1] "says" "is"   "for"  "you"  "for"  "my"   "is"   "made" "in"   "will"
## [11] "all"  "the"  "more" "my"   "so"   "that" "may"  "rest" "on"   "me"
  1. .*?\.txt$

This regex extract the any character pattern ending with .txt. This can be used to fetch the filenames.

filenames <- "Automatic_data_collection.txt"
files <- str_extract_all(filenames , ".*?\\.txt$")
files
## [[1]]
## [1] "Automatic_data_collection.txt"
  1. \d{2}/\d{2}/\d{4}

This Regex extract the date pattern dd/mm/yyyy.

01/01/1982

numericalue <- "56/56 20/11/2017  "
date_pattern <- str_extract_all(numericalue , "\\d{2}/\\d{2}/\\d{4}")
date_pattern
## [[1]]
## [1] "20/11/2017"
  1. <(.+?)>.+?</\1>

This Regex matches the open and close tag and the value inside the tag.

htmltag <- "5635 <name>charls joseph</name>38787"
tags <- str_extract_all(htmltag , "<(.+?)>.+?</\\1>")
tags
## [[1]]
## [1] "<name>charls joseph</name>"

Question 9:

The following code hides a secret message. Crack it with R and regular expressions. Hint: Some of the characters are more revealing than others! The code snippet is also available in the materials at www.r-datacollection.com.

clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5 fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr

message <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo
Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO
d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5
fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"

secret_message <- str_extract_all(message , "[[:upper:][:punct:]]")
secret_message
## [[1]]
##  [1] "C" "O" "N" "G" "R" "A" "T" "U" "L" "A" "T" "I" "O" "N" "S" "." "Y"
## [18] "O" "U" "." "A" "R" "E" "." "A" "." "S" "U" "P" "E" "R" "N" "E" "R"
## [35] "D" "!"