Raw Data

The given data is unstructured and need to be structured inorder to process
  • Using stringr library, extracting the only the alphabets and punctuation mark (.) and using unlist , the extracted data is structured
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"

library("stringr")

name <- unlist(str_extract_all(raw.data, "[A-Za-z,. ]{2,}"))
name
## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

Vector

3.1 Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name
  • The data is splitted in to 2 vectors firstname and lastname using strsplit function followed by sapply and conditional statement to identify the first and last name
  • Using knitr library, to display the first and last name in tabular format
fullname <- strsplit(name, ',')

first_name <- sapply(fullname, function(x) x[1])
lastname1 <- sapply(fullname, function(x) x[length(x)])
last_name <- ifelse(first_name==lastname1,'',lastname1)

library("knitr")
library("kableExtra")
kable(data.frame(first_name,last_name)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#ea7872")
first_name last_name
Moe Szyslak
Burns C. Montgomery
Rev. Timothy Lovejoy
Ned Flanders
Simpson Homer
Dr. Julius Hibbert
3.2 Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.)
  • Using str_detect function, identified name has Rev or Dr
title <- str_detect(first_name, "Rev.|Dr.")
kable(data.frame(first_name,last_name,title)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#ea7872")%>%
column_spec(3, background = "#f7f1e1")
first_name last_name title
Moe Szyslak FALSE
Burns C. Montgomery FALSE
Rev. Timothy Lovejoy TRUE
Ned Flanders FALSE
Simpson Homer FALSE
Dr. Julius Hibbert TRUE
3.3 Construct a logical vector indicating whether a character has a second name
  • Using str_detect function, identified name has Space Character
second_name <- str_detect(str_trim(last_name), " ") 
kable(data.frame(first_name,last_name,title,second_name)) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#ea7872") %>%
column_spec(3, background = "#f7f1e1")%>%
column_spec(4, background = "#f7f1e1")
first_name last_name title second_name
Moe Szyslak FALSE FALSE
Burns C. Montgomery FALSE TRUE
Rev. Timothy Lovejoy TRUE FALSE
Ned Flanders FALSE FALSE
Simpson Homer FALSE FALSE
Dr. Julius Hibbert TRUE FALSE

Types of Strings

4. Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression
4.1 [0-9]+\$
  • Numbers between 0 to 9 ending with $ character
string<-c("findonlynumber456$?<>helloyes786*.data")
str_extract(string, "[0-9]+\\$")
[1] “456$”

4.2 \b[a-z]{1,4}\b
  • Find only the word contains 4 characters of lower case alphabets, if not the result will be NA
string<-c("echo")
str_extract(string, "\\b[a-z]{1,4}\\b")
[1] “echo”

4.3 .*?\.txt$
  • .txt is to find any character ends with .txt as extension
string<-c("filename.txt")
str_extract(string, ".*?\\.txt$")
[1] “filename.txt”

4.4 \d{2}/\d{2}/\d{4}
  • Finding 2 character, 2 character and 4 character with / as delimiter
string<-c("02/16/2019")
str_extract(string, "\\d{2}/\\d{2}/\\d{4}")
[1] “02/16/2019”

4.5 <(.+?)>.+?</\1>
  • Search for text inside xml or html object tag <tag>some_text</tag>
string<-c("<xml>find-tag-info</xml>")
str_extract(string, "<(.+?)>.+?</\\1>")

[1] “find-tag-info

Secret Message

  • Hint: Some of the characters are more revealing than others!
  • Analyzed the pattern of characters using the Hint given, UPPER and LOWER case letters and found that UPPER CASE letters has a secret message
  • Using str_extract_all and str_replace_all, decrypted the message
msg <- paste("clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo
Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO
d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5
fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr")
msg <- paste(str_replace_all(unlist(str_extract_all(msg, "[[:upper:].]{1,}")),"[.]", " "), collapse="")
msg

[1] “CONGRATULATIONS YOU ARE A SUPERNERD”