Raw Data
The given data is unstructured and need to be structured inorder to process
-
Using stringr library, extracting the only the alphabets and punctuation mark (.) and using unlist , the extracted data is structured
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
library("stringr")
name <- unlist(str_extract_all(raw.data, "[A-Za-z,. ]{2,}"))
name
## [1] "Moe Szyslak" "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Simpson, Homer" "Dr. Julius Hibbert"
Vector
3.1 Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name
-
The data is splitted in to 2 vectors firstname and lastname using strsplit function followed by sapply and conditional statement to identify the first and last name
-
Using knitr library, to display the first and last name in tabular format
fullname <- strsplit(name, ',')
first_name <- sapply(fullname, function(x) x[1])
lastname1 <- sapply(fullname, function(x) x[length(x)])
last_name <- ifelse(first_name==lastname1,'',lastname1)
library("knitr")
library("kableExtra")
kable(data.frame(first_name,last_name)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#ea7872")
|
first_name
|
last_name
|
|
Moe Szyslak
|
|
|
Burns
|
C. Montgomery
|
|
Rev. Timothy Lovejoy
|
|
|
Ned Flanders
|
|
|
Simpson
|
Homer
|
|
Dr. Julius Hibbert
|
|
3.2 Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.)
-
Using str_detect function, identified name has Rev or Dr
title <- str_detect(first_name, "Rev.|Dr.")
kable(data.frame(first_name,last_name,title)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#ea7872")%>%
column_spec(3, background = "#f7f1e1")
|
first_name
|
last_name
|
title
|
|
Moe Szyslak
|
|
FALSE
|
|
Burns
|
C. Montgomery
|
FALSE
|
|
Rev. Timothy Lovejoy
|
|
TRUE
|
|
Ned Flanders
|
|
FALSE
|
|
Simpson
|
Homer
|
FALSE
|
|
Dr. Julius Hibbert
|
|
TRUE
|
3.3 Construct a logical vector indicating whether a character has a second name
-
Using str_detect function, identified name has Space Character
second_name <- str_detect(str_trim(last_name), " ")
kable(data.frame(first_name,last_name,title,second_name)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#ea7872") %>%
column_spec(3, background = "#f7f1e1")%>%
column_spec(4, background = "#f7f1e1")
|
first_name
|
last_name
|
title
|
second_name
|
|
Moe Szyslak
|
|
FALSE
|
FALSE
|
|
Burns
|
C. Montgomery
|
FALSE
|
TRUE
|
|
Rev. Timothy Lovejoy
|
|
TRUE
|
FALSE
|
|
Ned Flanders
|
|
FALSE
|
FALSE
|
|
Simpson
|
Homer
|
FALSE
|
FALSE
|
|
Dr. Julius Hibbert
|
|
TRUE
|
FALSE
|
Types of Strings
4. Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression 4.1 [0-9]+\$
-
Numbers between 0 to 9 ending with $ character
string<-c("findonlynumber456$?<>helloyes786*.data")
str_extract(string, "[0-9]+\\$")
[1] “456$”
4.2 \b[a-z]{1,4}\b
-
Find only the word contains 4 characters of lower case alphabets, if not the result will be NA
string<-c("echo")
str_extract(string, "\\b[a-z]{1,4}\\b")
[1] “echo”
4.3 .*?\.txt$
-
.txt is to find any character ends with .txt as extension
string<-c("filename.txt")
str_extract(string, ".*?\\.txt$")
[1] “filename.txt”
4.4 \d{2}/\d{2}/\d{4}
-
Finding 2 character, 2 character and 4 character with / as delimiter
string<-c("02/16/2019")
str_extract(string, "\\d{2}/\\d{2}/\\d{4}")
[1] “02/16/2019”
4.5 <(.+?)>.+?</\1>
-
Search for text inside xml or html object tag <tag>some_text</tag>
string<-c("<xml>find-tag-info</xml>")
str_extract(string, "<(.+?)>.+?</\\1>")
[1] “find-tag-info”
Secret Message
-
Hint: Some of the characters are more revealing than others!
-
Analyzed the pattern of characters using the Hint given, UPPER and LOWER case letters and found that UPPER CASE letters has a secret message
-
Using str_extract_all and str_replace_all, decrypted the message
msg <- paste("clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo
Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO
d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5
fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr")
msg <- paste(str_replace_all(unlist(str_extract_all(msg, "[[:upper:].]{1,}")),"[.]", " "), collapse="")
msg
[1] “CONGRATULATIONS YOU ARE A SUPERNERD”