Assignment

R Character Manipulation and Date Processing

Question 3

Copy the introductory example. The vector name stores the extracted names.

library(stringr)
library(knitr)

Introductory Example

raw.data <- "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"

Extract vectors, Vector name stores the extracted name

name <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
name

## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

Replace the title and the middle name for name dataframe with blank

remove_title_middle <- str_replace(name,"([[:alpha:]]{1,3}\\.\\s)", "")

Rearrange the vector so that all elements conform to the standard first_name last_name.

Reverse the first and last name where necessary

firstName_lastName <- str_replace(remove_title_middle, "([[:alpha:]]+), ([[:alpha:]]+)", "\\2 \\1")

kable(list(data.frame(name, firstName_lastName)), caption = "Reorder \"name\" so it conform to the standard first and last name format.")

Reorder “name” so it conform to the standard first and last name format.

name	firstName_lastName
Moe Szyslak	Moe Szyslak
Burns, C. Montgomery	Montgomery Burns
Rev. Timothy Lovejoy	Timothy Lovejoy
Ned Flanders	Ned Flanders
Simpson, Homer	Homer Simpson
Dr. Julius Hibbert	Julius Hibbert

As we can see, all the names are separated as the standard format, firstName, lastName

Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.).

title_char <- str_detect(name, "[[:alpha:]]{2,3}\\. ")
kable(list(data.frame(name, title_char)), caption = "Detect if names in \"name\" has a title.")

Detect if names in “name” has a title.

name	title_char
Moe Szyslak	FALSE
Burns, C. Montgomery	FALSE
Rev. Timothy Lovejoy	TRUE
Ned Flanders	FALSE
Simpson, Homer	FALSE
Dr. Julius Hibbert	TRUE

Construct a logical vector indicating whether a character has a second name.

second_name <- str_detect(name, " [[:alpha:],]{1,}")
kable(list(data.frame(name, second_name)), caption = "Detect if names in \"name\" has a second name.")

Detect if names in “name” has a second name.

name	second_name
Moe Szyslak	TRUE
Burns, C. Montgomery	TRUE
Rev. Timothy Lovejoy	TRUE
Ned Flanders	TRUE
Simpson, Homer	TRUE
Dr. Julius Hibbert	TRUE

All the names in the name vector has a second name, therefore, the resulting table has TRUE for all the inputs.

Question 4

Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.

[0-9]+\$

This regular expression is matched by a string which starts with a number/s [0-9], + sign means that the preceeding item will be matched one or more time, \$ ends with a $ sign

example:

str_1<- c("123$", "abc$", "-1098$", "test$", 27)
test_1<- unlist(str_extract_all(str_1, "[0-9]+\\$"))
test_1

## [1] "123$"  "1098$"

To detect if the above regular expression works:

test_1 <- str_detect(str_1, "[0-9]+\\$")
test_1

## [1]  TRUE FALSE  TRUE FALSE FALSE

\b[a-z]{1,4}\b

This regular expression is matched by a string which is followed by lowe case alphabets which ranges between a to z.We are asking the function for all instances where this sequence appears at least once, but at most four times. \b indicates the word edges

str_2<- c("aa", "bzdg", "bnewyork", "jeny")
test_2<- unlist(str_extract_all(str_2, "\\b[a-z]{1,4}\\b"))
test_2

## [1] "aa"   "bzdg" "jeny"

To detect if the above regular expression works:

test_2 <- str_detect(str_2, "\\b[a-z]{1,4}\\b")
test_2

## [1]  TRUE  TRUE FALSE  TRUE

.*?\.txt$

This regular expression is matched by any string which ends by .txt

str_3<- c("aa.txt", "bzdg", "$bnewyork", "sneha.txt")
test_3<- unlist(str_extract_all(str_3, ".*?\\.txt$"))
test_3

## [1] "aa.txt"    "sneha.txt"

To detect if the above regular expression works:

test_3 <- str_detect(str_3, "\\b[a-z]{1,4}\\b")
test_3

## [1]  TRUE  TRUE FALSE  TRUE

\d{2}/\d{2}/\d{4}

This regular expression is matched by a number pattern dd/dd/dddd

str_4<- c("12/12/2019", "11-22/2018", "11/22/2018", "sneha.txt")
test_4<- unlist(str_extract_all(str_4, "\\d{2}/\\d{2}/\\d{4}"))
test_4

## [1] "12/12/2019" "11/22/2018"

To detect if the above regular expression works:

test_4 <- str_detect(str_4, "\\b[a-z]{1,4}\\b")
test_4

## [1] FALSE FALSE FALSE  TRUE

<(.+?)>.+?</\1>

This regular expression is matched by a pattern with open and closed tags, like the html tags.

str_5 <- c("<b> qwerty </b>", "<h1>priya <h1>", "<>test</tag>", "<tag> helloworld </tag>")
test_5 <- unlist(str_extract_all(str_5, "<(.+?)>.+?</\\1>"))
test_5

## [1] "<b> qwerty </b>"         "<tag> helloworld </tag>"

To detect if the above regular expression works:

test_5 <- str_detect(str_5, "<(.+?)>.+?</\\1>")
test_5

## [1]  TRUE FALSE FALSE  TRUE

Question 9

The following code hides a secret message. Crack it with R and regular expressions. Hint: Some of the characters are more revealing than others! The code snippet is also available in the materials at www.r-datacollection.com. clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr

str_test <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"
str_extract_all(str_test, "[a-z]")

## [[1]]
##   [1] "c" "l" "c" "o" "p" "o" "w" "z" "m" "s" "t" "c" "d" "w" "n" "k" "i"
##  [18] "g" "v" "d" "i" "c" "p" "u" "g" "g" "v" "h" "r" "y" "n" "j" "u" "w"
##  [35] "c" "z" "i" "h" "q" "r" "f" "p" "x" "s" "j" "d" "w" "p" "n" "a" "n"
##  [52] "w" "o" "w" "i" "s" "d" "i" "j" "j" "k" "p" "f" "d" "r" "c" "o" "c"
##  [69] "b" "t" "y" "c" "z" "j" "a" "t" "a" "o" "o" "t" "j" "t" "j" "n" "e"
##  [86] "c" "f" "e" "k" "r" "w" "w" "w" "o" "j" "i" "g" "d" "v" "r" "f" "r"
## [103] "b" "z" "b" "k" "n" "b" "h" "z" "g" "v" "i" "z" "c" "r" "o" "p" "w"
## [120] "g" "n" "b" "q" "o" "f" "a" "o" "t" "f" "b" "w" "m" "k" "t" "s" "z"
## [137] "q" "e" "f" "y" "n" "d" "t" "k" "c" "f" "g" "m" "c" "g" "x" "o" "n"
## [154] "h" "k" "g" "r"

str_extract_all(str_test, "[A-Z]")

## [[1]]
##  [1] "C" "O" "N" "G" "R" "A" "T" "U" "L" "A" "T" "I" "O" "N" "S" "Y" "O"
## [18] "U" "A" "R" "E" "A" "S" "U" "P" "E" "R" "N" "E" "R" "D"

The hidden message is revealed. “CONGRATULATIONS YOU ARE A SUPER NERD”

Assignment_3

Priya Shaji

2/14/2019

R Character Manipulation and Date Processing

Question 3

Question 4

Question 9