Data 607 Week 3 Assignment

3. Copy the introductory example. The vector name stores the extracted names.

a. Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.

Answer:

library(stringr)
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"

names <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
names_split <- str_split(names, ', ', simplify = TRUE)
formatted_first_last_name <- str_c(names_split[, 2], names_split[, 1], sep = ' ')

formatted_first_last_name

## [1] " Moe Szyslak"          "C. Montgomery Burns"   " Rev. Timothy Lovejoy"
## [4] " Ned Flanders"         "Homer Simpson"         " Dr. Julius Hibbert"

b. Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.).

Answer:

library(stringr)
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"

names <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
names_split <- str_split(names, ', ', simplify = TRUE)
formatted_first_last_name <- str_c(names_split[, 2], names_split[, 1], sep = ' ')

name_has_title <- str_detect(formatted_first_last_name, "[[:alpha:]]{2,}\\.")
name_has_title

## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE

c. Construct a logical vector indicating whether a character has a second name.

Answer:

All Simpsons’ characters have last names.

library(stringr)
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"

names <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
names_split <- str_split(names, ', ', simplify = TRUE)

formatted_first_last_name <- str_c(names_split[, 2], names_split[, 1], sep = ' ')

character_has_last_name <- str_detect(formatted_first_last_name, "[[:alpha:]]+$")
character_has_last_name

## [1] TRUE TRUE TRUE TRUE TRUE TRUE

4. Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.

Answers:

a. [0-9]+\$

Matches numeric values (in the range of 0 to 9) immediately followed by a dollar sign.

Example:

library(stringr)
example_a_data <- c('136654$', 'aswwe', '1234', '$123', '500$$$$$')  
example_a <- str_extract(example_a_data, "[0-9]+\\$")
example_a

## [1] "136654$" NA        NA        NA        "500$"

b. \b[a-z]{1,4}\b

Matches the first word (or letter) it discovers of 1 to 4 lowercase alphabetical characters within the range of a to z.

Example:

library(stringr)
example_b_data <- c('the first example', '123 as23', 'clouds in the sky', '23452', 'hot water')  
example_b <- str_extract(example_b_data, "\\b[a-z]{1,4}\\b")
example_b

## [1] "the" NA    "in"  NA    "hot"

**c. .*?\.txt$**

Matches any value (numeric, alphabetical, alphanumeric, special, etc.) ending in a “.txt” extension.

Example:

library(stringr)
example_c_data <- c('s34;%^&%45#*(@#*@(42ed.txt', 'file.txt', '1234566', '23abc.txt', 'abcdefg', 'FILE.txt', '12.txt')  
example_c <- str_extract(example_c_data, ".*?\\.txt$")
example_c

## [1] "s34;%^&%45#*(@#*@(42ed.txt" "file.txt"                  
## [3] NA                           "23abc.txt"                 
## [5] NA                           "FILE.txt"                  
## [7] "12.txt"

d. \d{2}/\d{2}/\d{4}

The first part of the expression \d{2}/\d{2}/ matches 2 numeric values consisting of 2 digits, followed by a forward slash. The final part of the expression \d{4} matches a numeric value consisting of 4 digits. This regular expression would be well suited to searching for dates in the following format - 06/02/2019.

Example:

library(stringr)
example_d_data <- c('derfd06/02/2019fd', '$#/#$/*&()', '06/02/2019', '066/0222/20199', 'as/we/rrdd', '05/12/2018abcd')   
example_d <- str_extract(example_d_data, "\\d{2}/\\d{2}/\\d{4}")
example_d

## [1] "06/02/2019" NA           "06/02/2019" NA           NA          
## [6] "05/12/2018"

e. <(.+?)>.+?</\1>

Matches any value of any type wrapped in opening HTML brackets ‘<>’, followed by any value of any type, and closing with the same opening value wrapped in closing HTML brackets ’</>.

Example:

library(stringr)
example_e_data <- c('<234>desd</234>', '<h1>This is a Title</h1>', 'this will not pass')   
example_e <- str_extract(example_e_data, "<(.+?)>.+?</\\1>")
example_e

## [1] "<234>desd</234>"          "<h1>This is a Title</h1>"
## [3] NA

9. The following code hides a secret message. Crack it with R and regular expressions. Hint: Some of the characters are more revealing than others!

ANSWER:

The hint “Some of the characters are more revealing than others!” drew my attention to the numbers and uppercase letters in the code. The numbers in the code do not form a meaningful message, so I extracted all the uppercase letters and this revealed the hidden message - “CONGRATULATIONS YOU ARE A SUPERNERD”.

code <- 'clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo
Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO
d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5
fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr'

hidden_message <- str_extract_all(code, "[[:upper:]]")
hidden_message

## [[1]]
##  [1] "C" "O" "N" "G" "R" "A" "T" "U" "L" "A" "T" "I" "O" "N" "S" "Y" "O"
## [18] "U" "A" "R" "E" "A" "S" "U" "P" "E" "R" "N" "E" "R" "D"

Data 607 Week 3 Assignment

Stephen Haslett

9/12/2019