Assignment 3

Problem 3

Copy the introductory example. The vector name stores the extracted names.

library(tidyverse)
raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"

(a) Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.

#use regular expression to extract all the characters from the raw data
chara <- unlist(str_extract_all(raw.data,"[A-Za-z,.\\s]+"))
#remove the empty elements
name <- chara[chara!=" "]
name

## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

(b) Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.).

title_pat <- "Rev. |Dr. "
title_logi <- str_detect(name,title_pat)
title_logi

## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE

(c) Construct a logical vector indicating whether a character has a second name.

#remove the title from name
name_notitle <- str_replace_all(name, title_pat, "")
#count the numbers of space to determine whether a character has a second name
sec_name_logi <- str_count(name_notitle, ' ') >=2
sec_name_logi

## [1] FALSE  TRUE FALSE FALSE FALSE FALSE

Problem 4

Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.

(a) [0-9]+\$

The part of the string begin with at least one integer and end with a "$".
Example: "12345$"

example1 <- c("1$","12345$","12345abc$","abc12345$","ab$")
str_view(example1,"[0-9]+\\$")

(b) \b[a-z]{1,4}\b

The part of the string can only contain 1 to 4 lower case letters, no digits or any other types of data. 
Example: "how"

example2 <- c("how","ARE","you?","?fine?"," datascience")
str_view(example2,"\\b[a-z]{1,4}\\b")

(c) .*?\.txt$

The part of the string can begin with anything but must end with exactly ".txt". 
Example: "text.txt"

example3 <- c("text.txt","text.txtt",".txt","?text.txt?","\n.txt")
str_view(example3,".*?\\.txt$")

(d) \d{2}/\d{2}/\d{4}

The part of the string begin with 2 digits follow by a "/" then follow by other pair of 2 digits and "/", and finally follow by other 4 digits.  
Example: "02/17/2019"

example4 <- c("02/17/2019","ab/cd/2019","023/17/20199","2/17/19","AB02/17/2019CD")
str_view(example4,"\\d{2}/\\d{2}/\\d{4}")

(e) <(.+?)>.+?</\1>

The first part of the string begin with "<", and follow by at least a character, then ">".
The middle part is also at least one character.
The last part begin with "<", follow by "/", and end with ">".
Example: "<img>image</img>"

example5 <- c("<img>image</img>","<img>image</>","<img></img>","<>image</img>","<i>image</img>")
str_view(example5,"<(.+?)>.+?</\\1>")

Problem 9

The following code hides a secret message. Crack it with R and regular expressions. Hint: Some of the characters are more revealing than others! The code snippet is also available in the materials at www.r-datacollection.com.

clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5 fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr

txt <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5 fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"

upper_txt <- unlist(str_extract_all(txt,"[A-Z.]+"))
upper_txt_ws <- paste0(upper_txt,collapse = "")
upper_txt_ws

## [1] "CONGRATULATIONS.YOU.ARE.A.SUPERNERD"

Assignment 3

Qixing Li

February 17, 2019

Problem 3

Copy the introductory example. The vector name stores the extracted names.

(a) Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.

(b) Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.).

(c) Construct a logical vector indicating whether a character has a second name.

Problem 4

Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.

(a) [0-9]+\$

(b) \b[a-z]{1,4}\b

(c) .*?\.txt$

(d) \d{2}/\d{2}/\d{4}

(e) <(.+?)>.+?</\1>

Problem 9

The following code hides a secret message. Crack it with R and regular expressions. Hint: Some of the characters are more revealing than others! The code snippet is also available in the materials at www.r-datacollection.com.