week-3-assig

Copy the introductory example. The vector name stores the extracted names.

raw.data <- "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555
-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson,Homer5553642Dr. Julius Hibbert"
name <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
name

## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson,Homer"        "Dr. Julius Hibbert"

(a) Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.

standard <- paste(str_extract(name,"[^DrRev][:alpha:]{1,}"),str_extract(name,"(?<=[:blank:]|,)[a-zA-Z]{1,}$"))
standard

## [1] "Moe Szyslak"      "Burns Montgomery" " Timothy Lovejoy"
## [4] "Ned Flanders"     "Simpson Homer"    " Julius Hibbert"

(b) Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr. ).

has_tittle <- str_detect(name,"(^[a-zA-Z]{1,}\\.)")
has_tittle

## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE

#titles
unlist(str_extract_all(name,"(^[a-zA-Z]{1,}\\.)"))

## [1] "Rev." "Dr."

#vector type
typeof(has_tittle)

## [1] "logical"

(c) Construct a logical vector indicating whether a character has a second name.

has_2name <- str_detect(name,"[a-zA-Z]{1,}\\.\\s(?=([a-zA-Z]{1,})$)")

#extract second name
unlist(str_extract_all(name,"[a-zA-Z]{1,}\\.\\s(?=([a-zA-Z]{1,})$)"))

## [1] "C. "

typeof(has_2name)

## [1] "logical"

Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.

(a) [0-9]+\$

Matches a string that has one or more numeric characters followed by a $ sign in that same string

#true if if numbers and $ are present in the same string one after the other
str_detect("6685$6","[0-9]+\\$")

## [1] TRUE

#false if the line does not have numbers
str_detect("$","[0-9]+\\$")

## [1] FALSE

#false if the line does not have $ sign
str_detect("6685","[0-9]+\\$")

## [1] FALSE

(b)\b[a-z]{1,4}\b

Matches any lower case strings that have from 1 to 4 letters/characters

#returns all the matches that have less than 4 lowercase characters
str_extract_all("I Am trying this reg Expression","\\b[a-z]{1,4}\\b")

## [[1]]
## [1] "this" "reg"

(c).*?\.txt$

This expression reads a complete string and returns a match if the last string at the end of the line has the .txt characters.

#true if at the end
str_detect("I am lewris.txt",".*?\\.txt$")

## [1] TRUE

#false if the line does not end with .txt
str_detect("I am lewris.txt but not at the end",".*?\\.txt$")

## [1] FALSE

(d) \d{2}/\d{2}/\d{4}

Matches any occurence of dates in formats mm/dd/yyyy or dd/mm/yyyy tha appear in any string.

str_extract_all("5505/06/1886, 10/06/1885, 1/2/2016  ","\\d{2}/\\d{2}/\\d{4}")

## [[1]]
## [1] "05/06/1886" "10/06/1885"

(e) <(.+?)>.+?</\1>

Matches any occurence of a strings properly formated as html tags with opening and closing tags that are not empty. The content between html tags is returned even if more html tags are present.

str_extract_all("<empty></empty>><tittle>aaaa</tittle>","<(.+?)>.+?</\\1>")

## [[1]]
## [1] "<tittle>aaaa</tittle>"

#tags inside tags are also returned
str_extract_all("<empty></empty>><tittle>aaaa</tittle><body><h1>trying html tags</h1></body>","<(.+?)>.+?</\\1>")

## [[1]]
## [1] "<tittle>aaaa</tittle>"                 
## [2] "<body><h1>trying html tags</h1></body>"

week-3-assig

Lewris Mota

February 15, 2019

Copy the introductory example. The vector name stores the extracted names.

Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.