Assignment 4

3 (a)

library(stringr)

names <- c("Moe Szyslak", "Burns, C. Montgomery", "Rev. Timothy Lovejoy",
           "Ned Flanders", "Simpson, Homer", "Dr. Julius Hibbert")

The names list with nams followed by “,” will be considered as the last names. Split the whole list names in substrings separated by “,” using strsplit(). The function sapply applies to the function x, where x reverses the substrings in the position and pasting collapse with spaces(" “).THe combined function is as follows.

names

## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

upd_names<- sapply(strsplit(names, split=", "),function(x) 
  {paste(rev(x),collapse=" ")})
upd_names

## [1] "Moe Szyslak"          "C. Montgomery Burns"  "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Homer Simpson"        "Dr. Julius Hibbert"

3 (b)

We create a logical vector containing “Rev.” and “Dr.” and check in thee list of names to see if the original names list contain these titles.

log_vec <- c("Rev.", "Dr.")

str_detect(names, log_vec)

## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE

The method we recently used is acceptable to our case as we knew about the names containing titles.

We need to generalize the result to a greater list, we can use the following expression. The commonly used convention for titles upper case letter followed by one or more lower case follwed by “.”

str_extract(names, "[[:upper:]][:lower:]{1,}\\.+")

## [1] NA     NA     "Rev." NA     NA     "Dr."

3 (c)

’The most commonly used convention for second name (middle name) is upper case letter followed by “.” . We will use this to detect the middle name in the list of names.

str_extract(names, "[[:upper:]]\\.+")

## [1] NA   "C." NA   NA   NA   NA

7.

HTML_tag <- c("<title>+++BREAKING NEWS+++</title>")

The regular expression <.+> extract the sequence that run from < to > The function return all possible sequence of preceding character.

str_extract(HTML_tag, "<.+>")

## [1] "<title>+++BREAKING NEWS+++</title>"

In order to get the first HTML tag, we modify the regular expression by adding ?

that is the expression becomes <.+?>

str_extract(HTML_tag, "<.+?>")

## [1] "<title>"

8.

binomial_th <- c("(5-3)^2=5^2-2*5*3+3^2")

str_extract(binomial_th, "[^(0-9)=+*()]")

## [1] "-"

Using a caret (^) in the begining matches the inverse of the character class contents. ^(0-9) will refer to non-digits. We can alter this by putting - at the begining or at the end of a character class.The correct expression wiil be [-^0-9=+*()]+ Using [-^0-9=+*()]+ will yield the required results.

str_extract(binomial_th, "[-^0-9=+*()]+")

## [1] "(5-3)^2=5^2-2*5*3+3^2"

Another expression for obtaining the same result will be:

str_extract(binomial_th, "[-^[:digit:]=+*()]+")

## [1] "(5-3)^2=5^2-2*5*3+3^2"

9

code <- c("clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo
Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO
d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5
fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr")

#No Luck!!

Assignment 4

GP SINGH

February 18, 2016

3 (a)

3 (b)

3 (c)

7.

8.

9