HW 3 Data 607

3.Copy the introductory example. The vector name stores the extracted names.

R> name [1] “Moe Szyslak” “Burns, C. Montgomery” “Rev. Timothy Lovejoy” [4] “Ned Flanders” “Simpson, Homer” “Dr. Julius Hibbert”

Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.

require(stringr)

## Loading required package: stringr

raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"

Using the code that is provided in the book we first extract all of the names

name<-unlist(str_extract_all(raw.data,"[[:alpha:]., ]{2,}"))
name

## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

Only two names are not in the first name last name format, so one way to change that is by using the str_replace function in order to change the order.Reference to this is in the bottom.

names1<-(str_replace(name,"Burns, C. Montgomery","C. Montgomery Burns"))
fnamelname<-(str_replace(names1,"Simpson, Homer","Homer Simpson"))
fnamelname

## [1] "Moe Szyslak"          "C. Montgomery Burns"  "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Homer Simpson"        "Dr. Julius Hibbert"

We put the names in the data.frame so that it can be listed.

names3<-data.frame(fnamelname)
names3

##             fnamelname
## 1          Moe Szyslak
## 2  C. Montgomery Burns
## 3 Rev. Timothy Lovejoy
## 4         Ned Flanders
## 5        Homer Simpson
## 6   Dr. Julius Hibbert

Using the code that was used to extract the names to build on the code and the Stringr cheat sheet which list all the Regular Expressions along with explanations of what they are we can make a logical vector to see if a name has a title or if there is a second name.

Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.).

tname<-str_detect(fnamelname,"\\w[:alpha:]+\\.{1,2}")
tname

## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE

str_extract(fnamelname,"\\w[:alpha:]+\\.{1,2}")

## [1] NA     NA     "Rev." NA     NA     "Dr."

Construct a logical vector indicating whether a character has a second name.

sname<-str_detect(fnamelname,"[:upper:]+\\.{1,}")
sname

## [1] FALSE  TRUE FALSE FALSE FALSE FALSE

str_extract(fnamelname,"[:upper:]+\\.{1,}")

## [1] NA   "C." NA   NA   NA   NA

4.Describe the types of strings that conform to the following regular expressions andconstruct an example that is matched by the regular expression.

[0-9]+\$

This a string of numbers that is from 0-9 and is followed by $ sign.

ex1<-"6163$dfase16464$dfpiowejof445245644$"
str_extract_all(ex1,"[0-9]+\\$")

## [[1]]
## [1] "6163$"      "16464$"     "445245644$"

\b[a-z]{1,4}\b

Word bounderies from a-z and words that have one to four letters.

ex2<-"The four letter word is nice"
str_extract_all(ex2,"\\b[a-z]{1,4}\\b")

## [[1]]
## [1] "four" "word" "is"   "nice"

.*?\.txt$

Searches for words that have the extension .txt in the end.

ex3<-c("yes.html","no.csv","baseball.txt","basketball.txt","this.com")
str_extract(ex3,".*?\\.txt$")

## [1] NA               NA               "baseball.txt"   "basketball.txt"
## [5] NA

\d{2}/\d{2}/\d{4}

Digits that are spaced out like 12/12/1234 so its two numbers before / then another two then / four numbers.

ex4<-"54/15/1486/4646/45/6456/4165/454/65"
str_extract_all(ex4,"\\d{2}/\\d{2}/\\d{4}")

## [[1]]
## [1] "54/15/1486" "46/45/6456"

<(.+?)>.+?</\1>

Words that begin <> and end with </ > and the inside of either set of arrows has to be the same or else it would not be recognised.

ex5<-"<c>why</c>nknlk<x>igvbcui</x>khjgkldfhgohdfn<r>there</r>nohio<o>hjkgjk</c>"
str_extract_all(ex5,"<(.+?)>.+?</\\1>")

## [[1]]
## [1] "<c>why</c>"     "<x>igvbcui</x>" "<r>there</r>"

9.The following code hides a secret message. Crack it with R and regular expressions.Hint: Some of the characters are more revealing than others! The code snippet is also available in the materials at www.r-datacollection.com.

clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5 fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr

Using the hint provided one can notice the capital letters and the puntuations, so you have to find a way to extract the capital letters and the puntuations.

smsg<-"clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo
Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO
d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5
fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"

smsg1<-str_extract_all(smsg,"[[:upper:].!]")
smsg1

## [[1]]
##  [1] "C" "O" "N" "G" "R" "A" "T" "U" "L" "A" "T" "I" "O" "N" "S" "." "Y"
## [18] "O" "U" "." "A" "R" "E" "." "A" "." "S" "U" "P" "E" "R" "N" "E" "R"
## [35] "D" "!"

References How to replace a word inside a string https://www.tutorialrepublic.com/faq/how-to-replace-a-word-inside-a-string-in-php.php

Stringr Cheat Sheet http://edrub.in/CheatSheets/cheatSheetStringr.pdf

HW 3 Data 607

Maryluz Cruz

9/12/2019

3.Copy the introductory example. The vector name stores the extracted names.

4.Describe the types of strings that conform to the following regular expressions andconstruct an example that is matched by the regular expression.

9.The following code hides a secret message. Crack it with R and regular expressions.Hint: Some of the characters are more revealing than others! The code snippet is also available in the materials at www.r-datacollection.com.