Data 607 Assignment 3

John Kellogg

2019-09-13


Question 3

Copy the introductory example. The vector name stores the extracted names.

  • raw.data <-“555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert”
## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"
## [1] "555-1239"       "(636) 555-0113" "555-6542"       "555 8904"      
## [5] "636-555-3226"   "5553642"
##              user name   phone number
## 1          Moe Szyslak       555-1239
## 2 Burns, C. Montgomery (636) 555-0113
## 3 Rev. Timothy Lovejoy       555-6542
## 4         Ned Flanders       555 8904
## 5       Simpson, Homer   636-555-3226
## 6   Dr. Julius Hibbert        5553642

a. Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.

## [1] "Moe Szyslak"          " C. Montgomery Burns" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         " Homer Simpson"       "Dr. Julius Hibbert"
## [1]    5551239 6365550113    5556542    5558904 6365553226    5553642
## [1] "555-1239"     "636-555-3226" "555-8904"     "555-3642"    
## [5] "636-555-0113" "555-6542"
##                   name        phone
## 1          Moe Szyslak     555-1239
## 2  C. Montgomery Burns 636-555-3226
## 3 Rev. Timothy Lovejoy     555-8904
## 4         Ned Flanders     555-3642
## 5        Homer Simpson 636-555-0113
## 6   Dr. Julius Hibbert     555-6542

b. Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.)

## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE
## [1] "NA"   "NA"   "Rev." "NA"   "NA"   "Dr."
##   title                 name        phone
## 1    NA          Moe Szyslak     555-1239
## 2    NA  C. Montgomery Burns 636-555-3226
## 3  Rev.      Timothy Lovejoy     555-8904
## 4    NA         Ned Flanders     555-3642
## 5    NA        Homer Simpson 636-555-0113
## 6   Dr.       Julius Hibbert     555-6542

c. Construct a logical vector indicating whether a character has a second name.

##   title                 name        phone middle
## 1    NA          Moe Szyslak     555-1239   <NA>
## 2    NA  C. Montgomery Burns 636-555-3226    yes
## 3  Rev.      Timothy Lovejoy     555-8904   <NA>
## 4    NA         Ned Flanders     555-3642   <NA>
## 5    NA        Homer Simpson 636-555-0113   <NA>
## 6   Dr.       Julius Hibbert     555-6542   <NA>

Question 4

Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.

(a) [0-9]+\$

  • any string of numbers with a $ at the end of the string
## [1] "12$"  "1$"   "578$" "487$"

(b) \b[a-z]{1,4}\b

  • a character groupings of lower case letters between 1 to 4 characters
## [1] "drt"  "txt"  "few"  "data" "txt"  "html" "html" "load"

(c) .?\.txt$
Any string which ends with .txt

## [1] "Lenovo.txt" "phone8.txt"

(d) \d{2}/\d{2}/\d{4}

  • Any string which matches “2 digits / 2 digits / 4 digits” (think a date)
## [1] "02/24/1954" "01/08/2019"

(e) <(.+?)>.+?</\1>

  • Any string which matches any text
  • Where text1 is the same characters and second set of Text1 begins with backslash
## [1] "<html> table </html>"

Secret Message

The following code hides a secret message. Crack it with R and regular expressions. Hint: Some of the characters are more revealing than others! The code snippet is also available in the materials at www.r-datacollection.com.

## [1] "CONGRATULATIONS YOU ARE A SUPERNERD"