Regular Expressions and String Functions

Anil Akyildirim

2019-09-12

Question 1

Copy the introductory example. The vector name stores the extracted names.

R> name [1] “Moe Szyslak” “Burns, C. Montgomery” “Rev. Timothy Lovejoy” [4] “Ned Flanders” “Simpson, Homer” “Dr. Julius Hibbert”

  1. Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.

  2. Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.).

  3. Construct a logical vector indicating whether a character has a second name.

Answer 1

## Installing package into 'C:/Users/Anil Akyildirim/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## package 'stringr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Anil Akyildirim\AppData\Local\Temp\RtmpAHN0c4\downloaded_packages
## [1] "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555 -6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

Answer (a)

When we look at the name list we can see that we need to pay attention.

  • Title of a person

  • Spaces

  • Order of the first name and the last name

  • Middle Initial of an individual.

  • Commas

## [1] "Moe Szyslak"          "Burns C. Montgomery"  "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson Homer"        "Dr. Julius Hibbert"
## [1] "Moe Szyslak"         "Burns C. Montgomery" "Timothy Lovejoy"    
## [4] "Ned Flanders"        "Simpson Homer"       "Julius Hibbert"
## [1] "Moe Szyslak"      "Burns Montgomery" "Timothy Lovejoy" 
## [4] "Ned Flanders"     "Simpson Homer"    "Julius Hibbert"
## [1] "Szyslak Moe"      "Montgomery Burns" "Lovejoy Timothy" 
## [4] "Flanders Ned"     "Homer Simpson"    "Hibbert Julius"

Answer (b) Logical Vector indicating whether a character has a title

## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE

Answer (c) Logical Vector indicating whether a character has a middle name

## [1] FALSE  TRUE FALSE FALSE FALSE FALSE

Question 2

Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.

  1. [0-9]+\$

  2. \b[a-z]{1,4}\b

  3. .*?\.txt$

  4. \d{2}/\d{2}/\d{4}

  5. <(.+?)>.+?</\1>

Answer 2

Answer (a)

This string that has one or more numbers between 0 and 9 and will end with $ so something like this - 589$ -

## [1] "In Europe they use $ sign at the end of the numbers, for example 435$"
## [[1]]
## [1] "435$"

Answer(b)

The string that has any character counts from 1 to 4. (words that has 1 to 4 letters/characters)

## [[1]]
## [1] "they" "use"  "sign" "at"   "the"  "end"  "of"   "the"  "for"

Answer(c)

The string that ends with .txt

## [1] "We have a lot of files named info.txt"
## [[1]]
## [1] "We have a lot of files named info.txt"

Answer(d)

The string consists of 2 digit number followed by “/”, 2 digit number follwed by “/” and 4 digit number. Looks like date/month/year or month/date/year.

## [[1]]
## [1] "08/27/2019"

Answer(e)

The string consists of <…> and anything in between and ends with </> . Looks like html.

## [[1]]
## [1] "<p>These pretzels are making me thirsty!</p>"

Question 9

The following code hides a secret message. Crack it with R and regular expressions. Hint: Some of the characters are more revealing than others! The code snippet is also available in the materials at www.r-datacollection.com.

clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5 fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr

Answer 9

## [1] "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5 fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"
## [[1]]
##  [1] "1" "0" "8" "7" "7" "9" "2" "8" "5" "5" "0" "7" "8" "0" "3" "5" "3"
## [18] "0" "7" "5" "5" "3" "3" "6" "4" "1" "1" "6" "2" "2" "4" "9" "0" "5"
## [35] "6" "5" "1" "7" "2" "4" "6" "3" "9" "5" "8" "9" "6" "5" "9" "4" "9"
## [52] "0" "5" "4" "5"
## [[1]]
##  [1] "C" "O" "N" "G" "R" "A" "T" "U" "L" "A" "T" "I" "O" "N" "S" "Y" "O"
## [18] "U" "A" "R" "E" "A" "S" "U" "P" "E" "R" "N" "E" "R" "D"

Well it says Congratulations You are a super nerd but all characters are individually shown,

## C O N G R A T U L A T I O N S . Y O U . A R E . A . S U P E R N E R D