Data 607 - Week 3 Assignment

Assignment

Please deliver links to an R Markdown file (in GitHub and rpubs.com) with solutions to problems 3 and 4 from chapter 8 of Automated Data Collection in R. Problem 9 is extra credit. You may work in a small group, but please submit separately with names of all group participants in your submission.

Here is the referenced code for the introductory example in #3:

raw.data <-"555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555-6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"

Due end of day Sunday September 18th.

Question 3

Copy the introductory example. The vector character_names stores the extracted names.

character_names <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
character_names

## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

Use the tools of this chapter to rearrange the vector so that all elements conform to the standard first_name last_name.

character_names2 <- str_replace_all(str_replace_all(character_names, "(.+)(, .+)$", "\\2 \\1"), ", ", "")
character_names2

## [1] "Moe Szyslak"          "C. Montgomery Burns"  "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Homer Simpson"        "Dr. Julius Hibbert"

Construct a logical vector indicating whether a character has a title (i.e., Rev. and Dr.).

title <- str_detect(character_names, "Rev.|Dr.")
title

## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE

Construct a logical vector indicator whether a character has a second name.

second_name <- str_detect(character_names," [A-Z]\\.")
second_name

## [1] FALSE  TRUE FALSE FALSE FALSE FALSE

Question 4

Describe the types of strings that conform to the following regular expressions and construct an example that is matched by the regular expression.

[0-9]+\$

Matches vectors that contain any number of continuous digits (i.e. 0 - 9) with a dollar sign ($) at the end

test_1 <- c("hello everyone", "counting digits 1234$", "reversed prices 500$")
unlist(str_extract_all(test_1, "[0-9]+\\$"))

## [1] "1234$" "500$"

\b[a-z]{1,4}\b

Matches vectors containing 1 to 4 continuous lowercase letters (i.e. a - z) at beginning and end of words

test_2 <- c("Location", "Test", "test", "Cars", "cars", "car", "kids", "last", "a", "is")
unlist(str_extract_all(test_2, "\\b[a-z]{1,4}\\b"))

## [1] "test" "cars" "car"  "kids" "last" "a"    "is"

.*?\.txt$

Matches any vector that ends in .txt

test_3 <- c(".txt", "a lot of text before.txt", "test.txt", "nothing", "no matches here txt")
unlist(str_extract_all(test_3, ".*?\\.txt$"))

## [1] ".txt"                     "a lot of text before.txt"
## [3] "test.txt"

\d{2}/\d{2}/\d{4}

Matches strings with 2 digits a forward slash then 2 more digits a forward slash then 4 digits

test_4 <- c("12/35/2005", "12.23.2016", "11/24/04", "11/24/2004")
unlist(str_extract_all(test_4, "\\d{2}/\\d{2}/\\d{4}"))

## [1] "12/35/2005" "11/24/2004"

<(.+?)>.+?</\1>

Matches a string of length 1 or more with HTML tags around it (the same tag must be at the start and the end)

test_5 <- c("<tag> </tag>", "<tag>maybe anything here</tag>", "<HTML>mismatched tags</HTml>")
unlist(str_extract_all(test_5, "<(.+?)>.+?</\\1>"))

## [1] "<tag> </tag>"                   "<tag>maybe anything here</tag>"

Question 9 (Extra Credit)

The following code hides a secret message. Crack it with R and regular expressions. Hint: Some of the characters are more revealing than others!

secret <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"
secret

## [1] "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"

My first solution made something that was more readable than the above but still not a sentence.

unlist(str_extract_all(secret, "[[:upper:].!]"))

##  [1] "C" "O" "N" "G" "R" "A" "T" "U" "L" "A" "T" "I" "O" "N" "S" "." "Y"
## [18] "O" "U" "." "A" "R" "E" "." "A" "." "S" "U" "P" "E" "R" "N" "E" "R"
## [35] "D" "!"

I needed to replace the periods with spaces and merge the items in the vector into one. The replace feature is built into stringr and after some searching I found the paste function to collapse things. The paste function taken from here.

paste(str_replace(unlist(str_extract_all(secret, "[[:upper:].!]")), "[.]", " "), collapse = "")

## [1] "CONGRATULATIONS YOU ARE A SUPERNERD!"

Data 607 - Week 3 Assignment

Brandon OHara

9/18/2016

Assignment

Question 3

Question 4

Question 9 (Extra Credit)