Week3Assignment

Problem 3

name <- unlist(str_extract_all(raw.data, "[[:alpha:]., ]{2,}"))
name

## [1] "Moe Szyslak"          "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders"         "Simpson, Homer"       "Dr. Julius Hibbert"

#Looks for series of letters than end with a space OR 
#a series of letters that begin with a space at the end of the string.
#It also trims the white spaces
firstname <- str_trim(str_extract(name, "\\w+ | \\w+$")) 
firstname

## [1] "Moe"        "Montgomery" "Timothy"    "Ned"        "Homer"     
## [6] "Julius"

#This expression first looks for a series of letters that end with a comma OR
#a series of letters at the end of the string.
#It then extracts only the letters, trimming the commas and whitespaces
lastname <-str_extract(name, "\\w+,| \\w+$")
lastname <- str_extract(lastname, "\\w+")
lastname

## [1] "Szyslak"  "Burns"    "Lovejoy"  "Flanders" "Simpson"  "Hibbert"

#joins the first name and last name together separated by a space. 
first_last <- str_c(firstname, " ", lastname)
first_last

## [1] "Moe Szyslak"      "Montgomery Burns" "Timothy Lovejoy" 
## [4] "Ned Flanders"     "Homer Simpson"    "Julius Hibbert"

This matches any string that has “Rev.” or “Dr.” in the beginning and returns TRUE if it exists.

title <- str_detect(name, "^Rev\\.|^Dr\\.")
title

## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE

This might be oversimplying things but I consider someone to have a last name if there is a space between their names. The data also shows that there is a “Rev.” and “Dr.” so there would be two spaces. One between the title and another between the names.

This script counts the number of spaces in each string and detects if there is at least one. If there is, it returns TRUE meaning there is a last name.

str_detect(str_count(name, " "), "\\d{1}")

## [1] TRUE TRUE TRUE TRUE TRUE TRUE

Problem 4

This expression looks for a string of digits of any size greater than one that ends with a $\$$.

testdata <- c("999", "$123", "456$")
str_extract(testdata, "[0-9]+\\$")

## [1] NA     NA     "456$"

This expression looks between the first characater and last character of a string. In between the first and last characters can only exist between 1 and 4 lower-case letters.

testdata <- c("abcde", "12abc12", "ABC", "abc")
str_extract(testdata, "\\b[a-z]{1,4}\\b")

## [1] NA    NA    NA    "abc"

This expression looks for a string that has “.txt” at the very end. There can be as many alphanumeric characters before the “.” or none at all.

testdata <- c("txt.123", "data.txt", ".txt")
str_extract(testdata, ".*?\\.txt$")

## [1] NA         "data.txt" ".txt"

This expression looks for two digits, a forward slash, two digits, a forward slash, and the 4 digits. Example would be the date format of month/day/year

testdata <- c("111", "1/2/3", "10/15/1985")
str_extract(testdata, "\\d{2}/\\d{2}/\\d{4}")

## [1] NA           NA           "10/15/1985"

This expression finds a repeated pattern. There are two sets of angle brackets. The first set has a parenthesis. Inside the parathesis can be as many alphanumeric characters as long as there is at least one. This will be the pattern that is repeated. Between the sets of angle brackets can be as many alphanumeric characters as long as there is at least one. Inside the last set of angle brackets has to be a / followed by whatever the pattern inside the parenthesis was. Example is HTML tags.

testdata <- c("<a>This is a paragraph</a>")
str_extract(testdata, "<(.+?)>.+?</\\1>")

## [1] "<a>This is a paragraph</a>"

Problem 9

problemdata <- "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"

message <- str_extract_all(problemdata, "([[:upper:]]|[[:punct:]])", simplify = T)
str_c(message, collapse = "")

## [1] "CONGRATULATIONS.YOU.ARE.A.SUPERNERD!"

Week3Assignment

Chad Smith

September 17, 2017

Problem 3

Problem 4

Problem 9