Please create an R Markdown file that provides a solution for #4, #5 and #6 in Automated Data Collection in R, chapter 8. Publish the R Markdown file to rpubs.com, and include links to your R Markdown file (in GitHub) and your rpubs.com URL In your assignment solution.

image

Answer 4:

Example:

library(stringr)
four.a <- c("Sue is 2$nd place", "Fred is 3", "550$", "5th is Jen", "Nile is number$ 9")
unlist(str_extract(four.a, "[0-9]+\\$"))
## [1] "2$"   NA     "550$" NA     NA

Up

Example:

four.b <- "I think sunsets are very beautiful. The best place to see it is near a body of water."
str_extract(four.b, "\\b[a-z]{1,4}\\b")
## [1] "are"
unlist(str_extract_all(four.b, "\\b[a-z]{1,4}\\b"))
##  [1] "are"  "very" "best" "to"   "see"  "it"   "is"   "near" "a"    "body"
## [11] "of"

Up

Example:

four.c <- c("web: abd/nyc.org/assignment.txt", "nml/like.edu/4txt.cvs", "cdn/fyc.com/four.txt")
unlist(str_extract_all(four.c, ".*?\\.txt$")) 
## [1] "web: abd/nyc.org/assignment.txt" "cdn/fyc.com/four.txt"

Up

Example:

four.d <- "He was born on February 7th 1940. His father died on 07/18/1953. At the Age of 13/14 he had to drop out to school and start work. He had his first child on 01/06/1966. He boarded the ship on 12/24/1968."
unlist(str_extract_all(four.d, "\\d{2}/\\d{2}/\\d{4}"))
## [1] "07/18/1953" "01/06/1966" "12/24/1968"

Up

<(.+?)>.+?</\\1>

Example:

four.e <- c("This is <4ever21> an  example.  Not a long example </4ever21>. Very small example.")
unlist(str_extract_all(four.e, "<(.+?)>.+?</\\1>"))
## [1] "<4ever21> an  example.  Not a long example </4ever21>"

Up

Answer 5:

[0-9]+\\$ is a regular expression that I have rewritten as [[:digit:]]{1,}[$] . I use the expression [[:digit:]]{1,}[$] by enclosing the [:digit:] class in a [] to indicate that we are looking for digits, same as saying [0-9]. I also added the {1,} to match the class one or more times, same as using a +. Then I used [$] to state that I am looking of $ in the vector, same as saying \\$.

Let’s look at the same vector at answer four a:

four.a <- c("Sue is 2$nd place", "Fred is 1$2", "550$", "5th is Jen", "Nile is number$ 9")
unlist(str_extract(four.a, "[0-9]+\\$"))
## [1] "2$"   "1$"   "550$" NA     NA
unlist(str_extract(four.a, "[[:digit:]]{1,}[$]")) 
## [1] "2$"   "1$"   "550$" NA     NA

Up

Answer 6: Consider the mail address chunkylover53[at]aol[dot]com.

six.a <- "Consider the mail address chunkylover53[at]aol[dot]com."
six.a <- str_replace_all(six.a, pattern = "\\[at]", replacement = "@")
six.a <- str_replace_all(six.a, pattern = "\\[dot]", replacement = ".")
six.a
## [1] "Consider the mail address chunkylover53@aol.com."

Up

unlist(str_extract_all(six.a, "[:digit:]"))
## [1] "5" "3"

Then I use the expression [[:digit:]]{1,} which gives us the correct number “53”. I enclose the [:digit:] in a [] to indicate that we are looking at the predefined class [:digit:]. I also added the {1,} to match the class one or more times.

unlist(str_extract_all(six.a, "[[:digit:]]{1,}"))
## [1] "53"

Up

unlist(str_extract_all(six.a, "\\D"))
##  [1] "C" "o" "n" "s" "i" "d" "e" "r" " " "t" "h" "e" " " "m" "a" "i" "l"
## [18] " " "a" "d" "d" "r" "e" "s" "s" " " "c" "h" "u" "n" "k" "y" "l" "o"
## [35] "v" "e" "r" "@" "a" "o" "l" "." "c" "o" "m" "."

The reason \\D does not give us any digit is because the expression is the same as [^[:digit:]], which means any characters except digits. The correct way to write is with a lower case d not an upper case D, like: \\d .

I use the expression \\d{1,} which gives us the number “53”. I use \\d to get all the digits. I also added the {1,} to match previous case one or more times.

unlist(str_extract_all(six.a, "\\d{1,}"))
## [1] "53"

Up