[0-9] matches any digit, the ‘+’ allows for additional digits, and the string ends with a dollar sign.
library("stringr")
## Warning: package 'stringr' was built under R version 3.1.3
a_ex = c("1$", "12$", "123$")
grep(pattern = "[0-9]+\\$", a_ex, value = TRUE)
## [1] "1$" "12$" "123$"
“\b” creates a boundary for a word, [a-z] allows for any lowercase letter, and {1,4} allows the string to be 1 to 4 characters.
b_ex = c("a", "word")
grep(pattern = "\\b[a-z]{1,4}\\b", b_ex, value = TRUE)
## [1] "a" "word"
“.” allows for any character; “*" allows for multiple (or zero) of those characters; the “?” makes the previous characters optional, and the “\.txt$” means the characters (making up a file name) will be followed by “.txt”.
I spent some time looking into this pattern because I was having trouble breaking it. I think the “$” operator at the end is taking precendence over all of quantifiers and allowing anything as long as there’s nothing after the final “.txt”. I’m also not sure of the purpose of the “*" considering the “.” is greedy and the “?” functions as a limiter.
c_ex = c("a.txt ab.txt", "aaa.txt", "abcd.txt", "1abc.txt", ".txt", " .txt", "asdf asdf.txt", "asdf.txt asdf.txt", "asdf.txt.asdf.txt", "b a.txt a.txt")
grep(pattern = ".*?\\.txt$", c_ex, value = TRUE)
## [1] "a.txt ab.txt" "aaa.txt" "abcd.txt"
## [4] "1abc.txt" ".txt" " .txt"
## [7] "asdf asdf.txt" "asdf.txt asdf.txt" "asdf.txt.asdf.txt"
## [10] "b a.txt a.txt"
Strings that can represent dates, with forward slashes between day (2 digits), month (2 digits), year (4 digits), conform to this pattern.
d_ex = "07/31/1982"
str_extract(d_ex, "\\d{2}/\\d{2}/\\d{4}")
## [1] "07/31/1982"
One string in the initial brackets, followed by another string, followed by brackets with a forward slash and the backreferenced string. Typically used in languages like HTML, to say the command is finished.
e_ex = c("<asdf>asdc</asdf>")
str_extract(e_ex, "<(.+?)>.+?</\\1>")
## [1] "<asdf>asdc</asdf>"
five_ex = "234$"
str_extract(five_ex, "[0-9]+\\$")
## [1] "234$"
str_extract(five_ex, "[[:digit:]]{1,}[$]")
## [1] "234$"
“[:digit:]{1,}[$]”
six_ex = "chunkylover52[at]aol[dot]com"
sixa_ans1 = str_replace(six_ex, pattern = "\\[at\\]", replacement = "\\@")
sixa_ans2 = str_replace(sixa_ans1, pattern = "\\[dot\\]", replacement = "\\.")
sixa_ans2
## [1] "chunkylover52@aol.com"
str_extract_all(six_ex, "[:digit:]{2}")
## [[1]]
## [1] "52"
str_extract_all(six_ex, "[[:digit:]]{2}")
## [[1]]
## [1] "52"
I believe the answer, according to the text, is that it should be [[:digit:]], or else R will only search for the characters in “digit”, however [:digit:] works fine for me in RStudio, as well as knitr.
Another potential answer is it fails because it will only extract one digit at a time, presuming that’s not what we want. To extract two digits, one option is to follow it with {2}.
bad = str_extract_all(six_ex, "\\D")
good = str_extract_all(six_ex, "\\d{2}")
good
## [[1]]
## [1] "52"
“\D” collects everything but digits, “\d” collects digits.