4. Describe the types of strings that conform to the following regular expressions and construct an example that is matched that is matched by the regular expression.

  1. [0-9]+\$
    Answer: Finds a number having digits 0-9 of any length and ends with $.
question.one <- "My number is 1236534512 but i can also be reached through 2124496571$";  
unlist(str_extract_all(question.one, "[0-9]+\\$"));
## [1] "2124496571$"
  1. \b[a-z]{1,4}\b
    Answer: Finds words made up of lower case alphabets (a to z) of length 1 to maximum lenth of 4.
unlist(str_extract_all(question.one, "\\b[a-z]{1,4}\\b"));
## [1] "is"   "but"  "i"    "can"  "also" "be"
  1. .*?\.txt$ Answer: Finds any match which ends with “.txt”. As its greedy and has “.*“, it will identify the whole sentence if the sentence ends with .txt.
my.file <- "My name 1 is myPersonalDetails.txt";
unlist(str_extract_all(my.file, ".*?\\.txt$"));
## [1] "My name 1 is myPersonalDetails.txt"
  1. \d{2}/\d{2}/\d{4} Answer: Finds any part of the sentence which has number in format - two numbers/two numbers/four numbers
random.number <- "This is a random abx333/44/333322";
unlist(str_extract_all(random.number, "\\d{2}/\\d{2}/\\d{4}"))
## [1] "33/44/3333"
  1. <(.+?)>.+?</\1> Answer: Find tags wich is a format “content
random.tag <- "<html>Hellow from Arun</html>";
unlist(str_extract_all(random.tag, "<(.+?)>.+?</\\1>"));
## [1] "<html>Hellow from Arun</html>"

5. Rewriting the same expression [0-9]+\$

unlist(str_extract_all(question.one, "\\d*\\$"));
## [1] "2124496571$"

6.

my.email <- "chunkylover53[at]aol[dot]com";
  1. Transforming the string to standard mail format.
my.email <- str_replace(my.email, "\\[at\\]", "@");
my.email <- str_replace(my.email, "\\[dot\\]", ".");

The transformed email address : chunkylover53@aol.com
b. What’s wrong in using [[:digit:]] ?

unlist(str_extract_all(my.email, "[[:digit:]]"));
## [1] "5" "3"
The expression [[:digit:]] will only identify the first occurence.
To get the complete number, we need a greedy expression.
unlist(str_extract_all(my.email, "[[:digit:]]+"));
## [1] "53"
  1. Why expression \D fails to extract the digits. #####\D will match all non-digit occurences.
unlist(str_extract_all(my.email, "\\D+"));
## [1] "chunkylover" "@aol.com"
The correct expression is using \d
unlist(str_extract_all(my.email, "\\d+"));
## [1] "53"