IS607: Week 4 Assignment

[0-9]+\$
Answer: Finds a number having digits 0-9 of any length and ends with $.

question.one <- "My number is 1236534512 but i can also be reached through 2124496571$";  
unlist(str_extract_all(question.one, "[0-9]+\\$"));

## [1] "2124496571$"

\b[a-z]{1,4}\b
Answer: Finds words made up of lower case alphabets (a to z) of length 1 to maximum lenth of 4.

unlist(str_extract_all(question.one, "\\b[a-z]{1,4}\\b"));

## [1] "is"   "but"  "i"    "can"  "also" "be"

.*?\.txt$ Answer: Finds any match which ends with “.txt”. As its greedy and has “.*“, it will identify the whole sentence if the sentence ends with .txt.

my.file <- "My name 1 is myPersonalDetails.txt";
unlist(str_extract_all(my.file, ".*?\\.txt$"));

## [1] "My name 1 is myPersonalDetails.txt"

\d{2}/\d{2}/\d{4} Answer: Finds any part of the sentence which has number in format - two numbers/two numbers/four numbers

random.number <- "This is a random abx333/44/333322";
unlist(str_extract_all(random.number, "\\d{2}/\\d{2}/\\d{4}"))

## [1] "33/44/3333"

random.tag <- "<html>Hellow from Arun</html>";
unlist(str_extract_all(random.tag, "<(.+?)>.+?</\\1>"));

## [1] "<html>Hellow from Arun</html>"

unlist(str_extract_all(question.one, "\\d*\\$"));

## [1] "2124496571$"

my.email <- "chunkylover53[at]aol[dot]com";

my.email <- str_replace(my.email, "\\[at\\]", "@");
my.email <- str_replace(my.email, "\\[dot\\]", ".");

The transformed email address : chunkylover53@aol.com
b. What’s wrong in using [[:digit:]] ?

unlist(str_extract_all(my.email, "[[:digit:]]"));

## [1] "5" "3"

unlist(str_extract_all(my.email, "[[:digit:]]+"));

## [1] "53"

Why expression \D fails to extract the digits. #####\D will match all non-digit occurences.

unlist(str_extract_all(my.email, "\\D+"));

## [1] "chunkylover" "@aol.com"

unlist(str_extract_all(my.email, "\\d+"));

## [1] "53"