Basic String Handling
This is the Markdown file for the my third homework assignment. The problem statement is as follows:
Before we begin, let’s get this library stringr
## Warning: package 'stringr' was built under R version 3.6.3
Problem 3: Copy the introductory example.
- Use the tools…..
Yank given string into program variable raw.data:
Remove some non-alpha characters e.g. numbers, brackets, hyphens:
## [1] "Moe Szyslak" "Burns, C. Montgomery" "Rev. Timothy Lovejoy"
## [4] "Ned Flanders" "Simpson, Homer" "Dr. Julius Hibbert"
Remove “C.”, “Rev. ”, “Dr. ”, and “,”. Piped the processes with “>%”. It’s like the Unix/Linux pipe “|”. It’s not mentioned in the chapter, but I looked up.
name1 <- str_replace(name, pattern = "C. ", "") %>% str_replace(pattern = "Rev. ", "") %>% str_replace(pattern = "Dr. ", "") %>% str_replace(pattern = "[,]", "")
name1## [1] "Moe Szyslak" "Burns Montgomery" "Timothy Lovejoy" "Ned Flanders"
## [5] "Simpson Homer" "Julius Hibbert"
Split first_name and last_name.
Creating heading.
Print names.df, with proper heading and formatting.
## first_name last_name
## 1 Moe Szyslak
## 2 Burns Montgomery
## 3 Timothy Lovejoy
## 4 Ned Flanders
## 5 Simpson Homer
## 6 Julius Hibbert
- Construct a logical vector ….. has a title (i.e., Rev. and Dr.)
Logical vector to find Dr. or Rev.
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
- Construct a logical vector ….. has a second name.
Logical vector to find middle name C., as in one case. This is done differently from the previous one.
## [1] FALSE TRUE FALSE FALSE FALSE FALSE
Problem 4: Describe the types of strings…..
- [0-9]+\$ : Anywhere in the string, there has to be numeric character, immediately followed by the \(-character, e.g. "CCCC4\)BBB".
## [1] TRUE FALSE FALSE
- \b[a-z]{1,4}\b : Anywhere in the string, there has to be a lowercase word (i.e. a preceded and followed by white space), which is at least one character long but at most 4 characters.
str <- c("A 22g uShy Try KKKK", "LLLL kkkl LLL MMMM", "CCCC4BBB$ hhhP")
str_detect(str, "\\b[a-z]{1,4}\\b")## [1] FALSE TRUE FALSE
- .*?\.txt$ : The string has to end with .txt.
str <- c("A 22g uShy Try KKKK.txt", "LLLL kkkl.txt LLL MMMM", "CCCC4BBB$ hhhP")
str_detect(str, ".*?\\.txt$")## [1] TRUE FALSE FALSE
- \d{2}/\d{2}/\d{4} : The string must contain nn/nn/nnnn, where n stands for numeric.
str <- c("A 02/12/9988 KKKK.txt", "LLLL 88/33/9878 SDDD.txt LLL MMMM", " 77-99-676")
str_detect(str, "\\d{2}/\\d{2}/\\d{4}")## [1] TRUE TRUE FALSE
- <(.+?)>.+?</\1> : The following example will exemplify better than a wordy description.
str <- c("LLLL <hkLd1> this is string</hkLd1> lljlj", "<p>Wall Street</pt>", "<man>j</man")
str_detect(str, "<(.+?)>.+?</\\1>")## [1] TRUE FALSE FALSE
Problem 9: Secret message.
First yank the string, containing secret message, into program variable
password_string <- c("clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo
Uwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO
d6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5
fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr")
password_string## [1] "clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0Tanwo\nUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigO\nd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5\nfy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr"
It’s visible that there are some capitalized characters, and mostly small characters. Let’s try to extract those capitalized characters.
## [1] "C" "O" "N" "G" "R" "A" "T" "U" "L" "AT" "I" "O" "N" "S" "."
## [16] "Y" "O" "U" "." "A" "R" "E" "." "A" ".S" "U" "P" "E" "R" "N"
## [31] "E" "R" "D"
The above process produces a bunch of capital letters, with occassional dots. We’ll build words out of those capital letters.
first_word <- str_c(message[1], message[2], message[3], message[4], message[5], message[6], message[7], message[8], message[9], message[10], message[11], message[12], message[13], message[14])
second_word <- str_c(message[16], message[17], message[18])
third_word <- str_c(message[20], message[21], message[22])
fourth_word <- str_c(message[24])
fifth_word <- str_c(str_replace(message[25], pattern = ".", ""), message[26], message[27], message[28], message[29], message[30], message[31], message[32], message[33])Created 5 words. Now, I’ll create a complete sentence or phrase out of the five words.
## [1] "CONGRATULATIONS YOU ARE A SUPERNERD"
Marker: 607-03