library(httr)
url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"
majors <- read.csv(paste0(url), header = TRUE)
grep(pattern = 'STATISTICS|DATA', majors$Major, value = TRUE, ignore.case = TRUE)
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"
## [3] "STATISTICS AND DECISION SCIENCE"
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
library(stringr)
startstr <- '[1] "bell pepper" "bilberry" "blackberry" "blood orange"
[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
[9] "elderberry" "lime" "lychee" "mulberry"
[13] "olive" "salal berry"'
dbl_quote = '"'
# Use function gregexpr to extract the substring = pattern
double_quotes_positions <- gregexpr(pattern = dbl_quote, text = startstr)
# check double_quotes_positions[[1]][1]
double_quotes_positions[[1]][1]
## [1] 5
# print head of double_quotes positions
head(double_quotes_positions[[1]])
## [1] 5 17 20 29 35 46
# store all the double_quotes positions into the vector dq_pos
dq_pos <- vector()
i <- 1
while (!is.na(double_quotes_positions[[1]][i])){
dq_pos[i] <- double_quotes_positions[[1]][i]
i <- i+1
}
no_of_words <- length(dq_pos)/2
desired_output <- vector (length=no_of_words)
#print(desired_output)
# checking the length of desired_output
length(desired_output)
## [1] 14
for (i in 1:no_of_words) {
desired_output[i] <- substring(startstr,double_quotes_positions[[1]][2*i-1]+1,double_quotes_positions[[1]][2*i]-1)
i <- i+1
}
# set the optional character to be \", \"
# Use writeLines to complete the deal
end_result <- paste0("c(\"", paste0(desired_output, collapse = "\", \""), "\")")
writeLines(end_result)
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
The two exercises below are taken from R for Data Science, 14.3.5.1 in the on-line version:
(.)\1\1- This regular expression matches a string containing the same three consecutive characters. It would match, for example, “abbbc” but not “abc”."(.)(.)\\2\\1" - This is a string representing a regular expression that matches a pair of any characters followed by the reverse order of the same pair. So, “abbbba” and “abba” are both viable matches.(..)\1- This regular expression matches any two characters followed by the same sequence of two characters."(.).\\1.\\1" - five characters where the first, third and fifth are the same and the second and fourth can be anything. Possible matches could be “abaaa”, “dedad”."(.)(.)(.).*\\3\\2\\1" - 6 or more carachters where the first three charachters are the same as the last three in reverse order. It could be of length 7 or 6. If it’s of length 7, the fourth character"^(.).*\\1$"".*(.?...?).*\\1""(.?[A-Za-z].?){3,}"