coll_dt <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv")
data_stat <- coll_dt %>%
filter(grepl("DATA|STATISTICS", Major))
data_stat
## FOD1P Major Major_Category
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
text <- '[1] "bell pepper" "bilberry" "blackberry" "blood orange"
[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
[9] "elderberry" "lime" "lychee" "mulberry"
[13] "olive" "salal berry"'
new_text <- str_remove_all(unlist(str_extract_all(text,"\"[a-z]+.[a-z]+.\"")),"\"")
new_text
## [1] "bell pepper" "bilberry" "blackberry" "blood orange" "blueberry"
## [6] "cantaloupe" "chili pepper" "cloudberry" "elderberry" "lime"
## [11] "lychee" "mulberry" "olive" "salal berry"
“\\1” matches the exact same text that was matched by the first capture group(i.e the first bracketed expression). The expression is correctly written as “(.)\\1\\1”. It repeats the character matched by (.) by two times
“\\2” is a reference to the second capture group(i.e the second bracketed expression). While “\\1” matches the exact same text that was matched by the first capture group(i.e the first bracketed expression). “(.)(.)\\2\\1” returns the first match and second match by (.) and (.) respectively and repeats the second match again before the first match repetition.
“(..)\1” returns syntax error. But “(..)\\1” matches the first two characters of the expression and repeats them once.
(.) matches the first character. Then, “(.).\\1.\\1” returns the first match by (.) followed by another character matched by the . and a repetition of the first match by (.) and another character matched by the second . and a repetition of the first match by (.)
(.)(.)(.) are three distinct capture groups namely group 1, group 2 and group 3 respectively. "(.)(.)(.).\\3\\2\\1" displays the first three characters matched by the distinct capture groups followed by a character matched by the . and the characters matched by the (). Then a repetition of the third capture group, second capture group and lastly the first capture group.’
fruit<- c('coconut','cucumber','jujube','papaya','salal berry','eleven')
view(fruit, "(.).\\1")
str_view(fruit, "(..)\\1")
str_view(fruit, "(.).\\1.\\1")
str_view(fruit, "([A-Za-z]).\\1.\\1")