The fivethirtyeight.com table College Majors contains 173 college majors (https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv). To identify the majors containing the strings “DATA” or “STATISTICS,” I first loaded the .csv from Github:
# Load csv file from github and verify
majors<-read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv")
## Rows: 174 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): FOD1P, Major, Major_Category
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
To find majors that contain the strings “DATA” or “STATISTICS” in any position, I used the stringr function str_detect to filter the dataframe:
# Use str_detect and filter function with OR operator
majors %>%
filter(str_detect(majors$Major,"DATA") =='TRUE'|str_detect(majors$Major,"STATISTICS")=='TRUE')
## # A tibble: 3 × 3
## FOD1P Major Major_Category
## <chr> <chr> <chr>
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
Another way to accomplish this would be to create flag fields in the dataframe to flag majors containing the words DATA or STATISTICS and then print the records where either of the flags = TRUE: adding flags to the dataframe (table, view) can be useful for frequently-used searches, particularly those requiring complex logic:
# Add flags for Data and Stats majors
majors<-mutate(majors, data_flag=str_detect(majors$Major,"DATA"))
majors<-mutate(majors, stat_flag=str_detect(majors$Major,"STATISTICS"))
# Print records where either is TRUE
majors %>%
filter(data_flag=='TRUE'|stat_flag=='TRUE')
## # A tibble: 3 × 5
## FOD1P Major Major_Category data_flag stat_flag
## <chr> <chr> <chr> <lgl> <lgl>
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND S… Business FALSE TRUE
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCES… Computers & M… TRUE FALSE
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & M… FALSE TRUE
First, I copied the text string provided in the question to create a vector v. Then I converted that vector to a string using the paste0 function, adding characters “c(” and “)” at either end; i.e., I recreated the combine function used to create the vector.
# Create vector v and print
v<-c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
v
## [1] "bell pepper" "bilberry" "blackberry" "blood orange" "blueberry"
## [6] "cantaloupe" "chili pepper" "cloudberry" "elderberry" "lime"
## [11] "lychee" "mulberry" "olive" "salal berry"
# Convert vector back to temporary string ("a") with commas and quotes
a<-paste0(v,sep="",collapse='","')
cat(a)
## bell pepper","bilberry","blackberry","blood orange","blueberry","cantaloupe","chili pepper","cloudberry","elderberry","lime","lychee","mulberry","olive","salal berry
# Use paste0 to add necessary beginning and ending characters to the final string "v_str"
v_str<-paste0('c("',a,'")')
cat(v_str)
## c("bell pepper","bilberry","blackberry","blood orange","blueberry","cantaloupe","chili pepper","cloudberry","elderberry","lime","lychee","mulberry","olive","salal berry")
The given expressions will match strings as follows:
(.)\1\1 - Needs double quotes and the backslashes need to be escaped (double backslashes), but with those corrections (see below), this expression matches to three of the same character in a row
“(.)(.)\2\1” - Two characters followed immediately by the same two characters in reverse order
(..)\1 - Needs double quotes and the backslash needs to be escaped (double backslash), but with this done (see below), it will match two characters followed immediately by the same two characters in the same order
“(.).\1.\1” - One character followed by another, followed by the first character again, followed by another character, followed by the first character again (i.e. a string of five characters, with one character repeating in places 1, 3, and 5)
“(.)(.)(.).*\3\2\1” - Three characters, possibly followed by one or more other characters, then the same three characters in reverse order
test<-c("ed","bed","hundred","anna","banana","aaabbaaa")
str_view(test,"(.)\\1\\1")
## [6] │ <aaa>bb<aaa>
str_view(test,"(.)(.)\\2\\1")
## [4] │ <anna>
## [6] │ aa<abba>aa
str_view(test,"(..)\\1")
## [5] │ b<anan>a
str_view(test,"(.).\\1.\\1")
## [5] │ b<anana>
str_view(test,"(.)(.)(.).*\\3\\2\\1")
## [6] │ <aaabbaaa>
Construct regular expressions to match words that:
Start and end with the same character: str_view(words,“(^.).*\1$“)
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.): str_view(words,“(..).*\1”)
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.): str_view(words,“.(.).\1.*\1”)
# Start and end with the same character
str_view(words,"(^.).*\\1$")
## [36] │ <america>
## [49] │ <area>
## [209] │ <dad>
## [213] │ <dead>
## [223] │ <depend>
## [258] │ <educate>
## [266] │ <else>
## [268] │ <encourage>
## [270] │ <engine>
## [278] │ <europe>
## [283] │ <evidence>
## [285] │ <example>
## [287] │ <excuse>
## [288] │ <exercise>
## [291] │ <expense>
## [292] │ <experience>
## [296] │ <eye>
## [386] │ <health>
## [394] │ <high>
## [450] │ <knock>
## ... and 16 more
# Contain a repeated pair of letters
str_view(words,"(..).*\\1")
## [48] │ ap<propr>iate
## [152] │ <church>
## [181] │ c<ondition>
## [217] │ <decide>
## [275] │ <environmen>t
## [487] │ l<ondon>
## [598] │ pa<ragra>ph
## [603] │ p<articular>
## [617] │ <photograph>
## [638] │ p<repare>
## [641] │ p<ressure>
## [696] │ r<emem>ber
## [698] │ <repre>sent
## [699] │ <require>
## [739] │ <sense>
## [858] │ the<refore>
## [903] │ u<nderstand>
## [946] │ w<hethe>r
# Contain one letter repeated in at least three places
str_view(words,".*(.).*\\1.*\\1")
## [48] │ <approp>riate
## [62] │ <availa>ble
## [86] │ <believe>
## [90] │ <betwee>n
## [119] │ <business>
## [221] │ <degree>
## [229] │ <difference>
## [233] │ <discuss>
## [265] │ <eleve>n
## [275] │ <environmen>t
## [283] │ <evidence>
## [288] │ <exercise>
## [291] │ <expense>
## [292] │ <experience>
## [423] │ <indivi>dual
## [598] │ <paragra>ph
## [684] │ <receive>
## [696] │ <remembe>r
## [698] │ <represe>nt
## [845] │ <telephone>
## ... and 2 more