library(tidyverse)
Utilizing the fivethirtyeight College Majors dataset from the following article [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/],how many times does the word Data or Statistics appear?
college_majors = read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv')
sum(str_detect(college_majors$Major, "DATA"))
## [1] 1
sum(str_detect(college_majors$Major, "STATISTICS"))
## [1] 2
Now we know that one major contains the word “DATA” and two majors contain the word “STATISTICS”. Next, we can find out which majors these are.
college_majors %>% select("Major") %>%
filter(stringr::str_detect(Major, 'DATA|STATISTICS') )
## Major
## 1 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS
## 2 COMPUTER PROGRAMMING AND DATA PROCESSING
## 3 STATISTICS AND DECISION SCIENCE
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
fruits_original <- '[1] "bell pepper" "bilberry" "blackberry" "blood orange"
[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
[9] "elderberry" "lime" "lychee" "mulberry"
[13] "olive" "salal berry"'
(fruits_original)
## [1] "[1] \"bell pepper\" \"bilberry\" \"blackberry\" \"blood orange\"\n\n[5] \"blueberry\" \"cantaloupe\" \"chili pepper\" \"cloudberry\" \n\n[9] \"elderberry\" \"lime\" \"lychee\" \"mulberry\" \n\n[13] \"olive\" \"salal berry\""
fruits_new <- str_extract_all(fruits_original,pattern = '[A-Za-z]+.?[A-Za-z]+')
(fruits_new)
## [[1]]
## [1] "bell pepper" "bilberry" "blackberry" "blood orange" "blueberry"
## [6] "cantaloupe" "chili pepper" "cloudberry" "elderberry" "lime"
## [11] "lychee" "mulberry" "olive" "salal berry"
fruits_final <- (fruits_new) %>%
str_c(collapse = ", ")
(fruits_final)
## [1] "c(\"bell pepper\", \"bilberry\", \"blackberry\", \"blood orange\", \"blueberry\", \"cantaloupe\", \"chili pepper\", \"cloudberry\", \"elderberry\", \"lime\", \"lychee\", \"mulberry\", \"olive\", \"salal berry\")"
writeLines(fruits_final)
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
No. 1
(.)\1\1
I was not sure if we were suppose to catch something different with the quotation marks missing from No. 1 and 3, but I put this into an R readable format and proceeded, in which case it will match a character repeated three times consecutively in a string.
test1 <- c("Abbb","1234","aaah", "2333")
str_view(test1,"(.)\\1\\1")
No. 2
“(.)(.)\2\1/”
This will match a two consecutive characters followed by them repeating in reverse order within a string.
test2 <- c("noon","boop","follow", "2332")
str_view(test2,"(.)(.)\\2\\1")
No. 3
(..)\1
This again was formated for R and in that case it will match a two character set repetition in a string.
test3 <- c("4242","bidi","haha", "onno")
str_view(test3,"(..)\\1")
No. 4
“(.).\1.\1”
This will match a five character set where the first character is repeated as the third and fifth character within the string.
test4 <- c("gooog","pepep","12121", "onnno")
str_view(test4,"(.).\\1.\\1")
No. 5
"(.)(.)(.).*\3\2\1"
This will match a three consecutive characters repeated in reverse order later within a string, and with any amount of characters between the sets of characters.
test5 <- c("abceeeecba","jajaj","haha", "357hello753")
str_view(test5,"(.)(.)(.).*\\3\\2\\1")
No. 1
Start and end with the same character
sample1 <- c("fluff","blob","haha", "tart")
str_view(sample1,"^([A-Za-z]).*\\1$")
No. 2
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
sample2 <- c("church","blob","haha", "tart")
str_view(sample2,"(..).*\\1")
No. 3
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
sample3 <- c("eleven","elderberry","haha", "tart")
str_view(sample3,"(.).+\\1.+\\1")