Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
college_majors = read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv',
show_col_types = FALSE)
college_majors %>% filter(str_detect(Major,"STATISTICS") | str_detect(Major,"DATA"))
## # A tibble: 3 × 3
## FOD1P Major Major_Category
## <chr> <chr> <chr>
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
Write code that transforms the data below:
[1] “bell pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
## [1] "[1] \"bell pepper\" \"bilberry\" \"blackberry\" \"blood orange\"\n[5] \"blueberry\" \"cantaloupe\" \"chili pepper\" \"cloudberry\" \n[9] \"elderberry\" \"lime\" \"lychee\" \"mulberry\" \n[13] \"olive\" \"salal berry\""
## [1] "c(\"bell pepper\", \"bilberry\", \"blackberry\", \"blood orange\", \"blueberry\", \"cantaloupe\", \"chili pepper\", \"cloudberry\", \"elderberry\", \"lime\", \"lychee\", \"mulberry\", \"olive\", \"salal berry\")"
Describe, in words, what these expressions will match:
This matches 3 consecutive matches of a character e.g. in string “422-3777”, it would match 777
This matches a character followed by a second character repeated twice followed by the first character again. e.g. in string “gamma”, it would match amma
This matches 2 characters repeated twice e.g. in string “cucumber”, it would match cucu
This matches a character followed by any other character, followed by the first character, followed by any other character, followed by the first character e.g. in string “707372456”, it would match 70737
This matches any 3 characters, followed by 0 or more additional characters, followed by the 3rd character, then the 2nd character, and lastly the 1st character. e.g. in string “4563abc45643cba1231, it would match abc45643cba
Construct regular expressions to match words that:
**^(.).*\1$**
**(..).*\1**
(.).\1.\1