Using the 173 majors listed in fivethirtyeight.com’s [College Majors dataset] (https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/), provide code that identifies the majors that contain either “DATA” or “STATISTICS”
URL <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"
majors <- read.csv(URL)
majors %>% filter(str_detect(Major, "(DATA)|(STATISTICS)"))
## FOD1P Major Major_Category
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
Write code that transforms the data below:
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
fruits <- '[1] "bell pepper" "bilberry" "blackberry" "blood orange"
[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
[9] "elderberry" "lime" "lychee" "mulberry"
[13] "olive" "salal berry"'
fruits <- str_extract_all( fruits, '\"[a-z]*\\s*[a-z]*\\"')
fruits <- unlist(fruits)
fruits
## [1] "\"bell pepper\"" "\"bilberry\"" "\"blackberry\"" "\"blood orange\""
## [5] "\"blueberry\"" "\"cantaloupe\"" "\"chili pepper\"" "\"cloudberry\""
## [9] "\"elderberry\"" "\"lime\"" "\"lychee\"" "\"mulberry\""
## [13] "\"olive\"" "\"salal berry\""
writeLines(fruits)
## "bell pepper"
## "bilberry"
## "blackberry"
## "blood orange"
## "blueberry"
## "cantaloupe"
## "chili pepper"
## "cloudberry"
## "elderberry"
## "lime"
## "lychee"
## "mulberry"
## "olive"
## "salal berry"
The two exercises below are taken from R for Data Science, 14.3.5.1 in the on-line version:
Describe, in words, what these expressions will match:
abba. There are two characters followed by the second character and then the first characterabababacaabcdefcba or abccbateststring <- c("aaah", "noon", "mama", "rarer", "racecar")
# 3.a
str_view(teststring, "(.)\\1\\1")
# 3.b
str_view(teststring, "(.)(.)\\2\\1")
# 3.c
str_view(teststring, "(..)\\1")
# 3.d
str_view(teststring, "(.).\\1.\\1")
# 3.e
str_view(teststring, "(.)(.)(.).*\\3\\2\\1")
Construct regular expressions to match words that:
"^(.).*\\1$""(..).*\\1""([a-z][a-z]).*\\1" to specify characters are letters"(.).*\\1.*\\1""([a-z]).*\\1.*\\1" to specify characters are lettersteststring2 <- c("Mississippi", "noon", "mama", "rarer", "racecar")
# 4.a
str_view(teststring2, "^(.).*\\1$")
# 4.b
str_view(teststring2, "(..).*\\1")
# 4.c
str_view(teststring2, "(.).*\\1.*\\1")