1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset, provide code that identifies the majors that contain either “DATA” or “STATISTICS”

library(magrittr)
library(stringr)
url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"

data <- data.frame(read.csv(url))

data$Major[which(str_detect(data$Major, "DATA"))]
## [1] COMPUTER PROGRAMMING AND DATA PROCESSING
## 174 Levels: ACCOUNTING ACTUARIAL SCIENCE ... ZOOLOGY
data$Major[which(str_detect(data$Major, "STATISTICS"))]
## [1] MANAGEMENT INFORMATION SYSTEMS AND STATISTICS
## [2] STATISTICS AND DECISION SCIENCE              
## 174 Levels: ACCOUNTING ACTUARIAL SCIENCE ... ZOOLOGY

2. Write code that transforms the data below:

[1] “bell pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”

Into a format like this:

c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

x <- '[1] "bell pepper"  "bilberry"     "blackberry"   "blood orange"  
[5] "blueberry"    "cantaloupe"   "chili pepper" "cloudberry"  
[9] "elderberry"   "lime"         "lychee"       "mulberry"    
[13] "olive"        "salal berry"'

y <- str_remove_all(unlist(str_extract_all(x, '"[a-z]*\\s*[a-z]*"')), '\"')
y
##  [1] "bell pepper"  "bilberry"     "blackberry"   "blood orange" "blueberry"   
##  [6] "cantaloupe"   "chili pepper" "cloudberry"   "elderberry"   "lime"        
## [11] "lychee"       "mulberry"     "olive"        "salal berry"

3. Describe, in words, what these expressions will match:

4. Construct regular expressions to match words that: