Problem 1

Using the 173 majors listed in fivethirtyeight.com’s [College Majors dataset] (https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/), provide code that identifies the majors that contain either “DATA” or “STATISTICS”

URL <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"
majors <- read.csv(URL)

majors %>% filter(str_detect(Major, "(DATA)|(STATISTICS)"))
##   FOD1P                                         Major          Major_Category
## 1  6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS                Business
## 2  2101      COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3  3702               STATISTICS AND DECISION SCIENCE Computers & Mathematics

Problem 2

Write code that transforms the data below:

[1] “bell pepper” “bilberry” “blackberry” “blood orange”

[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”

[9] “elderberry” “lime” “lychee” “mulberry”

[13] “olive” “salal berry”

Into a format like this:

c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

fruits <- '[1] "bell pepper"  "bilberry"     "blackberry"   "blood orange"

[5] "blueberry"    "cantaloupe"   "chili pepper" "cloudberry"  

[9] "elderberry"   "lime"         "lychee"       "mulberry"    

[13] "olive"        "salal berry"'

fruits <- str_extract_all( fruits, '\"[a-z]*\\s*[a-z]*\\"')

fruits <- unlist(fruits)

fruits
##  [1] "\"bell pepper\""  "\"bilberry\""     "\"blackberry\""   "\"blood orange\""
##  [5] "\"blueberry\""    "\"cantaloupe\""   "\"chili pepper\"" "\"cloudberry\""  
##  [9] "\"elderberry\""   "\"lime\""         "\"lychee\""       "\"mulberry\""    
## [13] "\"olive\""        "\"salal berry\""
writeLines(fruits)
## "bell pepper"
## "bilberry"
## "blackberry"
## "blood orange"
## "blueberry"
## "cantaloupe"
## "chili pepper"
## "cloudberry"
## "elderberry"
## "lime"
## "lychee"
## "mulberry"
## "olive"
## "salal berry"

The two exercises below are taken from R for Data Science, 14.3.5.1 in the on-line version:

Problem 3

Describe, in words, what these expressions will match:

teststring <- c("aaah", "noon", "mama", "rarer", "racecar")
# 3.a
str_view(teststring, "(.)\\1\\1")
# 3.b
str_view(teststring, "(.)(.)\\2\\1")
# 3.c
str_view(teststring, "(..)\\1")
# 3.d
str_view(teststring, "(.).\\1.\\1")
# 3.e
str_view(teststring, "(.)(.)(.).*\\3\\2\\1")

Problem 4

Construct regular expressions to match words that:

teststring2 <- c("Mississippi", "noon", "mama", "rarer", "racecar")
# 4.a
str_view(teststring2, "^(.).*\\1$")
# 4.b
str_view(teststring2, "(..).*\\1")
# 4.c
str_view(teststring2, "(.).*\\1.*\\1")