Question 1

We are examining the list of college majors from the FiveThiryEight article The Economic Guide To Picking A College Major.

Below we import the data from the FiveThirtyEight Github

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)

fileURL = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv'

majorsDF = read.csv((url(fileURL)))

We are interested in finding majors that contain the phrases “DATA” or “STATISTICS”

majorsOfInterest <- grep("DATA|STATISTICS", majorsDF$Major, value = TRUE, ignore.case = FALSE)

print(majorsOfInterest)
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"     
## [3] "STATISTICS AND DECISION SCIENCE"

Question 2

We are interested in transforming the below vector into one line of output that is separated by commas

foodVector <- c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")

str_flatten_comma(foodVector)
## [1] "bell pepper, bilberry, blackberry, blood orange, blueberry, cantaloupe, chili pepper, cloudberry, elderberry, lime, lychee, mulberry, olive, salal berry"

Question 3

Describe, in words, what these expressions will match:

Question 4

Construct regular expressions to match words that: