Week 3 Assignment

1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset, provide code that identifies the majors that contain either “DATA” or “STATISTICS”

library(magrittr)
library(stringr)
url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"

data <- data.frame(read.csv(url))

data$Major[which(str_detect(data$Major, "DATA"))]

## [1] COMPUTER PROGRAMMING AND DATA PROCESSING
## 174 Levels: ACCOUNTING ACTUARIAL SCIENCE ... ZOOLOGY

data$Major[which(str_detect(data$Major, "STATISTICS"))]

## [1] MANAGEMENT INFORMATION SYSTEMS AND STATISTICS
## [2] STATISTICS AND DECISION SCIENCE              
## 174 Levels: ACCOUNTING ACTUARIAL SCIENCE ... ZOOLOGY

2. Write code that transforms the data below:

[1] “bell pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”

Into a format like this:

c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

x <- '[1] "bell pepper"  "bilberry"     "blackberry"   "blood orange"  
[5] "blueberry"    "cantaloupe"   "chili pepper" "cloudberry"  
[9] "elderberry"   "lime"         "lychee"       "mulberry"    
[13] "olive"        "salal berry"'

y <- str_remove_all(unlist(str_extract_all(x, '"[a-z]*\\s*[a-z]*"')), '\"')
y

##  [1] "bell pepper"  "bilberry"     "blackberry"   "blood orange" "blueberry"   
##  [6] "cantaloupe"   "chili pepper" "cloudberry"   "elderberry"   "lime"        
## [11] "lychee"       "mulberry"     "olive"        "salal berry"

3. Describe, in words, what these expressions will match:

(.)\1\1
- Any tripled character.
“(.)(.)\2\1”
- Any two characters followed by \2\1, all inside of quotes.
(..)\1
- Any pair of characters repeated.
“(.).\1.\1”
- Any two characters followed by \1, any character, and \1 all in quotes.
"(.)(.)(.).*\3\2\1"
- Any character followed by any character, followed by any character, followed by any number of random characters, followed by \3\2\1, all in quotes.

4. Construct regular expressions to match words that:

Start and end with the same character
- [a-z])\1\b
Contain a repeated pair of letters (eg “church” contains “ch” repeated twice.)
- ([a-z]{2})\1\b
Contain one letter repeated in at least three places (eg. “eleven” contains three e’s.)
- ([a-z])\1\1\b

Week 3 Assignment

Sam Reeves

2/17/2021

1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset, provide code that identifies the majors that contain either “DATA” or “STATISTICS”

2. Write code that transforms the data below:

3. Describe, in words, what these expressions will match:

4. Construct regular expressions to match words that: