1. 538 Majors

college <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv", header = TRUE, sep = ",")

#pull majors into a matrix
skills <- as.matrix(college[,2, drop=FALSE])

#Find data 
sapply("DATA", function(y) grep(y,skills))
## DATA 
##   52
#Find Statistics
sapply("STATISTICS", function(y) grep(y,skills))
##      STATISTICS
## [1,]         44
## [2,]         59
#Match majors to skills
majors <- c(skills[52], skills[44], skills[59])
majors
## [1] "COMPUTER PROGRAMMING AND DATA PROCESSING"     
## [2] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [3] "STATISTICS AND DECISION SCIENCE"

2. Transform data

groceries <- c('[1] "bell pepper"  "bilberry"     "blackberry"   "blood orange"
 [5] "blueberry"    "cantaloupe"   "chili pepper" "cloudberry"  
 [9] "elderberry"   "lime"         "lychee"       "mulberry"    
[13] "olive"        "salal berry"')

groceries
## [1] "[1] \"bell pepper\"  \"bilberry\"     \"blackberry\"   \"blood orange\"\n [5] \"blueberry\"    \"cantaloupe\"   \"chili pepper\" \"cloudberry\"  \n [9] \"elderberry\"   \"lime\"         \"lychee\"       \"mulberry\"    \n[13] \"olive\"        \"salal berry\""
groceries <- trimws(groceries) 
groceries <- str_replace_all(groceries, "[\\[\\]]", "") 
groceries <- str_replace_all(groceries, "[!^[:digit:]]", "")
groceries <- str_replace_all(groceries, "\\\n", "")
groceries <- str_replace_all(groceries, '[\"]', "'")
groceries <- str_replace_all(groceries, "\\s+", " ")
groceries <- str_replace_all(groceries, "'", "")
groceries <- str_split(groceries, pattern=",")
groceries <- str_replace_all(groceries, "' '", "','") 

groceries <- unlist(groceries)

groceries
## [1] " bell pepper bilberry blackberry blood orange blueberry cantaloupe chili pepper cloudberry elderberry lime lychee mulberry olive salal berry"

Describe, in words, what these expressions will match:

  1. in a string, this matches a character repeated three times
  2. this matches two letters in the following order: a-b-b-a
  3. incorrect backreferencing
  4. this matches one letter with any other in the following order: a-b-a-b-a
  5. this matches a three letter sequence in any order

Construct regular expressions to match words that that

start and end with the same character

eve <- "eve"

str_view(eve, "^(.).*\\1$")

Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.): "(..).*\1"

word1 <- "church"
word2 <- "shellshock"

str_view(word1, "^(.).*\\1$")
str_view(word2, "^(.).*\\1$")

Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.): “(.).\1.\1”

word3 <- "eleven"
word4 <- "mississippi"

str_view(word3, "(.).*\\1.*\\1")
str_view(word4, "(.).*\\1.*\\1")