library(tidyverse)

Exercise 1

Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

#read the dataset from github link
col_majors_df<-read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv")

str_subset(col_majors_df[[2]], "(DATA|STATISTICS)")
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"     
## [3] "STATISTICS AND DECISION SCIENCE"

Exercise 2

Write code that transforms the data below:
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”

Into a format like this:

c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

fruit_str <- '[1] "bell pepper"  "bilberry"     "blackberry"   "blood orange"
[5] "blueberry"    "cantaloupe"   "chili pepper" "cloudberry"  
[9] "elderberry"   "lime"         "lychee"       "mulberry"    
[13] "olive"        "salal berry"'

# create a fruit list
fruit_list <- str_extract_all(string = fruit_str, pattern = '\\".*?\\"')
#create a fruit string
fruit_str <- str_c(fruit_list[[1]], collapse = ', ')

str_glue('c({fruit_str})', items = fruit_str)
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")

Exercise 3

Describe, in words, what these expressions will match:

string_exam <- c('aaap', 'anna','paneaean', 'asafa', 'rdghjgdr')

str_view(string_exam, "(.)\\1\\1")

Thus, this regular expression will look for a two character string which is immediately followed in reverse order.
example: anna, goog

str_view(string_exam, "(.)(.)\\2\\1")
str_view(string_exam, "(..)\\1")

Thus, in this regular expression search, 1st, 3rd and 5th character should be the same. 2nd and 4th character can be anything.

str_view(string_exam, "(.).\\1.\\1")

Thus, this regular expression will capture three groups of characters followed by zero or more characters and the three groups in reverse order.

str_view(string_exam, "(.)(.)(.).*\\3\\2\\1")

Exercise 4

Construct regular expressions to match words that:

string_examp <- c('goog', ' church', 'eleven')
str_view(string_examp, "^(.).*\\1$")
str_view(string_examp,"(..).*\\1")
str_view(string_examp,"(.).*\\1.*\\1")