Part I:

Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset, provide code that identifies the majors that contain either “DATA” or “STATISTICS”

filtered_data <-
  major_data[
    grepl("(DATA|STATISTICS)",major_data$Major),
  ]
Filtered Majors
FOD1P Major Major_Category
6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics

Part II:

Write code that transforms the data below:
[1] “bell pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”

fruit_list
 [1] "bell pepper"  "bilberry"     "blackberry"   "blood orange" "blueberry"   
 [6] "cantaloupe"   "chili pepper" "cloudberry"   "elderberry"   "lime"        
[11] "lychee"       "mulberry"     "olive"        "salal berry" 

Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

dput(fruit_list)
c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", 
"cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", 
"lychee", "mulberry", "olive", "salal berry")

Part III:

Describe, in words, what these expressions will match:

  • (.)\1\1
    • This will present an error in two ways. First, there are no quotes around the pattern. Secondly, with only one ‘\’ Regex will read (.)\1\1 as (.) instead of (.)\1\1.
Error: <text>:1:25: unexpected input
1: str_view(fruit_list,(..)\
                            ^
  • “(.)(.)\\2\\1”
    • Take one character, followed by another character, repeat the second character, then the first. This is equivalent to ABBA.
  • (..)\1
    • This will also give you an error as it is not in quotes. Additionally, to get the regex right it should be \1 and not \1 as regex again will read it as (..) instead of (..)\1.
Error: <text>:1:25: unexpected input
1: str_view(fruit_list,(..)\
                            ^
  • “(.).\\1.\\1”
    • This is looking for a character, then any other character, then at the same character again, then any other character, then the same character again. The format would be ABACA
  • “(.)(.)(.).*\\3\\2\\1”
    • This will look at three characters then any number of other characters, followed by the three characters repeated backwards. This would show up as ABCDDDCBA

Part IV:

Construct regular expressions to match words that:

  • Start and end with the same character.
    • “^(.).*\\1$”
  • Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
    • “([A-Za-z][A-Za-z]).*\\1”
  • Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
    • “([A-Za-z]).*\\1.*\\1”
      * * *