#1 Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

Import data from URL - explore all the variables of the FiveThirtyEight majors dataset

majors <- read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv')

str(majors)
## 'data.frame':    174 obs. of  3 variables:
##  $ FOD1P         : chr  "1100" "1101" "1102" "1103" ...
##  $ Major         : chr  "GENERAL AGRICULTURE" "AGRICULTURE PRODUCTION AND MANAGEMENT" "AGRICULTURAL ECONOMICS" "ANIMAL SCIENCES" ...
##  $ Major_Category: chr  "Agriculture & Natural Resources" "Agriculture & Natural Resources" "Agriculture & Natural Resources" "Agriculture & Natural Resources" ...
data_stats_majors <- majors %>% 
  filter(grepl('DATA|STATISTICS', Major))

data_stats_majors
##   FOD1P                                         Major          Major_Category
## 1  6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS                Business
## 2  2101      COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3  3702               STATISTICS AND DECISION SCIENCE Computers & Mathematics

#2 Create a vector

list_of_fruits <- c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")

list_of_fruits
##  [1] "bell pepper"  "bilberry"     "blackberry"   "blood orange" "blueberry"   
##  [6] "cantaloupe"   "chili pepper" "cloudberry"   "elderberry"   "lime"        
## [11] "lychee"       "mulberry"     "olive"        "salal berry"

#3 Describe, in words, what these expressions will match

-(.)\1\1 - Dot matches any character except line breaks. \1 matches the results of capture group #1. Here we have two backreferences \1 so this expresssion will match something like “etc…” or “baaa”.

-“(.)(.)\2\1” - Matches something like “ab\2\1” or “bc/2/1”.

-(..)\1 - the parenthesis signifies a capture group and the \1 is a numeric reference that has to match the results of capture group #1. “bb” in “bbbb” is the capture group, and the next “bb” pair matches the first group. Therefore, “bbbb” is a match for this expression.

-“(.).\1.\1” - Matches something like “aa\1a\1” or “bb\1c\1”. Dot could match any characters while the first backslash is an escape character that tells regex to match exactly the “" character.

-“(.)(.)(.).*\3\2\1” - Matches something like “abcde\3\2\1”.

#4 Construct regular expressions to match words that:

-Start and end with the same character.

-^([a-z]).*\1$ I thought it would be something like this but doesn’t seem to work.

-Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)

-Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)