Please deliver links to an R Markdown file (in GitHub and rpubs.com) with solutions to the problems below. You may work in a small group, but please submit separately with names of all group participants in your submission.
Load the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/] from Github
CollegeMajorData = read_csv('https://raw.githubusercontent.com/BeshkiaKvarnstrom/MSDS-DATA607/main/majors-list.csv',
show_col_types = FALSE)
QUESTION 1: Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
CollegeMajorDF <- CollegeMajorData%>% filter(str_detect(Major,"STATISTICS") | str_detect(Major,"DATA"))
CollegeMajorDF
## # A tibble: 3 × 3
## FOD1P Major Major_Category
## <chr> <chr> <chr>
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
QUESTION 2: Write code that transforms the data below: [1] “bell
pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry”
“cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this: c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”) The two exercises below are taken from R for Data Science, 14.3.5.1 in the on-line version:
fv_input <- '[1] "bell pepper" "bilberry" "blackberry" "blood orange"
[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
[9] "elderberry" "lime" "lychee" "mulberry"
[13] "olive" "salal berry"'
fruits_veg <- str_extract_all(string = fv_input, pattern = '\\".*?\\"')
fv_items <- str_c(fruits_veg[[1]], collapse = ', ')
str_glue('c({fv_items})', fv_items = fv_items)
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
QUESTION 3: Describe, in words, what these expressions will
match:
(.)\1\1 - This expression will match any set of three
repeated characters in a row
“(.)(.)\2\1” - This expression will
match any two repeat characters in a string in the reverse order.
(..)\1 - This expression will match any two characters in a string that
repeat immediately in the same order.
“(.).\1.\1” - This expression
will match characters in a string where the character to be matched is
repeated three times with a different character in between each
occurrence
“(.)(.)(.).*\3\2\1” - This expression will find any three
unique characters in a string, followed by any amount of other
characters then followed by the original three characters in reverse
order
QUESTION 4: Construct regular expressions to match words that: Start and end with the same character. Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
inp_string <- c("radar","apple","dog","dad","mom","test","lateral","sense","church","banana","pepper","area","screen","England","eleven","ten","twelve","soso","treat","bandana", "Louisiana", "Missouri", "Mississippi", "Connecticut", "google", "mood", "madam")
#Start and end with the same character.
str_subset(inp_string, "(^|\\s)([a-z])(([a-z]+\\2(\\s|$))|\\2?(\\s|$))")
## [1] "radar" "dad" "mom" "test" "lateral" "area" "treat"
## [8] "madam"
#Contain a repeated pair of letters (e.g. "church" contains "ch" repeated twice.
str_subset(inp_string, "([A-Za-z][A-Za-z]).*\\1")
## [1] "sense" "church" "banana" "pepper" "soso"
## [6] "bandana" "Mississippi"
#Contain one letter repeated in at least three places (e.g. "eleven" contains three "e"s.)
str_subset(inp_string, "([A-Za-z]).*\\1.*\\1")
## [1] "banana" "pepper" "eleven" "bandana" "Mississippi"