This homework assignment explores regex in R
Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset link, provide code that identifies the majors that contain either “DATA” or “STATISTICS”
file_path <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"
majors <- read_csv(file_path, show_col_types = FALSE)
majors <- majors %>% mutate(has_data_stats = as.numeric(str_detect(Major, "DATA|STATISTICS")))
majors %>% filter(has_data_stats == 1)
## # A tibble: 3 × 4
## FOD1P Major Major_Category has_data_stats
## <chr> <chr> <chr> <dbl>
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business 1
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & M… 1
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & M… 1
Write code that transforms the data below:
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
temp <- c("bell pepper", "bilberry","blackberry", "blood orange","blueberry", "cantaloupe", "chili pepper", "cloudberry","elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
temp2 <- str_c(temp, collapse = ', ')
temp2
## [1] "bell pepper, bilberry, blackberry, blood orange, blueberry, cantaloupe, chili pepper, cloudberry, elderberry, lime, lychee, mulberry, olive, salal berry"
Describe, in words, what these expressions will match:
(.)\1\1
"(.)(.)\\2\\1"
(..)\1
"(.).\\1.\\1"
"(.)(.)(.).*\\3\\2\\1"