Introduction

This homework assignment explores regex in R

Problem 1

Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset link, provide code that identifies the majors that contain either “DATA” or “STATISTICS”

Solution

  • Read in data from Github
  • Create an new column to indicate if a major includes “DATA” or “STATISTICS” using ‘str_detect’
  • Filter on column to identify majors of interest
file_path <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"

majors <- read_csv(file_path, show_col_types = FALSE)

majors <- majors %>% mutate(has_data_stats = as.numeric(str_detect(Major, "DATA|STATISTICS")))

majors %>% filter(has_data_stats == 1)
## # A tibble: 3 × 4
##   FOD1P Major                                         Major_Category has_data_stats
##   <chr> <chr>                                         <chr>                   <dbl>
## 1 6212  MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business                    1
## 2 2101  COMPUTER PROGRAMMING AND DATA PROCESSING      Computers & M…              1
## 3 3702  STATISTICS AND DECISION SCIENCE               Computers & M…              1

Problem 2

Write code that transforms the data below:

[1] “bell pepper” “bilberry” “blackberry” “blood orange”

[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”

[9] “elderberry” “lime” “lychee” “mulberry”

[13] “olive” “salal berry”

Into a format like this:

c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

Solution

  • Put fruit in vector
  • Use ‘str_c’ with ther parameter ‘collapse’ to collapse the vector into a single character string
temp <- c("bell pepper", "bilberry","blackberry", "blood orange","blueberry", "cantaloupe", "chili pepper", "cloudberry","elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
temp2 <- str_c(temp, collapse = ', ')
temp2
## [1] "bell pepper, bilberry, blackberry, blood orange, blueberry, cantaloupe, chili pepper, cloudberry, elderberry, lime, lychee, mulberry, olive, salal berry"

Problem 3

Describe, in words, what these expressions will match:

(.)\1\1
"(.)(.)\\2\\1"
(..)\1
"(.).\\1.\\1"
"(.)(.)(.).*\\3\\2\\1"