Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
Response1: str_detect
major_df <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv")
spec(major_df)
## cols(
## FOD1P = col_character(),
## Major = col_character(),
## Major_Category = col_character()
## )
major_df |>
filter( str_detect(Major,"DATA") | str_detect(Major,"STATISTICS") | str_detect(Major_Category,"DATA") | str_detect(Major_Category,"STATISTICS") )
## # A tibble: 3 × 3
## FOD1P Major Major_Category
## <chr> <chr> <chr>
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
Write code that transforms the data below:
1 “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Response1: Using Regex , str_squish and str_replace_all
task2 <-'[1] "bell pepper" "bilberry" "blackberry" "blood orange"
[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
[9] "elderberry" "lime" "lychee" "mulberry"
[13] "olive" "salal berry"'
#Remove digits, [], additional line, and tab
task2_result <- str_replace_all(task2, "\\d+|\\[|\\]|\\n|\\t", "")
#Remove extra whitespace from strings
task2_result <- str_squish(task2_result)
#Add a - between different elements in the vector
task2_result <- str_replace_all(task2_result, '" "', "-")
task2_result <- task2_result %>%
str_replace_all('"', '') %>% # Remove all quotes
str_split("-", simplify = TRUE) %>% # Split by dash
as.vector() # Convert to vector
#print the vector
print(task2_result)
## [1] "bell pepper" "bilberry" "blackberry" "blood orange" "blueberry"
## [6] "cantaloupe" "chili pepper" "cloudberry" "elderberry" "lime"
## [11] "lychee" "mulberry" "olive" "salal berry"
Describe, in words, what these expressions will match:
(.)\1\1 , this will find the sequence of 3 characters. For example in the text, it replaced III for IV.
## [1] "This is a version IV"
“(.)(.)\2\1” , this will find a sequence of two characters followed by its inverse, within double quotes. For example in the text, it replaced IXXI for “The Roman numeral IXXI”.
## [1] "The Roman numeral IXXI is not valid"
(..)\1 , this will find the sequence of 2 characters followed by same two characters. For example in the text, it replaced XIXI for “The Roman numeral IXXI”.
## [1] "The Roman numeral XIXI is not valid"
“(.).\1.\1” , this will find the sequence of 2 characters structure followed by the first character, within double quotes. For example in the text, it replaced XIXIX for “The Roman numeral IXXI”.
## [1] "The Roman numeral XIXIX is not valid"
“(.)(.)(.).*\3\2\1” , this will find the sequence of 3 characters structure followed by any content and then the same sequence reversed, within double quotes. For example in the text, it replaced “XIVXIV is not valid same as VIXVIX” for “The Roman numeral IXXI”.
## [1] "String matches Regex"
Construct regular expressions to match words that:
Start and end with the same character : ^(.)[^\\1]*\1$
## [1] "String matches Regex"
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.): (.{2}).*\1
## [1] "String matches Regex .\""
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.):(.).\1.\1
## [1] "String matches Regex "