library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)

Question 1

data <- read.csv("https://raw.githubusercontent.com/sphill12/DATA607/main/majors-list%20data%20607.csv")
major_filtered <- data %>% filter(str_detect(Major, regex("STATISTICS|DATA")))
major_filtered
##   FOD1P                                         Major          Major_Category
## 1  6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS                Business
## 2  2101      COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3  3702               STATISTICS AND DECISION SCIENCE Computers & Mathematics

Question 2

raw_str <- r"([1] "bell pepper"  "bilberry"     "blackberry"   "blood orange"
[5] "blueberry"    "cantaloupe"   "chili pepper" "cloudberry"  
[9] "elderberry"   "lime"         "lychee"       "mulberry"    
[13] "olive"        "salal berry")"
pattern <- '"([^"]+)"'
find_match <- str_extract_all(raw_str, pattern)

final <- lapply(find_match, function(x) substr(x, 2, nchar(x)-1))
print(final)
## [[1]]
##  [1] "bell pepper"  "bilberry"     "blackberry"   "blood orange" "blueberry"   
##  [6] "cantaloupe"   "chili pepper" "cloudberry"   "elderberry"   "lime"        
## [11] "lychee"       "mulberry"     "olive"        "salal berry"

Question 3

“(.)\1\1” This expression will take any character for “(.)”. The “\1” will then match to the text of the first grouping, the “(.)”. This would match strings such as “aaa”

“(.)(.)\2\1” This expression will form 2 matching groups with “(.)(.)”. It will then take a match with the second group followed by a match with the first group. This would match strings such as “abba”

“(..)\1” This expression will take any two characters for the grouping. It will then match another set of this grouping. This would match a string such as “abab”

“(.).\1.\1”

This expression will take a grouping with the first char, and then allow any character to follow it. The next character must be the first grouping, followed by any character, and finally the first grouping. This would match a string such as “abaca”

“(.)(.)(.).\3\2\1

This expression will make 3 groupings at “(.)(.)(.)”.The .* will match 0 or more characters after the first 3. The string must then match the 3rd grouping, and the second grouping. The “\1*” will match 0 or more occurances of the 1st grouping.There are a variety of ways to match a string to this pattern. “abccb”,“abcabca”,“abccba” would all match

Question 4

Construct regular expressions to match words that:

Start and end with the same character

The following regex would do this “^(.).*\1$”

Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)

The following regex would do this “(..).*\1”

Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)

I was not able to get this one