## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

college_major <- read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv')

1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

glimpse(college_major)
## Rows: 174
## Columns: 3
## $ FOD1P          <chr> "1100", "1101", "1102", "1103", "1104", "1105", "1106",…
## $ Major          <chr> "GENERAL AGRICULTURE", "AGRICULTURE PRODUCTION AND MANA…
## $ Major_Category <chr> "Agriculture & Natural Resources", "Agriculture & Natur…
grep(pattern = "DATA|STATISTICS", college_major$Major, value = TRUE, ignore.case = TRUE)
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"     
## [3] "STATISTICS AND DECISION SCIENCE"

2 Write code that transforms the data below:

[1] “bell pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry” Into a format like this: c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

fruits_basket <- c("bell pepper", "bilberry", "blackberry", "blood orange",
"blueberry", "cantaloupe", "chili pepper", "cloudberry", 
"elderberry", "lime", "lychee", "mulberry",    
"olive", "salal berry")
fruits_basket
##  [1] "bell pepper"  "bilberry"     "blackberry"   "blood orange" "blueberry"   
##  [6] "cantaloupe"   "chili pepper" "cloudberry"   "elderberry"   "lime"        
## [11] "lychee"       "mulberry"     "olive"        "salal berry"
transformed_data <- cat(paste0("c(",paste0(sep = '"',fruits_basket, collapse = ', ', sep='"'),paste(")")))
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
#invisible(cat(transformed_data))

3 Describe, in words, what these expressions will match:

(.)\1\1 “(.)(.)\2\1” (..)\1 “(.).\1.\1” “(.)(.)(.).*\3\2\1”

Answers:- (.)\1\1 This expression matches any single character followed by the same character repeated twice.

“(.)(.)\2\1” It matches any two-character string where the characters are the same in both positions, but in reverse order.

(..)\1 It matches any four-character string where the first two characters are identical to the last two characters.

“(.).\1.\1” Any character in first capturing group, then it can be any character, next it will backreference to first capturing group, again any character, lastly, it will repeat first capturing group. example:- azaxa

“(.)(.)(.).*\3\2\1” It matches any three characters in a row, then any character repeated zero or more times. Next, it will match three capturing groups in reverse order.

4 Construct regular expressions to match words that:

Start and end with the same character. Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)

Answers:- ^(.).\1$ Starts with any one character following any character zero or more occurrences then finally ends with back-reference to first capturing group. ()\1.\1\1

([A-Za-z][A-Za-z]).*\1 It matches any two letters ignoring case, followed by any character with zero or more occurrences and finally it matches the back-reference to first capturing group.

([a-z]).\1.\1 This regex matches starting with single letter followed by zero or more occurrences of any character. Next, repeated the letter from first capturing group followed by zero or more occurrences of any character. Lastly, again matches with the first capturing group that is any one single letter.