library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

#1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

rawlink <- 'https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv'
dataset <- read.csv(rawlink)
filter_majors <- dataset %>%
  filter(str_detect(Major, regex("DATA|STATISTICS", ignore_case = TRUE)))
print(filter_majors$Major)
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"     
## [3] "STATISTICS AND DECISION SCIENCE"

There are three majors which contain either ‘DATA’ or “STATISTICS’: MANAGEMENT INFORMATION SYSTEMS AND STATISTICS, COMPUTER PROGRAMMING AND DATA PROCESSING, and STATISTICS AND DECISION SCIENCE

#2 Write code that transforms the data below: [1] “bell pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”

Into a format like this: c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

fruits<-c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
print(fruits)
##  [1] "bell pepper"  "bilberry"     "blackberry"   "blood orange" "blueberry"   
##  [6] "cantaloupe"   "chili pepper" "cloudberry"   "elderberry"   "lime"        
## [11] "lychee"       "mulberry"     "olive"        "salal berry"

The two exercises below are taken from R for Data Science, 14.3.5.1 in the on-line version: #3 Describe, in words, what these expressions will match: (.)\1\1 This will match 3 of the same previous character, for example: 111.

“(.)(.)\2\1” This will match a string of two pairs of characters, in which each pair have 2 different letters, but they are reversed, for example: abba.

(..)\1 This will match a pair of characters in which the second character is the same as the one that came before it, for example: 11.

“(.).\1.\1” This will match a string of 5 characters where the first character is followed by another character, then the same first character, then the other character, then the same first character. For example: ababa.

“(.)(.)(.).*\3\2\1” This will match a string of at least 6 characters where the first 3 characters,all different, will match the last three characters in reverse order, and there is another character in between. For example: 1234321.

#4 Construct regular expressions to match words that: Start and end with the same character. ^(.)(.).*\2\1$ I used the word radar as an example for this one. I anchored the first two characters, r a, and then added a third different character, then reversed the first two characters.

Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) (..)..*(..) This would be two letters, followed by two different letters in between, and then the same first two letters.

Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.) (.).(.).(.). For the word “eleven”, this is the same letter repeated three times, in between a different letter each time, and ending with a different letter.