Question 1:

Problem: Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

Solution to Question 1

To solve Question 1 Problem; I first imported the raw college major data from Github, first using the link provided from fivethirtyeight website to open the article.
Then I loaded packages: dplyr , tidyverse and readr. Afterwards, I searched the majors data frame through filter. The filter uses regex to filter out majors that contain either ‘DATA’ or ‘STATISTICS’ and pulls it out into a new data frame called mathematical_majors.

#import data and packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(readr)
majors <- read.csv(url("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"))

#filter majors that contains "DATA" & "STATISTICS"
(mathematical_majors <- majors %>%
  filter(str_detect(Major, "DATA") | str_detect(Major, "STATISTICS")))
##   FOD1P                                         Major          Major_Category
## 1  6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS                Business
## 2  2101      COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3  3702               STATISTICS AND DECISION SCIENCE Computers & Mathematics

Question 2

Write code that transforms the data below:

[1] “bell pepper” “bilberry” “blackberry” “blood orange”.

[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”

[9] “elderberry” “lime” “lychee” “mulberry”

[13] “olive” “salal berry”

Into a format like this:

c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

Solution to Question 2

fruits <- c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")

fruit_vec <- paste0("\"", fruits, "\"")

fru_vec2 <- paste0(fruit_vec, collapse = ", ") 

fru_vec3 <- paste0("c(", fru_vec2, ")")

cat(fru_vec3)
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")

Steps

So I had a hard time with this one a bit to make it look exactly like the way it is showed. I at first had it but minus the quotations around it. So I eventually came to this solution a different style.
I first created the string. The I had added backlash and quotation around each element (fruits), and defined it into a new vector called “fruit_vec”.
Next, I collapsed the quoted words into a string, separating each word with a comma. Now identified as “fruit_vec2”.
Then, I added the leading characters which is “c(…..)”.
Finally, I parsed the backslashes as escape characters using cat function. The output is now in the desired format.

Question 3

Describe, in words, what these expressions will match:

Solution to Question 3

there is an extra backslash in script to show the original expression in bullet points 2,4,and 5 because the first backlash escapes ; in question above is original expression.

Question 4

Construct regular expressions to match words that:

Solution to Question 4

there is an extra backslash in script to show the expression I intend to be solution. a to c is 2 backslash but made it 3 backlashes in script to show 2 backlashes for html

a). Start and end with the same character —> ^(.).*\\1$

b). Contain a repeated pair of letters —-> (.).*\\1

c). Contain one letter repeated in at least three places —> (.).\\1.\\1