Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”.
# The URL of the dataset
url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"
# Read the dataset into an R data frame
college_majors <- read_csv(url)
# Identify the majors that contain either "DATA" or "STATISTICS"
filtered_majors <- college_majors[grep("(DATA|STATISTICS)", college_majors$Major, ignore.case = TRUE), ]
# View the filtered majors
print(filtered_majors)## # A tibble: 3 × 3
## FOD1P Major Major_Category
## <chr> <chr> <chr>
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
Write code that transforms the data below:
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
fruits <- c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
# Converting the entire data into a single string
data_string <- paste(fruits, collapse = " ")
# Splitting the string into individual words
words <- unlist(strsplit(data_string, " "))
# The 'words' variable contains individual words extracted from the data string
print(words)## [1] "bell" "pepper" "bilberry" "blackberry" "blood"
## [6] "orange" "blueberry" "cantaloupe" "chili" "pepper"
## [11] "cloudberry" "elderberry" "lime" "lychee" "mulberry"
## [16] "olive" "salal" "berry"
Describe, in words, what these expressions will match:
(.)\1\1 Matches any character repeated
3 times in a row (e.g., “aaa”).
## [1] "aaabbb" "ffff"
"(.)(.)\\2\\1" Matches two characters
inside quotes that then appear again but in reverse order (e.g.,
“abba”).
## [1] "abba"
(..)\1 Matches any two characters that
appear twice in a row (e.g., “abab”).
## character(0)
"(.).\\1.\\1" Inside quotes, matches a
pattern where a character is followed by any character, and this
sequence repeats twice more (e.g., “aXaYa”).
## [1] "axaya"
"(.)(.)(.).*\\3\\2\\1" Inside quotes,
matches three characters followed by any sequence of characters, ending
with the initial three characters but in reverse order (e.g.,
“abc…cba”).
## [1] "abc...cba" "abz...zba"
Construct regular expressions to match words that:
- Start and end with the same character:
\b\w*(\w).*\1.*\1
## [1] "banana"
\b\w*(\w\w).*\1## [1] "church"
\b\w*(\w).*\1.*\1## [1] "eleven" "banana"