#1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS” Finding major with the word “Data”
library(stringr)
Word_Data_or_Stat <-read.csv("https://raw.githubusercontent.com/Wilchau/Data607assignment3/main/Major%20list.csv%20-%20Sheet1.csv",header = TRUE, sep = ",")
Words_Data <- str_detect(Word_Data_or_Stat$Major, fixed("DATA"))
Word_Data_or_Stat[Words_Data,]
## FOD1P Major Major_Category
## 52 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
Words_STAT <-str_detect(Word_Data_or_Stat$Major, fixed("STATISTICS"))
Word_Data_or_Stat[Words_STAT,]
## FOD1P Major Major_Category
## 44 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 59 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
Utilizing str_detect I was able to find a total of 3 majors that can
had Data OR Statistics. #2 Write code that transforms the data below:
[1] “bell pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry”
“cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry” Into a format like this: c(“bell pepper”,
“bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”,
“chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”,
“mulberry”, “olive”, “salal berry”)
unorganized_fruits <- '“bell pepper” “bilberry” “blackberry” “blood orange”
“blueberry” “cantaloupe” “chili pepper” “cloudberry”
“elderberry” “lime” “lychee” “mulberry”
“olive” “salal berry”'
organized_fruits<- paste('c(', paste('"',unorganized_fruits,'"',sep = "", collapse = ','), sep = "",')')
organized_fruits
## [1] "c(\"“bell pepper” “bilberry” “blackberry” “blood orange”\n“blueberry” “cantaloupe” “chili pepper” “cloudberry”\n“elderberry” “lime” “lychee” “mulberry”\n“olive” “salal berry”\")"
#3 Describe, in words, what these expressions will match: #This will match any one character, followed by two repetitions, like “aaa” or “555”. The correct expression would
(.)\1\1
This can be a regular expression is used to match a pattern in a string with the same character appearing three times in a row. Example: 777,www
“(.)(.)\2\1”
Matches the 2 strings or character group follow by ‘\2\1’. Example: ab\2\1
(..)\1
This can match with any two characters before \1. Example: aa\1
“(.).\1.\1”
A character followed by any character, the original character, any other character, and the original character again. Example: a\1b\1
“(.)(.)(.).*\3\2\1”
This can match with 3 characters followed by zero or more characters of any kinds followed by the same three characters in reverse. Example: abc\3\2\1
#4 1)Construct regular expressions to match words that: 1)Start and end with the same character. str_match(exercise_four,’^(.).*\1$’)
2)Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) ``` str_bring(exercise_four_two, “(..).*\1”)
3)Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.) ``` str_bring(exercise_four_three, “([a-z]).\1.\1”)