Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
#There should be three majors that belong to this subset.
majors<- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv")
#read the csv file.
str(majors)
## 'data.frame': 174 obs. of 3 variables:
## $ FOD1P : chr "1100" "1101" "1102" "1103" ...
## $ Major : chr "GENERAL AGRICULTURE" "AGRICULTURE PRODUCTION AND MANAGEMENT" "AGRICULTURAL ECONOMICS" "ANIMAL SCIENCES" ...
## $ Major_Category: chr "Agriculture & Natural Resources" "Agriculture & Natural Resources" "Agriculture & Natural Resources" "Agriculture & Natural Resources" ...
desired_majors <- grep("DATA|STATISTICS", majors$Major, value=TRUE)
desired_majors
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"
## [3] "STATISTICS AND DECISION SCIENCE"
Write code that transforms the data below:
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
ugly_sting <- '[1] "bell pepper" "bilberry" "blackberry" "blood orange"
[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
[9] "elderberry" "lime" "lychee" "mulberry"
[13] "olive" "salal berry"'
better_string <- str_remove_all(ugly_sting, "[^[:alnum:]\\W]")
writeLines(better_string)
## [1] "bell pepper" "bilberry" "blackberry" "blood orange"
##
## [5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
##
## [9] "elderberry" "lime" "lychee" "mulberry"
##
## [13] "olive" "salal berry"
#Only thing left to do is to add a comma after every second quotation.
Describe, in words, what these expressions will match:
(.)\1\1
library(htmlwidgets)
## Warning: package 'htmlwidgets' was built under R version 4.1.1
test <- c("a", "abab","aabb", "abba", "abracadabra", "a\1\1", "aaaaaaa", "aabbaa")
str_view_all(test, "(.)\1\1")
This expression searches for any character that is followed by the characters “\1\1”.
“(.)(.)\2\1”
str_view_all(test, "(.)(.)\\2\\1")
This expression searches for strings that contain the following format: char(a)char(b)char(b)char(a)
(..)\1
str_view_all(test, "(..)\1")
This expression looks for any two characters that are followed by the literal characters \1.
“(.).\1.\1”
str_view_all(test, "(.).\\1.\\1")
This expression searches for any string of the form a[]a[]a, where a can be any character, and inside the brackets must be seperate characters.
"(.)(.)(.).*\3\2\1"
str_view_all(test, "(.)(.)(.).*\\3\\2\\1")
This expression I believe takes any three characters abc and searches for strings that have the form abc[]cba. Where the bracket can be any character.
#Exercise 4
construct regular expressions to match words that:
Start and end with the same character.
str_view_all(test, "^(.).*\\1$")
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
test_new <- c("church", "eleven")
str_view_all(test_new, "(..).*\\1")
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
str_view_all(test_new, "(.).*\\1.*\\1")