Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
##install stringr
library(stringr)
##import data
library(readr)
majors_list <- read.csv( "https://raw.githubusercontent.com/fivethirtyeight/data/2d2ff3e9457549d51f8e571c52099bfe9b2017ad/college-majors/majors-list.csv")
##look at import
str(majors_list)
## 'data.frame': 174 obs. of 3 variables:
## $ FOD1P : chr "1100" "1101" "1102" "1103" ...
## $ Major : chr "GENERAL AGRICULTURE" "AGRICULTURE PRODUCTION AND MANAGEMENT" "AGRICULTURAL ECONOMICS" "ANIMAL SCIENCES" ...
## $ Major_Category: chr "Agriculture & Natural Resources" "Agriculture & Natural Resources" "Agriculture & Natural Resources" "Agriculture & Natural Resources" ...
summary(majors_list)
## FOD1P Major Major_Category
## Length:174 Length:174 Length:174
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##select majors with titles DATA or STATISTICS
grep(pattern = "DATA|STATISTICS", majors_list$Major, value = TRUE)
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"
## [3] "STATISTICS AND DECISION SCIENCE"
##2 Write code that transforms the data below: [1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
newfruit <- c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
(.)\1\1 Single characters that repeats three times
“(.)(.)\2\1” Two characters followed by same two characters in reverse order
(..)\1 Two characters repeated
“(.).\1.\1” Single character followed by another character, the first single character, another characters, the original single character
“(.)(.)(.).*\3\2\1” Three characters that repeat in reverse order after any number of variable characters
##construct regular expressions to match words that
##start and end with the same character
list <-c("anna", "church", "bob", "harry", "paul", "eleven", "bubble")
regex_expr ="^(.)((.*\\1$)|\\1?$)"
str_subset(list, regex_expr)
## [1] "anna" "bob"
##contain a repeated pair of letters (e.g., “church” contains “ch”
regex_expr2 = "([A-Za-z][A-Za-z]).*\\1"
str_subset(list,regex_expr2)
## [1] "church"
##contain one letter repeated in at least three places (e.g., “eleven contains three”e”s)
regex_expr3 = "([A-Za-z]).*\\1.*\\1"
str_subset(list,regex_expr3 )
## [1] "eleven" "bubble"