Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
# Load the CSV from Github
all_ages = read_csv("https://raw.githubusercontent.com/cliftonleesps/607_acq_mgt/main/week3/college-majors/all-ages.csv", show_col_types=FALSE)
# Now grep for DATA or STATISTICS
data_statistics_majors <- grep(pattern = "(STATISTICS|DATA)", all_ages$Major)
# Output the matched majors
for (i in (data_statistics_majors)) {
cat (sprintf("%s\n", all_ages$Major[i]))
}
## COMPUTER PROGRAMMING AND DATA PROCESSING
## STATISTICS AND DECISION SCIENCE
## MANAGEMENT INFORMATION SYSTEMS AND STATISTICS
[1] “bell pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
# Create a character vector representing the original comma separted, data string
stuff <- paste("bell pepper, bilberry, blackberry, blood orange, blueberry, cantaloupe, chili pepper, cloudberry, elderberry, lime, lychee, mulberry, olive, salal berry")
# Initialize
output <- ""
# Concatenate each element wrapped in double quotes and a comma
for (i in (str_split(stuff, " *, *"))) {
output <- str_c(output, '"',i,'"', sep="", collapse=",")
}
# Wrap with parentheses
output <-str_c('c(',output,')', sep="", collapse="")
# Display the result
cat(output)
## c("bell pepper","bilberry","blackberry","blood orange","blueberry","cantaloupe","chili pepper","cloudberry","elderberry","lime","lychee","mulberry","olive","salal berry")
The two exercises below are taken from R for Data Science, 14.3.5.1 in the on-line version:
will match any 3 consecutive characters (ex: CCC)
will match any 2 characters followed by the same 2 captured characters in reverse order (ex: eppe)
will match any 2 characters followed by the same 2 characters (ex: anan)
will match any character followed by any character, followed by the 1st captured character, followed by any character apart from 1st captured character, followed by the 1st captured character (ex: anana, anama)
will match any 3 characters, followed by any or no characters, followed by the first captured 3 characters in reverse order (ex: aprrpa, apricotrpa)
words[str_detect(words, "^(.)((.*\\1$)|\\1?$)")]
## [1] "a" "america" "area" "dad" "dead"
## [6] "depend" "educate" "else" "encourage" "engine"
## [11] "europe" "evidence" "example" "excuse" "exercise"
## [16] "expense" "experience" "eye" "health" "high"
## [21] "knock" "level" "local" "nation" "non"
## [26] "rather" "refer" "remember" "serious" "stairs"
## [31] "test" "tonight" "transport" "treat" "trust"
## [36] "window" "yesterday"
str_view(words, "([a-zA-Z][a-zA-Z]).*\\1", match = TRUE)
str_view(words, "([a-zA-Z]).*\\1.*\\1", match = TRUE)