Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
library(knitr)
library(stringr)
path <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv";
download.file(url=path, destfile = "majors-list.csv");
majors <- read.csv("majors-list.csv", stringsAsFactors = FALSE);
kable(str_subset(majors$Major,pattern="DATA|STATISTICS"))
| x |
|---|
| MANAGEMENT INFORMATION SYSTEMS AND STATISTICS |
| COMPUTER PROGRAMMING AND DATA PROCESSING |
| STATISTICS AND DECISION SCIENCE |
Write code that transforms the data below:
[1] “bell pepper” “bilberry” “blackberry” “blood orange” [5]
“blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this: c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
library(knitr)
library(stringr)
blob <- '"bell pepper" "bilberry" "blackberry" "blood orange"
"blueberry" "cantaloupe" "chili pepper" "cloudberry"
"elderberry" "lime" "lychee" "mulberry"
"olive" "salal berry" '
scan(text=blob, what="character")
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
“blueberry”
[6] “cantaloupe” “chili pepper” “cloudberry” “elderberry” “lime”
[11] “lychee” “mulberry” “olive” “salal berry”
#3
Describe, in words, what these expressions will match:
(.)\1\1 - singly repeat characters are matched “(.)(.)\2\1” - words in reverse (..)\1 - the first two characters are the same “(.).\1.\1”- matches first character is at the beginning middle and end of the word “(.)(.)(.).*\3\2\1” - matches a word that has it’s first three characters in reverse
#4
Construct regular expressions to match words that:
Start and end with the same character. “^([a-z]).*\1$” Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) “([a-z][a-z]){1,}” Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.) “[a-z]{3,}”