Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
library(stringr)
library(tidyverse)
library(kableExtra)
library(knitr)major_df <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv")
kable(head(major_df))| FOD1P | Major | Major_Category |
|---|---|---|
| 1100 | GENERAL AGRICULTURE | Agriculture & Natural Resources |
| 1101 | AGRICULTURE PRODUCTION AND MANAGEMENT | Agriculture & Natural Resources |
| 1102 | AGRICULTURAL ECONOMICS | Agriculture & Natural Resources |
| 1103 | ANIMAL SCIENCES | Agriculture & Natural Resources |
| 1104 | FOOD SCIENCE | Agriculture & Natural Resources |
| 1105 | PLANT SCIENCE AND AGRONOMY | Agriculture & Natural Resources |
summary(major_df)## FOD1P Major
## 1100 : 1 ACCOUNTING : 1
## 1101 : 1 ACTUARIAL SCIENCE : 1
## 1102 : 1 ADVERTISING AND PUBLIC RELATIONS : 1
## 1103 : 1 AEROSPACE ENGINEERING : 1
## 1104 : 1 AGRICULTURAL ECONOMICS : 1
## 1105 : 1 AGRICULTURE PRODUCTION AND MANAGEMENT: 1
## (Other):168 (Other) :168
## Major_Category
## Engineering :29
## Education :16
## Humanities & Liberal Arts:15
## Biology & Life Science :14
## Business :13
## (Other) :86
## NA's : 1
reg_data_stats = str_detect(levels(major_df$Major), regex("DATA|STATISTICS", ignore_case=TRUE))
reg_data_stats## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [34] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [100] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [122] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [144] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [155] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [166] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
levels(major_df$Major)[reg_data_stats]## [1] "COMPUTER PROGRAMMING AND DATA PROCESSING"
## [2] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [3] "STATISTICS AND DECISION SCIENCE"
Write code that transforms the data below:
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
input_text <- '[1] "bell pepper" "bilberry" "blackberry" "blood orange"
[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
[9] "elderberry" "lime" "lychee" "mulberry"
[13] "olive" "salal berry"'
char_Vector <- c(unlist(str_extract_all(input_text, "\\b[A-Za-z]+\\b")))
vec_str <- str_c('"', char_Vector, '"', collapse = ", " )
final_text <- str_c('c(', vec_str, ')', collapse = " " )
#Final Output text
writeLines(final_text)## c("bell", "pepper", "bilberry", "blackberry", "blood", "orange", "blueberry", "cantaloupe", "chili", "pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal", "berry")
Describe, in words, what these expressions will match:
word_ex_char <- c("(.)\1\1", "(.)(.)\\2\\1", "(..)\1", "(.).\\1.\\1", "(.)(.)(.).*\\3\\2\\1")
word_exprs <- list("aaa", "abc", "abba", "afada", "ab\1", "a\1\1", "abccba")
str_view(word_exprs, '(.)\1\1')This expression matches characters which are followed by a “\1\1”
str_view(word_exprs, '(.)(.)\\2\\1')It matches strings which contain pairs of characters that are followed by a reverse of their order.
str_view(word_exprs, '(..)\1')It matches strings a couple of characters that are followed by “\1”.
str_view(word_exprs, '(.).\\1.\\1')str_view(word_exprs, '(.)(.)(.).*\\3\\2\\1')Construct regular expressions to match words that
word_1 <- list("blurb", "9Thousand9", "Light", "101DATA101", "MAGMA", "BANANA")
str_view(word_1, "^(.)(.*)\\1$") str_view(word_1, '([A-Za-z][A-Za-z]).*\\1')str_view(word_1, '([A-Za-z]).*\\1.*\\1')