Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
College_maj<-read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv')
find='DATA|STATISTICS'
College_maj_sub <- College_maj$Major[grep(find, College_maj$Major)]
print(College_maj_sub)
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"
## [3] "STATISTICS AND DECISION SCIENCE"
grep() function has been used to search for matches of a pattern within each element of the given string.
[1] “bell pepper” “bilberry” “blackberry” “blood orange” [5]
“blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
Fruits <- data.frame(c("bell pepper", "bilberry", "blackberry","blood orange","blueberry","cantalope","chili pepper","cloudberry","elderberry","lime","lychee","mulberry","olive","salal berry"))
cat(paste0(Fruits), collapse=",")
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantalope", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry") ,
(.)\1\1
“(.)(.)\2\1”
(..)\1
“(.).\1.\1”
“(.)(.)(.).*\3\2\1”
exp <- c("toooo little", "sooo cute", "blackberry", "blackberrrry", "limeee", "lime", "12345", "347565", "07770")
str_subset(exp, "(.)\\1\\1")
## [1] "toooo little" "sooo cute" "blackberrrry" "limeee" "07770"
str_view(fruit, "(.)(.)\\2\\1")
## [5] │ bell p<eppe>r
## [17] │ chili p<eppe>r
str_view(fruit, "(..)\\1")
## [4] │ b<anan>a
## [20] │ <coco>nut
## [22] │ <cucu>mber
## [41] │ <juju>be
## [56] │ <papa>ya
## [73] │ s<alal> berry
str_view(fruit, "(.).\\1.\\1")
## [4] │ b<anana>
## [56] │ p<apaya>
exp <- c("toooo little", "abcdeffedcba", "blackberry", "blackberrrry", "limeee", "lime", "12345", "07770", "077770", "347743", "34788743")
str_subset(exp, "(.)(.)(.).*\\3\\2\\1")
## [1] "abcdeffedcba" "077770" "347743" "34788743"
Start and end with the same character.
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
Answers:
df.names <-c("alisha", "farhana", "anna", "sahana", "church", "bob", "harry", "paul", "eleven", "bubble", "cell", "apple", "dog", "ada", "sense", "banana", "pepperoni", "india", "ten", "twelve", "soso", "oso", "bandana", "Louisiana", "Missouri", "Mississippi", "Connecticut", "google", "conscience", "dalda", "short", "Evon", "ele", "Tort")
regex_expr1 <-"^(.)((.*\\1$)|\\1?$)"
str_subset(df.names,regex_expr1)
## [1] "alisha" "anna" "bob" "ada" "oso" "ele"
regex_expr2 <-"([A-Za-z][A-Za-z]).*\\1"
str_subset(df.names,regex_expr2)
## [1] "church" "sense" "banana" "pepperoni" "soso"
## [6] "bandana" "Mississippi" "dalda"
regex_expr3 <-"([A-Za-z]).*\\1.*\\1"
str_subset(df.names,regex_expr3)
## [1] "farhana" "sahana" "eleven" "bubble" "banana"
## [6] "pepperoni" "bandana" "Mississippi" "conscience"