607 W3

majorDS <- read.csv(file="https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv",header = TRUE, sep=",")

1

Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

key='DATA|STATISTICS'
majorDnS <- majorDS$Major[grep(key, majorDS$Major)]
majorDnS

## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"     
## [3] "STATISTICS AND DECISION SCIENCE"

2

Write code that transforms the data below: 1 “bell pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry” Into a format like this: c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”) Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

fruitDF <- data.frame(c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry"))
cat(paste(fruitDF), collapse=",")

## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry") ,

3

Describe, in words, what these expressions will match:

“(.)\1\1” is not work with only one , if “(.)\1\1” means match any 3 same characters that next to each other.
“(.)(.)\2\1” is matching any 2 character and they in reverse order.
(..)\1 is not work with only one ‘\’, if “(..)\1” means repeat any 2 characters.
“(.).\1.\1” is match any character 3 times and must be separated by one character.
“(.)(.)(.).*\3\2\1” is matching any 3 character and they in reverse order.

library("stringr")
exampleDS <-c("lool", "aa", "aaa", "aaaa", "is is", "aa", "lolol", "abccba", "abcddcba", "aaaalaaa", "aaaaaal", "aaaaaa", "abccab", "abb", "aba", "awww")

#(.)\1\1"
expression ="(.)\\1\\1"
result <- str_subset(exampleDS,expression)
result

## [1] "aaa"      "aaaa"     "aaaalaaa" "aaaaaal"  "aaaaaa"   "awww"

#"(.)(.)\\2\\1"
expression ="(.)(.)\\2\\1"
result <- str_subset(exampleDS,expression )
result

## [1] "lool"     "aaaa"     "abccba"   "abcddcba" "aaaalaaa" "aaaaaal"  "aaaaaa"

#"(..)\1"
expression ="(..)\\1"
result <- str_subset(exampleDS,expression )
result

## [1] "aaaa"     "lolol"    "aaaalaaa" "aaaaaal"  "aaaaaa"

#(.).\\1.\\1"
expression ="(.).\\1.\\1"
result <- str_subset(exampleDS,expression )
result

## [1] "lolol"    "aaaalaaa" "aaaaaal"  "aaaaaa"

#"(.)(.)(.).*\\3\\2\\1"
expression ="(.)(.)(.).*\\3\\2\\1"
result <- str_subset(exampleDS,expression )
result

## [1] "abccba"   "abcddcba" "aaaalaaa" "aaaaaal"  "aaaaaa"

4

Construct regular expressions to match words that:

Start and end with the same character.
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)

sampleDS <-c("eye", "dad", "mom", "prep", "lateral", "pulp", "pop", "madam", "treat", "tent", "essence", "nose", "stress" , "high" , "tall", "call", "zeus", "church", "chicken", "chocolate", "banana", "lululemon")

#Start and end with the same character.
expression ="^(.)(.*\\1$)"
result <- str_subset(sampleDS,expression )
result

##  [1] "eye"     "dad"     "mom"     "prep"    "lateral" "pulp"    "pop"    
##  [8] "madam"   "treat"   "tent"    "essence" "stress"  "high"

#Contain a repeated pair of letters (e.g. "church" contains "ch" repeated twice.)
expression="([A-Za-z][A-Za-z]).*\\1"
result <- str_subset(sampleDS,expression )
result

## [1] "church"    "banana"    "lululemon"

#Contain one letter repeated in at least three places (e.g. "eleven" contains three "e"s.)
expression="([A-Za-z]).*\\1.*\\1"
result <- str_subset(sampleDS,expression )
result

## [1] "essence"   "stress"    "banana"    "lululemon"

607 W3

Benson

2/19/2022

1

2

3

4