#1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
library(RCurl)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data_link <- getURL("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv")
data <- read.csv(text = data_link)
head(data)
data %>% filter(grepl('DATA|STATISTICS', Major))
#2 Write code that transforms the data below: [1] “bell pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry” Into a format like this: c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”) The two exercises below are taken from R for Data Science, 14.3.5.1 in the on-line version:
On #2, you should treat the lines starting with [1] “bell pepper” as an input. In a way, you’re reverse-engineering the code from its output.
fruits <- c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
fruits
## [1] "bell pepper" "bilberry" "blackberry" "blood orange" "blueberry"
## [6] "cantaloupe" "chili pepper" "cloudberry" "elderberry" "lime"
## [11] "lychee" "mulberry" "olive" "salal berry"
dput(as.character(fruits))
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry",
## "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime",
## "lychee", "mulberry", "olive", "salal berry")
#3 Describe, in words, what these expressions will match:
(.)\1\1 ; The same character appears three times in a row.
“(.)(.)\2\1” : Two characters side by side, in reverse order. Example, lool
(..)\1 : Two characters repeated
“(.).\1.\1” : AXAYA - A character, any character, the first character, a different character from the second, and the first character again.
“(.)(.)(.).*\3\2\1” : Three characters, then at least 0 characters, then the first three characters in reverse.
#4 Construct regular expressions to match words that:
Start and end with the same character : ^(.)((.*\1\()|\\1?\))
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) : Since here it specifies letters, we can no longer use (.) and instead need to use ASCII ([A-Za-z][A-Za-z]).*\1
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.) : ([A-Za-z]).\1.\1