# Loading data sets for the college majors
majors<-read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv")
grep(pattern = 'STATISTICS|DATA', majors$Major, value = TRUE, ignore.case = TRUE)
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"
## [3] "STATISTICS AND DECISION SCIENCE"
Two results produced STATISTICS and one with DATA.
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
fruitveggie<-'[1] "bell pepper" "bilberry" "blackberry" "blood orange"
[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
[9] "elderberry" "lime" "lychee" "mulberry"
[13] "olive" "salal berry"'
fruitveggie
## [1] "[1] \"bell pepper\" \"bilberry\" \"blackberry\" \"blood orange\"\n\n[5] \"blueberry\" \"cantaloupe\" \"chili pepper\" \"cloudberry\" \n\n[9] \"elderberry\" \"lime\" \"lychee\" \"mulberry\" \n\n[13] \"olive\" \"salal berry\""
#I created an empty vector of fruitveg and I split the elements using the strsplit function by removing the "\" string.
fruitveg<-vector()
splitfruitveggie <- strsplit(fruitveggie,"\"")[[1]]
fruitveg<-splitfruitveggie[c(FALSE,TRUE)]
fruitveg
## [1] "bell pepper" "bilberry" "blackberry" "blood orange" "blueberry"
## [6] "cantaloupe" "chili pepper" "cloudberry" "elderberry" "lime"
## [11] "lychee" "mulberry" "olive" "salal berry"
The two exercises below are taken from R for Data Science, 14.3.5.1 in the on-line version:
(.)\1\1
"(.)(.)\\2\\1"
(..)\1
"(.).\\1.\\1"
"(.)(.)(.).*\\3\\2\\1"
(.)\1\1 matches the same characters that appear three times in a row. For example, “bbb”.
“(.)(.)\2\1” matches a pair of characters, followed by the same pair in reverse order. For example, “cddc”
(..)\1 matches any two characters repeated. For example, “f6f6”.
“(.).\1.\1” matches a character, followed by any character, then the original, then any character, and finally the original character. For example, “abada”.
**"(.)(.)(.).*\3\2\1“** selects three characters, followed by zero or more characters produced in reversed order. For example,”hij6jih".
Start and end with the same character.
str_subset(words, "^(.)((.*\\1$|1?$)")
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
str_subset(words, "([A-Za-z][A-Za-z]).*\\1")