The Overview:
You can find the file on Github here
You can find the file on Rpubs here
#1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
Loading the required library
Getting the data from fivethirtyeight github
url <- ("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv")
data <- read.csv(url, sep = ",")
head(data)
## FOD1P Major Major_Category
## 1 1100 GENERAL AGRICULTURE Agriculture & Natural Resources
## 2 1101 AGRICULTURE PRODUCTION AND MANAGEMENT Agriculture & Natural Resources
## 3 1102 AGRICULTURAL ECONOMICS Agriculture & Natural Resources
## 4 1103 ANIMAL SCIENCES Agriculture & Natural Resources
## 5 1104 FOOD SCIENCE Agriculture & Natural Resources
## 6 1105 PLANT SCIENCE AND AGRONOMY Agriculture & Natural Resources
Getting the majors containing “DATA”
data$Major[grepl("DATA", data$Major)]
## [1] "COMPUTER PROGRAMMING AND DATA PROCESSING"
Getting the majors containing “DATA”
data$Major[grepl("STATISTICS", data$Major)]
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "STATISTICS AND DECISION SCIENCE"
2 Write code that transforms the data below:
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
fruits <- '[1] "bell pepper" "bilberry" "blackberry" "blood orange"
[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
[9] "elderberry" "lime" "lychee" "mulberry"
[13] "olive" "salal berry"'
fruits
## [1] "[1] \"bell pepper\" \"bilberry\" \"blackberry\" \"blood orange\"\n\n[5] \"blueberry\" \"cantaloupe\" \"chili pepper\" \"cloudberry\" \n\n[9] \"elderberry\" \"lime\" \"lychee\" \"mulberry\" \n\n[13] \"olive\" \"salal berry\""
Here we use the str extract for all of the fruits then we join them with comma separator
fruits_string <- str_extract_all(fruits,pattern = '[A-Za-z]+.?[A-Za-z]+')
fruits <- writeLines(str_c(fruits_string, collapse =", "))
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
The two exercises below are taken from R for Data Science, 14.3.5.1 in the on-line version:
3 Describe, in words, what these expressions will match:
(.)\1\1 This is to get one character with two repetitions like “AAA”.
“(.)(.)\2\1” This is to get two characters repeated in a reverse way like “ABBA”
(..)\1 This is to get two characters repeated like “ABAB”
“(.).\1.\1” This is to get 5 characters and three of them are the same like “ABACA”
**"(.)(.)(.).*\3\2\1"** This is to get a number of characters begin and end with the same characters in a reverse way like “ABC42342CBA”
4 Construct regular expressions to match words that:
- Start and end with the same character. The answer: "(.).*\1"
data <- c("church", "individual", "phillip")
str_view(data, "^(.).*\\1$", match = TRUE)
- Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) The answer: "(..).*\1"
data <- c("church", "individual", "phillip")
str_view(data, "^(..).*\\1$", match = TRUE)
- Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.) The answer: “(.).\1.\1”
data <- c("church", "individual", "phillip")
str_view(data, "(.).*\\1.*\\1", match = TRUE)
…
LS0tCnRpdGxlOiAiUiBDaGFyYWN0ZXIgTWFuaXB1bGF0aW9uIGFuZCBEYXRlIFByb2Nlc3NpbmciCmF1dGhvcjogIkthcmltIEhhbW1vdWQiCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCIKb3V0cHV0OiBvcGVuaW50cm86OmxhYl9yZXBvcnQKLS0tCgojIyBUaGUgT3ZlcnZpZXc6CgpZb3UgY2FuIGZpbmQgdGhlIGZpbGUgb24gW0dpdGh1YiBoZXJlXShodHRwczovL2dpdGh1Yi5jb20vYWthcmltaGFtbW91ZC82MDctRGF0YS1BY3F1aXNpdGlvbi1hbmQtTWFuYWdlbWVudC1DVU5ZLVNQUy1GYWxsMjAyMC90cmVlL21hc3Rlci9XMyUyMC0lMjBSJTIwQ2hhcmFjdGVyJTIwTWFuaXB1bGF0aW9uJTIwYW5kJTIwRGF0ZSUyMFByb2Nlc3NpbmcpCgpZb3UgY2FuIGZpbmQgdGhlIGZpbGUgb24gW1JwdWJzIGhlcmVdKGh0dHBzOi8vcnB1YnMuY29tL2thcmltN21vZC82NTk3MTkpIAoKIzEuIFVzaW5nIHRoZSAxNzMgbWFqb3JzIGxpc3RlZCBpbiBmaXZldGhpcnR5ZWlnaHQuY29t4oCZcyBDb2xsZWdlIE1ham9ycyBkYXRhc2V0IFtodHRwczovL2ZpdmV0aGlydHllaWdodC5jb20vZmVhdHVyZXMvdGhlLWVjb25vbWljLWd1aWRlLXRvLXBpY2tpbmctYS1jb2xsZWdlLW1ham9yL10sIHByb3ZpZGUgY29kZSB0aGF0IGlkZW50aWZpZXMgdGhlIG1ham9ycyB0aGF0IGNvbnRhaW4gZWl0aGVyICJEQVRBIiBvciAiU1RBVElTVElDUyIKCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpCmBgYAoKCiMjIyBMb2FkaW5nIHRoZSByZXF1aXJlZCBsaWJyYXJ5CgpgYGB7cn0KbGlicmFyeShzdHJpbmdyKQpgYGAKCgojIyMgR2V0dGluZyB0aGUgZGF0YSBmcm9tIGZpdmV0aGlydHllaWdodCBnaXRodWIKCmBgYHtyfQp1cmwgPC0gKCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZml2ZXRoaXJ0eWVpZ2h0L2RhdGEvbWFzdGVyL2NvbGxlZ2UtbWFqb3JzL21ham9ycy1saXN0LmNzdiIpCgpkYXRhIDwtIHJlYWQuY3N2KHVybCwgc2VwID0gIiwiKQoKaGVhZChkYXRhKQpgYGAKCiMjIyBHZXR0aW5nIHRoZSBtYWpvcnMgY29udGFpbmluZyAiREFUQSIgCgpgYGB7cn0KZGF0YSRNYWpvcltncmVwbCgiREFUQSIsIGRhdGEkTWFqb3IpXQpgYGAKCiMjIyBHZXR0aW5nIHRoZSBtYWpvcnMgY29udGFpbmluZyAiREFUQSIgCgpgYGB7cn0KZGF0YSRNYWpvcltncmVwbCgiU1RBVElTVElDUyIsIGRhdGEkTWFqb3IpXQpgYGAKCgojIyAyIFdyaXRlIGNvZGUgdGhhdCB0cmFuc2Zvcm1zIHRoZSBkYXRhIGJlbG93OgoKWzFdICJiZWxsIHBlcHBlciIgICJiaWxiZXJyeSIgICAgICJibGFja2JlcnJ5IiAgICJibG9vZCBvcmFuZ2UiCgpbNV0gImJsdWViZXJyeSIgICAgImNhbnRhbG91cGUiICAgImNoaWxpIHBlcHBlciIgImNsb3VkYmVycnkiICAKCls5XSAiZWxkZXJiZXJyeSIgICAibGltZSIgICAgICAgICAibHljaGVlIiAgICAgICAibXVsYmVycnkiICAgIAoKWzEzXSAib2xpdmUiICAgICAgICAic2FsYWwgYmVycnkiCgpJbnRvIGEgZm9ybWF0IGxpa2UgdGhpczoKCmMoImJlbGwgcGVwcGVyIiwgImJpbGJlcnJ5IiwgImJsYWNrYmVycnkiLCAiYmxvb2Qgb3JhbmdlIiwgImJsdWViZXJyeSIsICJjYW50YWxvdXBlIiwgImNoaWxpIHBlcHBlciIsICJjbG91ZGJlcnJ5IiwgImVsZGVyYmVycnkiLCAibGltZSIsICJseWNoZWUiLCAibXVsYmVycnkiLCAib2xpdmUiLCAic2FsYWwgYmVycnkiKQoKCmBgYHtyfQpmcnVpdHMgPC0gJ1sxXSAiYmVsbCBwZXBwZXIiICAiYmlsYmVycnkiICAgICAiYmxhY2tiZXJyeSIgICAiYmxvb2Qgb3JhbmdlIgoKWzVdICJibHVlYmVycnkiICAgICJjYW50YWxvdXBlIiAgICJjaGlsaSBwZXBwZXIiICJjbG91ZGJlcnJ5IiAgCgpbOV0gImVsZGVyYmVycnkiICAgImxpbWUiICAgICAgICAgImx5Y2hlZSIgICAgICAgIm11bGJlcnJ5IiAgICAKClsxM10gIm9saXZlIiAgICAgICAgInNhbGFsIGJlcnJ5IicKCmZydWl0cwpgYGAKSGVyZSB3ZSB1c2UgdGhlIHN0ciBleHRyYWN0IGZvciBhbGwgb2YgdGhlIGZydWl0cyB0aGVuIHdlIGpvaW4gdGhlbSB3aXRoIGNvbW1hIHNlcGFyYXRvcgoKYGBge3Igd2FybmluZyA9IEZBTFNFfQpmcnVpdHNfc3RyaW5nIDwtIHN0cl9leHRyYWN0X2FsbChmcnVpdHMscGF0dGVybiA9ICdbQS1aYS16XSsuP1tBLVphLXpdKycpCgpmcnVpdHMgPC0gd3JpdGVMaW5lcyhzdHJfYyhmcnVpdHNfc3RyaW5nLCBjb2xsYXBzZSA9IiwgIikpCmBgYAoKClRoZSB0d28gZXhlcmNpc2VzIGJlbG93IGFyZSB0YWtlbiBmcm9tIFIgZm9yIERhdGEgU2NpZW5jZSwgMTQuMy41LjEgaW4gdGhlIG9uLWxpbmUgdmVyc2lvbjoKCiMjIDMgRGVzY3JpYmUsIGluIHdvcmRzLCB3aGF0IHRoZXNlIGV4cHJlc3Npb25zIHdpbGwgbWF0Y2g6CgoqKiguKVwxXDEqKgpUaGlzIGlzIHRvIGdldCBvbmUgY2hhcmFjdGVyIHdpdGggdHdvIHJlcGV0aXRpb25zIGxpa2UgIkFBQSIuCgoqKiIoLikoLilcXDJcXDEiKioKVGhpcyBpcyB0byBnZXQgdHdvIGNoYXJhY3RlcnMgcmVwZWF0ZWQgaW4gYSByZXZlcnNlIHdheSBsaWtlICJBQkJBIiAKCioqKC4uKVwxKioKVGhpcyBpcyB0byBnZXQgdHdvIGNoYXJhY3RlcnMgcmVwZWF0ZWQgbGlrZSAiQUJBQiIKCioqIiguKS5cXDEuXFwxIioqClRoaXMgaXMgdG8gZ2V0IDUgY2hhcmFjdGVycyBhbmQgdGhyZWUgb2YgdGhlbSBhcmUgdGhlIHNhbWUgbGlrZSAiQUJBQ0EiCgoqKiIoLikoLikoLikuKlxcM1xcMlxcMSIqKgpUaGlzIGlzIHRvIGdldCBhIG51bWJlciBvZiBjaGFyYWN0ZXJzIGJlZ2luIGFuZCBlbmQgd2l0aCB0aGUgc2FtZSBjaGFyYWN0ZXJzIGluIGEgcmV2ZXJzZSB3YXkgbGlrZSAiQUJDNDIzNDJDQkEiCgoKCiMjIDQgQ29uc3RydWN0IHJlZ3VsYXIgZXhwcmVzc2lvbnMgdG8gbWF0Y2ggd29yZHMgdGhhdDoKCiogU3RhcnQgYW5kIGVuZCB3aXRoIHRoZSBzYW1lIGNoYXJhY3Rlci4KVGhlIGFuc3dlcjogIiguKS4qXDEiCgpgYGB7cn0KZGF0YSA8LSBjKCJjaHVyY2giLCAiaW5kaXZpZHVhbCIsICJwaGlsbGlwIikKc3RyX3ZpZXcoZGF0YSwgIl4oLikuKlxcMSQiLCBtYXRjaCA9IFRSVUUpCmBgYAoKCiogQ29udGFpbiBhIHJlcGVhdGVkIHBhaXIgb2YgbGV0dGVycyAoZS5nLiAiY2h1cmNoIiBjb250YWlucyAiY2giIHJlcGVhdGVkIHR3aWNlLikKVGhlIGFuc3dlcjogIiguLikuKlxcMSIKCmBgYHtyfQpkYXRhIDwtIGMoImNodXJjaCIsICJpbmRpdmlkdWFsIiwgInBoaWxsaXAiKQpzdHJfdmlldyhkYXRhLCAiXiguLikuKlxcMSQiLCBtYXRjaCA9IFRSVUUpCmBgYAoKCiogQ29udGFpbiBvbmUgbGV0dGVyIHJlcGVhdGVkIGluIGF0IGxlYXN0IHRocmVlIHBsYWNlcyAoZS5nLiAiZWxldmVuIiBjb250YWlucyB0aHJlZSAiZSJzLikKVGhlIGFuc3dlcjogIiguKS4qXFwxLipcXDEiCgpgYGB7cn0KZGF0YSA8LSBjKCJjaHVyY2giLCAiaW5kaXZpZHVhbCIsICJwaGlsbGlwIikKc3RyX3ZpZXcoZGF0YSwgIiguKS4qXFwxLipcXDEiLCBtYXRjaCA9IFRSVUUpCmBgYAouLi4=