library(stringr)
url<-"https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"
major<-read.csv(url)
major<-data.frame(major)
head(major)
## FOD1P Major Major_Category
## 1 1100 GENERAL AGRICULTURE Agriculture & Natural Resources
## 2 1101 AGRICULTURE PRODUCTION AND MANAGEMENT Agriculture & Natural Resources
## 3 1102 AGRICULTURAL ECONOMICS Agriculture & Natural Resources
## 4 1103 ANIMAL SCIENCES Agriculture & Natural Resources
## 5 1104 FOOD SCIENCE Agriculture & Natural Resources
## 6 1105 PLANT SCIENCE AND AGRONOMY Agriculture & Natural Resources
str(major)
## 'data.frame': 174 obs. of 3 variables:
## $ FOD1P : chr "1100" "1101" "1102" "1103" ...
## $ Major : chr "GENERAL AGRICULTURE" "AGRICULTURE PRODUCTION AND MANAGEMENT" "AGRICULTURAL ECONOMICS" "ANIMAL SCIENCES" ...
## $ Major_Category: chr "Agriculture & Natural Resources" "Agriculture & Natural Resources" "Agriculture & Natural Resources" "Agriculture & Natural Resources" ...
# There are 174 observations and 3 variables.
data_stat<-grep("DATA|STATISTICS", major$Major, value=F ,ignore.case = TRUE )
data_stat
## [1] 44 52 59
# row 44,52,59 are the majors have data or statistics
major[c(44,52,59),]
## FOD1P Major Major_Category
## 44 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 52 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 59 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
There are 3 majors contain either DATA or STATISTICS, included “Management Information Systems and Statistics”,“Computer Programming and Data Processing”,“Statistics and Decision Science”.
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
fruit<-c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
dput(fruit)
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry",
## "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime",
## "lychee", "mulberry", "olive", "salal berry")
The two exercises below are taken from R for Data Science, 14.3.5.1 in the on-line version: ### 3 Describe, in words, what these expressions will match:
(.)\1\1
The same letters repeats 3times.
“(.)(.)\2\1” two pair letters repeats in symmetrical reverse.
(..)\1 two pair letters repeats twice.
“(.).\1.\1” the same letter repeats 3 times, and separated by any letter.
"(.)(.)(.).*\3\2\1"
three pair letters repeats in symmetrical reverse.
Start and end with the same character. Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
library(stringr)
data("fruit")
data<-data.frame(fruit)
head(fruit)
## [1] "apple" "apricot" "avocado" "banana" "bell pepper"
## [6] "bilberry"
The fruit data from stringr library is used for this problem set.
# Start and end with the same character.
str_view(fruit, "^(.).*\1$", match = TRUE)
# it seems no character in the fruit data set has the same letter start and end.
# Contain a repeated pair of letters (e.g. "church" contains "ch" repeated twice.)
str_view(fruit, "(..).*\\1", match = TRUE)
# Contain one letter repeated in at least three places (e.g. "eleven" contains three "e"s.)
str_view(fruit, "(.).*\\1.*\\1", match = TRUE)