First step is to load my packages:
library(dplyr)
library(ggplot2)
library(stringr)
Now let’s begin
Let’s load the data!
degrees <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv",header = TRUE, sep = ",")
str(degrees)
## 'data.frame': 174 obs. of 3 variables:
## $ FOD1P : chr "1100" "1101" "1102" "1103" ...
## $ Major : chr "GENERAL AGRICULTURE" "AGRICULTURE PRODUCTION AND MANAGEMENT" "AGRICULTURAL ECONOMICS" "ANIMAL SCIENCES" ...
## $ Major_Category: chr "Agriculture & Natural Resources" "Agriculture & Natural Resources" "Agriculture & Natural Resources" "Agriculture & Natural Resources" ...
Now we have to look for all the majors that contain “DATA” or “STATISTICS”
grep(pattern = 'data|statistics',degrees$Major, value = TRUE, ignore.case = TRUE)
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"
## [3] "STATISTICS AND DECISION SCIENCE"
And there are only three, how disappointing!
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
So, as I understand the question, it wants me to create a character string and then print it out like I am creating a string.
The only tricky part, which took more time than I am proud to admit to figure out, is that you need a paste within a paste, ie ‘Paste Inception’
l <- c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
x<- paste('c(', paste('"',l,'"',sep = "", collapse = ','), sep = "",')')
writeLines(x)
## c("bell pepper","bilberry","blackberry","blood orange","blueberry","cantaloupe","chili pepper","cloudberry","elderberry","lime","lychee","mulberry","olive","salal berry")
“(.)\1\1”
It will look for the first letter (that does not start on a new line) at the start and see if it repeats twice afterwards
“(.)(.)\2\1”
I will look at the first two starting letters (that do not start on a new line) and see if something matches the inverse
“(..)\1”
It will at the first two letters (that do not start on a new line) and see if something repeats
“(.).\1.\1”
It will look at the first letter (that does not start on a new line), a character after it, the first character again, a character after it, and then the first character again
"(.)(.)(.).*\3\2\1"
it will look at the first three letters (that do not start on a new line), see if there is a number after it repeated 0 or more times, then see if there is something that matches the first pattern in the inverse
I was struggling to figure out this code, so I broke it down:
y<- c("bob", "toot", "gag", "oo")
str_view(y, "^(.)((.*\\1$)|\\1$)")
The hint on this one is that it asks specifically for a letter. I first developed it as if i were looking for any character 1. Find the pair of letters ‘([A-Za-z][A-Za-z])’ - Note, the letters specification needs to be in parentheses 2. Look through all the text ’.*’ 3. Find a match ‘\1’
yy<-c("church", "toto", "yoyo","appropriate")
str_view(yy, "([A-Za-z][A-Za-z]).*\\1")
Again, notice the Letters!
z<-c("eleven", "believe", "tomorrow","individual")
str_view(z, "([A-Za-z]).*\\1.*\\1")