Problem 1
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
download.file('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv', 'data.txt')
df <- read.csv('data.txt')
data_statistics <- df %>% filter(str_detect(Major, 'DATA|STATISTICS'))
data_statistics
## FOD1P Major Major_Category
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
Problem 2
test <- "[1] \"bell pepper\" \"bilberry\" \"blackberry\""
res <- str_extract_all(test, '"[a-z ]+"')
res <- str_split(res[[1]], '" "')
final <- unlist(res, use.names = FALSE)
final
## [1] "\"bell pepper\"" "\"bilberry\"" "\"blackberry\""
This looks a little funky, but when printed will return the strings as asked, since the extra " will disappear when printed.
Problem 3
test_cases = c('aaa', 'abba', 'abab', 'aaaa', 'abaca', 'abccba', 'acdc', 'acdddddca')
a <- '(.)\\1\\1' ##I have added extra slashes to escape the quotes
b <- "(.)(.)\\2\\1"
c <- '(..)\\1' ##Again, added a slash to escape the quotes
d <- "(.).\\1.\\1"
e <- "(.)(.)(.).*\\3\\2\\1"
str_view_all(test_cases, e)
(.)\1\1 will match any character repeated 3 times.
“(.)(.)\2\1” will match the pattern abba where a and b are any character.
(..)\1 will match a two character pattern repeated, such as abab
“(.).\1.\1” will match any 2 characters, followed by the first character, followed by another character, followed by the first character, such as abaca
"(.)(.)(.).*\3\2\1" will match at least 3 characters and then end with the first 3 characters in reverse order such as abcddddcba
Problem 4
test_cases <- c('arma', 'alabama', 'church', 'chinchilla', 'mississippi', 'eleven')
a <- '^(.).*\\1$'
b <- '.*(..).*\\1.*'
c <- '.*(.).*\\1.*\\1.*'
str_view_all(test_cases, c)
^(.).*\1$ (this makes the assumption that the word is at least 2 letters long)
.* (..).* \1.*
.* (.).* \1.* \1.*
Please note that the added spaces in these regex is not part of the regular expression. I had to add it or else R markdown would consider the *’s as formatting rather than part of the expression.