library(tidyverse)
library(dplyr)
library(stringr)
Let’s use the data from the College Majors data set and sort for the majors that includes words like “data” and “statistics”.
I will use dpylr’s filter function and grepl to sort for those specific values. Grepl will be our pattern matcher as data and statistics might not be the first term of the major, so we cannot use exact matches like Major==“Data”.
major.data<-read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv",header = TRUE)
major.data<-major.data%>%filter(grepl("DATA",major.data$Major)|grepl("STATISTICS",major.data$Major))
print(major.data)
## FOD1P Major Major_Category
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
For this exercise, we are given a vector of fruits and we must format the vector into a string “c(ourlist)”. First, I realized each item needed double quotes before the vector concatenation. I couldn’t escape the double quote ["] like [']and use str_c. So, I will use single quotes as it works for now.
fruit<-c("bell pepper","bilberry","blackberry","blood orange","blueberry","cantaloupe","chili pepper","cloudberry","elderberry","lime","lychee","mulberry","olive","berry")
fruit<-str_c("\'",fruit,"\'")
fruit<-str_c(fruit,collapse = ",")
fruit<-str_c("c(",fruit,")")
print(fruit)
## [1] "c('bell pepper','bilberry','blackberry','blood orange','blueberry','cantaloupe','chili pepper','cloudberry','elderberry','lime','lychee','mulberry','olive','berry')"
(.)\1\1 This pattern detects a character repeated three times. mmmary -> [mmm]
“(.)(.)\2\1” The pattern detects the first character followed by the second group character, the second group repeats and the first group’s character follows. wowo->[wo]
“(..)\1” The pattern detects two characters are repeated twice. banana->[anan]
“(.).\1.\1” The pattern detects the group term then another term, then the group term repeated, another term and the group term is repeated again. lalol->[l-l-l]
“(.)(.)(.).*\3\2\1” The pattern detects group one, group two, and group three. There can be 0 or more characters after the third group. Then, the characters repeat in the reverse order. pawxwap->[paw–wap]
Start and end with the same character.
str_view(c("wow","not","me"),"^(.).*\\1$")
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
str_view(c("church","banana","play"),"([a-z][a-z]).*\\1")
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
str_view("eleven","(.).\\1.\\1")