library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(stringr)
majors_raw <- read.csv(url("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"))
majors_frame <- as.data.frame(majors_raw)
desired_majors <- majors_frame %>%
filter(str_detect(Major, regex("DATA|STATISTICS")))
desired_majors
## FOD1P Major Major_Category
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
[1] “bell pepper” “bilberry” “blackberry” “blood orange” [5]
“blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
library(stringr)
fruits <- str_flatten(c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry"), "', '")
fruits
## [1] "bell pepper', 'bilberry', 'blackberry', 'blood orange', 'blueberry', 'cantaloupe', 'chili pepper', 'cloudberry', 'elderberry', 'lime', 'lychee', 'mulberry', 'olive', 'salal berry"
# This seems like we'd build a list of some input of fruits, and place them in a character vector
(.)\1\1 This will match all words that contain 1 specific character (not a newline) repeating at least 3 times. Since the . is in the capturing parenthesis (), we can refer back to it using where N is some number 1-9. The \1 just checks to see if it repeats 1 time, and the second \1 checks to see if it repeats again. In short, this Regex checks to see if some case-sensitive character is repeating in multiples of 3 times. For example; AA will not match, but AAA will match. AAaAA will not match, but AAAaAAA will match. AAAA will not match, as there are 4 characters, but we should have 3 identical characters repeating in multiples of 3- AAABBBCCC will match.
“(.)(.)\2\1” Since there’s quotes, we have to use the double-slash, so this is effectively asking for the (.)(.)\2\1 expression. This expression takes 2 characters, and then repeats them. So for example, abba will match, as a is the first (.), and b is the second (.), and the expression will look for b to be the 3rd character, and a to be the 4th. Another match would be cddce, as the 1st and 4th character match, as do the 2nd and 3rd.
(..)\1 This expression looks to see if there are 2 characters that repeat 1 or more times in a pair. For example, ABAB matches as there is a pair of AB’s. ABABAB does not match since there is an ABAB and AB without another AB. ABABABAB will match as there are 2 pairs of ABAB. It’s like looking to see if there are an even number of couples, where the A’s are dating B’s. It simply takes 2 characters, and repeats it.
(.).\1.\1 This expression takes some character, and makes it repeat again in the 3rd and 5th position of some expression, where the other characters don’t matter. For example, abaca is a match, as is ofogoa.
(.)(.)(.).*\3\2\1 This expression looks at some expression that is at least 6 characters long, and checks to see if the first 3 characters show up in reverse order at some point in the word. Matches include abccba,abcddddcba, and abctyfgtyuuyghcba.
Start and end with the same character. ^(.).*\1$
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) (.)(.).*?\1\2
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s. ((.)*?){3,}