library (tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
majors <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/all-ages.csv")
Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
majors_with_data_or_stats <- majors %>%
filter(str_detect(Major, regex("DATA|STATISTICS", ignore_case = TRUE)))
print(majors_with_data_or_stats)
## Major_code Major
## 1 2101 COMPUTER PROGRAMMING AND DATA PROCESSING
## 2 3702 STATISTICS AND DECISION SCIENCE
## 3 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS
## Major_category Total Employed Employed_full_time_year_round
## 1 Computers & Mathematics 29317 22828 18747
## 2 Computers & Mathematics 24806 18808 14468
## 3 Business 156673 134478 118249
## Unemployed Unemployment_rate Median P25th P75th
## 1 2265 0.09026422 60000 40000 85000
## 2 1138 0.05705405 70000 43000 102000
## 3 6186 0.04397714 72000 50000 100000
Write code that transforms the data below: [1] “bell pepper”
“bilberry” “blackberry” “blood orange” [5] “blueberry” “cantaloupe”
“chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry” Into a format like this: c(“bell pepper”,
“bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”,
“chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”,
“mulberry”, “olive”, “salal berry”)
## [1] "bell pepper" "bilberry" "blackberry" "blood orange" "blueberry"
## [6] "cantaloupe" "chili pepper" "cloudberry" "elderberry" "lime"
## [11] "lychee" "mulberry" "olive" "salal berry"
Describe, in words, what these expressions will match:
(..)\1
This references two characters (.)followed by the same characters. An extra backslash was need as to being interpreted as a literal backslash
text3 <- "aedaddaxyzabab"
str_view(text3, "(..)\\1", match = TRUE)
## [1] │ aedaddaxyz<abab>
“(.).\1.\1”
The reference is looking for a sequence where there is a character, followed by any character, followed by any character, followed by the same character as the first one, and then followed by any character, with the first character repeating once more in the sequence.
text4 <- "dcbabazadcbsdsms"
str_view(text4, "(.).\\1.\\1", match = TRUE)
## [1] │ dcb<abaza>dcb<sdsms>
“(.)(.)(.).*\3\2\1”
The reference matches a sequence in a string where the first three characters are followed by any characters and then the same three characters appear in reverse order
text5 <- "stuwyqabczcbaioeuo"
str_view(text5, "(.)(.)(.).*\\3\\2\\1", match = TRUE)
## [1] │ stuwyq<abczcba>ioeuo
Construct regular expressions to match words that:
Start and end with the same character.
words <- c("banana", "apple", "cherry", "radar", "level", "pop")
str_detect(words,"^(.).*\\1$")
## [1] FALSE FALSE FALSE TRUE TRUE TRUE
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
letters <- c("church", "success", "committee", "occurred", "aggressive", "necessary")
str_detect(letters,"(..).*\\1")
## [1] TRUE FALSE FALSE FALSE FALSE FALSE
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
letters2 <- c("church", "success", "committee", "occurred", "aggressive", "necessary", "eleven","accelerate" )
str_detect(letters2,"(.).*\\1.*\\1.*")
## [1] FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE