I used the following libraries in order to answer the below questions.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(stringr)
You can find the raw data used to answer this question here https://github.com/fivethirtyeight/data/blob/master/college-majors/majors-list.csv
college_majors <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv")
## Rows: 174 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): FOD1P, Major, Major_Category
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
college_majors
## # A tibble: 174 × 3
## FOD1P Major Major_Category
## <chr> <chr> <chr>
## 1 1100 GENERAL AGRICULTURE Agriculture & Natural Resources
## 2 1101 AGRICULTURE PRODUCTION AND MANAGEMENT Agriculture & Natural Resources
## 3 1102 AGRICULTURAL ECONOMICS Agriculture & Natural Resources
## 4 1103 ANIMAL SCIENCES Agriculture & Natural Resources
## 5 1104 FOOD SCIENCE Agriculture & Natural Resources
## 6 1105 PLANT SCIENCE AND AGRONOMY Agriculture & Natural Resources
## 7 1106 SOIL SCIENCE Agriculture & Natural Resources
## 8 1199 MISCELLANEOUS AGRICULTURE Agriculture & Natural Resources
## 9 1302 FORESTRY Agriculture & Natural Resources
## 10 1303 NATURAL RESOURCES MANAGEMENT Agriculture & Natural Resources
## # ℹ 164 more rows
college_majors |>
filter(str_detect(Major, "DATA|STATISTICS")) |>
count(Major, sort=TRUE)
## # A tibble: 3 × 2
## Major n
## <chr> <int>
## 1 COMPUTER PROGRAMMING AND DATA PROCESSING 1
## 2 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS 1
## 3 STATISTICS AND DECISION SCIENCE 1
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
Write code that transforms the data below:
[1] “bell pepper” “bilberry” “blackberry” “blood orange”
[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this:
c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
*I was slightly confused by the wording of this question. I assumed what was meant was to create a new string vector whose output would match the output of c(fruits).
output_string <- "[1] \"bell pepper\" \"bilberry\" \"blackberry\" \"blood orange\"\n[5] \"blueberry\" \"cantaloupe\" \"chili pepper\" \"cloudberry\"\n[9] \"elderberry\" \"lime\" \"lychee\" \"mulberry\"\n[13] \"olive\" \"salal berry\""
"OG String Vector:"
## [1] "OG String Vector:"
cat(output_string)
## [1] "bell pepper" "bilberry" "blackberry" "blood orange"
## [5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
## [9] "elderberry" "lime" "lychee" "mulberry"
## [13] "olive" "salal berry"
new_word_vec <- function(string) {
words <- str_extract_all(string, '"[a-zA-Z ]+"')[[1]]
words <- str_replace_all(words, '"', '')
return(words)
}
"Desired Output:"
## [1] "Desired Output:"
c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
## [1] "bell pepper" "bilberry" "blackberry" "blood orange" "blueberry"
## [6] "cantaloupe" "chili pepper" "cloudberry" "elderberry" "lime"
## [11] "lychee" "mulberry" "olive" "salal berry"
vec <- new_word_vec(output_string)
"Reorganized output:"
## [1] "Reorganized output:"
print(vec)
## [1] "bell pepper" "bilberry" "blackberry" "blood orange" "blueberry"
## [6] "cantaloupe" "chili pepper" "cloudberry" "elderberry" "lime"
## [11] "lychee" "mulberry" "olive" "salal berry"
Describe, in words, what these expressions will match:
(.)\1\1
A character is repeated three times (.) selects any character, then
repeated twice consecutively
“(.)(.)\2\1”
This would select four character palindromes where the first two
characters are randomly selected and then the third character is the
second selection group and the fourth is the first selection
group
(..)\1
Selects the two consecutive characters and then repeats that group of
characters once
“(.).\1.\1”
First character group and then selects a single character immediately
after, then it repeats the first selection group with the next trailing
character and then finally repeats the first selection group
again
“(.)(.)(.).\3\2\1”
Captures three characters, then “.’ captures 0-any
characters except for a new line (essentially the rest of the string),
finally the 3rd then 2nd and 1st capture group are repeated in that
order
str_view(words, "\\b(.).*\\1$")
## [36] │ <america>
## [49] │ <area>
## [209] │ <dad>
## [213] │ <dead>
## [223] │ <depend>
## [258] │ <educate>
## [266] │ <else>
## [268] │ <encourage>
## [270] │ <engine>
## [278] │ <europe>
## [283] │ <evidence>
## [285] │ <example>
## [287] │ <excuse>
## [288] │ <exercise>
## [291] │ <expense>
## [292] │ <experience>
## [296] │ <eye>
## [386] │ <health>
## [394] │ <high>
## [450] │ <knock>
## ... and 16 more
str_view(words, "(..).*\\1")
## [48] │ ap<propr>iate
## [152] │ <church>
## [181] │ c<ondition>
## [217] │ <decide>
## [275] │ <environmen>t
## [487] │ l<ondon>
## [598] │ pa<ragra>ph
## [603] │ p<articular>
## [617] │ <photograph>
## [638] │ p<repare>
## [641] │ p<ressure>
## [696] │ r<emem>ber
## [698] │ <repre>sent
## [699] │ <require>
## [739] │ <sense>
## [858] │ the<refore>
## [903] │ u<nderstand>
## [946] │ w<hethe>r
str_view(words, "(.)(.*\\1){2,}")
## [48] │ a<pprop>riate
## [62] │ <availa>ble
## [86] │ b<elieve>
## [90] │ b<etwee>n
## [119] │ bu<siness>
## [221] │ d<egree>
## [229] │ diff<erence>
## [233] │ di<scuss>
## [265] │ <eleve>n
## [275] │ e<nvironmen>t
## [283] │ <evidence>
## [288] │ <exercise>
## [291] │ <expense>
## [292] │ <experience>
## [423] │ <indivi>dual
## [598] │ p<aragra>ph
## [684] │ r<eceive>
## [696] │ r<emembe>r
## [698] │ r<eprese>nt
## [845] │ t<elephone>
## ... and 2 more