Question 1

We are examining the list of college majors from the FiveThiryEight article The Economic Guide To Picking A College Major.

Below we import the data from the FiveThirtyEight Github

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readr)

fileURL = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv'

majorsDF = read.csv((url(fileURL)))

We are interested in finding majors that contain the phrases “DATA” or “STATISTICS”

majorsOfInterest <- grep("DATA|STATISTICS", majorsDF$Major, value = TRUE, ignore.case = FALSE)

print(majorsOfInterest)

## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"     
## [3] "STATISTICS AND DECISION SCIENCE"

Question 2

We are interested in transforming the below vector into one line of output that is separated by commas

foodVector <- c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")

str_flatten_comma(foodVector)

## [1] "bell pepper, bilberry, blackberry, blood orange, blueberry, cantaloupe, chili pepper, cloudberry, elderberry, lime, lychee, mulberry, olive, salal berry"

Question 3

Describe, in words, what these expressions will match:

(.)\1\1

find a string that has the same charter three times in a row

none of the strings in fruit or words fill the requirement

str_view(fruit, "(.)\\1\\1")
str_view(words, "(.)\\1\\1")

str_view(c('abbba', 'goooal', 'ball'), "(.)\\1\\1") # returns only 'abbba' and 'goooal' with the repeating characters highlighted

## [1] │ a<bbb>a
## [2] │ g<ooo>al

“(.)(.)\\2\\1”

find a string that has two characters followed by the same two characters but in reverse order

str_view(words, "(.)(.)\\2\\1")

##  [19] │ after<noon>
##  [43] │ <appa>rent
##  [53] │ <arra>nge
## [107] │ b<otto>m
## [112] │ br<illi>ant
## [174] │ c<ommo>n
## [230] │ d<iffi>cult
## [259] │ <effe>ct
## [329] │ f<ollo>w
## [422] │ in<deed>
## [470] │ l<ette>r
## [521] │ m<illi>on
## [581] │ <oppo>rtunity
## [582] │ <oppo>se
## [877] │ tom<orro>w

(..)\1

find a string that has the same two characters repeating

str_view(words,"(..)\\1")

## [696] │ r<emem>ber

str_view(fruit,"(..)\\1")

##  [4] │ b<anan>a
## [20] │ <coco>nut
## [22] │ <cucu>mber
## [41] │ <juju>be
## [56] │ <papa>ya
## [73] │ s<alal> berry

“(.).\\1.\\1”
- find a string that has a character, followed by a different character, followed by the original character, followed by a different character, followed by the original character
```
str_view(words,"(.).\\1.\\1")
```
```
## [265] │ <eleve>n
```
```
str_view(fruit,"(.).\\1.\\1")
```
```
##  [4] │ b<anana>
## [56] │ p<apaya>
```
“(.)(.)(.).*\\3\\2\\1”
- finds a string where there are three characters followed by any set of characters followed by the aforementioned three characters in reverse order
```
str_view(words,"(.)(.)(.).*\\3\\2\\1")
```
```
## [598] │ <paragrap>h
```

Question 4

Construct regular expressions to match words that:

Start and end with the same character.

str_view(words, "^(.).*\\1$")

##  [36] │ <america>
##  [49] │ <area>
## [209] │ <dad>
## [213] │ <dead>
## [223] │ <depend>
## [258] │ <educate>
## [266] │ <else>
## [268] │ <encourage>
## [270] │ <engine>
## [278] │ <europe>
## [283] │ <evidence>
## [285] │ <example>
## [287] │ <excuse>
## [288] │ <exercise>
## [291] │ <expense>
## [292] │ <experience>
## [296] │ <eye>
## [386] │ <health>
## [394] │ <high>
## [450] │ <knock>
## ... and 16 more

Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)

str_view(words, "(..).*\\1")

##  [48] │ ap<propr>iate
## [152] │ <church>
## [181] │ c<ondition>
## [217] │ <decide>
## [275] │ <environmen>t
## [487] │ l<ondon>
## [598] │ pa<ragra>ph
## [603] │ p<articular>
## [617] │ <photograph>
## [638] │ p<repare>
## [641] │ p<ressure>
## [696] │ r<emem>ber
## [698] │ <repre>sent
## [699] │ <require>
## [739] │ <sense>
## [858] │ the<refore>
## [903] │ u<nderstand>
## [946] │ w<hethe>r

Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)

str_view(words, "(.).*\\1.*\\1")

##  [48] │ a<pprop>riate
##  [62] │ <availa>ble
##  [86] │ b<elieve>
##  [90] │ b<etwee>n
## [119] │ bu<siness>
## [221] │ d<egree>
## [229] │ diff<erence>
## [233] │ di<scuss>
## [265] │ <eleve>n
## [275] │ e<nvironmen>t
## [283] │ <evidence>
## [288] │ <exercise>
## [291] │ <expense>
## [292] │ <experience>
## [423] │ <indivi>dual
## [598] │ p<aragra>ph
## [684] │ r<eceive>
## [696] │ r<emembe>r
## [698] │ r<eprese>nt
## [845] │ t<elephone>
## ... and 2 more

Assignment3

Semyon Toybis

2024-02-09

Question 1

Question 2

Question 3

Question 4