DATA607

#1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

First lets pull the data from the GitHub provided in the article.

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.6     v purrr   0.3.4
## v tibble  3.1.8     v dplyr   1.0.9
## v tidyr   1.2.0     v stringr 1.4.1
## v readr   2.1.2     v forcats 0.5.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

path = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv'
majors = read.table(file=path, header=TRUE, sep=',')

## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
## EOF within quoted string

## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
## number of items read is not a multiple of the number of columns

df = data.frame(majors)
head(df)

##   FOD1P                                 Major                  Major_Category
## 1  1100                   GENERAL AGRICULTURE Agriculture & Natural Resources
## 2  1101 AGRICULTURE PRODUCTION AND MANAGEMENT Agriculture & Natural Resources
## 3  1102                AGRICULTURAL ECONOMICS Agriculture & Natural Resources
## 4  1103                       ANIMAL SCIENCES Agriculture & Natural Resources
## 5  1104                          FOOD SCIENCE Agriculture & Natural Resources
## 6  1105            PLANT SCIENCE AND AGRONOMY Agriculture & Natural Resources

Next, lets identify the “DATA” or “STATISTICS” majors in the dataset. As we can see, MANAGEMENT INFORMATION SYSTEMS AND STATISTICS, COMPUTER PROGRAMMING AND DATA PROCESSING, and STATISTICS AND DECISION SCIENCE are the 3 majors with “DATA” or “STATISTICS” in its title.

majors %>% filter(str_detect(Major, ("DATA|STATISTICS")))

##   FOD1P                                         Major          Major_Category
## 1  6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS                Business
## 2  2101      COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3  3702               STATISTICS AND DECISION SCIENCE Computers & Mathematics

#2 Write code that transforms the data below:

[1] “bell pepper” “bilberry” “blackberry” “blood orange”

[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”

[9] “elderberry” “lime” “lychee” “mulberry”

[13] “olive” “salal berry”

Into a format like this:

c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

data = '[1] "bell pepper"  "bilberry"     "blackberry"   "blood orange"

 [5] "blueberry"    "cantaloupe"   "chili pepper" "cloudberry"  

 [9] "elderberry"   "lime"         "lychee"       "mulberry"    

 [13] "olive"        "salal berry"'
data

## [1] "[1] \"bell pepper\"  \"bilberry\"     \"blackberry\"   \"blood orange\"\n\n [5] \"blueberry\"    \"cantaloupe\"   \"chili pepper\" \"cloudberry\"  \n\n [9] \"elderberry\"   \"lime\"         \"lychee\"       \"mulberry\"    \n\n [13] \"olive\"        \"salal berry\""

w = c("bell pepper","bilberry","blackberry","blood orange")
x = c("blueberry","cantaloupe","chili pepper","cloudberry")
y = c("elderberry","lime","lychee","mulberry")
z = c("olive","salal berry")

join = c(w,x,y,z)
join

##  [1] "bell pepper"  "bilberry"     "blackberry"   "blood orange" "blueberry"   
##  [6] "cantaloupe"   "chili pepper" "cloudberry"   "elderberry"   "lime"        
## [11] "lychee"       "mulberry"     "olive"        "salal berry"

#3 Describe, in words, what these expressions will match:

“(.)\1\1” = The (.) character will appear and repeat 2 times. “(.)(.)\2\1” = The 2 characters repeated will appear and then appear reversed. “(..)\1” = The 2 characters will be repeated once. “(.).\1.\1” = The 3 same characters out of a 5 character expression will be placed in 1, 3, and 5 positions. “(.)(.)(.).*\3\2\1” = This will repeat the first 3 characters at the end in reverse order.

#4 Construct regular expressions to match words that:

Start and end with the same character.

^(.)(.*)\1$”

Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)

“([A-Za-z][A-Za-z]).*\1”

Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.

“([A-Za-z]).\1.\1”

DATA607_W4

Tyler Brown

2022-09-12