library(RCurl)
library(stringr)
library(tidyverse)

#1.

Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

getfile <- getURL("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv") 
majors <- read.csv(text = getfile)
data_or_stat <- majors %>%
  filter(str_detect(Major, "DATA|STATISTICS"))
print(data_or_stat)

##   FOD1P                                         Major          Major_Category
## 1  6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS                Business
## 2  2101      COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3  3702               STATISTICS AND DECISION SCIENCE Computers & Mathematics

#2

Write code that transforms the data below:

[1] “bell pepper” “bilberry” “blackberry” “blood orange”

[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”

[9] “elderberry” “lime” “lychee” “mulberry”

[13] “olive” “salal berry”

Into a format like this:

c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

needtoclean <-' "bell pepper"  "bilberry"     "blackberry"   "blood orange"

[5] "blueberry"    "cantaloupe"   "chili pepper" "cloudberry"  

[9] "elderberry"   "lime"         "lychee"       "mulberry"    

[13] "olive"        "salal berry" '

cleaned <- needtoclean %>%
  str_split(boundary("word"))%>%
  .[[1]]%>%
  str_subset("[^\\d]")

print(cleaned)

##  [1] "bell"       "pepper"     "bilberry"   "blackberry" "blood"     
##  [6] "orange"     "blueberry"  "cantaloupe" "chili"      "pepper"    
## [11] "cloudberry" "elderberry" "lime"       "lychee"     "mulberry"  
## [16] "olive"      "salal"      "berry"

The two exercises below are taken from R for Data Science, 14.3.5.1 in the on-line version:

#3

Describe, in words, what these expressions will match:

(.)\1\1: The character that directly preceeds \1\1.

“(.)(.)\2\1”: First character followed by two of the second, followed by the first; all within double quotation ex. “xyyx”

(..)\1: The two characters that directly proceed \1.

“(.).\1.\1”: First character, followed by any character, then first character, then any character, the first character; all within double quotation. ex “xyxyx”

"(.)(.)(.).*\3\2\1“: First three characters, then anything, ending with the first three characters in reverse order; all within double quotation. ex.”xyzaddazyx"

#4

Construct regular expressions to match words that:

Start and end with the same character.

str_view(fruit, "^(.).*\\1$", match = TRUE)

Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)

str_view(fruit, "(.)(.).*\\1\\2", match = TRUE)

Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)

str_view(fruit, "(.).*\\1.*\\1", match = TRUE)

Data 607: Week 3 Assignment

R Character Manipulation and Date Processing

Mustafa Telab

9/9/2020

#1.

#2

#3

#4