Data 607 Week 3 Assignment

Library

library(tidyverse)
library(stringr)
library(dplyr)

1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

major_list <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv",header = TRUE, sep = ",")

head(major_list)

##   FOD1P                                 Major                  Major_Category
## 1  1100                   GENERAL AGRICULTURE Agriculture & Natural Resources
## 2  1101 AGRICULTURE PRODUCTION AND MANAGEMENT Agriculture & Natural Resources
## 3  1102                AGRICULTURAL ECONOMICS Agriculture & Natural Resources
## 4  1103                       ANIMAL SCIENCES Agriculture & Natural Resources
## 5  1104                          FOOD SCIENCE Agriculture & Natural Resources
## 6  1105            PLANT SCIENCE AND AGRONOMY Agriculture & Natural Resources

grep(pattern = 'data|statistics',major_list$Major, value = TRUE, ignore.case = TRUE)

## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"     
## [3] "STATISTICS AND DECISION SCIENCE"

2. Write code that transforms the data below:

[1] “bell pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”

Into a format like this: c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

data1 <- '"bell pepper" "bilberry" "blackberry" "blood orange" "blueberry" "cantaloupe" "chili pepper" "cloudberry" "elderberry" "lime" "lychee" "mulberry" "olive" "salal berry"'

data2 <- strsplit(data1, " ")
data4 <- as.data.frame(data2)
head(data4)

##   c....bell....pepper........bilberry........blackberry........blood...
## 1                                                                 "bell
## 2                                                               pepper"
## 3                                                            "bilberry"
## 4                                                          "blackberry"
## 5                                                                "blood
## 6                                                               orange"

#3 Describe, in words, what these expressions will match: (.)\1\1 “(.)(.)\2\1” (..)\1 “(.).\1.\1” “(.)(.)(.).*\3\2\1”

(.)\1\1: This shows the same character repeating three times.

“(.)(.)\2\1”: The shows a pair of characters and then it reverses the same pair of characters and put them at the end.

(..)\1: Any 2 characters repeated 1 time.

“(.).\1.\1”: This shows 1 character and any character after it, And then, it shows the first 1 character followed by any character. Lastly, it shows the first character again.

“(.)(.)(.).*\3\2\1” This shows 3 characters and any character after it. It then shows the 3 characters again in reverse order.

#4 Construct regular expressions to match words that:

Start and end with the same character.

str_view("apple", "^(.)((.*\\1$)|\\1$)")

Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)

str_view("church", "([A-Za-z][A-Za-z]).*\\1")

## [1] │ <church>

Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)

str_view("eleven", "([A-Za-z]).*\\1.*\\1.")

## [1] │ <eleven>

Data 607 Week 3 Assignment

CHUN SHING LEUNG

2023-09-16

Library

1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

2. Write code that transforms the data below: