Assignment 3

Hazal Gunduz

1.Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

library(readr)
majors_list <- read_csv("~/Desktop/majors-list.csv")
## Rows: 174 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): FOD1P, Major, Major_Category
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
head(majors_list)
## # A tibble: 6 × 3
##   FOD1P Major                                 Major_Category                 
##   <chr> <chr>                                 <chr>                          
## 1 1100  GENERAL AGRICULTURE                   Agriculture & Natural Resources
## 2 1101  AGRICULTURE PRODUCTION AND MANAGEMENT Agriculture & Natural Resources
## 3 1102  AGRICULTURAL ECONOMICS                Agriculture & Natural Resources
## 4 1103  ANIMAL SCIENCES                       Agriculture & Natural Resources
## 5 1104  FOOD SCIENCE                          Agriculture & Natural Resources
## 6 1105  PLANT SCIENCE AND AGRONOMY            Agriculture & Natural Resources

DATA

FOD1P Major Major_Category

52 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics

STATISTICS

FOD1P Major Major_Category

44 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business

59 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics

  1. Write code that transforms the data below:

[1] “bell pepper” “bilberry” “blackberry” “blood orange”

[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”

[9] “elderberry” “lime” “lychee” “mulberry”

[13] “olive” “salal berry”

Into a format like this:

c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

no1 <- c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantalope", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")

dput(as.character(no1))
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", 
## "cantalope", "chili pepper", "cloudberry", "elderberry", "lime", 
## "lychee", "mulberry", "olive", "salal berry")

3.Describe, in words, what these expressions will match:

(.)\1\1

Answer:Same character appears 3 times in a row

“(.)(.)\2\1”

Answer:2 characters attached to the same 2 characters in reverse order

(..)\1

Answer:Any 2 characters repeated

“(.).\1.\1”

Answer:Has the same character repeat 3 times and they are all seperated by one character.

"(.)(.)(.).*\3\2\1"

Answer:3 characters followed by zero or more characters followed by the original 3 characters in reverse order.

4.Construct regular expressions to match words that:

-Start and end with the same character. “^(.).+\1$”

-Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) “\b\w(\w{2})\w\1”

-Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.) “1([a-z])\1[a-z]$”

library(tidyr)
test_words = list("banana", "peep", "strawberry", "cucumber", "olive", "test")

regex1 = "^(.).+\\1$"
Filter(function(x) any(grepl(regex1, x)), test_words)
## [[1]]
## [1] "peep"
## 
## [[2]]
## [1] "test"
regex2 = "\\b\\w*(\\w{2})\\w*\\1"
Filter(function(x) any(grepl(regex2, x)), test_words)
## [[1]]
## [1] "banana"
## 
## [[2]]
## [1] "cucumber"
regex3 = "^[a-z]*([a-z])\\1[a-z]*$"
Filter(function(x) any(grepl(regex3, x)), test_words)
## [[1]]
## [1] "peep"
## 
## [[2]]
## [1] "strawberry"

https://github.com/Gunduzhazal/https-rpubs.com-gunduzhazal-808190

https://rpubs.com/gunduzhazal/808190


  1. a-z↩︎