###1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
data=read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/all-ages.csv',header=T)
head(data)
## Major_code Major
## 1 1100 GENERAL AGRICULTURE
## 2 1101 AGRICULTURE PRODUCTION AND MANAGEMENT
## 3 1102 AGRICULTURAL ECONOMICS
## 4 1103 ANIMAL SCIENCES
## 5 1104 FOOD SCIENCE
## 6 1105 PLANT SCIENCE AND AGRONOMY
## Major_category Total Employed Employed_full_time_year_round
## 1 Agriculture & Natural Resources 128148 90245 74078
## 2 Agriculture & Natural Resources 95326 76865 64240
## 3 Agriculture & Natural Resources 33955 26321 22810
## 4 Agriculture & Natural Resources 103549 81177 64937
## 5 Agriculture & Natural Resources 24280 17281 12722
## 6 Agriculture & Natural Resources 79409 63043 51077
## Unemployed Unemployment_rate Median P25th P75th
## 1 2423 0.02614711 50000 34000 80000
## 2 2266 0.02863606 54000 36000 80000
## 3 821 0.03024832 63000 40000 98000
## 4 3619 0.04267890 46000 30000 72000
## 5 894 0.04918845 62000 38500 90000
## 6 2070 0.03179089 50000 35000 75000
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v stringr 1.4.0
## v tidyr 1.1.2 v forcats 0.5.1
## v readr 1.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
ds1 = data[str_detect(data$Major, regex("DATA",ignore_case = TRUE)) | str_detect(data$Major, regex("STATISTICS",ignore_case = TRUE)) ,]
head(ds1)
## Major_code Major
## 20 2101 COMPUTER PROGRAMMING AND DATA PROCESSING
## 93 3702 STATISTICS AND DECISION SCIENCE
## 170 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS
## Major_category Total Employed Employed_full_time_year_round
## 20 Computers & Mathematics 29317 22828 18747
## 93 Computers & Mathematics 24806 18808 14468
## 170 Business 156673 134478 118249
## Unemployed Unemployment_rate Median P25th P75th
## 20 2265 0.09026422 60000 40000 85000
## 93 1138 0.05705405 70000 43000 102000
## 170 6186 0.04397714 72000 50000 100000
###2 Write code that transforms the data below: [1] “bell pepper” “bilberry” “blackberry” “blood orange” [5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”
[9] “elderberry” “lime” “lychee” “mulberry”
[13] “olive” “salal berry”
Into a format like this: c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)
strng= paste("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry", sep=',')
strng = paste('c("', gsub(pattern = ",", replacement = '\",\"', strng), '")')
strng = gsub(pattern = '\" ', replacement = '\"', strng)
strng = gsub(pattern = ' \"', replacement = '\"', strng)
message(strng)
## c("bell pepper","bilberry","blackberry","blood orange","blueberry","cantaloupe","chili pepper","cloudberry","elderberry","lime","lychee","mulberry","olive","salal berry")
###3 Describe, in words, what these expressions will match:
(.)\1\1 - 1st capturing group - any char, match the same char as 1st group, match the same char as 1st group “(.)(.)\2\1” - 1st capturing group any char, 2nd capturing group any char, match the same char as 2nd group, , match the same char as 1st group (..)\1 - found all strings that have a repeated pair of letters. “(.).\1.\1” - 1st capturing group any char, any char, repeat the same char twice "(.)(.)(.).*\3\2\1" - find three charters match in reverse order
###4 Construct regular expressions to match words that:
Start and end with the same character.
str_view(c("qwq", "qwe"), "^q.*q$",match = TRUE)
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
str_view(c("chur", "church", "chch"), "(..)\\1",match = TRUE)
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
str_view(c("eleven", "church"), "(..)\\1{3}",match = TRUE)