Homework 3

1. Using the 173 majors listed

In fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

library(tidyverse)
theURL <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"
all_majors <- read.csv(file=theURL, fileEncoding="UTF-8-BOM")
data_or_stats <- subset(all_majors, grepl("DATA|STATISTICS", Major))
data_or_stats
##    FOD1P                                         Major          Major_Category
## 44  6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS                Business
## 52  2101      COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 59  3702               STATISTICS AND DECISION SCIENCE Computers & Mathematics

2. Write code that transforms the data below:

vector_1 <- c("bell pepper", "bilberry", "blackberry", "blood orange")
vector_2 <- c("blueberry", "cantaloupe", "chili pepper", "cloudberry")
vector_3 <- c("elderberry", "lime", "lychee", "mulberry")
vector_4 <- c("olive", "salal berry")
main_vector <- c(vector_1, vector_2, vector_3, vector_4)
main_vector
##  [1] "bell pepper"  "bilberry"     "blackberry"   "blood orange" "blueberry"   
##  [6] "cantaloupe"   "chili pepper" "cloudberry"   "elderberry"   "lime"        
## [11] "lychee"       "mulberry"     "olive"        "salal berry"

3. Describe, in words, what these expressions will match:

lista <- c("abba", "hello\2\1","aaaabbbcccdddde", "aabbmmmmmjmmkmkkk","banana", "amanaplanacanalpanama","civic", "racecar")
str_view(lista, "(.)\\1\\1")
str_view(lista, "(.)(.)\\2\\1")
str_view(lista, "(..)\\1")
str_view(lista, "(.).\\1.\\1")
str_view(lista, "(.)(.)(.).*\\3\\2\\1")
  1. “(.)\1\1” Match any three repeated characters
  2. "(.)(.)\2\1 Match any symmetrical 4 characters where 1st and 4th characters are the same and 2,3 are the same.
  3. (..)\1 Match any repeated characters separated by 1 character.
  4. “(.).\1.\1” match any three repeated characters separated by the same character
  5. "(.)(.)(.).*\3\2\1" match palyndromes of 6 of more letters

4.Construct regular expressions to match words that:

Start and end with the same character. Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)

words <- c("civic", "church", "eleven")
str_view(words, "^(.).*\\1$")
str_view(words, "(.)(.).*\\1\\2")
str_view(words, "(.).\\1.\\1")
LS0tDQp0aXRsZTogIkRTNjA3IC0gSG9tZXdvcmsgMyINCmF1dGhvcjogIkdlb3JnZSBDcnV6Ig0KZGF0ZTogImByIFN5cy5EYXRlKClgIg0Kb3V0cHV0OiBvcGVuaW50cm86OmxhYl9yZXBvcnQNCi0tLQ0KIyBIb21ld29yayAzDQoNCiMjIyAxLiBVc2luZyB0aGUgMTczIG1ham9ycyBsaXN0ZWQgDQpJbiBmaXZldGhpcnR5ZWlnaHQuY29t4oCZcyBDb2xsZWdlIE1ham9ycyBkYXRhc2V0IFtodHRwczovL2ZpdmV0aGlydHllaWdodC5jb20vZmVhdHVyZXMvdGhlLWVjb25vbWljLWd1aWRlLXRvLXBpY2tpbmctYS1jb2xsZWdlLW1ham9yL10sIHByb3ZpZGUgY29kZSB0aGF0IGlkZW50aWZpZXMgdGhlIG1ham9ycyB0aGF0IGNvbnRhaW4gZWl0aGVyICJEQVRBIiBvciAiU1RBVElTVElDUyINCmBgYHtyIGxvYWQtcGFja2FnZXMsIG1lc3NhZ2U9RkFMU0V9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmBgYA0KDQoNCmBgYHtyfQ0KdGhlVVJMIDwtICJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZml2ZXRoaXJ0eWVpZ2h0L2RhdGEvbWFzdGVyL2NvbGxlZ2UtbWFqb3JzL21ham9ycy1saXN0LmNzdiINCmFsbF9tYWpvcnMgPC0gcmVhZC5jc3YoZmlsZT10aGVVUkwsIGZpbGVFbmNvZGluZz0iVVRGLTgtQk9NIikNCmRhdGFfb3Jfc3RhdHMgPC0gc3Vic2V0KGFsbF9tYWpvcnMsIGdyZXBsKCJEQVRBfFNUQVRJU1RJQ1MiLCBNYWpvcikpDQpkYXRhX29yX3N0YXRzDQpgYGANCg0KIyMjIDIuIFdyaXRlIGNvZGUgdGhhdCB0cmFuc2Zvcm1zIHRoZSBkYXRhIGJlbG93Og0KDQpgYGB7ciBjb2RlLWNodW5rLWxhYmVsfQ0KDQp2ZWN0b3JfMSA8LSBjKCJiZWxsIHBlcHBlciIsICJiaWxiZXJyeSIsICJibGFja2JlcnJ5IiwgImJsb29kIG9yYW5nZSIpDQp2ZWN0b3JfMiA8LSBjKCJibHVlYmVycnkiLCAiY2FudGFsb3VwZSIsICJjaGlsaSBwZXBwZXIiLCAiY2xvdWRiZXJyeSIpDQp2ZWN0b3JfMyA8LSBjKCJlbGRlcmJlcnJ5IiwgImxpbWUiLCAibHljaGVlIiwgIm11bGJlcnJ5IikNCnZlY3Rvcl80IDwtIGMoIm9saXZlIiwgInNhbGFsIGJlcnJ5IikNCm1haW5fdmVjdG9yIDwtIGModmVjdG9yXzEsIHZlY3Rvcl8yLCB2ZWN0b3JfMywgdmVjdG9yXzQpDQptYWluX3ZlY3Rvcg0KYGBgDQoNCiMjIyAzLiBEZXNjcmliZSwgaW4gd29yZHMsIHdoYXQgdGhlc2UgZXhwcmVzc2lvbnMgd2lsbCBtYXRjaDoNCmBgYHtyLCByZXN1bHRzPSdoaWRlJ30NCmxpc3RhIDwtIGMoImFiYmEiLCAiaGVsbG9cMlwxIiwiYWFhYWJiYmNjY2RkZGRlIiwgImFhYmJtbW1tbWptbWtta2trIiwiYmFuYW5hIiwgImFtYW5hcGxhbmFjYW5hbHBhbmFtYSIsImNpdmljIiwgInJhY2VjYXIiKQ0Kc3RyX3ZpZXcobGlzdGEsICIoLilcXDFcXDEiKQ0Kc3RyX3ZpZXcobGlzdGEsICIoLikoLilcXDJcXDEiKQ0Kc3RyX3ZpZXcobGlzdGEsICIoLi4pXFwxIikNCnN0cl92aWV3KGxpc3RhLCAiKC4pLlxcMS5cXDEiKQ0Kc3RyX3ZpZXcobGlzdGEsICIoLikoLikoLikuKlxcM1xcMlxcMSIpDQoNCmBgYA0KDQoxLiAiKC4pXDFcMSIgIE1hdGNoIGFueSB0aHJlZSByZXBlYXRlZCBjaGFyYWN0ZXJzDQoyLiAiKC4pKC4pXFwyXFwxIE1hdGNoIGFueSBzeW1tZXRyaWNhbCA0IGNoYXJhY3RlcnMgd2hlcmUgMXN0IGFuZCA0dGggY2hhcmFjdGVycyBhcmUgdGhlIHNhbWUgYW5kIDIsMyBhcmUgdGhlIHNhbWUuDQozLiAoLi4pXDEgTWF0Y2ggYW55IHJlcGVhdGVkIGNoYXJhY3RlcnMgc2VwYXJhdGVkIGJ5IDEgY2hhcmFjdGVyLg0KNC4gIiguKS5cXDEuXFwxIiBtYXRjaCBhbnkgdGhyZWUgcmVwZWF0ZWQgY2hhcmFjdGVycyBzZXBhcmF0ZWQgYnkgdGhlIHNhbWUgY2hhcmFjdGVyDQo1LiAiKC4pKC4pKC4pLipcXDNcXDJcXDEiIG1hdGNoIHBhbHluZHJvbWVzIG9mIDYgb2YgbW9yZSBsZXR0ZXJzDQoNCiMjIyA0LkNvbnN0cnVjdCByZWd1bGFyIGV4cHJlc3Npb25zIHRvIG1hdGNoIHdvcmRzIHRoYXQ6DQoNClN0YXJ0IGFuZCBlbmQgd2l0aCB0aGUgc2FtZSBjaGFyYWN0ZXIuDQpDb250YWluIGEgcmVwZWF0ZWQgcGFpciBvZiBsZXR0ZXJzIChlLmcuICJjaHVyY2giIGNvbnRhaW5zICJjaCIgcmVwZWF0ZWQgdHdpY2UuKQ0KQ29udGFpbiBvbmUgbGV0dGVyIHJlcGVhdGVkIGluIGF0IGxlYXN0IHRocmVlIHBsYWNlcyAoZS5nLiAiZWxldmVuIiBjb250YWlucyB0aHJlZSAiZSJzLikNCg0KYGBge3IsIHJlc3VsdHM9J2hpZGUnfQ0Kd29yZHMgPC0gYygiY2l2aWMiLCAiY2h1cmNoIiwgImVsZXZlbiIpDQpzdHJfdmlldyh3b3JkcywgIl4oLikuKlxcMSQiKQ0Kc3RyX3ZpZXcod29yZHMsICIoLikoLikuKlxcMVxcMiIpDQpzdHJfdmlldyh3b3JkcywgIiguKS5cXDEuXFwxIikNCg0KYGBgDQoNCg0K