Homework 3
1. Using the 173 majors listed
In fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”
theURL <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv"
all_majors <- read.csv(file=theURL, fileEncoding="UTF-8-BOM")
data_or_stats <- subset(all_majors, grepl("DATA|STATISTICS", Major))
data_or_stats
## FOD1P Major Major_Category
## 44 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 52 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 59 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
3. Describe, in words, what these expressions will match:
lista <- c("abba", "hello\2\1","aaaabbbcccdddde", "aabbmmmmmjmmkmkkk","banana", "amanaplanacanalpanama","civic", "racecar")
str_view(lista, "(.)\\1\\1")
str_view(lista, "(.)(.)\\2\\1")
str_view(lista, "(..)\\1")
str_view(lista, "(.).\\1.\\1")
str_view(lista, "(.)(.)(.).*\\3\\2\\1")
- “(.)\1\1” Match any three repeated characters
- "(.)(.)\2\1 Match any symmetrical 4 characters where 1st and 4th characters are the same and 2,3 are the same.
- (..)\1 Match any repeated characters separated by 1 character.
- “(.).\1.\1” match any three repeated characters separated by the same character
- "(.)(.)(.).*\3\2\1" match palyndromes of 6 of more letters
4.Construct regular expressions to match words that:
Start and end with the same character. Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
words <- c("civic", "church", "eleven")
str_view(words, "^(.).*\\1$")
str_view(words, "(.)(.).*\\1\\2")
str_view(words, "(.).\\1.\\1")
LS0tDQp0aXRsZTogIkRTNjA3IC0gSG9tZXdvcmsgMyINCmF1dGhvcjogIkdlb3JnZSBDcnV6Ig0KZGF0ZTogImByIFN5cy5EYXRlKClgIg0Kb3V0cHV0OiBvcGVuaW50cm86OmxhYl9yZXBvcnQNCi0tLQ0KIyBIb21ld29yayAzDQoNCiMjIyAxLiBVc2luZyB0aGUgMTczIG1ham9ycyBsaXN0ZWQgDQpJbiBmaXZldGhpcnR5ZWlnaHQuY29t4oCZcyBDb2xsZWdlIE1ham9ycyBkYXRhc2V0IFtodHRwczovL2ZpdmV0aGlydHllaWdodC5jb20vZmVhdHVyZXMvdGhlLWVjb25vbWljLWd1aWRlLXRvLXBpY2tpbmctYS1jb2xsZWdlLW1ham9yL10sIHByb3ZpZGUgY29kZSB0aGF0IGlkZW50aWZpZXMgdGhlIG1ham9ycyB0aGF0IGNvbnRhaW4gZWl0aGVyICJEQVRBIiBvciAiU1RBVElTVElDUyINCmBgYHtyIGxvYWQtcGFja2FnZXMsIG1lc3NhZ2U9RkFMU0V9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmBgYA0KDQoNCmBgYHtyfQ0KdGhlVVJMIDwtICJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZml2ZXRoaXJ0eWVpZ2h0L2RhdGEvbWFzdGVyL2NvbGxlZ2UtbWFqb3JzL21ham9ycy1saXN0LmNzdiINCmFsbF9tYWpvcnMgPC0gcmVhZC5jc3YoZmlsZT10aGVVUkwsIGZpbGVFbmNvZGluZz0iVVRGLTgtQk9NIikNCmRhdGFfb3Jfc3RhdHMgPC0gc3Vic2V0KGFsbF9tYWpvcnMsIGdyZXBsKCJEQVRBfFNUQVRJU1RJQ1MiLCBNYWpvcikpDQpkYXRhX29yX3N0YXRzDQpgYGANCg0KIyMjIDIuIFdyaXRlIGNvZGUgdGhhdCB0cmFuc2Zvcm1zIHRoZSBkYXRhIGJlbG93Og0KDQpgYGB7ciBjb2RlLWNodW5rLWxhYmVsfQ0KDQp2ZWN0b3JfMSA8LSBjKCJiZWxsIHBlcHBlciIsICJiaWxiZXJyeSIsICJibGFja2JlcnJ5IiwgImJsb29kIG9yYW5nZSIpDQp2ZWN0b3JfMiA8LSBjKCJibHVlYmVycnkiLCAiY2FudGFsb3VwZSIsICJjaGlsaSBwZXBwZXIiLCAiY2xvdWRiZXJyeSIpDQp2ZWN0b3JfMyA8LSBjKCJlbGRlcmJlcnJ5IiwgImxpbWUiLCAibHljaGVlIiwgIm11bGJlcnJ5IikNCnZlY3Rvcl80IDwtIGMoIm9saXZlIiwgInNhbGFsIGJlcnJ5IikNCm1haW5fdmVjdG9yIDwtIGModmVjdG9yXzEsIHZlY3Rvcl8yLCB2ZWN0b3JfMywgdmVjdG9yXzQpDQptYWluX3ZlY3Rvcg0KYGBgDQoNCiMjIyAzLiBEZXNjcmliZSwgaW4gd29yZHMsIHdoYXQgdGhlc2UgZXhwcmVzc2lvbnMgd2lsbCBtYXRjaDoNCmBgYHtyLCByZXN1bHRzPSdoaWRlJ30NCmxpc3RhIDwtIGMoImFiYmEiLCAiaGVsbG9cMlwxIiwiYWFhYWJiYmNjY2RkZGRlIiwgImFhYmJtbW1tbWptbWtta2trIiwiYmFuYW5hIiwgImFtYW5hcGxhbmFjYW5hbHBhbmFtYSIsImNpdmljIiwgInJhY2VjYXIiKQ0Kc3RyX3ZpZXcobGlzdGEsICIoLilcXDFcXDEiKQ0Kc3RyX3ZpZXcobGlzdGEsICIoLikoLilcXDJcXDEiKQ0Kc3RyX3ZpZXcobGlzdGEsICIoLi4pXFwxIikNCnN0cl92aWV3KGxpc3RhLCAiKC4pLlxcMS5cXDEiKQ0Kc3RyX3ZpZXcobGlzdGEsICIoLikoLikoLikuKlxcM1xcMlxcMSIpDQoNCmBgYA0KDQoxLiAiKC4pXDFcMSIgIE1hdGNoIGFueSB0aHJlZSByZXBlYXRlZCBjaGFyYWN0ZXJzDQoyLiAiKC4pKC4pXFwyXFwxIE1hdGNoIGFueSBzeW1tZXRyaWNhbCA0IGNoYXJhY3RlcnMgd2hlcmUgMXN0IGFuZCA0dGggY2hhcmFjdGVycyBhcmUgdGhlIHNhbWUgYW5kIDIsMyBhcmUgdGhlIHNhbWUuDQozLiAoLi4pXDEgTWF0Y2ggYW55IHJlcGVhdGVkIGNoYXJhY3RlcnMgc2VwYXJhdGVkIGJ5IDEgY2hhcmFjdGVyLg0KNC4gIiguKS5cXDEuXFwxIiBtYXRjaCBhbnkgdGhyZWUgcmVwZWF0ZWQgY2hhcmFjdGVycyBzZXBhcmF0ZWQgYnkgdGhlIHNhbWUgY2hhcmFjdGVyDQo1LiAiKC4pKC4pKC4pLipcXDNcXDJcXDEiIG1hdGNoIHBhbHluZHJvbWVzIG9mIDYgb2YgbW9yZSBsZXR0ZXJzDQoNCiMjIyA0LkNvbnN0cnVjdCByZWd1bGFyIGV4cHJlc3Npb25zIHRvIG1hdGNoIHdvcmRzIHRoYXQ6DQoNClN0YXJ0IGFuZCBlbmQgd2l0aCB0aGUgc2FtZSBjaGFyYWN0ZXIuDQpDb250YWluIGEgcmVwZWF0ZWQgcGFpciBvZiBsZXR0ZXJzIChlLmcuICJjaHVyY2giIGNvbnRhaW5zICJjaCIgcmVwZWF0ZWQgdHdpY2UuKQ0KQ29udGFpbiBvbmUgbGV0dGVyIHJlcGVhdGVkIGluIGF0IGxlYXN0IHRocmVlIHBsYWNlcyAoZS5nLiAiZWxldmVuIiBjb250YWlucyB0aHJlZSAiZSJzLikNCg0KYGBge3IsIHJlc3VsdHM9J2hpZGUnfQ0Kd29yZHMgPC0gYygiY2l2aWMiLCAiY2h1cmNoIiwgImVsZXZlbiIpDQpzdHJfdmlldyh3b3JkcywgIl4oLikuKlxcMSQiKQ0Kc3RyX3ZpZXcod29yZHMsICIoLikoLikuKlxcMVxcMiIpDQpzdHJfdmlldyh3b3JkcywgIiguKS5cXDEuXFwxIikNCg0KYGBgDQoNCg0K