WEEK 3

Exercise 1 - Find “DATA” or “STATISTICS”

Downloading majors list from 538 website

library(tidyverse)
## ── Attaching packages ───────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
majors <- read.csv(file = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv')

Using str_subset to identify majors containing Data and statistics

ex_1<-str_subset(majors$Major, "DATA|STATISTICS")
ex_1
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"     
## [3] "STATISTICS AND DECISION SCIENCE"

Exercise 2

Concatenating strings - I’m a bit confused with this one. c(“a”, “b”, “c”) is usually a way to stored multiple strings into a character vector. It seems the exercise is asking for the inverse, to go from a character vector to the function? Or does the exercise wants from the character vector to a single string that looks like the c() function? A have two solutions,using the ‘fruit’ list: 1: Here, it builds a single string with double quotes around each element

x<-c(fruit)
y<-str_c("\"", x , "\"" , collapse = ",")
writeLines(y)
## "apple","apricot","avocado","banana","bell pepper","bilberry","blackberry","blackcurrant","blood orange","blueberry","boysenberry","breadfruit","canary melon","cantaloupe","cherimoya","cherry","chili pepper","clementine","cloudberry","coconut","cranberry","cucumber","currant","damson","date","dragonfruit","durian","eggplant","elderberry","feijoa","fig","goji berry","gooseberry","grape","grapefruit","guava","honeydew","huckleberry","jackfruit","jambul","jujube","kiwi fruit","kumquat","lemon","lime","loquat","lychee","mandarine","mango","mulberry","nectarine","nut","olive","orange","pamelo","papaya","passionfruit","peach","pear","persimmon","physalis","pineapple","plum","pomegranate","pomelo","purple mangosteen","quince","raisin","rambutan","raspberry","redcurrant","rock melon","salal berry","satsuma","star fruit","strawberry","tamarillo","tangerine","ugli fruit","watermelon"
y<-c("c(",str_c("\"", x , "\"" , collapse = ","),")")

2: Here it builds a single string contains c( in the beginning and ) at the end

z<-c(fruit)
yy<-c("c(",str_c("\"", z , "\"" , collapse = ","),")")
writeLines(yy)
## c(
## "apple","apricot","avocado","banana","bell pepper","bilberry","blackberry","blackcurrant","blood orange","blueberry","boysenberry","breadfruit","canary melon","cantaloupe","cherimoya","cherry","chili pepper","clementine","cloudberry","coconut","cranberry","cucumber","currant","damson","date","dragonfruit","durian","eggplant","elderberry","feijoa","fig","goji berry","gooseberry","grape","grapefruit","guava","honeydew","huckleberry","jackfruit","jambul","jujube","kiwi fruit","kumquat","lemon","lime","loquat","lychee","mandarine","mango","mulberry","nectarine","nut","olive","orange","pamelo","papaya","passionfruit","peach","pear","persimmon","physalis","pineapple","plum","pomegranate","pomelo","purple mangosteen","quince","raisin","rambutan","raspberry","redcurrant","rock melon","salal berry","satsuma","star fruit","strawberry","tamarillo","tangerine","ugli fruit","watermelon"
## )

Exercise 3

(.)\1\1 <- matches same character three times in a row at any position in a string. Code below shows for same character two times in a row. Hence, adding an additional \1 makes it three times in a row. In the ‘fruit’ list, it seems there are no matches like bbb or ccc.

str_view(fruit,"(.)\\1",match = TRUE)

(.)(.)\2\1 <-matches pair of characters where the second character of the first group is the first character in the second group, like eppe in pepper

str_view(fruit,"(.)(.)\\2\\1",match = TRUE)

(..)\1 <- matches two characters that are repeated like anan in banana

str_view(fruit,"(..)\\1", match = TRUE)

(.).\1.\1 <- this matches a character, followed by any character, the first character, followed by any character, followed by first character again, like anana in banana

str_view(fruit,"(.).\\1.\\1",match = TRUE)

(.)(.)(.).\3\2\1-> matches three characters, followed by any character, followed by characters of any length, in reverse order (3rd group first, 2nd group second and 1st group third), like cdefgzhuhjedc. ‘Fruit’ list doesn’t have any matched for this expression but I found for (.)(.)(.).\2\1 in clementine (entine)

x<-str_view(fruit,"(.)(.)(.).*\\3\\2\\1",match = TRUE)
x
y<-str_view(fruit,"(.)(.)(.).*\\2\\1",match = TRUE)
y

Exercise 4

. Regex start and end with the same character Cannot find a solution

. regex contains a repeated pair of letters I started with (..)\1, like in the previous example which shows emem in remember but it seems too restrictive as it not picking up church

str_view(words,"(..)\\1", match=TRUE)

I then added .* like in the previous example to account for any character of any length

str_view(words,"(..).*\\1", match=TRUE)

. regex contains one letter repeated in at least three places using example above but with one character in the group (.).*\1, gives two repeated letters

str_view(words,"(.).*\\1", match=TRUE)

Adding an additional repetition, should do it

str_view(words,"(.).*\\1.*\\1", match=TRUE)