Title: “DATA 607 Assignment 3”

output: html_document


Gehad Gad

February 16th, 2020

Assignment 3

#Import libraries and/or Packages

#install.packages("tidyverse")
#install.packages("htmlwidgets")
library (stringr)
library (tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.4
## v tidyr   1.0.2     v forcats 0.4.0
## v readr   1.3.1
## -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Question 1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS”

#Import the data into R.

majors <- read.csv ("https://github.com/fivethirtyeight/data/raw/master/college-majors/majors-list.csv")
#Select the major(s)the contain the word DATA.

DATA = majors [grep ("DATA", majors$Major),]
#Select the major(s)the contain the word STATISTICS.


STAT = majors [grep ("STATISTICS", majors$Major),]
# Combine the two togther.

Data_Stat = rbind (DATA, STAT)

Question 2. Write code that transforms the data below:

[1] “bell pepper” “bilberry” “blackberry” “blood orange”

[5] “blueberry” “cantaloupe” “chili pepper” “cloudberry”

[9] “elderberry” “lime” “lychee” “mulberry”

[13] “olive” “salal berry”

Into a format like this:

c(“bell pepper”, “bilberry”, “blackberry”, “blood orange”, “blueberry”, “cantaloupe”, “chili pepper”, “cloudberry”, “elderberry”, “lime”, “lychee”, “mulberry”, “olive”, “salal berry”)

# We will create an array for all the item listed.

Array = array (c("bell pepper", "bilberry",   "blackberry",  "blood orange", 
"blueberry",  "cantaloupe" ,  "chili, pepper","cloudberry",  
"elderberry",   "lime", "lychee", "mulberry", 
"olive",    "salal berry"))

#Array

#[1] “bell pepper” “bilberry” “blackberry” “blood orange” #[5] “blueberry” “cantaloupe” “chili, pepper” “cloudberry”
#[9] “elderberry” “lime” “lychee” “mulberry”
#[13] “olive” “salal berry”

#Create change the array to a vector and display the vector.

Vector = as.vector (Array)

dput(Vector)
## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", 
## "cantaloupe", "chili, pepper", "cloudberry", "elderberry", "lime", 
## "lychee", "mulberry", "olive", "salal berry")

3 Describe, in words, what these expressions will match:

(.)\1\1

() represents group one and the . represents a character other than /n or a space. Followed by numerical reference for group 1 then remerical reference for group 1

“(.)(.)\2\1”

The first dot represenes a character and the first () represents group one and the second () represent group two. The second dot is a character other than /n or a space, followed by another character. Followed by "2" then "1"

(..)\1

The () represents a group one. The .. represent a set of two characters are followed by a numerical reference of group 1.

“(.).\1.\1”

The () represents a group one. The two .. represent a character followed by "1" followed by a . which represents a character other than /n or a space followed by "1". 

"(.)(.)(.).\3\2\1"*

The first dot represenes a character and the first () represents group one. The second () represent group two and the second dot is a character other than /n or a space, the third dot represenes a character and the third () represents group three followed by another character, "3", "2" then "1"

Question 4.Construct regular expressions to match words that:

1. Start and end with the same character.

x1 <- c("sales", "saults", "scalps", "shapes", "scoop")

str_view (x1, "^s")
str_view (x1, "s$")

Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)

x2 <- c("shushing", "sharpshooter", "crosshatch", "sharp")
  
str_view (x2, " sh+")
str_count (x2, "sh")
## [1] 2 2 1 1

Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)

x3 <- c ("cheese", "deeper", "breeze", "Green")

str_view (x3, "ee+")
for (i in x3){count = str_count (i, "e")   
if (count >= 3){  print(i)}}
## [1] "cheese"
## [1] "deeper"
## [1] "breeze"