Data607 - Week 3 Assignment

This assignment will use the following packages:

library(readr)
library(stringr)
library(dplyr)
library(tidyr)
library(tidyverse)
library(ggplot2)

Question 1

Utilizing the 173 majors listed on fivethirtyeight. The link to the dataset is within this github link here, which will also be uploaded onto my github page. Using the following data from the csv file the following code will identify the majors that contain either “Data” or “Statistics”.

a <- getwd() # Just to set directory
setwd(a)
majors <- read.csv("majors-list.csv")

head(majors)

##   FOD1P                                 Major                  Major_Category
## 1  1100                   GENERAL AGRICULTURE Agriculture & Natural Resources
## 2  1101 AGRICULTURE PRODUCTION AND MANAGEMENT Agriculture & Natural Resources
## 3  1102                AGRICULTURAL ECONOMICS Agriculture & Natural Resources
## 4  1103                       ANIMAL SCIENCES Agriculture & Natural Resources
## 5  1104                          FOOD SCIENCE Agriculture & Natural Resources
## 6  1105            PLANT SCIENCE AND AGRONOMY Agriculture & Natural Resources

grep("DATA|STATISTICS", majors$Major, value = TRUE)

## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "COMPUTER PROGRAMMING AND DATA PROCESSING"     
## [3] "STATISTICS AND DECISION SCIENCE"

From the following code block, of the 173 majors, only 3 majors consist of “Data” or “Statistics”.

Question 2

The following code block will convert the following data:

[1] "bell pepper" "bilberry" "blackberry" "blood orange"

[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"

[9] "elderberry" "lime" "lychee" "mulberry"

[13] "olive" "salal berry"

c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")

# Initialization

fruitsMain <- '[1] "bell pepper"  "bilberry"     "blackberry"   "blood orange"

[5] "blueberry"    "cantaloupe"   "chili pepper" "cloudberry"  

[9] "elderberry"   "lime"         "lychee"       "mulberry"    

[13] "olive"        "salal berry"'

# Remove brackets and numbers
fruitsMod <- gsub('\\[\\d+\\]|\\s{2,}', '', fruitsMain)

# Splices at the quotes

fruitsModded <- unlist(strsplit(fruitsMod, '"')) 

# removes empty white spaces

fruitsModded <- trimws(fruitsModded[fruitsModded != "" & fruitsModded != " "])

fruitsModded <- paste0('c(', paste(shQuote(fruitsModded), collapse = ", "), ')')

#Print final product

cat(fruitsModded)

## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")

Question 3

The following problems will be listed below to explain what will happen to each expression.

(.)\\1\\1

alpha <- c("aaabcdef", "cheese", "banana", "wawawawaw", "Starlette", "Aurora", "Thalassa", "Apollo", "Bobobo", "haroldlorah")

str_view_all(alpha, "(.)\\1\\1", match=TRUE)

## Warning: `str_view_all()` was deprecated in stringr 1.5.0.
## ℹ Please use `str_view()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## [1] │ <aaa>bcdef

This highlights any character that is repeated three times.

"(.)(.)\\2\\1"

str_view(alpha, "(.)(.)\\2\\1", match=TRUE)

## [5] │ Starl<ette>
## [7] │ Thal<assa>
## [8] │ Ap<ollo>

This highlights any characters with xyyx.

(..)\\1

str_view(alpha, "(..)\\1", match=TRUE)

## [3] │ b<anan>a
## [4] │ <wawa><wawa>w
## [9] │ B<obob>o

This one highlights the characters with xyxy.

"(.).\\1.\\1"

str_view(alpha, "(.).\\1.\\1", match=TRUE)

## [3] │ b<anana>
## [4] │ <wawaw>awaw
## [9] │ B<obobo>

This looks for a single character that repeats three times consecutively.

"(.)(.)(.).*\\3\\2\\1"

str_view(alpha, "(.)(.)(.).*\\3\\2\\1", match=TRUE)

##  [4] │ <wawawawaw>
## [10] │ <haroldlorah>

This is basically where the first three characters are followed by their reverse after some random text, this can also be names that are palindromic.

Question 4

The following code blocks will answer the following questions below:

Start and end with the same character. Answer: ^(.).*\\1$

str_view(alpha, "^(.).*\\1$", match=TRUE)

##  [4] │ <wawawawaw>
## [10] │ <haroldlorah>

Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.) Answer: (..).*\\1

str_view(alpha, "(..).*\\1", match=TRUE)

## [3] │ b<anan>a
## [4] │ <wawawawa>w
## [9] │ B<obob>o

Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.) Answer: (.).*\\1.*\\1

str_view(alpha, "(.).*\\1.*\\1", match=TRUE)

## [1] │ <aaa>bcdef
## [2] │ ch<eese>
## [3] │ b<anana>
## [4] │ <wawawawaw>
## [5] │ S<tarlett>e
## [7] │ Th<alassa>
## [9] │ B<obobo>

Data607 - Week 3 Assignment

Anthony Josue Roman

2024-09-14

Question 1

Question 2

Question 3

Question 4