In this document, the four parts of Assignment 3 will be answered with code as the answer or as examples.
In this section, the readr package is used to import data from a CSV file that is located in a Git Hub repository. Then, majors with DATA or STATISTICS are filtered out of the original data frame.
#Import data
library(readr)
library(stringr)
library(gt)
majors <- read_csv("https://raw.githubusercontent.com/juliaDataScience-22/cuny-fall-23/manage-acquire-data/majors-list.csv")
## Rows: 174 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): FOD1P, Major, Major_Category
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
gt(head(majors)) |>
tab_header(
title = "Table 1",
subtitle = "All Majors"
)
| Table 1 | ||
| All Majors | ||
| FOD1P | Major | Major_Category |
|---|---|---|
| 1100 | GENERAL AGRICULTURE | Agriculture & Natural Resources |
| 1101 | AGRICULTURE PRODUCTION AND MANAGEMENT | Agriculture & Natural Resources |
| 1102 | AGRICULTURAL ECONOMICS | Agriculture & Natural Resources |
| 1103 | ANIMAL SCIENCES | Agriculture & Natural Resources |
| 1104 | FOOD SCIENCE | Agriculture & Natural Resources |
| 1105 | PLANT SCIENCE AND AGRONOMY | Agriculture & Natural Resources |
#Filter out majors with DATA
dataMajors <- majors[str_detect(majors$Major, "DATA"),]
#Filter out majors with STATISTICS
statisticsMajors <- majors[str_detect(majors$Major, "STATISTICS"),]
newMajors <- rbind(dataMajors, statisticsMajors)
#Print information about all majors containing DATA or STATISTICS
gt(head(newMajors)) |>
tab_header(
title = "Table 2",
subtitle = "Majors Containing DATA or STATISTICS"
)
| Table 2 | ||
| Majors Containing DATA or STATISTICS | ||
| FOD1P | Major | Major_Category |
|---|---|---|
| 2101 | COMPUTER PROGRAMMING AND DATA PROCESSING | Computers & Mathematics |
| 6212 | MANAGEMENT INFORMATION SYSTEMS AND STATISTICS | Business |
| 3702 | STATISTICS AND DECISION SCIENCE | Computers & Mathematics |
Next, the list of items in foodItems needs to be transformed into the format it was used to create the list.
foodItems <- c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
noquote(paste0("c(",'"',foodItems[1],'"',", ",'"',foodItems[2],'"',', ','"',foodItems[3],'"',', ','"',foodItems[4],'"',', ','"',foodItems[5],'"',', ','"',foodItems[6],'"',', ','"',foodItems[7],'"',', ','"',foodItems[8],'"',', ','"',foodItems[9],'"',', ','"',foodItems[10],'"',', ','"',foodItems[11],'"',', ','"',foodItems[12],'"',', ','"',foodItems[13],'"',', ','"',foodItems[14],'"',')'))
## [1] c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")
(.)\1\1
This expression will match with all words that contain a triple letter. For example, “treee” will match because of the triple E, while “reference” will not match because not all the E’s are next to each other.
“(.)(.)\2\1”
This expression will match with all words that contain the following pattern: abba. The first symbol must be repeated as the fourth symbol. The middle two symbols must match each other. For example, “sees” will match, but “keep” will not.
(..)\1
This expression will match with all words that contain a repeated pattern of two symbols. It will match with the following pattern: abab. For example, “banana” will match because it contains anan, while “decide” will not match because de and de do not repeat next to each other.
“(.).\1.\1”
This expression will match with all words that contain the same symbol three times as every other symbol. The pattern is abaca where b and c can be anything. For example, “caravan” will match because of the arava part, while “greeter” will not match because the three E’s are not every other letter.
“(.)(.)(.).*\3\2\1”
This expression will match with all words in the following pattern: abcdecba. The first three symbols of the pattern must be the last three symbols in reverse order. The middle two symbols can be anything. For example, “snellens” and “acegheca” will match, while “grilling” and “automation” will not match.
All examples are provided in the code below:
words <- c("one","book","treee","banana","hello","sees","decide","snellens","abcdecba","tacos","sweetest","cucumber")
str_view(words, "(.)\\1\\1")
## [3] │ tr<eee>
str_view(words, "(.)(.)\\2\\1")
## [6] │ <sees>
## [8] │ sn<elle>ns
str_view(words, "(..)\\1")
## [4] │ b<anan>a
## [12] │ <cucu>mber
str_view(words, "(.).\\1.\\1")
## [4] │ b<anana>
str_view(words, "(.)(.)(.).*\\3\\2\\1")
## [8] │ <snellens>
## [9] │ <abcdecba>
Start and end with the same character
“^(.).*\\1$”
Contain a repeated pair of letters (together, like ab ab)
“([a-z]|[A-Z])(a-z]|[A-Z]).*?\\1\\2”
Contain one letter repeated in at least three places
“([a-z]|[A-Z]).?\\1.?\\1”
Examples of each are provided in the following code:
str_view(words, "^(.).*?\\1$")
## [6] │ <sees>
## [8] │ <snellens>
## [9] │ <abcdecba>
str_view(words, "([a-z]|[A-Z])([a-z]|[A-Z]).*?\\1\\2")
## [4] │ b<anan>a
## [7] │ <decide>
## [12] │ <cucu>mber
str_view(words, "([a-z]|[A-Z]).?\\1.?\\1")
## [3] │ tr<eee>
## [4] │ b<anana>
## [11] │ sw<eete>st
Wickham, Hadley. Çetinkaya-Rundel, Mine. Grolemund, Garrett. R for Data Science, 2nd Edition. 2023. https://r4ds.hadley.nz/
Lander, Jared P. R for Everyone: Advanced Analytics and Graphics. Addison Wesley, 2017.