Please deliver links to an R Markdown file (in GitHub and rpubs.com) with solutions to the problems below. You may work in a small group, but please submit separately with names of all group participants in your submission.
Dataframe 1 - supermarket customer purchase data including what item they bought
supermarket_customer <- data.frame(
customer_id = c(1, 2, 3, 4, 5),
name = c("Addie", "Eddie", "Elma", "Saif", "Dawa"),
item = c("Apple", "Banana", "Pie", "Apple", "Donut"),
stringsAsFactors = FALSE
)
print(supermarket_customer)
## customer_id name item
## 1 1 Addie Apple
## 2 2 Eddie Banana
## 3 3 Elma Pie
## 4 4 Saif Apple
## 5 5 Dawa Donut
Dataframe 2 - the date the customers ordered the items
customer_orders <- data.frame(
order_id = c(200, 201, 202, 203, 204),
customer_id = c(1, 2, 3, 4, 5),
order_date = as.Date(c("2025-02-01", "2025-02-02", "2025-02-03", "2025-02-04", "2025-02-05")),
stringsAsFactors = FALSE
)
print(customer_orders)
## order_id customer_id order_date
## 1 200 1 2025-02-01
## 2 201 2 2025-02-02
## 3 202 3 2025-02-03
## 4 203 4 2025-02-04
## 5 204 5 2025-02-05
Dataframe 3 - the total amount of money the customers spent for each transaction
order_total <- data.frame(
order_total_id = c(300, 301, 302, 303, 304),
order_id = c(200, 201, 202, 203, 204),
order_total = c(20.25, 34.52, 11.23, 23.87, 5.36),
stringsAsFactors = FALSE
)
print(order_total)
## order_total_id order_id order_total
## 1 300 200 20.25
## 2 301 201 34.52
## 3 302 202 11.23
## 4 303 203 23.87
## 5 304 204 5.36
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
college_major_data <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/refs/heads/master/college-majors/majors-list.csv")
## Rows: 174 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): FOD1P, Major, Major_Category
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(college_major_data)
## Rows: 174
## Columns: 3
## $ FOD1P <chr> "1100", "1101", "1102", "1103", "1104", "1105", "1106",…
## $ Major <chr> "GENERAL AGRICULTURE", "AGRICULTURE PRODUCTION AND MANA…
## $ Major_Category <chr> "Agriculture & Natural Resources", "Agriculture & Natur…
data_stats_majors <- college_major_data %>%
filter(str_detect(Major, regex("DATA|STATISTICS", ignore_case = TRUE)))
print(data_stats_majors)
## # A tibble: 3 × 3
## FOD1P Major Major_Category
## <chr> <chr> <chr>
## 1 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS Business
## 2 2101 COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 3 3702 STATISTICS AND DECISION SCIENCE Computers & Mathematics
(.)\1\1
The (.) portion campures single characters while \1\1 matches the occurrences of the same characters so:
this expression matches any characters repeated three times
example: “???”
“(.)(.)\2\1”
This expression matches a pattern where a two character sequence separately then it matches the second character that is followed by the original one
example: “abba”
(..)\1
this expression matches any two character sequences that are repeated right after
example: “hihi”
“(.).\1.\1”
this expression matches a five character string where the first, third, and fifth characters are the same
example: hahyh”
**“(.)(.)(.).*\3\2\1”**
this expression macthes a string that begins with three characters and ends with the sme three but in reverse order
example: xyzhihowareyouzyx”
start_end_regex <- "^(.).*\\1$"
#Time to test
test_words <- c("banana", "apple", "alpha", "bulb")
grep(start_end_regex, test_words, value = TRUE)
## [1] "alpha" "bulb"
repeated_pair_regex <- "([A-Za-z]{2}).*\\1"
#time to test
test_words <- c("church", "happy", "jolly", "cat")
grep(repeated_pair_regex, test_words, value = TRUE)
## [1] "church"
repeat_3_regex <- ".*([A-Za-z]).*\\1.*\\1.*"
#time to test
test_words <- c("eleven", "banana", "ice cream", "apple")
grep(repeat_3_regex, test_words, value = TRUE)
## [1] "eleven" "banana"