Project 2 Code Base Submission Dataset 2

Author

Long Lin

Overview

For this project, I used three different datasets from the Week 5 Discussion 5A post. With these three datasets, I prepared each of them by creating a .csv file and importing the data. Then I worked on tidying the data, and performing an analysis on the dataset. I also made sure that the code within the Quarto Markdown file is reproducible in a clean environment. I used a similar process to what we did in Assignment 5A with the Airline Delays, as I feel like that is very similar assignment to this.

Dataset 2:

Generational Takeout Spending posted by Kiera Griffiths

source: https://raw.githubusercontent.com/longflin/DATA-607/refs/heads/main/Project%202/Untidy%20Generational%20Takeout%20Spending.csv

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)
library(dplyr)
library(gt)

generational_spending_url <- "https://raw.githubusercontent.com/longflin/DATA-607/refs/heads/main/Project%202/Untidy%20Generational%20Takeout%20Spending.csv"

generational_spending_df <- read_csv(
  file = generational_spending_url,
  show_col_types = FALSE,
  progress = FALSE
)

head(generational_spending_df)
# A tibble: 5 × 7
  Generation   Jan.2025 Feb.2025 Mar.2025 Apr.2025 May.2025 Jun.2025
  <chr>           <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1 Gen Z             185      172      198      210      205      225
2 Millenials        240      228      255      270      265      285
3 Gen X             195      188      205      215      210      220
4 Baby Boomers      120      115      130      140      135      145
5 Silent Gen         75       70       82       88       85       90

I fixed the missing values in this dataset by replacing them with 0 using the following code chunk.

generational_spending_df <- generational_spending_df |>
  mutate(across("Jan.2025":"Jun.2025", ~replace_na(.x, 0)))

head(generational_spending_df)
# A tibble: 5 × 7
  Generation   Jan.2025 Feb.2025 Mar.2025 Apr.2025 May.2025 Jun.2025
  <chr>           <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1 Gen Z             185      172      198      210      205      225
2 Millenials        240      228      255      270      265      285
3 Gen X             195      188      205      215      210      220
4 Baby Boomers      120      115      130      140      135      145
5 Silent Gen         75       70       82       88       85       90

I converted the data from a wide data format to a long data format using pivot_longer.

long_generational_spending_df <- generational_spending_df |>
  pivot_longer(
    cols = c("Jan.2025", "Feb.2025", "Mar.2025", "Apr.2025", "May.2025", "Jun.2025"),
    names_to = "Month",
    values_to = "Average Takeout Spending"
  )
head(long_generational_spending_df, 20)
# A tibble: 20 × 3
   Generation   Month    `Average Takeout Spending`
   <chr>        <chr>                         <dbl>
 1 Gen Z        Jan.2025                        185
 2 Gen Z        Feb.2025                        172
 3 Gen Z        Mar.2025                        198
 4 Gen Z        Apr.2025                        210
 5 Gen Z        May.2025                        205
 6 Gen Z        Jun.2025                        225
 7 Millenials   Jan.2025                        240
 8 Millenials   Feb.2025                        228
 9 Millenials   Mar.2025                        255
10 Millenials   Apr.2025                        270
11 Millenials   May.2025                        265
12 Millenials   Jun.2025                        285
13 Gen X        Jan.2025                        195
14 Gen X        Feb.2025                        188
15 Gen X        Mar.2025                        205
16 Gen X        Apr.2025                        215
17 Gen X        May.2025                        210
18 Gen X        Jun.2025                        220
19 Baby Boomers Jan.2025                        120
20 Baby Boomers Feb.2025                        115

I filtered the data for January 2025 using the following code chunk. Then I created a table for the January 2025 data.

Jan_only_df <- long_generational_spending_df |>
  filter(`Month` == 'Jan.2025')
head(Jan_only_df, 20)
# A tibble: 5 × 3
  Generation   Month    `Average Takeout Spending`
  <chr>        <chr>                         <dbl>
1 Gen Z        Jan.2025                        185
2 Millenials   Jan.2025                        240
3 Gen X        Jan.2025                        195
4 Baby Boomers Jan.2025                        120
5 Silent Gen   Jan.2025                         75
Jan_only_df |>
  gt() |>
  cols_hide(columns = c(`Month`)) |>
  tab_header(
    title = "Average Monthly Takeout Spending by Generation in Jan 2025",
  )
Average Monthly Takeout Spending by Generation in Jan 2025
Generation Average Takeout Spending
Gen Z 185
Millenials 240
Gen X 195
Baby Boomers 120
Silent Gen 75

I created a bar graph for the January 2025 data using the following code chunk.

ggplot(Jan_only_df, aes(x = Generation, y = `Average Takeout Spending`)) +
  geom_col(fill = "steelblue") +
  theme_minimal() +
  labs(title = "Generational Average Monthly Takeout Spending in Jan 2025")

Based on the plot for January’s Data, we can see that Millenials spend the most on average for takeout in January 2025 and the Silent Gen spends the least on average for takeout in January 2025.

Next I filtered the data for June 2025 using the following code chunk. Then I created another table for the June 2025 data.

Jun_only_df <- long_generational_spending_df |>
  filter(`Month` == 'Jun.2025')
head(Jun_only_df, 20)
# A tibble: 5 × 3
  Generation   Month    `Average Takeout Spending`
  <chr>        <chr>                         <dbl>
1 Gen Z        Jun.2025                        225
2 Millenials   Jun.2025                        285
3 Gen X        Jun.2025                        220
4 Baby Boomers Jun.2025                        145
5 Silent Gen   Jun.2025                         90
Jun_only_df |>
  gt() |>
  cols_hide(columns = c(`Month`)) |>
  tab_header(
    title = "Average Monthly Takeout Spending by Generation in June 2025",
  )
Average Monthly Takeout Spending by Generation in June 2025
Generation Average Takeout Spending
Gen Z 225
Millenials 285
Gen X 220
Baby Boomers 145
Silent Gen 90

Next I created a bar graph for the June 2025 data using the following code chunk.

ggplot(Jun_only_df, aes(x = Generation, y = `Average Takeout Spending`)) +
  geom_col(fill = "steelblue") +
  theme_minimal() +
  labs(title = "Generational Average Monthly Takeout Spending in June 2025")

From this plot for June’s Data, we can see that Millenials still spend the most on average for takeout in January 2025 and the Silent Gen still spends the least on average for takeout in June 2025.