Project 2- Data Tidying

Author

Kiera Griffiths, Desiree Thomas,

Approach

For Project 2, we will be tidying 3 datasets. This file focuses on the Generational Takeout Spending dataset. We will import the wide-format CSV file from GitHub into R and transform it into a tidy structure using the tidyr and dplyr packages from tidyverse. We will reshape the generational spending columns into a long format using functions such as pivot_longer(), then standardize column names with rename_with(). Our dataset did not contain any missing or inconsistent values so we did not need to use functions such as drop_na() to prepare the data for analysis. Finally, we will analyze trends in takeout spending across generations and visualize the results using ggplot2, though a potential challenge may be restructuring the generational columns correctly during the wide-to-long transformation.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tidyr)
library(dplyr)
library(ggplot2)
library(gt)

Data Source

We created the dataset based on studies showing the takeout spending habits across various generations. Imported this raw CSV file from the main branch of our project 2 public GitHub repository.

generational_takeout <- read.csv("https://raw.githubusercontent.com/meiqing39/607-Project-2/refs/heads/main/Generational%20Takeout%20Spending.csv")

Data Structure Before Tidying

To perform an initial inspection of the dataset, we printed the generational_takeout tibble to the console. This provided a summary of the data structure, confirming a dimension of 5 rows and 7 columns. By viewing the data in this format, we were able to verify that the spending values were correctly imported and that the dataset currently exists in a wide format.

generational_takeout

    Generation Jan.25 Feb.25 Mar.25 Apr.25 May.25 Jun.25
1        Gen Z    185    172    198    210    205    225
2  Millennials    240    228    255    270    265    285
3        Gen X    195    188    205    215    210    220
4 Baby Boomers    120    115    130    140    135    145
5   Silent Gen     75     70     82     88     85     90

Transformation

We reshaped the dataset from wide to tidy (long) format. We normalized variable structures and renamed variables to follow a consistent naming convention. We did not have missing or inconsistent values to address.

tidy_gen_takeout <- generational_takeout %>%
  
  # Reshape from wide to long (tidy) format
  pivot_longer(
    cols = -Generation,       # all columns except 'Generation'
    names_to = "period",      # column names (Jan.25, Feb.25...) go into 'period'
    values_to = "expenses"    # values go into 'expenses'
  ) %>%
 
  # Split 'period' into month and year, this is part of the normalization process.
  #\\. is used to escape the '.' and prevent an error in the separate() function; '.' on it's own is a wildcard character 
  separate(period, into = c("month", "year"), sep = "\\.") %>%
 
   # Standardize year to 4 digits and convert Generation to factor
  mutate(
    year = paste0("20", year),
    Generation = as.factor(Generation)
  ) %>%
 
   # Standardize column names manually
  rename(
    generation = Generation
  )

# Preview tidy dataset
tidy_gen_takeout %>%
   print(n = Inf)

# A tibble: 30 × 4
   generation   month year  expenses
   <fct>        <chr> <chr>    <int>
 1 Gen Z        Jan   2025       185
 2 Gen Z        Feb   2025       172
 3 Gen Z        Mar   2025       198
 4 Gen Z        Apr   2025       210
 5 Gen Z        May   2025       205
 6 Gen Z        Jun   2025       225
 7 Millennials  Jan   2025       240
 8 Millennials  Feb   2025       228
 9 Millennials  Mar   2025       255
10 Millennials  Apr   2025       270
11 Millennials  May   2025       265
12 Millennials  Jun   2025       285
13 Gen X        Jan   2025       195
14 Gen X        Feb   2025       188
15 Gen X        Mar   2025       205
16 Gen X        Apr   2025       215
17 Gen X        May   2025       210
18 Gen X        Jun   2025       220
19 Baby Boomers Jan   2025       120
20 Baby Boomers Feb   2025       115
21 Baby Boomers Mar   2025       130
22 Baby Boomers Apr   2025       140
23 Baby Boomers May   2025       135
24 Baby Boomers Jun   2025       145
25 Silent Gen   Jan   2025        75
26 Silent Gen   Feb   2025        70
27 Silent Gen   Mar   2025        82
28 Silent Gen   Apr   2025        88
29 Silent Gen   May   2025        85
30 Silent Gen   Jun   2025        90

Analysis

We provided two analyses for our dataset. First we wanted to calculate each generation’s percentage of each month’s total spending. We loaded tidy_gen_takeout into the dataframe gen_monthly_percentage, calculated the percentages then plotted the results in a bar graph and summary table.

The next analysis calculated the average monthly spending for each generation, over the entire 6-month period. We loaded the tidy_gen_takeout into a dataframe named generation_analysis. We ranked the generations in descending order and plotted the results in a bar graph and summary table for visualization.

# Generational Share of Monthly Takeout Spending

gen_monthly_percentage <- tidy_gen_takeout %>%
  group_by(month) %>%                       # group by month
  mutate(total_month = sum(expenses)) %>%  # total spending across all generations per month
  ungroup() %>%
  mutate(percent = round((expenses / total_month) * 100)) %>%  # round to whole %
  select(month, generation, expenses, percent) %>%
  arrange(match(month, c("Jan","Feb","Mar","Apr","May","Jun")), generation)

gen_monthly_percentage %>%
  mutate(
    expenses = paste0("$", expenses),
    percent = paste0(percent, "%")
  )

# A tibble: 30 × 4
   month generation   expenses percent
   <chr> <fct>        <chr>    <chr>  
 1 Jan   Baby Boomers $120     15%    
 2 Jan   Gen X        $195     24%    
 3 Jan   Gen Z        $185     23%    
 4 Jan   Millennials  $240     29%    
 5 Jan   Silent Gen   $75      9%     
 6 Feb   Baby Boomers $115     15%    
 7 Feb   Gen X        $188     24%    
 8 Feb   Gen Z        $172     22%    
 9 Feb   Millennials  $228     29%    
10 Feb   Silent Gen   $70      9%     
# ℹ 20 more rows

# Month order
month_levels <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun")

# Convert month column to ordered factor
gen_monthly_percentage <- gen_monthly_percentage %>%
  mutate(month = factor(month, levels = month_levels))

ggplot(gen_monthly_percentage, aes(x = month, y = percent, fill = generation)) +
  geom_bar(stat = "identity", position = "stack") +   # stacked by generation
  geom_text(aes(label = paste0(percent, "%")),
            position = position_stack(vjust = 0.5),   # labels inside each bar
            size = 3) +
  labs(
    title = "Generational Percentage of Monthly Takeout Spending",
    x = "Month",
    y = "Percentage of Total Monthly Spending (%)",
    fill = "Generation"
  ) +
  theme_minimal()

# Prepare summary table: months as rows, generations as columns
summary_table <- gen_monthly_percentage %>%
  select(month, generation, percent) %>%
  pivot_wider(names_from = generation, values_from = percent) %>%
  arrange(match(month, c("Jan","Feb","Mar","Apr","May","Jun")))

# Create styled summary chart
summary_table %>%
  gt() %>%
  tab_header(
    title = "Generational Percentages per Month (%)"
  ) %>%
  cols_label(
    month = "Month"
  ) %>%
  cols_align(
    align = "center",
    columns = everything()
  ) %>%
  cols_align(
    align = "left",
    columns = month
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_body(
      columns = month
    )
  ) %>%
  tab_style(
    style = cell_fill(color = "lightgray"),
    locations = cells_body(
      rows = seq(1, nrow(summary_table), 2)
    )
  )

Month	Baby Boomers	Gen X	Gen Z	Millennials	Silent Gen
Generational Percentages per Month (%)
Jan	15	24	23	29	9
Feb	15	24	22	29	9
Mar	15	24	23	29	9
Apr	15	23	23	29	10
May	15	23	23	29	9
Jun	15	23	23	30	9

generational_analysis <- tidy_gen_takeout %>%

# Group by Generation
  group_by(generation) %>%

# Calculate the average spending across the months
  summarise(avg_monthly_spending = round(mean(expenses))) %>%
  
# Order from highest to lowest
  arrange(desc(avg_monthly_spending)) 

# Add $ to amounts only in presentation, as not to chnage the character structure.
  generational_analysis  %>%
  mutate(
    avg_monthly_spending = paste0("$", avg_monthly_spending)
  )

# A tibble: 5 × 2
  generation   avg_monthly_spending
  <fct>        <chr>               
1 Millennials  $257                
2 Gen X        $206                
3 Gen Z        $199                
4 Baby Boomers $131                
5 Silent Gen   $82

# View the final ranked results
print(generational_analysis)

# A tibble: 5 × 2
  generation   avg_monthly_spending
  <fct>                       <dbl>
1 Millennials                   257
2 Gen X                         206
3 Gen Z                         199
4 Baby Boomers                  131
5 Silent Gen                     82

ggplot(generational_analysis, 
       aes(x = reorder(generation, avg_monthly_spending),
           y = avg_monthly_spending, 
           fill = generation)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = paste0("$", avg_monthly_spending)),
            vjust = -0.5, size = 3.8) +
  labs(
    title = "Average Monthly Takeout Spending by Generation",
    x = "Generation",
    y = "Average Monthly Spending ($)",
    fill = "Generation"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

generational_analysis %>%
  gt() %>%
  tab_header(
    title = "Average Monthly Takeout Spending by Generation"
  ) %>%
  cols_label(
    generation = "Generation",
    avg_monthly_spending = "Monthly Average ($)"
  ) %>%
  cols_align(
    align = "center",
    columns = everything()
  ) %>%
  tab_style(
    style = cell_fill(color = "lightgray"),
    locations = cells_body(
      rows = seq(1, nrow(generational_analysis), 2)
    )
  )

Generation	Monthly Average ($)
Average Monthly Takeout Spending by Generation
Millennials	257
Gen X	206
Gen Z	199
Baby Boomers	131
Silent Gen	82

Conclusion

After transforming the dataset from wide to tidy, we performed two analyses: the generational percentage of total monthly spending and the average monthly spending by generation.

The generational percentage of total monthly spending analysis shows Millennials account for the largest share of each month’s takeout spending, ranging from 29–30%. Gen X and Gen Z spend almost identical shares each month, at 23-24% and 22-23%, respectively. And as the older generations, Baby Boomers only account for 15%, while the Silent Generation stays at 9-10%, each month.

The average monthly spending analysis by generation supports these findings by showing Millennials spend the most, with an average monthly takeout expenditure of approximately $257, followed by Gen X at $206 and Gen Z at roughly $199. Baby Boomers were in fourth with an average of $131, while the Silent Generation had the lowest monthly average at $82 per month, $175 less, per month, than the highest spender.

Overall, the analyses presented in the visualizations and summary tables highlight multiple generational differences in spending behavior. Approximately 75% of monthly takeout spending is done by the younger three generations, though Millenials, who are in the middle of the three, account for 30%, a disproportionately large amount. The data also shows an interesting trend where Gen X and Gen Z are spending almost on par each month, though there’s a generation between them. Gen X is often the parent of Gen Z, so it could be informative to explore their spending habits deeper. The older generations, Baby Boomers and the Silent Generation spending the lowest amounts was expected though.

Citation

(Google DeepMind). (2026). Gemini Pro 3.1 [Large language model]. https://gemini.google.com. Accessed March 7th, 2026.