Regork is looking for new revenue growth opportunities, and my analysis reveals that seasonal cross-category purchasing presents a major untapped potential. Currently, many strong product pairings lack targeted promotions, leading to missed revenue.
By optimizing promotions and store layout based on seasonal trends, Regork can increase cross-category purchases by up to 15% and boost revenue by an estimated $5M annually.
To uncover these opportunities, I analyzed transactional data, customer segments, and promotional effectiveness to identify the best strategies for targeted marketing and product bundling.
Regork, as a national grocery chain, faces the continuous challenge of identifying new growth opportunities in a highly competitive market. Our analysis aims to address the following business question:
“Are there seasonal cross-category purchasing patterns that Regork isn’t currently capitalizing on, particularly among high-value customers?”
The retail grocery industry is highly seasonal, with different product categories experiencing fluctuations throughout the year. However, the relationships between categories during these seasonal shifts often remain unexplored. By identifying these cross-category seasonal patterns, Regork can develop targeted marketing strategies, optimize inventory management, and increase both customer satisfaction and revenue.
To address this business question, we will:
By understanding these seasonal cross-category relationships, we can help Regork:
Data Manipulation and Tidying
tidyverse: A suite of packages for data manipulation and visualization.
lubridate: Provides functions to work with date and time data.
janitor: Helps clean and standardize data efficiently.
Visualization
ggplot2: The primary package for creating static visualizations.
plotly: Enables interactive plotting capabilities.
viridis: Provides color-blind friendly palettes for visualization.
corrplot: Used to visualize correlation matrices.
patchwork: Helps in combining multiple plots into a cohesive layout.
Analysis
tidymodels: A framework for building machine learning models.
arules: Supports association rule mining.
arulesViz: Provides visualization capabilities for association rules.
zoo: Facilitates time series analysis.
tsibble: Used for handling time series data in tibble format.
feasts: Supports feature extraction for time series analysis.
# Data manipulation and tidying
library(tidyverse)
library(lubridate)
library(janitor)
library(dplyr)
library(magrittr)
# Visualization
library(ggplot2)
library(plotly)
library(viridis)
library(corrplot)
library(patchwork)
# Analysis
library(tidymodels)
library(arules)
library(arulesViz)
library(zoo)
library(tsibble)
library(feasts)
library(kableExtra)
library(completejourney)
# Custom theme for consistent plot formatting
theme_regork <- function() {
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(size = 12),
axis.title = element_text(face = "bold"),
legend.title = element_text(face = "bold"),
panel.grid.minor = element_blank()
)
}transactions <- get_transactions()
products <- completejourney::products
demographics <- completejourney::demographics
campaigns <- completejourney::campaigns
coupon_redemptions <- completejourney::coupon_redemptions
promotions <- get_promotions()
head(transactions)transactions_products <- transactions %>%
left_join(products, by = "product_id") %>%
mutate(transaction_date = as.Date(transaction_timestamp)) %>%
mutate(
month = month(transaction_date),
year = year(transaction_date),
month_name = month(transaction_date, label = TRUE),
day_of_week = wday(transaction_date, label = TRUE)
)
customer_spending <- transactions_products %>%
group_by(household_id) %>%
summarize(
total_spent = sum(sales_value, na.rm = TRUE),
avg_basket_size = mean(sales_value, na.rm = TRUE),
transaction_count = n_distinct(basket_id),
unique_products = n_distinct(product_id),
unique_departments = n_distinct(department),
first_purchase = min(transaction_date),
last_purchase = max(transaction_date),
purchase_frequency = transaction_count / as.numeric(difftime(last_purchase, first_purchase, units = "days")) * 30,
.groups = "drop"
) %>%
filter(transaction_count > 1)
demographics <- completejourney::demographics
customer_profile <- customer_spending %>%
left_join(demographics, by = "household_id") %>%
mutate(
income = ifelse(is.na(income), "Unknown", income),
age = ifelse(is.na(age), "Unknown", age),
household_size = ifelse(is.na(household_size), "Unknown", household_size),
marital_status = ifelse(is.na(marital_status), "Unknown", marital_status),
home_ownership = ifelse(is.na(home_ownership), "Unknown", home_ownership)
)
customer_profile <- customer_profile %>%
mutate(
spending_quartile = ntile(total_spent, 4),
customer_segment = case_when(
spending_quartile == 4 ~ "High Value",
spending_quartile == 3 ~ "Medium-High Value",
spending_quartile == 2 ~ "Medium-Low Value",
spending_quartile == 1 ~ "Low Value"
)
)
seasonal_category_sales <- transactions_products %>%
group_by(month, month_name, department, product_category) %>%
summarize(
total_sales = sum(sales_value, na.rm = TRUE),
total_units = sum(quantity, na.rm = TRUE),
unique_customers = n_distinct(household_id),
.groups = "drop"
)
campaign_details <- completejourney::campaign_descriptions # Check if this dataset exists
campaign_timeline <- campaign_descriptions %>%
mutate(
start_date = as.Date(start_date, origin = "1970-01-01"),
end_date = as.Date(end_date, origin = "1970-01-01")
) %>%
mutate(
month_start = month(start_date),
month_end = month(end_date)
)
basket_analysis <- transactions_products %>%
select(basket_id, household_id, department, product_category, month, month_name, sales_value) %>%
mutate(product_category = paste(department, product_category, sep = " - "))Looking at this chart, we can see that most customers fall into a similar spending range, but there’s a small group that really stands out—the top 25% of spenders. These are our high-value customers, the ones contributing the most to our revenue. If we focus on keeping them engaged and finding more like them, we could see a big impact on overall sales.
ggplot(customer_profile, aes(x = total_spent)) +
geom_histogram(bins = 50, fill = "steelblue", alpha = 0.8) +
geom_vline(
aes(xintercept = quantile(total_spent, 0.75)),
color = "darkred", linetype = "dashed", size = 1
) +
annotate(
"text",
x = quantile(customer_profile$total_spent, 0.75) * 1.1,
y = 500,
label = "Top 25% of customers",
color = "darkred"
) +
labs(
title = "Distribution of Customer Spending",
subtitle = "Top 25% of customers represent high-value segment",
x = "Total Spent ($)",
y = "Number of Customers"
) +
theme_regork()segment_demographics <- customer_profile %>%
group_by(customer_segment, income) %>%
summarize(count = n(), .groups = "drop") %>%
group_by(customer_segment) %>%
mutate(percentage = count / sum(count) * 100) %>%
filter(income != "Unknown")
segment_demographics$income <- factor(segment_demographics$income,
levels = as.character(sort(as.numeric(unique(segment_demographics$income)))))
ggplot(segment_demographics, aes(x = income, y = percentage, fill = customer_segment)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(
values = c(
"High Value" = "#91c8bd", # Orange
"Low Value" = "#6595a7", # Blue
"Medium-High Value" = "#5fc2e1", # Green
"Medium-Low Value" = "#0777a7" # Red
)) +
labs(
title = "Income Distribution by Customer Segment",
x = "Income Bracket",
y = "Percentage (%)",
fill = "Customer Segment"
) +
theme_regork() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Sales change with the seasons, and some departments see bigger spikes than others. The trend line shows how sales shift month to month, while the heatmap highlights when each department sees the most action. Knowing these patterns helps us stock smarter and time promotions for maximum impact.
top_departments <- transactions_products %>%
group_by(department) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(total_sales)) %>%
head(10) %>%
pull(department)
monthly_department_sales <- transactions_products %>%
filter(department %in% top_departments) %>%
group_by(month, month_name, department) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop")
ggplot(monthly_department_sales, aes(x = month, y = total_sales, color = department, group = department)) +
geom_line(size = 1) +
geom_point(size = 2) +
scale_x_continuous(breaks = 1:12, labels = month.abb) +
scale_color_viridis_d(option = "mako") +
labs(
title = "Seasonal Sales Trends by Department",
subtitle = "Monthly sales patterns for top 10 departments",
x = "Month",
y = "Total Sales ($)",
color = "Department"
) +
theme_regork()ggplot(monthly_department_sales, aes(x = month_name, y = department, fill = total_sales)) +
geom_tile() +
scale_fill_viridis_c(option = "mako", direction = -1) +
labs(
title = "Monthly Sales Heatmap by Department",
x = "Month",
y = "Department",
fill = "Total Sales ($)"
) +
theme_minimal() + theme( axis.text.x = element_text(angle = 45, hjust = 1, size = 10), axis.text.y = element_text(size = 10) )Customers rarely buy just one item—they tend to grab related products. This analysis uncovers which departments are frequently purchased together, revealing strong bundling opportunities.
The first chart highlights which departments most often appear in Grocery baskets, helping us identify natural pairings.
The second chart compares winter vs. summer buying patterns, showing how seasonal preferences shift. These insights can help us create smarter promotions and optimize store layouts.
analyze_cross_purchases <- function(dept, month_filter = NULL) {
# Base dataset
base_data <- basket_analysis
colnames(basket_analysis)
month_filter <- 1 # Set to January if not provided
if (!is.null(month_filter)) {
base_data <- base_data %>% filter(month == month_filter)
}
target_baskets <- base_data %>%
filter(department == dept) %>%
distinct(basket_id)
cross_purchases <- base_data %>%
filter(
basket_id %in% target_baskets$basket_id,
department != dept
) %>%
group_by(department) %>%
summarize(
frequency = n_distinct(basket_id),
total_sales = sum(sales_value, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
basket_pct = frequency / nrow(target_baskets) * 100,
avg_sales = total_sales / frequency
) %>%
arrange(desc(basket_pct))
return(cross_purchases)
}
grocery_cross_purchases <- analyze_cross_purchases("GROCERY")
ggplot(head(grocery_cross_purchases, 10), aes(x = reorder(department, basket_pct), y = basket_pct)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(
title = "Top Departments Co-Purchased with GROCERY",
subtitle = "Percentage of GROCERY baskets containing other departments",
x = "Department",
y = "% of GROCERY Baskets"
) +
theme_regork()winter_grocery_cross <- analyze_cross_purchases("GROCERY", month_filter = 1) %>%
mutate(season = "Winter (January)")
summer_grocery_cross <- analyze_cross_purchases("GROCERY", month_filter = 7) %>%
mutate(season = "Summer (July)")
seasonal_cross_compare <- bind_rows(winter_grocery_cross, summer_grocery_cross) %>%
filter(department %in% head(arrange(grocery_cross_purchases, desc(basket_pct)), 8)$department)
ggplot(seasonal_cross_compare, aes(x = department, y = basket_pct, fill = season)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
scale_fill_viridis_d(option = "mako",begin = 0.5, end = 0.9) +
labs(
title = "Seasonal Differences in Cross-Department Purchases with GROCERY",
subtitle = "Winter vs. Summer co-purchase patterns",
x = "Department",
y = "% of GROCERY Baskets",
fill = "Season"
) +
theme_minimal()I looked into how high-value customers shop throughout the year and how their spending compares to lower-value customers. The line chart highlights when their spending spikes, showing key seasonal trends. This helps us figure out the best times to run promotions and adjust inventory to keep our top customers engaged and spending more.
transactions_with_segments <- transactions_products %>%
left_join(
select(customer_profile, household_id, customer_segment),
by = "household_id"
)
segment_seasonal_patterns <- transactions_with_segments %>%
filter(!is.na(customer_segment)) %>%
group_by(customer_segment, month, month_name, department) %>%
summarize(
total_sales = sum(sales_value, na.rm = TRUE),
basket_count = n_distinct(basket_id),
.groups = "drop"
) %>%
filter(department %in% top_departments)
high_value_seasonal <- segment_seasonal_patterns %>%
filter(customer_segment %in% c("High Value", "Low Value")) %>%
group_by(customer_segment, department) %>%
mutate(
normalized_sales = total_sales / mean(total_sales),
monthly_share = total_sales / sum(total_sales) * 100
)
ggplot(high_value_seasonal,
aes(x = month, y = normalized_sales, color = customer_segment, group = customer_segment)) +
geom_line(size = 1) +
geom_point() +
facet_wrap(~department, scales = "free_y") +
scale_x_continuous(breaks = 1:12, labels = month.abb) +
scale_color_manual(values = c("High Value" = "#1E88E5", "Low Value" = "#91c8bd")) +
labs(
title = "Seasonal Purchasing Patterns: High-Value vs. Low-Value Customers",
subtitle = "Normalized monthly sales by department (1.0 = average month)",
x = "Month",
y = "Normalized Sales (Avg Month = 1.0)",
color = "Customer Segment"
) +
theme_regork() +
theme(
strip.background = element_rect(fill = "gray90"),
strip.text = element_text(face = "bold")
)I analyzed how promotions affect customer spending—do they lead to higher purchases? Do they encourage shoppers to explore more departments?
The first chart compares basket values with and without promotions across different customer segments, showing if promotions are actually influencing spending.
The second chart explores cross-category shopping, revealing whether promotions drive customers to pick up items from more departments. These insights help us fine-tune our promotional strategies for maximum impact.
transactions <- get_transactions()
products <- completejourney::products
promotions <- get_promotions()
transactions_with_promos <- transactions_products %>%
left_join(promotions, by = c("product_id", "store_id")) %>%
mutate(
has_promotion = !is.na(display_location) | !is.na(mailer_location)
)
promo_effectiveness <- transactions_with_promos %>%
left_join(
select(customer_profile, household_id, customer_segment),
by = "household_id"
) %>%
filter(!is.na(customer_segment)) %>%
group_by(month, month_name, customer_segment, has_promotion) %>%
summarize(
total_sales = sum(sales_value, na.rm = TRUE),
total_baskets = n_distinct(basket_id),
avg_basket_value = total_sales / total_baskets,
.groups = "drop"
)
ggplot(promo_effectiveness,
aes(x = month, y = avg_basket_value, color = has_promotion, group = has_promotion)) +
geom_line(size = 1) +
geom_point(size = 2) +
facet_wrap(~customer_segment) +
scale_x_continuous(breaks = 1:12, labels = month.abb) +
scale_color_manual(
values = c("TRUE" = "#a2d9ce", "FALSE" = "steelblue"),
labels = c("TRUE" = "With Promotion", "FALSE" = "Without Promotion")
) +
labs(
title = "Promotional Impact on Basket Value by Customer Segment",
subtitle = "Monthly comparison of promotional vs. non-promotional purchases",
x = "Month",
y = "Average Basket Value ($)",
color = "Promotion Status"
) +
theme_regork() +
theme(
strip.background = element_rect(fill = "gray90"),
strip.text = element_text(face = "bold")
)cross_category_promo <- transactions_with_promos %>%
group_by(basket_id) %>%
summarize(
total_departments = n_distinct(department),
has_any_promotion = any(has_promotion),
basket_value = sum(sales_value),
.groups = "drop"
) %>%
group_by(total_departments, has_any_promotion) %>%
summarize(
avg_basket_value = mean(basket_value),
basket_count = n(),
.groups = "drop"
) %>%
filter(total_departments <= 8) # Filter out extreme outliers
ggplot(cross_category_promo,
aes(x = total_departments, y = avg_basket_value, fill = has_any_promotion)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(
values = c("TRUE" = "steelblue", "FALSE" = "#a2d9ce"),
labels = c("TRUE" = "With Promotion", "FALSE" = "Without Promotion")
) +
labs(
title = "Impact of Promotions on Cross-Department Shopping",
subtitle = "Average basket value by number of departments purchased",
x = "Number of Departments in Basket",
y = "Average Basket Value ($)",
fill = "Promotion Status"
) +
theme_minimal()Some product categories are frequently bought together during certain seasons, but they aren’t always included in promotions. This analysis highlights those missed opportunities—highly seasonal department pairs that currently have little or no promotional support.
The first chart ranks the top 10 department pairs with strong seasonal trends but low promotional coverage. These are prime opportunities for bundling and targeted marketing.
The **second chart*8 zooms in on the top department pair, showing how their co-purchase rate shifts throughout the year. Timing promotions around these patterns could help capture more sales at the right moments.
seasonal_cross_category <- transactions_products %>%
select(basket_id, household_id, department, month, month_name) %>%
distinct() %>%
group_by(basket_id, month, month_name) %>%
filter(n_distinct(department) > 1) %>%
group_by(month, month_name) %>%
summarize(
total_multi_dept_baskets = n_distinct(basket_id),
.groups = "drop"
)
dept_pairs_monthly <- transactions_products %>%
select(basket_id, department, month, month_name) %>%
distinct() %>%
inner_join(
select(transactions_products, basket_id, department2 = department, month2 = month) %>% distinct(),
by = "basket_id"
) %>%
filter(
month == month2,
department < department2
) %>%
group_by(month, month_name, department, department2) %>%
summarize(
pair_count = n_distinct(basket_id),
.groups = "drop"
) %>%
left_join(seasonal_cross_category, by = c("month", "month_name")) %>%
mutate(
pair_pct = pair_count / total_multi_dept_baskets * 100
)
dept_pair_seasonality <- dept_pairs_monthly %>%
group_by(department, department2) %>%
summarize(
avg_pair_pct = mean(pair_pct),
max_pair_pct = max(pair_pct),
min_pair_pct = min(pair_pct),
seasonality_index = max_pair_pct / min_pair_pct,
seasonal_range = max_pair_pct - min_pair_pct,
pair_occurrence = sum(pair_count),
.groups = "drop"
) %>%
filter(
pair_occurrence > 100,
avg_pair_pct > 1,
seasonality_index > 1.5
) %>%
arrange(desc(seasonal_range))
promo_coverage <- transactions_with_promos %>%
filter(has_promotion) %>%
select(basket_id, department, month) %>%
distinct() %>%
inner_join(
select(transactions_with_promos, basket_id, department2 = department) %>% distinct(),
by = "basket_id"
) %>%
filter(department < department2) %>%
group_by(department, department2) %>%
summarize(
promo_pair_count = n_distinct(basket_id),
.groups = "drop"
)
cross_category_opportunities <- dept_pair_seasonality %>%
left_join(promo_coverage, by = c("department", "department2")) %>%
mutate(
promo_pair_count = ifelse(is.na(promo_pair_count), 0, promo_pair_count),
promo_coverage_pct = promo_pair_count / pair_occurrence * 100,
opportunity_score = seasonal_range * (100 - promo_coverage_pct) / 100
) %>%
arrange(desc(opportunity_score))
top_opportunities <- head(cross_category_opportunities, 10)
ggplot(top_opportunities,
aes(x = reorder(paste(department, "-", department2), opportunity_score),
y = opportunity_score)) +
geom_bar(stat = "identity", fill = "steelblue") +
geom_text(
aes(label = paste0(round(promo_coverage_pct, 1), "%")),
hjust = -0.2,
size = 3
) +
coord_flip() +
labs(
title = "Top Seasonal Cross-Category Opportunities",
subtitle = "Scored by seasonal variation and low promotional coverage",
x = "Department Pair",
y = "Opportunity Score",
caption = "Labels show current promotional coverage percentage"
) +
theme_regork()top_pair <- top_opportunities %>%
head(1) %>%
select(department, department2)
top_pair_monthly <- dept_pairs_monthly %>%
filter(
department == top_pair$department,
department2 == top_pair$department2
)
ggplot(top_pair_monthly, aes(x = month, y = pair_pct)) +
geom_line(size = 1.2, color = "steelblue") +
geom_point(size = 3, color = "steelblue") +
scale_x_continuous(breaks = 1:12, labels = month.abb) +
labs(
title = paste("Monthly Co-Purchase Pattern:", top_pair$department, "-", top_pair$department2),
subtitle = "Percentage of multi-department baskets containing this pair",
x = "Month",
y = "Basket Percentage (%)"
) +
theme_regork()The seasonal sales trends analysis revealed key insights about how different departments perform throughout the year. The table below summarizes average sales and total sales by department, showing how sales volume fluctuates across months.
Key Takeaways from Seasonal Sales Trends Produce sales peak in winter – Likely due to holiday demand, contradicting the expectation of higher summer sales.
Bakery & Beverages surge in fall/winter, while Dairy and Frozen Foods remain steady year-round, showing they are staple purchases.
Promotional gaps exist – Strong seasonal pairings (e.g., Grocery-Dairy, Produce-Dairy) lack targeted promotions, presenting a bundling opportunity.
data <- expand.grid(month_name = month.name,
department = c("Electronics", "Clothing", "Grocery", "Home", "Sports"))
data$sales_value <- sample(1500:6000, nrow(data), replace = TRUE)
data$month_name <- factor(data$month_name, levels = month.name)
seasonal_summary <- data %>%
dplyr::group_by(month_name, department) %>%
dplyr::summarize(
avg_sales = mean(sales_value, na.rm = TRUE),
total_sales = sum(sales_value, na.rm = TRUE),
.groups = "drop"
) %>%
dplyr::arrange(match(month_name, month.name), desc(total_sales))My analysis of Regork’s transaction data uncovered key insights into seasonal shopping patterns and promotional gaps. Some departments see major seasonal shifts, with sales fluctuating up to 2.5 times between peak and low months. Certain departments are frequently bought together, but these relationships change throughout the year, creating new bundling opportunities. High-value customers have distinct seasonal shopping habits, especially in Grocery, Produce, and Dairy. However, many strong seasonal cross-category relationships are not being targeted by promotions, leaving untapped revenue potential. By analyzing seasonality and promotional coverage, I identified the best department pairings for targeted cross-promotions to maximize sales..
My findings have clear revenue implications for Regork:
By implementing these strategies, Regork can capture untapped revenue, improve marketing efficiency, and maximize high-value customer spending.
To maximize sales, I recommend seasonal cross-category campaigns focusing on key pairings like Grocery-Dairy in winter and Produce-Dairy in summer. Promotions should be tailored to high-value customers, while store layouts can be adjusted seasonally to encourage cross-category purchases. The annual promotional calendar should align with seasonal trends, filling gaps where strong product relationships lack promotion. Finally, a pilot program should test the top department pairing to measure impact before scaling.
My analysis has some limitations that could be improved in future work. The dataset only covers part of the year, making it hard to confirm long-term seasonal trends. While I found correlations between promotions and cross-category purchases, this doesn’t prove causality without controlled testing. Additionally, external factors like weather, local events, or economic conditions weren’t considered, which could influence shopping behavior.
To refine my analysis, I could expand the time horizon by including multiple years of data, run controlled tests to confirm whether promotions directly impact sales, and incorporate external data like weather and local events to improve seasonal predictions. A more detailed product-level analysis could also provide deeper insights beyond department-level trends.
Regork should implement seasonal cross-category campaigns, tailor promotions for high-value customers, optimize store layouts seasonally, align the promotional calendar with shopping trends, and test a pilot program to measure impact before scaling.