As a national grocery chain, Regork must continuously identify new areas of growth to maintain competitive advantage and drive profitability. This analysis was commissioned to identify a potential area of growth where the company could invest future resources to increase revenue and profits.
We have decided to take the following approach to find growth opportunities for Regork Grocery Chain:
Our goal is to identify specific days, times, and customer segments where targeted promotional investments can generate the highest return on revenue.
Primary Finding: We identified two consistently underperforming days of the week that represent a significant revenue opportunity.
Three-Tiered Approach:
Expected Outcome: By bringing weak days up to average performance, we can capture substantial annual revenue growth with focused, low-risk promotional investments.
Quantified Opportunity:
Based on our analysis of one full year of transaction data from 2,700 households, we have identified a clear path to revenue growth:
Strategic Value:
This approach provides a blueprint for systematic revenue optimization that can be replicated across all store locations and expanded to other identified opportunities.
The following packages are required in order to run the code efficiently without any errors.
library(completejourney) # dataset source
library(dplyr) # for dynamic data manipulation
library(tidyr) # for data tidying operations
library(lubridate) # for easy date and time manipulation
library(ggplot2) # for creating visualizations
library(scales) # for formatting plot scales and labels
library(knitr) # for dynamic report generation
library(kableExtra) # for enhanced table styling
# Set global theme for consistent visualizations
theme_set(theme_minimal(base_size = 13))
Package Descriptions:
This analysis uses two primary datasets from the CompleteJourney package:
Transactions Dataset (transactions_sample)
Variable | Data Type | Description |
---|---|---|
household_id | character | Uniquely identifies each household |
basket_id | character | Uniquely identifies each purchase occasion |
product_id | character | Uniquely identifies each product |
quantity | numeric | Number of products purchased |
sales_value | numeric | Amount in dollars from the sale |
transaction_timestamp | datetime | Date and time of transaction |
Demographics Dataset
Variable | Data Type | Description |
---|---|---|
household_id | character | Links to transactions dataset |
age | character | Age group of household head |
income | character | Income bracket |
household_size | numeric | Number of people in household (1-5+) |
# Load datasets
transactions <- transactions_sample
demographics <- demographics
# Define business parameters
STORE_OPEN_HOUR <- 6 # 6 AM
STORE_CLOSE_HOUR <- 23 # 11 PM
cat("**Dataset Overview:**\n\n")
Dataset Overview:
cat("- Date Range:", as.character(min(transactions$transaction_timestamp)), "to",
as.character(max(transactions$transaction_timestamp)), "\n")
We transformed the raw transaction data by extracting temporal features and creating meaningful time segments for analysis.
# Prepare transaction data with time features
transactions_clean <- transactions %>%
mutate(
# Extract date/time components
transaction_date = as.Date(transaction_timestamp),
day_of_week = wday(transaction_timestamp, label = TRUE, abbr = FALSE),
day_of_week_num = wday(transaction_timestamp),
hour_of_day = hour(transaction_timestamp),
month = month(transaction_timestamp, label = TRUE),
# Create time periods
time_period = case_when(
hour_of_day >= 6 & hour_of_day < 9 ~ "Early Morning (6-9 AM)",
hour_of_day >= 9 & hour_of_day < 12 ~ "Mid Morning (9-12 PM)",
hour_of_day >= 12 & hour_of_day < 15 ~ "Early Afternoon (12-3 PM)",
hour_of_day >= 15 & hour_of_day < 18 ~ "Late Afternoon (3-6 PM)",
hour_of_day >= 18 & hour_of_day < 21 ~ "Evening (6-9 PM)",
hour_of_day >= 21 & hour_of_day < 24 ~ "Late Evening (9-11 PM)",
TRUE ~ "Closed Hours"
),
# Weekend vs Weekday
day_type = ifelse(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")
) %>%
# Filter to business hours only
filter(hour_of_day >= STORE_OPEN_HOUR & hour_of_day <= STORE_CLOSE_HOUR)
cat("**Data prepared:**", format(nrow(transactions_clean), big.mark = ","),
"transactions during business hours\n\n")
Data prepared: 73,813 transactions during business hours
cat("**Coverage:**", round(nrow(transactions_clean)/nrow(transactions)*100, 1),
"% of total transactions\n")
Coverage: 98.4 % of total transactions
Question: Which specific days need promotional investment?
# Calculate daily metrics
day_analysis <- transactions_clean %>%
group_by(day_of_week, day_of_week_num) %>%
summarise(
total_transactions = n(),
unique_households = n_distinct(household_id),
total_sales = sum(sales_value, na.rm = TRUE),
avg_basket_size = mean(sales_value, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(day_of_week_num) %>%
mutate(
pct_transactions = total_transactions / sum(total_transactions) * 100,
pct_sales = total_sales / sum(total_sales) * 100
)
# Identify weak days
weakest_day <- day_analysis %>% arrange(total_sales) %>% slice(1)
second_weakest <- day_analysis %>% arrange(total_sales) %>% slice(2)
avg_daily_sales <- mean(day_analysis$total_sales)
weak_gap <- avg_daily_sales - weakest_day$total_sales
second_gap <- avg_daily_sales - second_weakest$total_sales
total_gap <- weak_gap + second_gap
weak_day_data <- day_analysis %>%
mutate(
is_weak = day_of_week %in% c(as.character(weakest_day$day_of_week),
as.character(second_weakest$day_of_week)),
status = ifelse(is_weak, "Target for Growth", "Current Performance")
)
ggplot(weak_day_data, aes(x = day_of_week, y = total_sales/1000, fill = status)) +
geom_col(width = 0.7) +
geom_hline(yintercept = avg_daily_sales/1000, linetype = "dashed",
color = "#D32F2F", size = 1.2) +
geom_text(aes(label = paste0("$", round(total_sales/1000, 1), "K")),
vjust = -0.5, size = 4.5, fontface = "bold") +
annotate("segment", x = 2.5, xend = 4.5,
y = avg_daily_sales/1000, yend = avg_daily_sales/1000,
linetype = "dashed", color = "#D32F2F", size = 1.2) +
annotate("text", x = 3.3, y = avg_daily_sales/1000 + 2.5,
label = "Daily Average", color = "#D32F2F", hjust = 1,
fontface = "bold", size = 4.5) +
scale_y_continuous(labels = function(x) paste0("$", x, "K"),
expand = expansion(mult = c(0, 0.18)),
limits = c(0, max(weak_day_data$total_sales/1000) * 1.15)) +
scale_fill_manual(values = c("Target for Growth" = "#E74C3C",
"Current Performance" = "#95A5A6")) +
labs(
title = "Daily Revenue Performance Across the Week",
x = "Day of Week",
y = "Total Sales (in thousands)",
fill = ""
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18, color = "#2C3E50"),
axis.text.x = element_text(angle = 45, hjust = 1, size = 12),
legend.position = "top",
legend.text = element_text(size = 12),
plot.margin = margin(10, 10, 10, 10)
)
Data Interpretation:
Based on the above graph, We can conclude that Wednesday and Tuesday consistently fall below the daily average revenue benchmark.On Wednesday current Sales is $28.5K/week lower than daily avg sales by $4.9k while for Tuesday current sales is $28.5k/week lower than daily avg sales by $4.4k/week.
These two days represent $9,357 in weekly unrealized revenue potential. When extrapolated across 52 weeks, this gap translates to an annual opportunity of $486,551, representing approximately 211% potential increase in total annual revenue.
A point of observation to be noted here is that the underperformance is consistent rather than sporadic, indicating a systematic pattern rather than random variation. This suggests that targeted interventions can reliably capture this opportunity.
Question: How does revenue vary across the week?
# Prepare data for dual visualization
daily_plot_data <- day_analysis %>%
mutate(
revenue_scaled = total_sales / 1000,
traffic_scaled = total_transactions
)
# Calculate scaling factor for secondary axis
scale_factor <- max(daily_plot_data$revenue_scaled) / max(daily_plot_data$traffic_scaled)
# Color palette
revenue_color <- "#4A90E2"
traffic_color <- "#E74C3C"
# Create dual-axis chart
ggplot(daily_plot_data, aes(x = day_of_week)) +
geom_col(aes(y = revenue_scaled), fill = revenue_color, alpha = 0.8, width = 0.65) +
geom_text(aes(y = revenue_scaled / 2, label = paste0("$", round(revenue_scaled, 1), "K")),
size = 4, fontface = "bold", color = "white") +
geom_line(aes(y = traffic_scaled * scale_factor, group = 1),
color = traffic_color, size = 2) +
geom_point(aes(y = traffic_scaled * scale_factor),
color = traffic_color, size = 5, shape = 19) +
geom_text(aes(y = traffic_scaled * scale_factor,
label = format(traffic_scaled, big.mark = ",")),
vjust = -2.2, size = 3.8, fontface = "bold", color = traffic_color) +
scale_y_continuous(
name = "Revenue (Thousands)",
labels = function(x) paste0("$", x, "K"),
expand = expansion(mult = c(0, 0.3)),
sec.axis = sec_axis(
~./scale_factor,
name = "Number of Shopping Trips",
labels = comma
)
) +
labs(
title = "Daily Performance: Revenue vs Customer Traffic",
subtitle = "BLUE BARS = Revenue Earned ($) | RED LINE = Customer Shopping Trips",
x = "Day of Week",
caption = "Note: Both metrics displayed on separate scales. Bars show total revenue; line shows transaction count."
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18, hjust = 0.5),
plot.subtitle = element_text(size = 12, color = "gray30", hjust = 0.5, margin = margin(b = 20)),
plot.caption = element_text(size = 11, color = "gray30", hjust = 0, margin = margin(t = 10)),
axis.title.y.left = element_text(color = revenue_color, face = "bold", size = 12),
axis.title.y.right = element_text(color = traffic_color, face = "bold", size = 12),
axis.text.y.left = element_text(color = revenue_color, face = "bold"),
axis.text.y.right = element_text(color = traffic_color, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1, size = 11, face = "bold"),
panel.grid.major.x = element_blank()
)
Data Interpretation:
This visualization reveals a strong correlation between customer traffic (red line) and revenue generation (blue bars). The parallel movement of both metrics indicates that our revenue challenges on weak days are primarily driven by lower foot traffic rather than reduced spending per customer.
For these weak days, we further deep-dive and find opportunities for growth by analyzing their spend patterns based on factors like family size, time of day, and shopping frequency. The consistent gap between peak days (Saturday, Sunday) and weak days (Wednesday, Tuesday) represents a 49% difference in customer visits - a substantial opportunity for traffic-driving promotions.
Question: Which exact day/hour combinations should we target with promotions? Part A: Shopping Traffic Heatmap
day_hour_analysis <- transactions_clean %>%
group_by(day_of_week, day_of_week_num, hour_of_day) %>%
summarise(
total_transactions = n(),
total_sales = sum(sales_value, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(day_of_week_num)
avg_hourly <- mean(day_hour_analysis$total_transactions)
ggplot(day_hour_analysis, aes(x = hour_of_day, y = day_of_week, fill = total_transactions)) +
geom_tile(color = "white", size = 1) +
geom_text(aes(label = comma(total_transactions, accuracy = 1)),
size = 3.5, color = "black", fontface = "bold") +
scale_fill_gradient2(
low = "#E74C3C",
mid = "#F39C12",
high = "#27AE60",
midpoint = avg_hourly,
labels = comma,
name = "Transactions"
) +
scale_x_continuous(breaks = seq(6, 23, 1)) +
labs(
title = "Shopping Traffic Intensity: Day × Hour Analysis",
x = "Hour of Day",
y = "Day of Week"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 16),
legend.position = "right",
panel.grid = element_blank(),
axis.text = element_text(size = 11)
)
Part B: Specific Time-Based Growth Opportunities
# Calculate hour benchmarks
hour_benchmarks <- day_hour_analysis %>%
group_by(hour_of_day) %>%
summarise(
hour_avg_transactions = mean(total_transactions),
hour_avg_sales = mean(total_sales),
.groups = "drop"
)
# Find opportunities
growth_opps <- day_hour_analysis %>%
left_join(hour_benchmarks, by = "hour_of_day") %>%
mutate(
pct_of_hour_avg = (total_transactions / hour_avg_transactions) * 100,
revenue_gap = hour_avg_sales - total_sales,
is_opportunity = hour_of_day >= 9 & hour_of_day <= 21 & pct_of_hour_avg < 85
) %>%
filter(is_opportunity) %>%
arrange(pct_of_hour_avg)
if(nrow(growth_opps) > 0) {
print(
kable(
growth_opps %>%
head(15) %>%
mutate(
time_label = paste0(sprintf("%02d", hour_of_day), ":00"),
day_hour = paste(day_of_week, time_label)
) %>%
select(
`Day & Hour` = day_hour,
`Current Traffic` = total_transactions,
`Hour Average` = hour_avg_transactions,
`% of Average` = pct_of_hour_avg,
`Revenue Gap` = revenue_gap
) %>%
mutate(
`Hour Average` = round(`Hour Average`, 0),
`% of Average` = paste0(round(`% of Average`, 1), "%"),
`Revenue Gap` = dollar(`Revenue Gap`)
),
format.args = list(big.mark = ","),
caption = "Top 15 Underperforming Day/Hour Combinations (Prime Hours 9 AM - 9 PM)"
) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE) %>%
row_spec(1:5, bold = TRUE, background = "#F8D7DA")
)
} else {
cat("\n**No significant time-based opportunities identified.**\n\n")
cat("Traffic is well-distributed across prime shopping hours (9 AM - 9 PM).\n\n")
cat("**Strategic Recommendation:** Focus resources on day-level and family-size targeting instead, as these offer greater impact.\n")
}
Day & Hour | Current Traffic | Hour Average | % of Average | Revenue Gap |
---|---|---|---|---|
Thursday 10:00 | 330 | 471 | 70% | $501.09 |
Wednesday 13:00 | 595 | 823 | 72.3% | $676.53 |
Wednesday 10:00 | 341 | 471 | 72.3% | $227.33 |
Thursday 11:00 | 505 | 677 | 74.6% | $395.38 |
Thursday 12:00 | 578 | 773 | 74.7% | $607.60 |
Thursday 14:00 | 634 | 839 | 75.5% | $649.20 |
Wednesday 09:00 | 206 | 272 | 75.8% | $230.36 |
Tuesday 14:00 | 637 | 839 | 75.9% | $888.67 |
Tuesday 12:00 | 588 | 773 | 76% | $625.65 |
Tuesday 13:00 | 635 | 823 | 77.2% | $692.30 |
Wednesday 12:00 | 599 | 773 | 77.5% | $674.47 |
Wednesday 11:00 | 525 | 677 | 77.6% | $507.12 |
Wednesday 14:00 | 657 | 839 | 78.3% | $434.41 |
Tuesday 15:00 | 696 | 874 | 79.7% | $731.32 |
Friday 11:00 | 540 | 677 | 79.8% | $397.27 |
Data Interpretation:
Based on the above heatmap, we identify shopping intensity patterns across 126 unique day-hour combinations (7 days × 18 hours). The color gradient from red (low traffic) to green (high traffic) reveals that weak days show consistently lower activity across most operating hours.
The table chart helps us to further drill down and identify Top 15 Underperfroming time slots,Prioritize targeted promotions for late-morning to mid-afternoon promotions rather than broad all-day campaigns.
Question: Which family sizes are MISSING on our weakest days?
# Prepare family data
weak_day_trans <- transactions_clean %>%
filter(day_of_week %in% c(as.character(weakest_day$day_of_week),
as.character(second_weakest$day_of_week))) %>%
left_join(demographics, by = "household_id") %>%
filter(!is.na(household_size))
all_trans_demo <- transactions_clean %>%
left_join(demographics, by = "household_id") %>%
filter(!is.na(household_size))
# Calculate metrics
weak_family <- weak_day_trans %>%
group_by(household_size) %>%
summarise(
transactions = n(),
total_sales = sum(sales_value, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(weak_day_pct = transactions / sum(transactions) * 100)
overall_family <- all_trans_demo %>%
group_by(household_size) %>%
summarise(transactions = n(), .groups = "drop") %>%
mutate(overall_pct = transactions / sum(transactions) * 100)
family_comparison <- weak_family %>%
left_join(overall_family %>% select(household_size, overall_pct),
by = "household_size") %>%
mutate(
gap = weak_day_pct - overall_pct,
gap_label = paste0(ifelse(gap > 0, "+", ""), round(gap, 1), " pp"),
status = ifelse(gap < 0, "Missing on Weak Days", "Already Shopping Well")
) %>%
arrange(gap)
most_underrep <- family_comparison %>% filter(gap < 0) %>% slice(1)
ggplot(family_comparison, aes(x = reorder(factor(household_size), gap), y = gap)) +
geom_col(aes(fill = gap < 0), width = 0.6, show.legend = FALSE) +
geom_hline(yintercept = 0, size = 1.5, color = "black") +
geom_text(aes(label = gap_label, hjust = ifelse(gap < 0, 1.0 , -0.3)),
size = 3.5, fontface = "bold") +
coord_flip() +
scale_fill_manual(values = c("TRUE" = "#E74C3C", "FALSE" = "#27AE60")) +
scale_y_continuous(
labels = function(x) paste0(x, " pp"),
breaks = seq(-10, 10, 2)
) +
labs(
title = "Customer Segment Analysis: Family Size Representation on Weak Days",
x = "Household Size (Number of People)",
y = "Gap in Shopping Frequency (percentage points)"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 17, color = "#2C3E50"),
panel.grid.major.y = element_blank(),
axis.text = element_text(size = 12, face = "bold"),
axis.title = element_text(size = 13, face = "bold")
)
Data Interpretation:
Based on the above graph, we identified customer profiles based on their shopping patterns across weak days versus the overall weekly average. Negative values (red bars) indicate family sizes that shop significantly less on Wednesday and Tuesday compared to their normal behavior.
All the other customers with positive values (green bars) already shop at their typical frequency on these days. For these family size profiles, we further deep-dive and find opportunities for growth by creating targeted promotional bundles and messaging that address their specific household needs.
We observed that family size 3 shows the most significant underrepresentation (0.8 percentage points below normal). These are existing Regork customers who actively avoid weak days - making them prime candidates for targeted “family pack” promotions and personalized incentives to shift their shopping behavior.
The primary objective of the project is to find growth opportunities for Regork Grocery Chain by analyzing customer shopping patterns, identifying weak days, and discovering underrepresented customer segments for the year 2017.
The completejourney dataset is available in R under the package of the same name, and provides access to datasets characterizing household level transactions over one year from a group of 2,467 households who are frequent shoppers at Regork.
1. Two Weakest Days
cat("- **Target Days:** ", as.character(weakest_day$day_of_week), " and ",
as.character(second_weakest$day_of_week), sep = "")
cat("- **Combined revenue gap:** $", format(round(total_gap, 0), big.mark = ","),
" per week", sep = "")
cat("- **Potential revenue increase:** ",
round(total_gap * 52 / sum(transactions_clean$sales_value) * 100, 1), sep = "")
2. Family Size Targeting
if(exists("most_underrep") && nrow(most_underrep) > 0) {
underrep_families <- family_comparison %>% filter(gap < -1.5)
cat("- **Most underrepresented:** Family size ", most_underrep$household_size, "\n", sep = "")
if(nrow(underrep_families) > 1) {
cat("- **Total underrepresented segments:** ", nrow(underrep_families),
" family sizes identified\n", sep = "")
}
cat("- **Action:** Create family-pack promotions and bulk discounts\n")
}
3. Time-Based Opportunities
if(exists("growth_opps") && nrow(growth_opps) > 0) {
top_15_opps <- growth_opps %>% head(15)
cat("- **Underperforming time slots:** Top 15 day/hour combinations identified for targeted promotions\n")
cat("- **Action:** Launch flash sales or limited-time offers during these specific windows\n")
cat("- **Combined potential:** $",
format(round(sum(top_15_opps$revenue_gap), 0), big.mark = ","),
" weekly if these top 15 slots reach their hour average\n", sep = "")
} else {
cat("- **Time distribution:** Well-balanced across prime hours\n")
cat("- **Focus:** Day-level and family-size strategies will have greater impact\n")
}
Recommendation 1: Launch Day-Specific Loyalty Program
Create “Wednesday & Tuesday Rewards” program:
Recommendation 2: Family-Targeted Promotions
if(exists("most_underrep") && nrow(most_underrep) > 0) {
underrep_sizes <- family_comparison %>% filter(gap < -1.5) %>% pull(household_size)
cat("Focus on underrepresented family sizes: **", paste(underrep_sizes, collapse = ", "), "**\n\n", sep = "")
cat("**Specific Tactics:**\n\n")
cat("- **Targeted Campaigns** - e.g. Family of ", most_underrep$household_size,
"? Get 20% off our weekly dinner box!\n", sep = "")
cat("- **Exclusive deal on 'Family Bundles'** - Create ", most_underrep$household_size,
"-pack bundles (meals, snacks, beverages)\n", sep = "")
cat("- **Bulk buy promotions** - 'Buy 3 Get 1 Free' on staples and household essentials\n")
cat("- **Highlight value packs** - Emphasize nutritional value and family-friendly items\n")
cat("- **Partner with family-oriented apps** for targeted digital advertising\n")
} else {
cat("All family sizes well-represented. Focus on frequency and basket size.\n")
}
Focus on underrepresented family sizes: ****
Specific Tactics:
Recommendation 3: Promotional Calendar
Create a structured approach:
Recommendation 4: Marketing Channel Strategy
Primary Growth Lever: Target the two weakest days (Wednesday and Tuesday) with day-specific promotions ($486,551 annual potential)
Secondary Growth Lever: Create family-size targeted promotions to attract underrepresented household segments on weak days
We have identified top 15 high-priority day/hour combinations, successful targeted campaigns could potentially increase the annual revenue by $428,411
Why This Will Work:
Success Metrics:
Analysis Date: 10 October, 2025
Data Source: CompleteJourney Dataset - 2,700
Households, Full Year Analysis
Business Hours Analyzed: 6:00 AM - 11:00 PM Daily