To: Regork CEO
From: Data Science Team
Re: Identifying High-Impact Growth Investment
Opportunities
As a national grocery chain, Regork must continuously identify new areas of growth to maintain competitive advantage and drive profitability. This analysis was commissioned to identify a potential area of growth where the company could invest future resources to increase revenue and profits.
Data Source: CompleteJourney dataset - 1 year of transaction data from 2,700 households
Analytical Methodology:
Phase 1: Identify Weakest Days - Analyzed daily revenue performance to find the two weakest performing days, calculated revenue gaps compared to average daily sales, and quantified annual opportunity
Phase 2: Temporal Analysis - Analyzed daily and hourly traffic patterns to identify peak vs. off-peak periods, filtered to business hours (6 AM - 11 PM), and created day × hour heatmaps
Phase 3: Family Size Gap Analysis - Compared family size shopping patterns on weak days vs. overall average and identified which household sizes are underrepresented
Phase 4: Smart Time-Based Benchmarking - Compared same hour across different days (not all hours against one average), focused on prime shopping hours (9 AM - 9 PM)
Tier 1: Day-Level Targeting (Primary Focus) - Target the two weakest performing days with day-specific promotions - Launch exclusive loyalty rewards for these days
Tier 2: Family Size Targeting - Identify underrepresented family sizes on weak days - Create family-pack promotions and bulk discounts
Tier 3: Time-Based Targeting (If Applicable) - Target specific hour/day combinations during prime hours
# Data manipulation and analysis
library(completejourney) # Dataset source
library(dplyr) # Data manipulation
library(tidyr) # Data tidying
library(lubridate) # Date/time manipulation
# Visualization
library(ggplot2) # Core plotting
library(scales) # Scale functions for plots
# Table formatting
library(knitr) # Table generation
library(kableExtra) # Enhanced table styling
# Set global theme
theme_set(theme_minimal(base_size = 13))
# Load datasets
transactions <- transactions_sample
demographics <- demographics
# Define business parameters
STORE_OPEN_HOUR <- 6
STORE_CLOSE_HOUR <- 23
cat('<div class="card">\n\n')
Dataset Overview:
cat("- **Unique Households:** ", format(n_distinct(transactions$household_id), big.mark = ","), "\n")
cat("- **Date Range:** ", as.character(min(transactions$transaction_timestamp)), " to ",
as.character(max(transactions$transaction_timestamp)), "\n")
# Prepare transaction data with time features
transactions_clean <- transactions %>%
mutate(
transaction_date = as.Date(transaction_timestamp),
day_of_week = wday(transaction_timestamp, label = TRUE, abbr = FALSE),
day_of_week_num = wday(transaction_timestamp),
hour_of_day = hour(transaction_timestamp),
month = month(transaction_timestamp, label = TRUE),
time_period = case_when(
hour_of_day >= 6 & hour_of_day < 9 ~ "Early Morning (6-9 AM)",
hour_of_day >= 9 & hour_of_day < 12 ~ "Mid Morning (9-12 PM)",
hour_of_day >= 12 & hour_of_day < 15 ~ "Early Afternoon (12-3 PM)",
hour_of_day >= 15 & hour_of_day < 18 ~ "Late Afternoon (3-6 PM)",
hour_of_day >= 18 & hour_of_day < 21 ~ "Evening (6-9 PM)",
hour_of_day >= 21 & hour_of_day < 24 ~ "Late Evening (9-11 PM)",
TRUE ~ "Closed Hours"
),
day_type = ifelse(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")
) %>%
filter(hour_of_day >= STORE_OPEN_HOUR & hour_of_day <= STORE_CLOSE_HOUR)
Data prepared: 73,813 transactions during business hours (98.4% of total)
# Calculate daily metrics
day_analysis <- transactions_clean %>%
group_by(day_of_week, day_of_week_num) %>%
summarise(
total_transactions = n(),
unique_households = n_distinct(household_id),
total_sales = sum(sales_value, na.rm = TRUE),
avg_basket_size = mean(sales_value, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(day_of_week_num) %>%
mutate(
pct_transactions = total_transactions / sum(total_transactions) * 100,
pct_sales = total_sales / sum(total_sales) * 100
)
weakest_day <- day_analysis %>% arrange(total_sales) %>% slice(1)
second_weakest <- day_analysis %>% arrange(total_sales) %>% slice(2)
avg_daily_sales <- mean(day_analysis$total_sales)
weak_gap <- avg_daily_sales - weakest_day$total_sales
second_gap <- avg_daily_sales - second_weakest$total_sales
total_gap <- weak_gap + second_gap
weak_day_data <- day_analysis %>%
mutate(
is_weak = day_of_week %in% c(as.character(weakest_day$day_of_week),
as.character(second_weakest$day_of_week)),
status = ifelse(is_weak, "Target for Growth", "Current Performance")
)
ggplot(weak_day_data, aes(x = day_of_week, y = total_sales/1000, fill = status)) +
geom_col(width = 0.7) +
geom_hline(yintercept = avg_daily_sales/1000, linetype = "dashed",
color = "#E74C3C", size = 1.5) +
geom_text(aes(label = paste0("$", round(total_sales/1000, 1), "K")),
vjust = -0.5, size = 4.5, fontface = "bold") +
annotate("text", x = 1.5, y = avg_daily_sales/1000 * 1.08,
label = "Daily Average", color = "#E74C3C", hjust = 0, fontface = "bold", size = 5) +
scale_y_continuous(labels = function(x) paste0("$", x, "K"),
expand = expansion(mult = c(0, 0.15))) +
scale_fill_manual(values = c("Target for Growth" = "#F3A712",
"Current Performance" = "#8A99A6")) +
labs(
title = "Growth Opportunity: Two Weakest Days Identified",
subtitle = "Days below the dashed line need promotional investment",
x = "Day of Week",
y = "Total Sales (in thousands)",
fill = ""
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18, color = "#1F7A8C"),
plot.subtitle = element_text(size = 12, color = "#2E3A46"),
axis.text.x = element_text(angle = 45, hjust = 1, size = 12),
legend.position = "top",
plot.margin = margin(10, 10, 10, 10)
)
Two weakest performing days identified for targeted growth initiatives
Weakest Day: Wednesday - Current sales: $28,005 per week - Gap to average: $4,932 - Represents 12.1% of weekly revenue
Second Weakest: Tuesday - Current sales: $28,513 per week - Gap to average: $4,424 - Represents 12.4% of weekly revenue
Annual revenue potential: $486,551 - This represents a 211% increase in total annual revenue
daily_plot_data <- day_analysis %>%
mutate(
revenue_scaled = total_sales / 1000,
traffic_scaled = total_transactions
)
scale_factor <- max(daily_plot_data$revenue_scaled) / max(daily_plot_data$traffic_scaled)
revenue_color <- "#1F7A8C"
traffic_color <- "#F3A712"
ggplot(daily_plot_data, aes(x = day_of_week)) +
geom_col(aes(y = revenue_scaled), fill = revenue_color, alpha = 0.8, width = 0.65) +
geom_text(aes(y = revenue_scaled / 2, label = paste0("$", round(revenue_scaled, 1), "K")),
size = 4, fontface = "bold", color = "white") +
geom_line(aes(y = traffic_scaled * scale_factor, group = 1),
color = traffic_color, size = 2) +
geom_point(aes(y = traffic_scaled * scale_factor),
color = traffic_color, size = 5, shape = 19) +
geom_text(aes(y = traffic_scaled * scale_factor,
label = format(traffic_scaled, big.mark = ",")),
vjust = -2.2, size = 3.8, fontface = "bold", color = traffic_color) +
scale_y_continuous(
name = "Revenue (Thousands)",
labels = function(x) paste0("$", x, "K"),
expand = expansion(mult = c(0, 0.3)),
sec.axis = sec_axis(
~./scale_factor,
name = "Number of Shopping Trips",
labels = comma
)
) +
labs(
title = "Daily Performance: Revenue vs Customer Traffic",
x = "Day of Week"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18, hjust = 0.5, color = "#1F7A8C"),
axis.title.y.left = element_text(color = revenue_color, face = "bold", size = 12),
axis.title.y.right = element_text(color = traffic_color, face = "bold", size = 12),
axis.text.y.left = element_text(color = revenue_color, face = "bold", size = 10),
axis.text.y.right = element_text(color = traffic_color, face = "bold", size = 10),
axis.text.x = element_text(angle = 45, hjust = 1, size = 11, face = "bold"),
panel.grid.major.x = element_blank(),
plot.margin = margin(10, 10, 10, 10)
)
Revenue and customer traffic comparison across days of the week
busiest <- day_analysis %>% arrange(desc(total_sales)) %>% slice(1)
busiest_traffic <- day_analysis %>% arrange(desc(total_transactions)) %>% slice(1)
cat('<div class="card">\n\n')
Revenue Performance:
cat("- **Highest Revenue Day:** ", as.character(busiest$day_of_week), " - $",
format(round(busiest$total_sales, 0), big.mark = ","), " (",
round(busiest$pct_sales, 1), "% of weekly revenue)\n", sep = "")
cat("- **Lowest Revenue Day:** ", as.character(weakest_day$day_of_week), " - $",
format(round(weakest_day$total_sales, 0), big.mark = ","), " (",
round(weakest_day$pct_sales, 1), "% of weekly revenue)\n\n", sep = "")
Traffic Performance:
cat("- **Busiest Day:** ", as.character(busiest_traffic$day_of_week),
" with ", format(busiest_traffic$total_transactions, big.mark = ","), " shopping trips\n", sep = "")
cat("- **Slowest Day:** ", as.character(weakest_day$day_of_week),
" with ", format(weakest_day$total_transactions, big.mark = ","), " shopping trips\n", sep = "")
Revenue Gap: 46% difference between strongest and weakest day | Traffic Gap: 49% difference in customer visits
day_hour_analysis <- transactions_clean %>%
group_by(day_of_week, day_of_week_num, hour_of_day) %>%
summarise(
total_transactions = n(),
total_sales = sum(sales_value, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(day_of_week_num)
avg_hourly <- mean(day_hour_analysis$total_transactions)
ggplot(day_hour_analysis, aes(x = hour_of_day, y = day_of_week, fill = total_transactions)) +
geom_tile(color = "white", size = 1) +
geom_text(aes(label = comma(total_transactions, accuracy = 1)),
size = 3.5, color = "black", fontface = "bold") +
scale_fill_gradient2(
low = "#F3A712",
mid = "#1F7A8C",
high = "#2E8540",
midpoint = avg_hourly,
labels = comma,
name = "Transactions"
) +
scale_x_continuous(breaks = seq(6, 23, 1)) +
labs(
title = "Shopping Traffic Heatmap: Every Day × Every Hour",
subtitle = paste0("Business Hours: 6 AM - 11 PM | Average: ", round(avg_hourly, 0),
" transactions/hour"),
x = "Hour of Day",
y = "Day of Week"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 16, color = "#1F7A8C"),
plot.subtitle = element_text(size = 11, color = "#2E3A46"),
legend.position = "right",
panel.grid = element_blank(),
plot.margin = margin(10, 10, 10, 10)
)
Shopping traffic heatmap showing peak and off-peak hours throughout the week
hour_benchmarks <- day_hour_analysis %>%
group_by(hour_of_day) %>%
summarise(
hour_avg_transactions = mean(total_transactions),
hour_avg_sales = mean(total_sales),
.groups = "drop"
)
growth_opps <- day_hour_analysis %>%
left_join(hour_benchmarks, by = "hour_of_day") %>%
mutate(
pct_of_hour_avg = (total_transactions / hour_avg_transactions) * 100,
revenue_gap = hour_avg_sales - total_sales,
is_opportunity = hour_of_day >= 9 & hour_of_day <= 21 & pct_of_hour_avg < 85
) %>%
filter(is_opportunity) %>%
arrange(pct_of_hour_avg)
if(nrow(growth_opps) > 0) {
print(
kable(
growth_opps %>%
head(15) %>%
mutate(
time_label = paste0(sprintf("%02d", hour_of_day), ":00"),
day_hour = paste(day_of_week, time_label)
) %>%
select(
`Day & Hour` = day_hour,
`Current Traffic` = total_transactions,
`Hour Average` = hour_avg_transactions,
`% of Average` = pct_of_hour_avg,
`Revenue Gap` = revenue_gap
) %>%
mutate(
`Hour Average` = round(`Hour Average`, 0),
`% of Average` = paste0(round(`% of Average`, 1), "%"),
`Revenue Gap` = dollar(`Revenue Gap`)
),
format.args = list(big.mark = ","),
caption = "Top 15 Underperforming Day/Hour Combinations (Prime Hours 9 AM - 9 PM)"
) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
)
} else {
cat('<div class="insight">\n')
cat("Traffic is well-distributed across prime shopping hours (9 AM - 9 PM). Focus resources on day-level and family-size targeting.\n")
cat('</div>')
}
Day & Hour | Current Traffic | Hour Average | % of Average | Revenue Gap |
---|---|---|---|---|
Thursday 10:00 | 330 | 471 | 70% | $501.09 |
Wednesday 13:00 | 595 | 823 | 72.3% | $676.53 |
Wednesday 10:00 | 341 | 471 | 72.3% | $227.33 |
Thursday 11:00 | 505 | 677 | 74.6% | $395.38 |
Thursday 12:00 | 578 | 773 | 74.7% | $607.60 |
Thursday 14:00 | 634 | 839 | 75.5% | $649.20 |
Wednesday 09:00 | 206 | 272 | 75.8% | $230.36 |
Tuesday 14:00 | 637 | 839 | 75.9% | $888.67 |
Tuesday 12:00 | 588 | 773 | 76% | $625.65 |
Tuesday 13:00 | 635 | 823 | 77.2% | $692.30 |
Wednesday 12:00 | 599 | 773 | 77.5% | $674.47 |
Wednesday 11:00 | 525 | 677 | 77.6% | $507.12 |
Wednesday 14:00 | 657 | 839 | 78.3% | $434.41 |
Tuesday 15:00 | 696 | 874 | 79.7% | $731.32 |
Friday 11:00 | 540 | 677 | 79.8% | $397.27 |
weak_day_trans <- transactions_clean %>%
filter(day_of_week %in% c(as.character(weakest_day$day_of_week),
as.character(second_weakest$day_of_week))) %>%
left_join(demographics, by = "household_id") %>%
filter(!is.na(household_size))
all_trans_demo <- transactions_clean %>%
left_join(demographics, by = "household_id") %>%
filter(!is.na(household_size))
weak_family <- weak_day_trans %>%
group_by(household_size) %>%
summarise(
transactions = n(),
total_sales = sum(sales_value, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(weak_day_pct = transactions / sum(transactions) * 100)
overall_family <- all_trans_demo %>%
group_by(household_size) %>%
summarise(transactions = n(), .groups = "drop") %>%
mutate(overall_pct = transactions / sum(transactions) * 100)
family_comparison <- weak_family %>%
left_join(overall_family %>% select(household_size, overall_pct),
by = "household_size") %>%
mutate(
gap = weak_day_pct - overall_pct,
gap_label = paste0(ifelse(gap > 0, "+", ""), round(gap, 1), " pp"),
status = ifelse(gap < 0, "Missing on Weak Days", "Already Shopping Well")
) %>%
arrange(gap)
most_underrep <- family_comparison %>% filter(gap < 0) %>% slice(1)
ggplot(family_comparison, aes(x = reorder(factor(household_size), gap), y = gap)) +
geom_col(aes(fill = gap < 0), width = 0.6, show.legend = FALSE) +
geom_hline(yintercept = 0, size = 1.5, color = "#2E3A46") +
geom_text(aes(label = gap_label, hjust = ifelse(gap < 0, 1.2, -0.2)),
size = 5.5, fontface = "bold") +
coord_flip() +
scale_fill_manual(values = c("TRUE" = "#F3A712", "FALSE" = "#2E8540")) +
scale_y_continuous(
labels = function(x) paste0(x, " pp"),
breaks = seq(-10, 10, 2)
) +
labs(
title = "Family Size Opportunity Gap: Who's Missing on Weak Days?",
subtitle = paste0("Analysis of ", as.character(weakest_day$day_of_week), " and ",
as.character(second_weakest$day_of_week)),
x = "Household Size (Number of People)",
y = "Gap in Shopping Frequency (percentage points)"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 17, color = "#1F7A8C"),
plot.subtitle = element_text(size = 12, color = "#2E3A46"),
panel.grid.major.y = element_blank(),
plot.margin = margin(10, 10, 10, 10)
)
Family size opportunity gap - identifying underrepresented segments on weak days
Create “Wednesday & Tuesday Rewards” program:
if(exists("most_underrep") && nrow(most_underrep) > 0) {
underrep_sizes <- family_comparison %>% filter(gap < -1.5) %>% pull(household_size)
cat('<div class="recommendation">\n')
cat("#### Recommendation 2: Family-Targeted Promotions\n\n")
cat("Focus on underrepresented family sizes: **", paste(underrep_sizes, collapse = ", "), "**\n\n", sep = "")
cat("**Specific Tactics:**\n\n")
cat("- **Targeted Campaigns** - e.g. Family of ", most_underrep$household_size,
"? Get 20% off our weekly dinner box!\n", sep = "")
cat("- **Exclusive 'Family Bundles'** - Create ", most_underrep$household_size,
"-pack bundles (meals, snacks, beverages)\n", sep = "")
cat("- **Bulk buy promotions** - 'Buy 3 Get 1 Free' on staples and household essentials\n")
cat("- **Highlight value packs** - Emphasize nutritional value and family-friendly items\n")
cat("- **Partner with family-oriented apps** for targeted digital advertising\n")
cat('</div>')
}
Focus on underrepresented family sizes: ****
Specific Tactics:
Create a structured approach:
Primary Growth Lever: Target the two weakest days (Wednesday and Tuesday) with day-specific promotions
$486,551 Annual Potential
Secondary Growth Lever: Create family-size targeted promotions to attract underrepresented household segments on weak days
$428,411 Time-Based Potential (Top 15)
# All code chunks are shown here for reference
# Daily analysis
day_analysis <- transactions_clean %>%
group_by(day_of_week, day_of_week_num) %>%
summarise(
total_transactions = n(),
unique_households = n_distinct(household_id),
total_sales = sum(sales_value, na.rm = TRUE),
avg_basket_size = mean(sales_value, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(day_of_week_num) %>%
mutate(
pct_transactions = total_transactions / sum(total_transactions) * 100,
pct_sales = total_sales / sum(total_sales) * 100
)
# Identify weak days
weakest_day <- day_analysis %>% arrange(total_sales) %>% slice(1)
second_weakest <- day_analysis %>% arrange(total_sales) %>% slice(2)
# Calculate gaps
avg_daily_sales <- mean(day_analysis$total_sales)
weak_gap <- avg_daily_sales - weakest_day$total_sales
second_gap <- avg_daily_sales - second_weakest$total_sales
total_gap <- weak_gap + second_gap
Analysis Date: 2025-10-09 | Business Hours Analyzed: 6:00 AM - 11:00 PM | Data Source: CompleteJourney - 2,700 households, 1 year of transactions