Executive Summary

Business Problem

To: Regork CEO
From: Data Science Team
Re: Identifying High-Impact Growth Investment Opportunities

As a national grocery chain, Regork must continuously identify new areas of growth to maintain competitive advantage and drive profitability. This analysis was commissioned to identify a potential area of growth where the company could invest future resources to increase revenue and profits.

Key Business Questions

Q1

Which days of the week have the lowest sales performance?

Q2

What are the daily and hourly traffic patterns?

Q3

Which customer segments (family sizes) are underrepresented on weak days?

Q4

Are there specific day/hour combinations that underperform?

Our Approach

Data Source: CompleteJourney dataset - 1 year of transaction data from 2,700 households

Analytical Methodology:

  • Phase 1: Identify Weakest Days - Analyzed daily revenue performance to find the two weakest performing days, calculated revenue gaps compared to average daily sales, and quantified annual opportunity

  • Phase 2: Temporal Analysis - Analyzed daily and hourly traffic patterns to identify peak vs. off-peak periods, filtered to business hours (6 AM - 11 PM), and created day × hour heatmaps

  • Phase 3: Family Size Gap Analysis - Compared family size shopping patterns on weak days vs. overall average and identified which household sizes are underrepresented

  • Phase 4: Smart Time-Based Benchmarking - Compared same hour across different days (not all hours against one average), focused on prime shopping hours (9 AM - 9 PM)

Proposed Solution


Data & Methods

Packages Required

# Data manipulation and analysis
library(completejourney)  # Dataset source
library(dplyr)            # Data manipulation
library(tidyr)            # Data tidying
library(lubridate)        # Date/time manipulation

# Visualization
library(ggplot2)          # Core plotting
library(scales)           # Scale functions for plots

# Table formatting
library(knitr)            # Table generation
library(kableExtra)       # Enhanced table styling

# Set global theme
theme_set(theme_minimal(base_size = 13))

Dataset Overview

# Load datasets
transactions <- transactions_sample
demographics <- demographics

# Define business parameters
STORE_OPEN_HOUR <- 6
STORE_CLOSE_HOUR <- 23

cat('<div class="card">\n\n')
cat("**Dataset Overview:**\n\n")

Dataset Overview:

cat("- **Transaction Records:** ", format(nrow(transactions), big.mark = ","), "\n")
  • Transaction Records: 75,000
cat("- **Unique Households:** ", format(n_distinct(transactions$household_id), big.mark = ","), "\n")
  • Unique Households: 2,377
cat("- **Date Range:** ", as.character(min(transactions$transaction_timestamp)), " to ", 
    as.character(max(transactions$transaction_timestamp)), "\n")
  • Date Range: 2017-01-01 07:30:27 to 2017-12-31 22:47:38
cat("- **Business Hours:** 6:00 AM - 11:00 PM Daily\n")
  • Business Hours: 6:00 AM - 11:00 PM Daily
cat('\n</div>')

Data Preparation

# Prepare transaction data with time features
transactions_clean <- transactions %>%
  mutate(
    transaction_date = as.Date(transaction_timestamp),
    day_of_week = wday(transaction_timestamp, label = TRUE, abbr = FALSE),
    day_of_week_num = wday(transaction_timestamp),
    hour_of_day = hour(transaction_timestamp),
    month = month(transaction_timestamp, label = TRUE),
    time_period = case_when(
      hour_of_day >= 6 & hour_of_day < 9 ~ "Early Morning (6-9 AM)",
      hour_of_day >= 9 & hour_of_day < 12 ~ "Mid Morning (9-12 PM)",
      hour_of_day >= 12 & hour_of_day < 15 ~ "Early Afternoon (12-3 PM)",
      hour_of_day >= 15 & hour_of_day < 18 ~ "Late Afternoon (3-6 PM)",
      hour_of_day >= 18 & hour_of_day < 21 ~ "Evening (6-9 PM)",
      hour_of_day >= 21 & hour_of_day < 24 ~ "Late Evening (9-11 PM)",
      TRUE ~ "Closed Hours"
    ),
    day_type = ifelse(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")
  ) %>%
  filter(hour_of_day >= STORE_OPEN_HOUR & hour_of_day <= STORE_CLOSE_HOUR)

Data prepared: 73,813 transactions during business hours (98.4% of total)


Key Findings

Finding 1: Two Weakest Days Identified

# Calculate daily metrics
day_analysis <- transactions_clean %>%
  group_by(day_of_week, day_of_week_num) %>%
  summarise(
    total_transactions = n(),
    unique_households = n_distinct(household_id),
    total_sales = sum(sales_value, na.rm = TRUE),
    avg_basket_size = mean(sales_value, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(day_of_week_num) %>%
  mutate(
    pct_transactions = total_transactions / sum(total_transactions) * 100,
    pct_sales = total_sales / sum(total_sales) * 100
  )

weakest_day <- day_analysis %>% arrange(total_sales) %>% slice(1)
second_weakest <- day_analysis %>% arrange(total_sales) %>% slice(2)
avg_daily_sales <- mean(day_analysis$total_sales)
weak_gap <- avg_daily_sales - weakest_day$total_sales
second_gap <- avg_daily_sales - second_weakest$total_sales
total_gap <- weak_gap + second_gap
weak_day_data <- day_analysis %>%
  mutate(
    is_weak = day_of_week %in% c(as.character(weakest_day$day_of_week), 
                                   as.character(second_weakest$day_of_week)),
    status = ifelse(is_weak, "Target for Growth", "Current Performance")
  )

ggplot(weak_day_data, aes(x = day_of_week, y = total_sales/1000, fill = status)) +
  geom_col(width = 0.7) +
  geom_hline(yintercept = avg_daily_sales/1000, linetype = "dashed", 
             color = "#E74C3C", size = 1.5) +
  geom_text(aes(label = paste0("$", round(total_sales/1000, 1), "K")),
            vjust = -0.5, size = 4.5, fontface = "bold") +
  annotate("text", x = 1.5, y = avg_daily_sales/1000 * 1.08, 
           label = "Daily Average", color = "#E74C3C", hjust = 0, fontface = "bold", size = 5) +
  scale_y_continuous(labels = function(x) paste0("$", x, "K"),
                     expand = expansion(mult = c(0, 0.15))) +
  scale_fill_manual(values = c("Target for Growth" = "#F3A712", 
                               "Current Performance" = "#8A99A6")) +
  labs(
    title = "Growth Opportunity: Two Weakest Days Identified",
    subtitle = "Days below the dashed line need promotional investment",
    x = "Day of Week",
    y = "Total Sales (in thousands)",
    fill = ""
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 18, color = "#1F7A8C"),
    plot.subtitle = element_text(size = 12, color = "#2E3A46"),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 12),
    legend.position = "top",
    plot.margin = margin(10, 10, 10, 10)
  )
Two weakest performing days identified for targeted growth initiatives

Two weakest performing days identified for targeted growth initiatives

Growth Opportunity Identified

Weakest Day: Wednesday - Current sales: $28,005 per week - Gap to average: $4,932 - Represents 12.1% of weekly revenue

Second Weakest: Tuesday - Current sales: $28,513 per week - Gap to average: $4,424 - Represents 12.4% of weekly revenue

Annual revenue potential: $486,551 - This represents a 211% increase in total annual revenue


Finding 2: Daily Traffic Patterns

daily_plot_data <- day_analysis %>%
  mutate(
    revenue_scaled = total_sales / 1000,
    traffic_scaled = total_transactions
  )

scale_factor <- max(daily_plot_data$revenue_scaled) / max(daily_plot_data$traffic_scaled)
revenue_color <- "#1F7A8C"
traffic_color <- "#F3A712"

ggplot(daily_plot_data, aes(x = day_of_week)) +
  geom_col(aes(y = revenue_scaled), fill = revenue_color, alpha = 0.8, width = 0.65) +
  geom_text(aes(y = revenue_scaled / 2, label = paste0("$", round(revenue_scaled, 1), "K")),
            size = 4, fontface = "bold", color = "white") +
  geom_line(aes(y = traffic_scaled * scale_factor, group = 1), 
            color = traffic_color, size = 2) +
  geom_point(aes(y = traffic_scaled * scale_factor), 
             color = traffic_color, size = 5, shape = 19) +
  geom_text(aes(y = traffic_scaled * scale_factor, 
                label = format(traffic_scaled, big.mark = ",")),
            vjust = -2.2, size = 3.8, fontface = "bold", color = traffic_color) +
  scale_y_continuous(
    name = "Revenue (Thousands)",
    labels = function(x) paste0("$", x, "K"),
    expand = expansion(mult = c(0, 0.3)),
    sec.axis = sec_axis(
      ~./scale_factor, 
      name = "Number of Shopping Trips",
      labels = comma
    )
  ) +
  labs(
    title = "Daily Performance: Revenue vs Customer Traffic",
    x = "Day of Week"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 18, hjust = 0.5, color = "#1F7A8C"),
    axis.title.y.left = element_text(color = revenue_color, face = "bold", size = 12),
    axis.title.y.right = element_text(color = traffic_color, face = "bold", size = 12),
    axis.text.y.left = element_text(color = revenue_color, face = "bold", size = 10),
    axis.text.y.right = element_text(color = traffic_color, face = "bold", size = 10),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 11, face = "bold"),
    panel.grid.major.x = element_blank(),
    plot.margin = margin(10, 10, 10, 10)
  )
Revenue and customer traffic comparison across days of the week

Revenue and customer traffic comparison across days of the week

busiest <- day_analysis %>% arrange(desc(total_sales)) %>% slice(1)
busiest_traffic <- day_analysis %>% arrange(desc(total_transactions)) %>% slice(1)

cat('<div class="card">\n\n')
cat("**Revenue Performance:**\n\n")

Revenue Performance:

cat("- **Highest Revenue Day:** ", as.character(busiest$day_of_week), " - $",
    format(round(busiest$total_sales, 0), big.mark = ","), " (", 
    round(busiest$pct_sales, 1), "% of weekly revenue)\n", sep = "")
  • Highest Revenue Day: Sunday - $40,956 (17.8% of weekly revenue)
cat("- **Lowest Revenue Day:** ", as.character(weakest_day$day_of_week), " - $",
    format(round(weakest_day$total_sales, 0), big.mark = ","), " (", 
    round(weakest_day$pct_sales, 1), "% of weekly revenue)\n\n", sep = "")
  • Lowest Revenue Day: Wednesday - $28,005 (12.1% of weekly revenue)
cat("**Traffic Performance:**\n\n")

Traffic Performance:

cat("- **Busiest Day:** ", as.character(busiest_traffic$day_of_week), 
    " with ", format(busiest_traffic$total_transactions, big.mark = ","), " shopping trips\n", sep = "")
  • Busiest Day: Sunday with 13,476 shopping trips
cat("- **Slowest Day:** ", as.character(weakest_day$day_of_week),
    " with ", format(weakest_day$total_transactions, big.mark = ","), " shopping trips\n", sep = "")
  • Slowest Day: Wednesday with 9,040 shopping trips
cat('\n</div>')

Revenue Gap: 46% difference between strongest and weakest day | Traffic Gap: 49% difference in customer visits


Finding 3: Hourly Traffic Patterns

day_hour_analysis <- transactions_clean %>%
  group_by(day_of_week, day_of_week_num, hour_of_day) %>%
  summarise(
    total_transactions = n(),
    total_sales = sum(sales_value, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(day_of_week_num)

avg_hourly <- mean(day_hour_analysis$total_transactions)
ggplot(day_hour_analysis, aes(x = hour_of_day, y = day_of_week, fill = total_transactions)) +
  geom_tile(color = "white", size = 1) +
  geom_text(aes(label = comma(total_transactions, accuracy = 1)), 
            size = 3.5, color = "black", fontface = "bold") +
  scale_fill_gradient2(
    low = "#F3A712", 
    mid = "#1F7A8C",
    high = "#2E8540", 
    midpoint = avg_hourly,
    labels = comma,
    name = "Transactions"
  ) +
  scale_x_continuous(breaks = seq(6, 23, 1)) +
  labs(
    title = "Shopping Traffic Heatmap: Every Day × Every Hour",
    subtitle = paste0("Business Hours: 6 AM - 11 PM | Average: ", round(avg_hourly, 0), 
                     " transactions/hour"),
    x = "Hour of Day",
    y = "Day of Week"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 16, color = "#1F7A8C"),
    plot.subtitle = element_text(size = 11, color = "#2E3A46"),
    legend.position = "right",
    panel.grid = element_blank(),
    plot.margin = margin(10, 10, 10, 10)
  )
Shopping traffic heatmap showing peak and off-peak hours throughout the week

Shopping traffic heatmap showing peak and off-peak hours throughout the week

Time-Based Opportunities

hour_benchmarks <- day_hour_analysis %>%
  group_by(hour_of_day) %>%
  summarise(
    hour_avg_transactions = mean(total_transactions),
    hour_avg_sales = mean(total_sales),
    .groups = "drop"
  )

growth_opps <- day_hour_analysis %>%
  left_join(hour_benchmarks, by = "hour_of_day") %>%
  mutate(
    pct_of_hour_avg = (total_transactions / hour_avg_transactions) * 100,
    revenue_gap = hour_avg_sales - total_sales,
    is_opportunity = hour_of_day >= 9 & hour_of_day <= 21 & pct_of_hour_avg < 85
  ) %>%
  filter(is_opportunity) %>%
  arrange(pct_of_hour_avg)
if(nrow(growth_opps) > 0) {
  print(
    kable(
      growth_opps %>%
        head(15) %>%
        mutate(
          time_label = paste0(sprintf("%02d", hour_of_day), ":00"),
          day_hour = paste(day_of_week, time_label)
        ) %>%
        select(
          `Day & Hour` = day_hour,
          `Current Traffic` = total_transactions,
          `Hour Average` = hour_avg_transactions,
          `% of Average` = pct_of_hour_avg,
          `Revenue Gap` = revenue_gap
        ) %>%
        mutate(
          `Hour Average` = round(`Hour Average`, 0),
          `% of Average` = paste0(round(`% of Average`, 1), "%"),
          `Revenue Gap` = dollar(`Revenue Gap`)
        ),
      format.args = list(big.mark = ","),
      caption = "Top 15 Underperforming Day/Hour Combinations (Prime Hours 9 AM - 9 PM)"
    ) %>%
      kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                    full_width = FALSE)
  )
} else {
  cat('<div class="insight">\n')
  cat("Traffic is well-distributed across prime shopping hours (9 AM - 9 PM). Focus resources on day-level and family-size targeting.\n")
  cat('</div>')
}
Top 15 Underperforming Day/Hour Combinations (Prime Hours 9 AM - 9 PM)
Day & Hour Current Traffic Hour Average % of Average Revenue Gap
Thursday 10:00 330 471 70% $501.09
Wednesday 13:00 595 823 72.3% $676.53
Wednesday 10:00 341 471 72.3% $227.33
Thursday 11:00 505 677 74.6% $395.38
Thursday 12:00 578 773 74.7% $607.60
Thursday 14:00 634 839 75.5% $649.20
Wednesday 09:00 206 272 75.8% $230.36
Tuesday 14:00 637 839 75.9% $888.67
Tuesday 12:00 588 773 76% $625.65
Tuesday 13:00 635 823 77.2% $692.30
Wednesday 12:00 599 773 77.5% $674.47
Wednesday 11:00 525 677 77.6% $507.12
Wednesday 14:00 657 839 78.3% $434.41
Tuesday 15:00 696 874 79.7% $731.32
Friday 11:00 540 677 79.8% $397.27

Finding 4: Family Size Opportunities

weak_day_trans <- transactions_clean %>%
  filter(day_of_week %in% c(as.character(weakest_day$day_of_week),
                             as.character(second_weakest$day_of_week))) %>%
  left_join(demographics, by = "household_id") %>%
  filter(!is.na(household_size))

all_trans_demo <- transactions_clean %>%
  left_join(demographics, by = "household_id") %>%
  filter(!is.na(household_size))

weak_family <- weak_day_trans %>%
  group_by(household_size) %>%
  summarise(
    transactions = n(),
    total_sales = sum(sales_value, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(weak_day_pct = transactions / sum(transactions) * 100)

overall_family <- all_trans_demo %>%
  group_by(household_size) %>%
  summarise(transactions = n(), .groups = "drop") %>%
  mutate(overall_pct = transactions / sum(transactions) * 100)

family_comparison <- weak_family %>%
  left_join(overall_family %>% select(household_size, overall_pct),
            by = "household_size") %>%
  mutate(
    gap = weak_day_pct - overall_pct,
    gap_label = paste0(ifelse(gap > 0, "+", ""), round(gap, 1), " pp"),
    status = ifelse(gap < 0, "Missing on Weak Days", "Already Shopping Well")
  ) %>%
  arrange(gap)

most_underrep <- family_comparison %>% filter(gap < 0) %>% slice(1)
ggplot(family_comparison, aes(x = reorder(factor(household_size), gap), y = gap)) +
  geom_col(aes(fill = gap < 0), width = 0.6, show.legend = FALSE) +
  geom_hline(yintercept = 0, size = 1.5, color = "#2E3A46") +
  geom_text(aes(label = gap_label, hjust = ifelse(gap < 0, 1.2, -0.2)),
            size = 5.5, fontface = "bold") +
  coord_flip() +
  scale_fill_manual(values = c("TRUE" = "#F3A712", "FALSE" = "#2E8540")) +
  scale_y_continuous(
    labels = function(x) paste0(x, " pp"),
    breaks = seq(-10, 10, 2)
  ) +
  labs(
    title = "Family Size Opportunity Gap: Who's Missing on Weak Days?",
    subtitle = paste0("Analysis of ", as.character(weakest_day$day_of_week), " and ", 
                     as.character(second_weakest$day_of_week)),
    x = "Household Size (Number of People)",
    y = "Gap in Shopping Frequency (percentage points)"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 17, color = "#1F7A8C"),
    plot.subtitle = element_text(size = 12, color = "#2E3A46"),
    panel.grid.major.y = element_blank(),
    plot.margin = margin(10, 10, 10, 10)
  )
Family size opportunity gap - identifying underrepresented segments on weak days

Family size opportunity gap - identifying underrepresented segments on weak days


Strategic Recommendations

Recommendation 1: Launch Day-Specific Loyalty Program

Create “Wednesday & Tuesday Rewards” program:

  • Double loyalty points on these two days
  • Exclusive deals available only on weak days
  • Email/app notifications highlighting special offers
  • Expected impact: 40-50% closure of revenue gap
if(exists("most_underrep") && nrow(most_underrep) > 0) {
  underrep_sizes <- family_comparison %>% filter(gap < -1.5) %>% pull(household_size)
  
  cat('<div class="recommendation">\n')
  cat("#### Recommendation 2: Family-Targeted Promotions\n\n")
  cat("Focus on underrepresented family sizes: **", paste(underrep_sizes, collapse = ", "), "**\n\n", sep = "")
  cat("**Specific Tactics:**\n\n")
  cat("- **Targeted Campaigns** - e.g. Family of ", most_underrep$household_size, 
      "? Get 20% off our weekly dinner box!\n", sep = "")
  cat("- **Exclusive 'Family Bundles'** - Create ", most_underrep$household_size, 
      "-pack bundles (meals, snacks, beverages)\n", sep = "")
  cat("- **Bulk buy promotions** - 'Buy 3 Get 1 Free' on staples and household essentials\n")
  cat("- **Highlight value packs** - Emphasize nutritional value and family-friendly items\n")
  cat("- **Partner with family-oriented apps** for targeted digital advertising\n")
  cat('</div>')
}

Recommendation 2: Family-Targeted Promotions

Focus on underrepresented family sizes: ****

Specific Tactics:

  • Targeted Campaigns - e.g. Family of 3? Get 20% off our weekly dinner box!
  • Exclusive ‘Family Bundles’ - Create 3-pack bundles (meals, snacks, beverages)
  • Bulk buy promotions - ‘Buy 3 Get 1 Free’ on staples and household essentials
  • Highlight value packs - Emphasize nutritional value and family-friendly items
  • Partner with family-oriented apps for targeted digital advertising

Recommendation 3: Promotional Calendar

Create a structured approach:

  • Week 1: Launch pilot “Weak Day Rewards” with double points
  • Week 2-4: A/B test different family promotions (discount % vs. bundle deals)
  • Month 2: Scale successful tactics; add time-specific flash sales if needed
  • Month 3: Measure results and refine strategy

Recommendation 4: Marketing Channel Strategy

  • Email campaigns: Target families by household size with personalized offers
  • Mobile app push notifications: Day-specific deals on Tuesday/Wednesday mornings
  • In-store signage: Highlight “Family Pack Deals” on weak days
  • Social media: “Weak Day Wednesday” and “Tuesday Treats” campaigns

Implementation Summary

Key Takeaways

Primary Growth Lever: Target the two weakest days (Wednesday and Tuesday) with day-specific promotions

$486,551 Annual Potential

Secondary Growth Lever: Create family-size targeted promotions to attract underrepresented household segments on weak days

$428,411 Time-Based Potential (Top 15)

Why This Will Work

  1. Focused: Two specific days vs. broad-based promotions = efficient marketing spend
  2. Targeted: Family-size segmentation = relevant offers that resonate
  3. Measurable: Clear baseline and KPIs for tracking success
  4. Low-risk: Pilot approach allows testing before full rollout
  5. Scalable: Success model can extend to other opportunities

Success Metrics

  • Week-over-week traffic increase on target days
  • Family size distribution shift on weak days
  • Revenue per day trending toward average
  • Customer satisfaction scores during promotional periods

Appendix: Technical Details

Data Processing Code

# All code chunks are shown here for reference
# Daily analysis
day_analysis <- transactions_clean %>%
  group_by(day_of_week, day_of_week_num) %>%
  summarise(
    total_transactions = n(),
    unique_households = n_distinct(household_id),
    total_sales = sum(sales_value, na.rm = TRUE),
    avg_basket_size = mean(sales_value, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(day_of_week_num) %>%
  mutate(
    pct_transactions = total_transactions / sum(total_transactions) * 100,
    pct_sales = total_sales / sum(total_sales) * 100
  )

# Identify weak days
weakest_day <- day_analysis %>% arrange(total_sales) %>% slice(1)
second_weakest <- day_analysis %>% arrange(total_sales) %>% slice(2)

# Calculate gaps
avg_daily_sales <- mean(day_analysis$total_sales)
weak_gap <- avg_daily_sales - weakest_day$total_sales
second_gap <- avg_daily_sales - second_weakest$total_sales
total_gap <- weak_gap + second_gap

Analysis Date: 2025-10-09 | Business Hours Analyzed: 6:00 AM - 11:00 PM | Data Source: CompleteJourney - 2,700 households, 1 year of transactions