REGORK FINAL PROJECT

INTRODUCTION

Introduction:

In an increasingly competitive grocery chain industry, it is more important than ever for Regork to strategically capitalize on its investments to maintain a strong market position. By taking our findings into consideration, Regork will not only remain as one of the top grocer chains, but it will “thrive” as the most adaptable, affordable, and advanced one in the market.

What the Problem Statement is:

Our Regork customers are not as satisfied as they need to be; their loyalty to Regork is waning as they turn to our competitors. Our current practices are resulting in customer churn, signaling an error on our end that is costing us severely.

Question: Where should Regork invest in targeted, calendar-aware promotions and in-aisle bundles—based on observed purchase seasonality and frequently co-purchased items—to lift revenue with existing traffic?

How We Address It:

In order to figure out how to stop this negative trend, we first need to analyze “why” our customers are shopping with other grocers. We need to gather data to gain insight into buyer behavior to draw conclusions and make educated predictions. We based our findings off of the Complete Journey Study provided by 84.51°, a partner brand of Regork that specializes in data analytics. By using advanced data wrangling techniques, we can develop a better understanding of our consumers, leading to business practices that better suit them.

Our Solution

Based on our findings, we believe that an evolved, seasonally-conscious marketing campaign will prove to be incredibly helpful in gaining more customer loyalty. We want our customers to feel that we offer the best deals exactly when they need them; the best way to do this is to have an innovative structure that focuses on general shopping trends. This will also give us more of a competitive edge against other grocery chains. For example, timing promotions when foot traffic is highest will generate more sales.

Moreover, a marketing campaign revolving around customer incentives for our top departments will cause a majority of our old customers, along with new, to repeatedly shop at Regork. This will involve offering promotions, exclusive loyalty programs, and well-timed discounts. Additionally, offering wide-spread communication for our consumers to alert them to these deals will be imperative. We need to capitalize on our strengths, and highlighting our best departments is a great way to do that.

Furthermore a personalized approach to for a communication strategy will portray an understanding and familiar essence to our customer, resulting in increased customer engagement. A mass-audience communication strategy will reach the largest audience possible, while a personalized “opt-in” messaging will result in a stronger allure to shop at Regork to a niche group of individuals.

Based on our data, we believe these strategies will prove to be the most effective at reducing customer churn. After understanding our consumers’ needs and desires, we came up with strategies that directly relate to them to increase customer satisfaction and loyalty. In turn, stronger, longer-lasting relationships will be made between Regork and the customer.

PACKAGES/LIBRARIES REQUIRED:

Libraries

knitr::opts_chunk$set(message = FALSE, warning = FALSE)

# Import everything up front (i) and quietly (ii)
suppressPackageStartupMessages({
  library(completejourney)  # Complete Journey data access
  library(tidyverse)        # wrangling + plotting
  library(lubridate)        # date features (month/week/day-of-week)
  library(janitor)          # clean_names() and quick QA tables
  library(gt)               # clean, presentation-ready tables
  library(scales)           # currency/percent labels
  library(widyr)            # simple co-occurrence (bundle ideas)
})

What each package is for:

completejourney: loads the full transactions, products, households, and promotions datasets.
tidyverse: makes data cleaning, joining, and plotting simple.
lubridate: turns timestamps into month/week/day-of-week for seasonality.
janitor: fixes messy column names and helps quick data checks.
gt: creates polished tables for business-friendly display.
scales: formats dollars/percentages in plots and tables.
widyr: finds items frequently bought together (bundle candidates).

EXPLORATORY DATA

Table: Top 10 Departments by Sales

tx   <- completejourney::get_transactions()
prod <- completejourney::products
txp  <- dplyr::left_join(tx, prod, by = "product_id")
txp$tx_date <- as.Date(txp$transaction_timestamp)
txp$month   <- lubridate::month(txp$tx_date, label = TRUE, abbr = TRUE)

dept_tbl <- txp |>
  dplyr::group_by(department) |>
  dplyr::summarise(
    sales = sum(sales_value, na.rm = TRUE),
    units = sum(quantity,    na.rm = TRUE),
    .groups = "drop"
  ) |>
  dplyr::arrange(dplyr::desc(sales)) |>
  dplyr::slice_head(n = 10)

dept_tbl |>
  dplyr::mutate(sales = scales::dollar(sales)) |>
  gt::gt() |>
  gt::tab_header("Top 10 Departments by Sales")

department	sales	units
Top 10 Departments by Sales
GROCERY	$2,316,394	1242944
DRUG GM	$596,827	198635
FUEL	$329,594	129662940
PRODUCE	$322,859	185444
MEAT	$308,575	67526
MEAT-PCKGD	$232,283	83423
DELI	$148,344	37954
MISCELLANEOUS	$78,859	21361882
PASTRY	$69,117	28132
NUTRITION	$57,261	25024

Graph: Sales by Month

by_month <- txp |>
  dplyr::filter(!is.na(month)) |>
  dplyr::group_by(month) |>
  dplyr::summarise(sales = sum(sales_value, na.rm = TRUE), .groups = "drop")

ggplot2::ggplot(by_month, ggplot2::aes(month, sales, fill = month)) +
  ggplot2::geom_col(color = "white") +
  ggplot2::scale_y_continuous(labels = scales::dollar) +
  ggplot2::scale_fill_brewer(palette = "Set3", guide = "none") +
  ggplot2::labs(title = "Sales by Month", x = NULL, y = "Sales ($)")

Graph: Sales by Day of Week

# daily sales
dy <- txp |>
  dplyr::filter(!is.na(tx_date)) |>
  dplyr::group_by(tx_date) |>
  dplyr::summarise(sales = sum(sales_value, na.rm = TRUE), .groups = "drop") |>
  dplyr::mutate(
    month_start = lubridate::floor_date(tx_date, "month"),
    month_lab   = format(month_start, "%b %Y"),
    dow         = lubridate::wday(tx_date, label = TRUE, abbr = TRUE, week_start = 1),
    wom         = ceiling((lubridate::day(tx_date) + 
                           lubridate::wday(month_start, week_start = 1) - 1) / 7)
  )

ggplot2::ggplot(dy, ggplot2::aes(x = wom, y = dow, fill = sales)) +
  ggplot2::geom_tile(color = "white", linewidth = 0.3) +
  ggplot2::scale_fill_viridis_c(labels = scales::dollar) +
  ggplot2::facet_wrap(~ month_lab, ncol = 3) +
  ggplot2::labs(
    title = "Calendar Heatmap: Daily Sales",
    x = "Week of Month", y = NULL, fill = "Sales"
  ) +
  ggplot2::theme_minimal(base_size = 12) +
  ggplot2::theme(
    strip.text = ggplot2::element_text(face = "bold"),
    panel.spacing = grid::unit(0.8, "lines")
  )

Graph: Sales by Day of Week

txp$dow <- lubridate::wday(txp$tx_date, label = TRUE, abbr = TRUE)
by_dow <- txp |>
  dplyr::filter(!is.na(dow)) |>
  dplyr::group_by(dow) |>
  dplyr::summarise(sales = sum(sales_value, na.rm = TRUE), .groups = "drop")

ggplot2::ggplot(by_dow, ggplot2::aes(dow, sales)) +
  ggplot2::geom_segment(ggplot2::aes(x = dow, xend = dow, y = 0, yend = sales),
                        linewidth = 1.2, color = "green") +
  ggplot2::geom_point(size = 3, color = "purple") +
  ggplot2::scale_y_continuous(labels = scales::dollar) +
  ggplot2::labs(title = "Sales by Day of Week", x = NULL, y = "Sales ($)")

Graph: Category Seasonality (Lift vs. Own Average)

nm_lower <- tolower(names(txp))
cand <- c("commodity","commodity_desc","product_category","category","sub_commodity")
has  <- names(txp)[nm_lower %in% cand]
cat_col <- if (length(has)) has[1] else "department"  # fallback

# top 6 categories by sales
comm_tbl <- txp |>
  dplyr::count(!!rlang::sym(cat_col), wt = sales_value, name = "sales") |>
  dplyr::arrange(dplyr::desc(sales)) |>
  dplyr::slice_head(n = 6) |>
  dplyr::rename(category = !!rlang::sym(cat_col))

top_cats <- comm_tbl$category

# monthly lift vs each category's own average
comm_month <- txp |>
  dplyr::filter(!!rlang::sym(cat_col) %in% top_cats, !is.na(month)) |>
  dplyr::group_by(!!rlang::sym(cat_col), month) |>
  dplyr::summarise(sales = sum(sales_value, na.rm = TRUE), .groups = "drop_last") |>
  dplyr::mutate(lift = sales / mean(sales)) |>
  dplyr::rename(category = !!rlang::sym(cat_col)) |>
  dplyr::ungroup()

ggplot2::ggplot(comm_month, ggplot2::aes(month, lift, group = category)) +
  ggplot2::geom_line(linewidth = 1, color = "#1F78B4") +
  ggplot2::geom_hline(yintercept = 1, linetype = 2) +
  ggplot2::facet_wrap(~ category, scales = "free_y") +
  ggplot2::labs(
    title = paste0("Seasonality by ", cat_col, " (Lift vs. Own Average)"),
    x = NULL, y = "Lift"
  )

BANA 7025 FINAL

Kylie Hannum, Anthony Busch and Jeena Patel

2025-10-08

REGORK FINAL PROJECT

INTRODUCTION

Introduction:

What the Problem Statement is:

How We Address It:

Our Solution

PACKAGES/LIBRARIES REQUIRED:

Libraries

What each package is for:

EXPLORATORY DATA

Table: Top 10 Departments by Sales

Graph: Sales by Month

Graph: Sales by Day of Week

Graph: Sales by Day of Week

Graph: Category Seasonality (Lift vs. Own Average)

SUMMARY