Final Project

Introduction

This analysis will be performed using Exploratory Data Analysis from the Complete Journey dataset and will focus on identifying purchasing trends, demographic patterns related to household size and income, and the impact of specific marketing campaigns (based on their IDs) on candy sales. By uncovering these insights, we aim to provide actionable recommendations to boost candy sales and enhance customer engagement.

Data Used:

Data Tables used from complete Journey:

Transactions
Demographics
Products

Using these datasets, data frames will be created and visualized in various graph styles, which will be explained in detail throughout this report.

Packages Needed:

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
library(scales)

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

library(dplyr)
library(completejourney)

## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.

library(ggCorpIdent)

## Loading required package: extrafont
## Registering fonts with R
## 
## Attaching package: 'ggCorpIdent'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     ggplot, scale_colour_discrete, scale_fill_discrete

library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

library(reshape2)

## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths

library(scales)
library(sunburstR)
library(RColorBrewer) 
library(extrafont)
library(paletteer)

# Tidy up the data into a single dataframe

transactions <- get_transactions()
prod <- products
demo <- demographics
coupon <- coupon_redemptions

df <- transactions %>%
  inner_join(prod) %>%
  inner_join(demo) %>%
  inner_join(coupon)

## Joining with `by = join_by(product_id)`
## Joining with `by = join_by(household_id)`
## Joining with `by = join_by(household_id)`

## Warning in inner_join(., coupon): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 3 of `x` matches multiple rows in `y`.
## ℹ Row 1549 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

df

df_forcandy <- transactions %>%
  inner_join(prod) %>%
  inner_join(demo)

## Joining with `by = join_by(product_id)`
## Joining with `by = join_by(household_id)`

# bottom 10 products in Grocery
df_grocery <- df_forcandy %>%
  filter(department == "GROCERY") %>%  
  group_by(product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
  arrange(total_sales) 


df_bottom_grocery <- df_grocery %>%
  slice_head(n = 10)

# Plot
ggplot(df_bottom_grocery, aes(x = reorder(product_type, total_sales), y = total_sales)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  coord_flip() + 
  scale_y_continuous(labels = label_comma()) + 
  labs(title = "Lowest-Performing Products in Grocery by Total Sales",
       x = "Product Type",
       y = "Total Sales") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),  
    axis.text.y = element_text(size = 14),  
    axis.title = element_text(size = 12),  
    axis.title.y = element_text(margin = margin(r = 20))  
  )

Look at the top lowest performing products, we chose to look into candy.

# DF candy
candy_df <- df %>%
  filter(grepl("candy", product_type, ignore.case = TRUE))

# timestamp
candy_df <- candy_df %>%
  mutate(month = month(transaction_timestamp))

# DF monthly quantity
candy_by_month <- candy_df %>%
  group_by(month) %>%
  summarise(quantity = sum(quantity, na.rm = TRUE)) %>%
  ungroup()

candy_by_month <- candy_by_month %>%
  mutate(month_name = factor(month.name[month], levels = month.name))

candy_by_month <- candy_by_month %>%
  arrange(month)

# Bar plot
ggplot(candy_by_month, aes(x = month_name, y = quantity, fill = month_name)) +
  geom_bar(stat = "identity") +
  scale_fill_paletteer_d("colorBlindness::Blue2DarkRed12Steps") +  
  labs(
    title = "Amount of Candy Purchased Monthly in 2017",
    x = "Month",
    y = "Total Quantity Sold"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 12),
    plot.title = element_text(size = 16)
  ) +
  geom_text(
    aes(label = scales::comma(quantity)), 
    vjust = -0.5, 
    color = "black",
    size = 4
  )

# df
candy_df <- df %>%
  filter(grepl("candy", product_type, ignore.case = TRUE)) %>%
  select(income, transaction_timestamp, sales_value, product_type) %>%
  mutate(month = month(transaction_timestamp))

# Monthly candy sales by income group
candy_by_month_income <- candy_df %>%
  group_by(month, income) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
  arrange(month) %>%
  mutate(month_name = factor(month.name[month], levels = month.name))

# Plot 
ggplot(candy_by_month_income, aes(x = month_name, y = total_sales, fill = income)) +
  geom_bar(stat = "identity", position = "stack") +
  scale_fill_paletteer_d("impressionist.colors::korenveld_onder_onweerslucht") +  
  labs(
    title = "Monthly Trends in Candy Sales by Income Group",
    x = "Month",
    y = "Total Sales"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 12),
    plot.title = element_text(size = 16),
    plot.margin = margin(10, 10, 10, 10)
  )

We see that higher income group purchase less candy than lower income group.

Monthly Trends in Candy Sales

# Candy DF
candy_df <- df %>%
  filter(grepl("candy", product_type, ignore.case = TRUE))

candy_df <- candy_df %>%
  mutate(month = month(transaction_timestamp))

# Sum of sales of Candy
candy_by_month <- candy_df %>%
  group_by(month) %>%
  summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
  ungroup()

candy_by_month <- candy_by_month %>%
  mutate(month_name = factor(month.name[month], levels = month.name))

# Months sort
candy_by_month <- candy_by_month %>%
  arrange(month)

# Line plot 
ggplot(candy_by_month, aes(x = month_name, y = sales_value, group = 1)) +
  geom_line(color = "#F9FFAF", linewidth = 1.5) +  
  geom_point(color = "#00A3A8", size = 4) +  
  labs(
    title = "Monthly Trends in Candy Sales (2017)",
    x = "Month",
    y = "Total Quantity Sold"
  ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title = element_text(size = 12),
    plot.title = element_text(size = 16)
  ) +
  geom_text(
    aes(label = scales::comma(sales_value)),  
    vjust = -0.5, 
    color = "black",
    size = 4
  ) +
  theme(
    plot.margin = margin(10, 10, 10, 10)
  )

Candy sales have clear seasonal patterns, especially in relation to Halloween (October) and December (likely driven by the holidays). These patterns suggest that candy sales peak during specific months and experience a dip in January due to post-holiday, New Year’s resolutions, and customers making healthier choices.

# Candy DF
df_candy <- df %>%
  filter(grepl("candy", product_type, ignore.case = TRUE))
candy_sales_by_income_household <- df_candy %>%
  group_by(income, household_size) %>%
  summarise(total_sales_value = sum(sales_value, na.rm = TRUE), .groups = "drop")

pivot_table <- candy_sales_by_income_household %>%
  pivot_wider(names_from = household_size, values_from = total_sales_value, values_fill = list(total_sales_value = 0))

melted_data <- melt(pivot_table, id.vars = "income", variable.name = "household_size", value.name = "total_sales_value")

# Heatmap
plot_ly(melted_data, x = ~household_size, y = ~income, z = ~total_sales_value, type = "heatmap", colors = "Blues", 
        colorbar = list(title = "Total Sales ($)")) %>%
  layout(
    title = "Sales of Candy Products by Income Group and Household Size",
    xaxis = list(title = "Household Size"),
    yaxis = list(title = "Income Group")
  )

# Candy DF
df_candy <- df %>%
  filter(grepl("candy", product_type, ignore.case = TRUE))

candy_sales_by_income_kids <- df_candy %>%
  group_by(income, kids_count) %>%
  summarise(total_sales_value = sum(sales_value, na.rm = TRUE), .groups = "drop")

pivot_table <- candy_sales_by_income_kids %>%
  pivot_wider(names_from = kids_count, values_from = total_sales_value, values_fill = list(total_sales_value = 0))

melted_data <- melt(pivot_table, id.vars = "income", variable.name = "kids_count", value.name = "total_sales_value")

# Heatmap using plotly
plot_ly(melted_data, x = ~kids_count, y = ~income, z = ~total_sales_value, type = "heatmap", colors = "Blues", 
        colorbar = list(title = "Total Sales ($)")) %>%
  layout(
    title = "Sales of Candy Products by Income Group and Kids Count",
    xaxis = list(title = "Kids Count"),
    yaxis = list(title = "Income Group")
  )

Create Targeted Bundling Promotions

df_candy <- df %>%
  filter(grepl("candy", product_type, ignore.case = TRUE))  

candy_baskets <- unique(df_candy$basket_id)

purchases <- df %>%
  filter(basket_id %in% candy_baskets)

purchases_grocery <- purchases %>%
  filter(department == "GROCERY") %>%
  filter(!(grepl("candy", product_type, ignore.case = TRUE)))  


top_purchases <- purchases_grocery %>%
  group_by(product_type) %>%
  summarise(total_quantity = sum(quantity, na.rm = TRUE)) %>%
  arrange(desc(total_quantity)) %>%
  head(10)

#plot
ggplot(top_purchases, aes(x = reorder(product_type, total_quantity), y = total_quantity, fill = total_quantity)) +
  geom_bar(stat = "identity") +
  coord_flip() +  
  scale_fill_gradient(low = "lightblue", high = "darkblue") +  
  labs(
    title = "Top 10 Products Bought with Candy in the GROCERY Department",
    x = "Product Type",
    y = "Total Quantity Purchased"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.title = element_text(size = 12),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10)
  ) +
  theme(
    plot.margin = margin(10, 10, 10, 10)
  )

By bundling candy with milk, yogurt, and soft drinks, Regork can increase sales and encourage customers to purchase more items together. Bundling also enhances the perceived value of the offer, making it attractive for customers.

df_candy <- df %>%
  filter(grepl("candy", product_type, ignore.case = TRUE))

# sales and quantity by campaign_id
candy_sales_by_campaign <- df_candy %>%
  group_by(campaign_id) %>%
  summarise(
    total_sales_value = sum(sales_value, na.rm = TRUE),
    total_quantity = sum(quantity, na.rm = TRUE),
    .groups = 'drop'  
  )

# top 10 campaigns 
candy_sales_campaign <- candy_sales_by_campaign %>%
  arrange(desc(total_sales_value)) %>%
  slice_head(n = 10)  


candy_sales_campaign <- candy_sales_campaign %>%
  mutate(
    campaign_id = as.character(campaign_id), 
    total_sales_value = as.character(total_sales_value)  
  ) %>%
  select(campaign_id, total_sales_value)  


# pie chart
sunburst(candy_sales_campaign, count = TRUE)

# df
df_top_sellers <- df %>%
  group_by(income, store_id) %>%
  summarize(Avg_total_sales = mean(sales_value, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(Avg_total_sales)) %>%  # Sort all by Avg_total_sales in descending order
  group_by(income) %>%
  slice_head(n = 4)

# Plot
ggplot(df_top_sellers, aes(reorder(store_id, Avg_total_sales), 
        y = Avg_total_sales, fill = income)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_y_continuous(labels = label_comma()) +
  scale_fill_paletteer_d("impressionist.colors::korenveld_onder_onweerslucht") + # Custom color palette
  labs(title = "Most Popular & Profitable Stores (per Income Group)",
       x = "Store_ID",
       y = "Average Total Sales",
       fill = "Income Group") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1),
        axis.text.y = element_text(size = 10),
        axis.title = element_text(size = 12),
        axis.title.y = element_text(margin = margin(r = 20)))

# df
df_top_sellers <- df %>%
  group_by(store_id) %>%
  summarize(Avg_total_sales = mean(sales_value, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(Avg_total_sales)) %>%  
  slice_head(n = 12)  

# Plot
ggplot(df_top_sellers, aes(reorder(store_id, Avg_total_sales), 
        y = Avg_total_sales, fill = store_id)) +  
  geom_bar(stat = "identity") +  
  coord_flip() + 
  scale_y_continuous(labels = label_comma()) +  
  scale_fill_paletteer_d("palettetown::dragonite") +  
  labs(title = "Most Popular & Profitable Stores",
       x = "Store ID",
       y = "Average Total Sales") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1),
        axis.text.y = element_text(size = 10),
        axis.title = element_text(size = 12),
        axis.title.y = element_text(margin = margin(r = 20)))

Top average sales is store id: 901, 3149, 602, 197, 3163. Targeting marketing campaigns to those stores can improve on sales.

Summary:

Our project focused on Regork’s promotional campaigns for Candy product. Since sales for Candy is one of the lowest perform (bottom 5 sales in Grocery) and saw potential in growth. We focus our business question to: How can Regork optimize candy sales by identifying key product pairings, seasonal demand patterns, and effective marketing campaigns?

Key insights:

Top Performing Stores:

The stores with the highest average candy sales are:
- Store 901
- Store 3149
- Store 602
- Store 197
- Store 3163
These stores consistently outperform others in terms of candy sales, suggesting that there may be specific factors contributing to their success.

Target Customer Segments:

Candy sales are highest in households with:
- Household Size 2 and Income Group 50-74K.
- Household Size 1 and Income Group 50-74K.
- Households with 0 Kids and Income Group 50-74K, followed by those with an income of 75-99K.

Top Products Purchased with Candy:

Milk, yogurt, and soft drinks are the most frequently bought products along with candy, indicating potential cross-selling opportunities.

Seasonal Demand Patterns:

Candy sales peak in October (likely due to Halloween) and December (due to holiday season), while January sees the lowest sales.

Conclusion:

To optimize candy sales and maximize revenue, Regork should prioritize marketing efforts on the highest-performing stores (901, 3149, 602, 197, 3163) while also tailoring campaigns to the specific customer segments that are most likely to purchase candy. Seasonal promotions, strategic cross-selling, and bundled product offerings with milk, yogurt, and soft drinks can also drive increased sales. By leveraging data and aligning marketing strategies with the store’s performance and customer demographics, Regork can significantly enhance candy sales and overall profitability.