This analysis will be performed using Exploratory Data Analysis from
the Complete Journey dataset and will focus on identifying
purchasing trends, demographic patterns related to
household size and income, and the impact of
specific marketing campaigns (based on their IDs) on candy
sales. By uncovering these insights, we aim to provide actionable
recommendations to boost candy sales and enhance customer
engagement.
Data Tables used from complete Journey:
TransactionsDemographicsProductsUsing these datasets, data frames will be created and visualized in various graph styles, which will be explained in detail throughout this report.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library(dplyr)
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(ggCorpIdent)
## Loading required package: extrafont
## Registering fonts with R
##
## Attaching package: 'ggCorpIdent'
##
## The following objects are masked from 'package:ggplot2':
##
## ggplot, scale_colour_discrete, scale_fill_discrete
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(reshape2)
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
library(scales)
library(sunburstR)
library(RColorBrewer)
library(extrafont)
library(paletteer)
# Tidy up the data into a single dataframe
transactions <- get_transactions()
prod <- products
demo <- demographics
coupon <- coupon_redemptions
df <- transactions %>%
inner_join(prod) %>%
inner_join(demo) %>%
inner_join(coupon)
## Joining with `by = join_by(product_id)`
## Joining with `by = join_by(household_id)`
## Joining with `by = join_by(household_id)`
## Warning in inner_join(., coupon): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 3 of `x` matches multiple rows in `y`.
## ℹ Row 1549 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
df
df_forcandy <- transactions %>%
inner_join(prod) %>%
inner_join(demo)
## Joining with `by = join_by(product_id)`
## Joining with `by = join_by(household_id)`
# bottom 10 products in Grocery
df_grocery <- df_forcandy %>%
filter(department == "GROCERY") %>%
group_by(product_type) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
arrange(total_sales)
df_bottom_grocery <- df_grocery %>%
slice_head(n = 10)
# Plot
ggplot(df_bottom_grocery, aes(x = reorder(product_type, total_sales), y = total_sales)) +
geom_bar(stat = "identity", fill = "skyblue") +
coord_flip() +
scale_y_continuous(labels = label_comma()) +
labs(title = "Lowest-Performing Products in Grocery by Total Sales",
x = "Product Type",
y = "Total Sales") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.text.y = element_text(size = 14),
axis.title = element_text(size = 12),
axis.title.y = element_text(margin = margin(r = 20))
)
Look at the top lowest performing products, we chose to look into candy.
# DF candy
candy_df <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE))
# timestamp
candy_df <- candy_df %>%
mutate(month = month(transaction_timestamp))
# DF monthly quantity
candy_by_month <- candy_df %>%
group_by(month) %>%
summarise(quantity = sum(quantity, na.rm = TRUE)) %>%
ungroup()
candy_by_month <- candy_by_month %>%
mutate(month_name = factor(month.name[month], levels = month.name))
candy_by_month <- candy_by_month %>%
arrange(month)
# Bar plot
ggplot(candy_by_month, aes(x = month_name, y = quantity, fill = month_name)) +
geom_bar(stat = "identity") +
scale_fill_paletteer_d("colorBlindness::Blue2DarkRed12Steps") +
labs(
title = "Amount of Candy Purchased Monthly in 2017",
x = "Month",
y = "Total Quantity Sold"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.title = element_text(size = 12),
plot.title = element_text(size = 16)
) +
geom_text(
aes(label = scales::comma(quantity)),
vjust = -0.5,
color = "black",
size = 4
)
# df
candy_df <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE)) %>%
select(income, transaction_timestamp, sales_value, product_type) %>%
mutate(month = month(transaction_timestamp))
# Monthly candy sales by income group
candy_by_month_income <- candy_df %>%
group_by(month, income) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
arrange(month) %>%
mutate(month_name = factor(month.name[month], levels = month.name))
# Plot
ggplot(candy_by_month_income, aes(x = month_name, y = total_sales, fill = income)) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_paletteer_d("impressionist.colors::korenveld_onder_onweerslucht") +
labs(
title = "Monthly Trends in Candy Sales by Income Group",
x = "Month",
y = "Total Sales"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.title = element_text(size = 12),
plot.title = element_text(size = 16),
plot.margin = margin(10, 10, 10, 10)
)
We see that higher income group purchase less candy than lower income group.
# Candy DF
candy_df <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE))
candy_df <- candy_df %>%
mutate(month = month(transaction_timestamp))
# Sum of sales of Candy
candy_by_month <- candy_df %>%
group_by(month) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
candy_by_month <- candy_by_month %>%
mutate(month_name = factor(month.name[month], levels = month.name))
# Months sort
candy_by_month <- candy_by_month %>%
arrange(month)
# Line plot
ggplot(candy_by_month, aes(x = month_name, y = sales_value, group = 1)) +
geom_line(color = "#F9FFAF", linewidth = 1.5) +
geom_point(color = "#00A3A8", size = 4) +
labs(
title = "Monthly Trends in Candy Sales (2017)",
x = "Month",
y = "Total Quantity Sold"
) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.title = element_text(size = 12),
plot.title = element_text(size = 16)
) +
geom_text(
aes(label = scales::comma(sales_value)),
vjust = -0.5,
color = "black",
size = 4
) +
theme(
plot.margin = margin(10, 10, 10, 10)
)
Candy sales have clear seasonal patterns, especially in relation to Halloween (October) and December (likely driven by the holidays). These patterns suggest that candy sales peak during specific months and experience a dip in January due to post-holiday, New Year’s resolutions, and customers making healthier choices.
# Candy DF
df_candy <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE))
candy_sales_by_income_household <- df_candy %>%
group_by(income, household_size) %>%
summarise(total_sales_value = sum(sales_value, na.rm = TRUE), .groups = "drop")
pivot_table <- candy_sales_by_income_household %>%
pivot_wider(names_from = household_size, values_from = total_sales_value, values_fill = list(total_sales_value = 0))
melted_data <- melt(pivot_table, id.vars = "income", variable.name = "household_size", value.name = "total_sales_value")
# Heatmap
plot_ly(melted_data, x = ~household_size, y = ~income, z = ~total_sales_value, type = "heatmap", colors = "Blues",
colorbar = list(title = "Total Sales ($)")) %>%
layout(
title = "Sales of Candy Products by Income Group and Household Size",
xaxis = list(title = "Household Size"),
yaxis = list(title = "Income Group")
)
# Candy DF
df_candy <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE))
candy_sales_by_income_kids <- df_candy %>%
group_by(income, kids_count) %>%
summarise(total_sales_value = sum(sales_value, na.rm = TRUE), .groups = "drop")
pivot_table <- candy_sales_by_income_kids %>%
pivot_wider(names_from = kids_count, values_from = total_sales_value, values_fill = list(total_sales_value = 0))
melted_data <- melt(pivot_table, id.vars = "income", variable.name = "kids_count", value.name = "total_sales_value")
# Heatmap using plotly
plot_ly(melted_data, x = ~kids_count, y = ~income, z = ~total_sales_value, type = "heatmap", colors = "Blues",
colorbar = list(title = "Total Sales ($)")) %>%
layout(
title = "Sales of Candy Products by Income Group and Kids Count",
xaxis = list(title = "Kids Count"),
yaxis = list(title = "Income Group")
)
df_candy <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE))
candy_baskets <- unique(df_candy$basket_id)
purchases <- df %>%
filter(basket_id %in% candy_baskets)
purchases_grocery <- purchases %>%
filter(department == "GROCERY") %>%
filter(!(grepl("candy", product_type, ignore.case = TRUE)))
top_purchases <- purchases_grocery %>%
group_by(product_type) %>%
summarise(total_quantity = sum(quantity, na.rm = TRUE)) %>%
arrange(desc(total_quantity)) %>%
head(10)
#plot
ggplot(top_purchases, aes(x = reorder(product_type, total_quantity), y = total_quantity, fill = total_quantity)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(
title = "Top 10 Products Bought with Candy in the GROCERY Department",
x = "Product Type",
y = "Total Quantity Purchased"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.title = element_text(size = 12),
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10)
) +
theme(
plot.margin = margin(10, 10, 10, 10)
)
By bundling candy with milk, yogurt, and soft drinks, Regork can increase sales and encourage customers to purchase more items together. Bundling also enhances the perceived value of the offer, making it attractive for customers.
df_candy <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE))
# sales and quantity by campaign_id
candy_sales_by_campaign <- df_candy %>%
group_by(campaign_id) %>%
summarise(
total_sales_value = sum(sales_value, na.rm = TRUE),
total_quantity = sum(quantity, na.rm = TRUE),
.groups = 'drop'
)
# top 10 campaigns
candy_sales_campaign <- candy_sales_by_campaign %>%
arrange(desc(total_sales_value)) %>%
slice_head(n = 10)
candy_sales_campaign <- candy_sales_campaign %>%
mutate(
campaign_id = as.character(campaign_id),
total_sales_value = as.character(total_sales_value)
) %>%
select(campaign_id, total_sales_value)
# pie chart
sunburst(candy_sales_campaign, count = TRUE)
# df
df_top_sellers <- df %>%
group_by(income, store_id) %>%
summarize(Avg_total_sales = mean(sales_value, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(Avg_total_sales)) %>% # Sort all by Avg_total_sales in descending order
group_by(income) %>%
slice_head(n = 4)
# Plot
ggplot(df_top_sellers, aes(reorder(store_id, Avg_total_sales),
y = Avg_total_sales, fill = income)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(labels = label_comma()) +
scale_fill_paletteer_d("impressionist.colors::korenveld_onder_onweerslucht") + # Custom color palette
labs(title = "Most Popular & Profitable Stores (per Income Group)",
x = "Store_ID",
y = "Average Total Sales",
fill = "Income Group") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1),
axis.text.y = element_text(size = 10),
axis.title = element_text(size = 12),
axis.title.y = element_text(margin = margin(r = 20)))
# df
df_top_sellers <- df %>%
group_by(store_id) %>%
summarize(Avg_total_sales = mean(sales_value, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(Avg_total_sales)) %>%
slice_head(n = 12)
# Plot
ggplot(df_top_sellers, aes(reorder(store_id, Avg_total_sales),
y = Avg_total_sales, fill = store_id)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(labels = label_comma()) +
scale_fill_paletteer_d("palettetown::dragonite") +
labs(title = "Most Popular & Profitable Stores",
x = "Store ID",
y = "Average Total Sales") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1),
axis.text.y = element_text(size = 10),
axis.title = element_text(size = 12),
axis.title.y = element_text(margin = margin(r = 20)))
Top average sales is store id: 901, 3149, 602, 197, 3163. Targeting marketing campaigns to those stores can improve on sales.
Our project focused on Regork’s promotional campaigns for Candy product. Since sales for Candy is one of the lowest perform (bottom 5 sales in Grocery) and saw potential in growth. We focus our business question to: How can Regork optimize candy sales by identifying key product pairings, seasonal demand patterns, and effective marketing campaigns?
Seasonal Demand Patterns:
Candy sales peak in October (likely due to Halloween) and December (due to holiday season), while January sees the lowest sales.
To optimize candy sales and maximize revenue, Regork
should prioritize marketing efforts on the highest-performing stores
(901, 3149, 602, 197, 3163) while also tailoring campaigns to the
specific customer segments that are most likely to purchase candy.
Seasonal promotions, strategic cross-selling, and bundled product
offerings with milk, yogurt, and
soft drinks can also drive increased sales. By leveraging
data and aligning marketing strategies with the store’s performance and
customer demographics, Regork can significantly enhance candy sales and
overall profitability.