This analysis will be performed using Exploratory Data Analysis from
the Complete Journey dataset and will focus on identifying
purchasing trends, demographic patterns related to
household size and income, and the impact of
specific marketing campaigns (based on their IDs) on candy
sales. By uncovering these insights, we aim to provide actionable
recommendations to boost candy sales and enhance customer
engagement.
Data Tables used from complete Journey:
TransactionsDemographicsProductsUsing these datasets, data frames will be created and visualized in various graph styles, which will be explained in detail throughout this report.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library(dplyr)
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(ggCorpIdent)
## Loading required package: extrafont
## Registering fonts with R
##
## Attaching package: 'ggCorpIdent'
##
## The following objects are masked from 'package:ggplot2':
##
## ggplot, scale_colour_discrete, scale_fill_discrete
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(reshape2)
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
library(scales)
library(sunburstR)
library(RColorBrewer)
library(extrafont)
library(paletteer)
# Tidy up the data into a single dataframe
transactions <- get_transactions()
prod <- products
demo <- demographics
coupon <- coupon_redemptions
df <- transactions %>%
inner_join(prod) %>%
inner_join(demo) %>%
inner_join(coupon)
## Joining with `by = join_by(product_id)`
## Joining with `by = join_by(household_id)`
## Joining with `by = join_by(household_id)`
## Warning in inner_join(., coupon): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 3 of `x` matches multiple rows in `y`.
## ℹ Row 1549 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
df
df_forcandy <- transactions %>%
inner_join(prod) %>%
inner_join(demo)
## Joining with `by = join_by(product_id)`
## Joining with `by = join_by(household_id)`
# bottom 10 products in Grocery
df_grocery <- df_forcandy %>%
filter(department == "GROCERY") %>%
group_by(product_type) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
arrange(total_sales)
df_bottom_grocery <- df_grocery %>%
slice_head(n = 10)
# Plot
ggplot(df_bottom_grocery, aes(x = reorder(product_type, total_sales), y = total_sales)) +
geom_bar(stat = "identity", fill = "skyblue") +
coord_flip() +
scale_y_continuous(labels = label_comma()) +
labs(title = "Lowest-Performing Products in Grocery by Total Sales",
x = "Product Type",
y = "Total Sales") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.text.y = element_text(size = 14),
axis.title = element_text(size = 12),
axis.title.y = element_text(margin = margin(r = 20))
)
The lowest performing product-categories in Regork’s sales portfolio. The underwhelming sales performance of the candy-category in 2017 is surprising and disappointing, but also gives us an opportunity to develop practical solutions for tangible sales growth.
# DF candy
candy_df <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE))
# timestamp
candy_df <- candy_df %>%
mutate(month = month(transaction_timestamp))
# DF monthly quantity
candy_by_month <- candy_df %>%
group_by(month) %>%
summarise(quantity = sum(quantity, na.rm = TRUE)) %>%
ungroup()
candy_by_month <- candy_by_month %>%
mutate(month_name = factor(month.name[month], levels = month.name))
candy_by_month <- candy_by_month %>%
arrange(month)
# Bar plot
ggplot(candy_by_month, aes(x = month_name, y = quantity, fill = month_name)) +
geom_bar(stat = "identity") +
scale_fill_paletteer_d("colorBlindness::Blue2DarkRed12Steps") +
labs(
title = "Amount of Candy Purchased Monthly in 2017",
x = "Month",
y = "Total Quantity Sold"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.title = element_text(size = 12),
plot.title = element_text(size = 16)
) +
geom_text(
aes(label = scales::comma(quantity)),
vjust = -0.5,
color = "black",
size = 4
)
A Monthly comparison analysis of candy sales in 2017.
Key Takeaway: Monthly changes do not significantly impact the overall total amount of candy purchased by consumers. Therefore, an effective mechanism implemented to improve the category (overall) should result in continual and robust performance.
# df
candy_df <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE)) %>%
select(income, transaction_timestamp, sales_value, product_type) %>%
mutate(month = month(transaction_timestamp))
# Monthly candy sales by income group
candy_by_month_income <- candy_df %>%
group_by(month, income) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
arrange(month) %>%
mutate(month_name = factor(month.name[month], levels = month.name))
# Plot
ggplot(candy_by_month_income, aes(x = month_name, y = total_sales, fill = income)) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_paletteer_d("impressionist.colors::korenveld_onder_onweerslucht") +
labs(
title = "Monthly Trends in Candy Sales by Income Group",
x = "Month",
y = "Total Sales"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.title = element_text(size = 12),
plot.title = element_text(size = 16),
plot.margin = margin(10, 10, 10, 10)
)
An illustration of the monthly candy sales by Income Group.
Key Takeaways: Overall, candy sales spike during the holiday months (Halloween & Christmas) for all income groups. A drop in candy purchases for the other months is an opportunity to significantly improve the annual sales total.
# Candy DF
candy_df <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE))
candy_df <- candy_df %>%
mutate(month = month(transaction_timestamp))
# Sum of sales of Candy
candy_by_month <- candy_df %>%
group_by(month) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
candy_by_month <- candy_by_month %>%
mutate(month_name = factor(month.name[month], levels = month.name))
# Months sort
candy_by_month <- candy_by_month %>%
arrange(month)
# Line plot
ggplot(candy_by_month, aes(x = month_name, y = sales_value, group = 1)) +
geom_line(color = "#F9FFAF", linewidth = 1.5) +
geom_point(color = "#00A3A8", size = 4) +
labs(
title = "Monthly Trends in Candy Sales (2017)",
x = "Month",
y = "Total Quantity Sold"
) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.title = element_text(size = 12),
plot.title = element_text(size = 16)
) +
geom_text(
aes(label = scales::comma(sales_value)),
vjust = -0.5,
color = "black",
size = 4
) +
theme(
plot.margin = margin(10, 10, 10, 10)
)
Another illustration showing the effect of seasonal patterns on sales performance (especially in relation to the Holidays: Halloween/October, Christmas/December). These observations suggest that candy sales peak during certain months; and then dip after holiday-celebrations have concluded, which suggests that consumers than choose alternative food options for health (or New Years Resolution) reasons.
# Candy DF
df_candy <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE))
candy_sales_by_income_household <- df_candy %>%
group_by(income, household_size) %>%
summarise(total_sales_value = sum(sales_value, na.rm = TRUE), .groups = "drop")
pivot_table <- candy_sales_by_income_household %>%
pivot_wider(names_from = household_size, values_from = total_sales_value, values_fill = list(total_sales_value = 0))
melted_data <- melt(pivot_table, id.vars = "income", variable.name = "household_size", value.name = "total_sales_value")
# Heatmap
plot_ly(melted_data, x = ~household_size, y = ~income, z = ~total_sales_value, type = "heatmap", colors = "Blues",
colorbar = list(title = "Total Sales ($)")) %>%
layout(
title = "Sales of Candy Products by Income Group and Household Size",
xaxis = list(title = "Household Size"),
yaxis = list(title = "Income Group")
)
An illustration of candy sales based on household (family) size & income group.
Key Takeaways: The largest demographic of candy consumers is comprised of a household size of 1 or 2 people, and earn $50 - $74K per year. This demographic should be our target audience in marketing & promotional campaigns.
# Candy DF
df_candy <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE))
candy_sales_by_income_kids <- df_candy %>%
group_by(income, kids_count) %>%
summarise(total_sales_value = sum(sales_value, na.rm = TRUE), .groups = "drop")
pivot_table <- candy_sales_by_income_kids %>%
pivot_wider(names_from = kids_count, values_from = total_sales_value, values_fill = list(total_sales_value = 0))
melted_data <- melt(pivot_table, id.vars = "income", variable.name = "kids_count", value.name = "total_sales_value")
# Heatmap using plotly
plot_ly(melted_data, x = ~kids_count, y = ~income, z = ~total_sales_value, type = "heatmap", colors = "Blues",
colorbar = list(title = "Total Sales ($)")) %>%
layout(
title = "Sales of Candy Products by Income Group and Kids Count",
xaxis = list(title = "Kids Count"),
yaxis = list(title = "Income Group")
)
An illustration of candy sales based on household (family) size & # of Kids.
Key Takeaways: Consumers who make an annual income of $50 - $74K and have no kids are the largest spending group of candy. This is our #1 target audience.
df_candy <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE))
candy_baskets <- unique(df_candy$basket_id)
purchases <- df %>%
filter(basket_id %in% candy_baskets)
purchases_grocery <- purchases %>%
filter(department == "GROCERY") %>%
filter(!(grepl("candy", product_type, ignore.case = TRUE)))
top_purchases <- purchases_grocery %>%
group_by(product_type) %>%
summarise(total_quantity = sum(quantity, na.rm = TRUE)) %>%
arrange(desc(total_quantity)) %>%
head(10)
#plot
ggplot(top_purchases, aes(x = reorder(product_type, total_quantity), y = total_quantity, fill = total_quantity)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(
title = "Top 10 Products Bought with Candy in the GROCERY Department",
x = "Product Type",
y = "Total Quantity Purchased"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.title = element_text(size = 12),
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10)
) +
theme(
plot.margin = margin(10, 10, 10, 10)
)
Sale Strategy - By bundling candy with [milk,
yogurt, and soft drinks], Regork is ensured to
increase annual sales and encourage customers to purchase more store
items as a result. A “Bundling Strategy” also enhances the perceived
value of the offer, and thus makes it even more attractive to
customers.
df_candy <- df %>%
filter(grepl("candy", product_type, ignore.case = TRUE))
# sales and quantity by campaign_id
candy_sales_by_campaign <- df_candy %>%
group_by(campaign_id) %>%
summarise(
total_sales_value = sum(sales_value, na.rm = TRUE),
total_quantity = sum(quantity, na.rm = TRUE),
.groups = 'drop'
)
# top 10 campaigns
candy_sales_campaign <- candy_sales_by_campaign %>%
arrange(desc(total_sales_value)) %>%
slice_head(n = 10)
candy_sales_campaign <- candy_sales_campaign %>%
mutate(
campaign_id = as.character(campaign_id),
total_sales_value = as.character(total_sales_value)
) %>%
select(campaign_id, total_sales_value)
# pie chart
sunburst(candy_sales_campaign, count = TRUE) %>%
ggtitle("Top 10 Campaigns for Candy Sales")
## $title
##
## $subtitle
## [1] "Top 10 Campaigns for Candy Sales"
##
## attr(,"class")
## [1] "labels"
# df
df_top_sellers <- df %>%
group_by(income, store_id) %>%
summarize(Avg_total_sales = mean(sales_value, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(Avg_total_sales)) %>%
group_by(income) %>%
slice_head(n = 4)
# Plot
ggplot(df_top_sellers, aes(reorder(store_id, Avg_total_sales),
y = Avg_total_sales, fill = income)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(labels = label_comma()) +
scale_fill_paletteer_d("impressionist.colors::korenveld_onder_onweerslucht") +
labs(title = "Most Popular & Profitable Stores (per Income Group)",
x = "Store ID",
y = "Average Total Sales",
fill = "Income Group") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1),
axis.text.y = element_text(size = 10),
axis.title = element_text(size = 12),
axis.title.y = element_text(margin = margin(r = 20)))
An Analysis of the Most Popular & Profitable Stores (per Income Group)
Key Takeaways: Consumer prefer specific stores to spend their money;and these stores should be our target locations for product placement, and marketing campaigns. Placing our product where consumers spend the most maximizes product visibility & presence, therefore increasing consumer sales as a result.
# df
df_top_sellers <- df %>%
group_by(store_id) %>%
summarize(Avg_total_sales = mean(sales_value, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(Avg_total_sales)) %>%
slice_head(n = 12)
# Plot
ggplot(df_top_sellers, aes(reorder(store_id, Avg_total_sales),
y = Avg_total_sales, fill = store_id)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(labels = label_comma()) +
scale_fill_paletteer_d("palettetown::dragonite") +
labs(title = "Most Popular & Profitable Stores",
x = "Store ID",
y = "Average Total Sales") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1),
axis.text.y = element_text(size = 10),
axis.title = element_text(size = 12),
axis.title.y = element_text(margin = margin(r = 20)))
An Identification of the Most Popular & Profitable Stores (based on Story ID) Key Takeaways: The most profitable stores (based on the Average Sales metric & Store ID) are the following: 901, 3149, 602, 197, 3163. A targeted marketing campaign on these stores will improve annual (candy) sales.
Our project focused on Regork’s promotional campaigns for Candy product. Since sales for Candy is one of the lowest perform (bottom 5 sales in Grocery) and saw potential in growth. We focus our business question to: How can Regork optimize candy sales by identifying key product pairings, seasonal demand patterns, and effective marketing campaigns?
Seasonal Demand Patterns:
Candy sales peak in October (likely due to Halloween) and December (due to holiday season), while January sees the lowest sales.
To optimize candy sales and maximize revenue, Regork
should prioritize marketing efforts on the highest-performing stores
(901, 3149, 602, 197, 3163) while also tailoring campaigns to the
specific customer segments that are most likely to purchase candy.
Seasonal promotions, strategic cross-selling, and bundled product
offerings with milk, yogurt, and
soft drinks can also drive increased sales. By leveraging
data and aligning marketing strategies with the store’s performance and
customer demographics, Regork can significantly enhance candy sales and
overall profitability.