Introduction
Business Problem
The business problem revolves around identifying the sales strategies that differentiate top-performing stores from low-performing ones within Regork. By analyzing data from coupons, marketing campaigns, and customer demographics, such as loyalty, age, and income, the focus is to uncover how top-performing stores effectively engage their customer base. The goal is to pinpoint the gaps in the strategies of low-performing stores and recommend actionable improvements.
Approach
This analysis leverages customer demographic data and coupon usage patterns to compare top and underperforming stores. The methodology involves examining key factors like coupon usage, loyalty programs, and demographic influences on spending behavior. Insights from this analysis will guide the Regork CEO in understanding how to target high-potential customer segments and optimize marketing strategies, providing a pathway for underperforming stores to enhance their performance. This approach aims to increase overall profitability and ensure consistent success across all stores in the chain.
COMPLETEJOURNEY - This is a collection of data sets capturing household level transactions over one year from a group of 2,469 households who are frequent shoppers at a grocery store.
DPLYR - This package is helpful for data manipulation.
GGPLOT2 - This package is helpful as it contains various charting functions.
LUBRIDATE - This package is helpful when working with dates.
TIDYVERSE - This collection of packages aids in creating a seamless and consistent coding style.
FORCATS - The goal of the forcats package is to provide a suite of useful tools that solve common problems with factors.
gridExtra - The gridExtra package provides useful extensions to the grid system
library(gridExtra)
library(tidyverse)
library(completejourney)
library(ggplot2)
library(lubridate)
library(forcats)
# Get transactions and promotions
transactions <- get_transactions()
promotions <- get_promotions()
# Calculate store sales and unique households
store_sales <- transactions %>%
group_by(store_id) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE),
Unique_Households = n_distinct(household_id)) %>%
ungroup() %>%
mutate(sales_per_household = sales_value / Unique_Households)
# Sort values to identify top and bottom stores
store_sales_sorted <- store_sales %>%
arrange(desc(sales_per_household))
# Top and bottom 25 stores
top_25_stores <- head(store_sales_sorted, 25) %>%
arrange(desc(sales_value)) # Sort top stores by sales value
bottom_25_stores <- tail(store_sales_sorted, 25) %>%
arrange(sales_value) # Sort bottom stores by sales value
# Plot for top 25 stores (Lollipop chart) sorted by sales_per_household
p1 <- ggplot(top_25_stores, aes(x = reorder(store_id, -sales_per_household), y = sales_per_household)) + # Note the negative sign for descending order
geom_segment(aes(x = store_id, xend = store_id, y = 0, yend = sales_per_household), size = 1, color = "#A7C7E7") +
geom_point(size = 4, color = "#A7C7E7") +
labs(x = "Store IDs", y = "Sales per Household", title = "Sales for Top 25 Stores") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 5)) + # Abbreviate store IDs if needed
scale_y_continuous(labels = scales::dollar_format() )
# Plot for bottom 25 stores (Lollipop chart) sorted by sales_per_household
p2 <- ggplot(bottom_25_stores, aes(x = reorder(store_id, sales_per_household), y = sales_per_household)) + # Ascending order
geom_segment(aes(x = store_id, xend = store_id, y = 0, yend = sales_per_household), size = 1, color = "#FF9999") +
geom_point(size = 4, color = "#FF9999") +
labs(x = "Store IDs", y = "Sales per Household", title = "Sales for Bottom 25 Stores") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 5)) + # Abbreviate store IDs if needed
scale_y_continuous(labels = scales::dollar_format() )
# Print the plots
print(p1)
print(p2)
# Map income to the specified ranges
income_dict <- c(
'Under 15K' = '<50K',
'15-24K' = '<50K',
'25-34K' = '<50K',
'35-49K' = '<50K',
'50-74K' = '50-100K',
'75-99K' = '50-100K',
'100-124K' = '100-150K',
'125-149K' = '100-150K',
'150-174K' = '150-200K',
'175-199K' = '150-200K',
'200-249K' = '200-250K',
'250K+' = '250K+'
)
# Define the columns of interest
trans_cols <- c('household_id', 'store_id', 'sales_value')
coupon_red_cols <- c('household_id', 'store_id', 'sales_value','coupon_upc','campaign_id')
demo_cols <- c('household_id', 'age', 'income')
# Top performing stores
top_stores <- transactions %>%
group_by(store_id) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(sales_value)) %>%
slice(1:25)
# Select top stores
df_top_stores <- top_stores %>% select(store_id)
# Merge top stores with transaction data
df_top_stores1 <- df_top_stores %>%
inner_join(transactions %>% select(trans_cols), by = 'store_id') %>%
group_by(store_id, household_id) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
# Merge with demographics data
df_top_stores2 <- df_top_stores1 %>%
inner_join(demographics %>% select(demo_cols), by = 'household_id')
# Group by store_id, age, and income to calculate total sales
df_top_stores_age <- df_top_stores2 %>%
group_by(age) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
# Convert store_id to factor for better visualization
df_top_stores_age$age <- as.factor(df_top_stores_age$age)
age1 <- ggplot(df_top_stores_age, aes(x = factor(age), y = sales_value)) +
geom_bar(stat = "identity", fill = "#A7C7E7", color = "black") +
labs(x = "Age Groups", y = "Total Spends", title = "Sales for Top 25 Stores") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_y_continuous(labels = scales::dollar_format() ) +
scale_x_discrete(labels = function(x) substr(x, 1, 5)) # Abbreviate store IDs if needed
# Merge top stores with coupon redemtions data
df_campaign_by_age <- df_top_stores1 %>%
inner_join(coupon_redemptions,by = 'household_id') %>%
select(store_id, household_id, sales_value, campaign_id,coupon_upc) %>%
group_by(store_id, household_id,coupon_upc,campaign_id) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
df_campaign_by_age <- df_campaign_by_age %>%
inner_join(demographics,by = 'household_id') %>%
select(campaign_id,coupon_upc, age, income) %>%
ungroup()
df_campaign_by_age <- df_campaign_by_age %>%
group_by(age) %>%
summarise(no_of_coupon = n(),
no_of_campaigns = n_distinct(campaign_id)) %>%
ungroup()
# Convert store_id to factor for better visualization
df_campaign_by_age$age <- as.factor(df_campaign_by_age$age)
age2 <- ggplot(df_campaign_by_age, aes(x = factor(age), y = no_of_coupon)) +
geom_bar(stat = "identity", fill = "#E7A7C7", color = "black") +
labs(x = "Age Groups", y = "Number of Coupons", title = "Coupon usage by Age group") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 5)) # Abbreviate store IDs if needed
age3 <- ggplot(df_campaign_by_age, aes(x = factor(age), y = no_of_campaigns)) +
geom_bar(stat = "identity", fill = "#C7E7A7", color = "black") +
labs(x = "Age Groups", y = "Number of distinct campaigns", title = "Campaign by Age group") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 5)) # Abbreviate store IDs if needed
print(age1)
# Arrange plots side by side
grid.arrange(age1, age2, age3, ncol = 3)
# Merge top stores with coupon redemtions data
df_campaign_by_income <- df_top_stores1 %>%
inner_join(coupon_redemptions,by = 'household_id') %>%
select(store_id, household_id, sales_value, campaign_id,coupon_upc) %>%
group_by(store_id, household_id,coupon_upc,campaign_id) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
df_campaign_by_income <- df_campaign_by_income %>%
inner_join(demographics,by = 'household_id') %>%
select(campaign_id,coupon_upc, age, income) %>%
ungroup()
df_campaign_by_income <- df_campaign_by_income %>%
mutate(income = recode(income, !!!income_dict)) %>%
group_by(income) %>%
summarise(no_of_coupon = n(),
no_of_campaigns = n_distinct(campaign_id)) %>%
ungroup()
# Merge with demographics data
df_top_stores3 <- df_top_stores1 %>%
inner_join(demographics %>% select(demo_cols), by = 'household_id')
df_top_stores3 <- df_top_stores3 %>%
mutate(income = recode(income, !!!income_dict))
# Group by store_id, age, and income to calculate total sales
df_top_stores_income <- df_top_stores3 %>%
# group_by(store_id, age, income) %>%
group_by(income) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
# Convert store_id to factor for better visualization
df_top_stores_income$income <- as.factor(df_top_stores_income$income)
income1 <- ggplot(df_top_stores_income, aes(x = factor(income), y = sales_value)) +
geom_bar(stat = "identity", fill = "#A7C7E7", color = "black") +
labs(x = "Income Groups", y = "Total Spends", title = "Sales for Top 25 Stores") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 9)) + # Abbreviate store IDs if needed
scale_y_continuous(labels = scales::dollar_format() )
# Convert store_id to factor for better visualization
df_campaign_by_income$income <- as.factor(df_campaign_by_income$income)
income2 <- ggplot(df_campaign_by_income, aes(x = factor(income), y = no_of_coupon)) +
geom_bar(stat = "identity", fill = "#E7A7C7", color = "black") +
labs(x = "Income Groups", y = "Number of Coupons", title = "Coupon usage by Income group") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 9)) # Abbreviate store IDs if needed
income3 <- ggplot(df_campaign_by_income, aes(x = factor(income), y = no_of_campaigns)) +
geom_bar(stat = "identity", fill = "#C7E7A7", color = "black") +
labs(x = "Income Groups", y = "Number of distinct campaigns", title = "Campaign by Income group") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 9)) # Abbreviate store IDs if needed
print(income1)
# Arrange plots side by side
grid.arrange(income1, income2,income3, ncol = 3)
# Under-performing stores
bottom_stores <- transactions %>%
group_by(store_id) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
arrange(sales_value) %>%
slice(1:25)
# Select bottom stores
df_bottom_stores <- bottom_stores %>% select(store_id)
# Merge bottom stores with transaction data
df_bottom_stores1 <- df_bottom_stores %>%
inner_join(transactions %>% select(trans_cols), by = 'store_id') %>%
group_by(store_id, household_id) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
# Merge with demographics data
df_bottom_stores2 <- df_bottom_stores1 %>%
inner_join(demographics %>% select(demo_cols), by = 'household_id')
# Group by store_id, age, and income to calculate total sales
df_bottom_stores_age <- df_bottom_stores2 %>%
group_by(age) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
# Convert store_id to factor for better visualization
df_bottom_stores_age$age <- as.factor(df_bottom_stores_age$age)
bottom_age1 <- ggplot(df_bottom_stores_age, aes(x = factor(age), y = sales_value)) +
geom_bar(stat = "identity", fill = "#A7C7E7", color = "black") +
labs(x = "Age Groups", y = "Total Spends", title = "Sales for Bottom 25 Stores") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 5)) + # Abbreviate store IDs if needed
scale_y_continuous(labels = scales::dollar_format() )
# Merge bottom stores with coupon redemtions data
df_campaign_by_age <- df_bottom_stores1 %>%
inner_join(coupon_redemptions,by = 'household_id') %>%
select(store_id, household_id, sales_value, campaign_id,coupon_upc) %>%
group_by(store_id, household_id,coupon_upc,campaign_id) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
df_campaign_by_age <- df_campaign_by_age %>%
inner_join(demographics,by = 'household_id') %>%
select(campaign_id,coupon_upc, age, income) %>%
ungroup()
df_campaign_by_age <- df_campaign_by_age %>%
group_by(age) %>%
summarise(no_of_coupon = n(),
no_of_campaigns = n_distinct(campaign_id)) %>%
ungroup()
# Convert store_id to factor for better visualization
df_campaign_by_age$age <- as.factor(df_campaign_by_age$age)
bottom_age2 <- ggplot(df_campaign_by_age, aes(x = factor(age), y = no_of_coupon)) +
geom_bar(stat = "identity", fill = "#E7A7C7", color = "black") +
labs(x = "Age Groups", y = "Number of Coupons", title = "Coupon usage by Age group") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 5)) # Abbreviate store IDs if needed
bottom_age3 <- ggplot(df_campaign_by_age, aes(x = factor(age), y = no_of_campaigns)) +
geom_bar(stat = "identity", fill = "#C7E7A7", color = "black") +
labs(x = "Age Groups", y = "Number of distinct campaigns", title = "Campaign by Age group") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 5)) # Abbreviate store IDs if needed
print(bottom_age1)
# Arrange plots side by side
grid.arrange(bottom_age1, bottom_age2, bottom_age3, ncol = 3)
# Merge top stores with coupon redemtions data
df_campaign_by_income <- df_bottom_stores1 %>%
inner_join(coupon_redemptions,by = 'household_id') %>%
select(store_id, household_id, sales_value, campaign_id,coupon_upc) %>%
group_by(store_id, household_id,coupon_upc,campaign_id) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
df_campaign_by_income <- df_campaign_by_income %>%
inner_join(demographics,by = 'household_id') %>%
select(campaign_id,coupon_upc, age, income) %>%
ungroup()
df_campaign_by_income <- df_campaign_by_income %>%
mutate(income = recode(income, !!!income_dict)) %>%
group_by(income) %>%
summarise(no_of_coupon = n(),
no_of_campaigns = n_distinct(campaign_id)) %>%
ungroup()
# Merge with demographics data
df_bottom_stores3 <- df_bottom_stores1 %>%
inner_join(demographics %>% select(demo_cols), by = 'household_id')
df_bottom_stores3 <- df_bottom_stores3 %>%
mutate(income = recode(income, !!!income_dict))
# Group by store_id, age, and income to calculate total sales
df_bottom_stores_income <- df_bottom_stores3 %>%
group_by(income) %>%
summarise(sales_value = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
# Convert store_id to factor for better visualization
df_bottom_stores_income$income <- as.factor(df_bottom_stores_income$income)
bottom_income1 <- ggplot(df_bottom_stores_income, aes(x = factor(income), y = sales_value)) +
geom_bar(stat = "identity", fill = "#A7C7E7", color = "black") +
labs(x = "Income Groups", y = "Total Spends", title = "Sales for Bottom 25 Stores") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 9)) + # Abbreviate store IDs if needed
scale_y_continuous(labels = scales::dollar_format() )
# Convert store_id to factor for better visualization
df_campaign_by_income$income <- as.factor(df_campaign_by_income$income)
bottom_income2 <- ggplot(df_campaign_by_income, aes(x = factor(income), y = no_of_coupon)) +
geom_bar(stat = "identity", fill = "#E7A7C7", color = "black") +
labs(x = "Income Groups", y = "Number of Coupons", title = "Coupon usage by Income group") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 9)) # Abbreviate store IDs if needed
bottom_income3 <- ggplot(df_campaign_by_income, aes(x = factor(income), y = no_of_campaigns)) +
geom_bar(stat = "identity", fill = "#C7E7A7", color = "black") +
labs(x = "Income Groups", y = "Number of distinct campaigns", title = "Campaign by Income group") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Rotate x-axis labels
axis.ticks.x = element_blank(),
panel.grid.major = element_line(linetype = "dashed", color = "grey")) +
scale_x_discrete(labels = function(x) substr(x, 1, 9)) # Abbreviate store IDs if needed
print(bottom_income1)
# Arrange plots side by side
grid.arrange(bottom_income1, bottom_income2, bottom_income3, ncol = 3)
Problem Statement:
The project aimed to address the question: What sales strategies are top-performing stores using that others lack? Under-performing stores are struggling to keep up due to ineffective marketing strategies and a lack of loyalty programs that effectively target key age and income groups. This leads to missed opportunities for growth.
Approach:
To tackle this, we used coupon and marketing campaign data to conduct a comparative analysis between top-performing and under-performing stores. Our methodology involved identifying key customer demographics (age and income groups) and analyzing the effectiveness of customer loyalty programs and targeted marketing efforts.
Key Insights:
• Age Group: Top-performing stores drive the majority of their sales from the 25-54 age group, focusing on customer loyalty through coupons and marketing campaigns. However, they over-target the 65+ age group, which contributes less to sales.
• Income Group: Stores performing well generate most of their revenue from customers earning less than 100K but have effectively used loyalty programs to engage high-income groups (180-200K) as well.
• Under-performance Factors: Low-performing stores fail to target key demographics, particularly the 35-44 age group, which is crucial for increasing sales. They also lack loyalty programs for high-income customers, and their marketing efforts focus too much on less impactful groups.
Implications to the Consumer:
For consumers, this analysis reveals that their engagement with stores—through loyalty programs and targeted campaigns—varies significantly depending on which store they shop at. While top-performing stores reward loyal customers more effectively, under-performing stores miss opportunities to engage with potential repeat customers. Improving these stores’ marketing strategies could lead to better customer experiences and more tailored offers.
Recommendations:
To capitalize on these findings, the following steps are recommended:
1. Implement Loyalty Programs for High-Income Groups:
Under-performing stores should focus on establishing robust loyalty programs for customers earning over $100K. By providing tailored rewards and targeted marketing, they can increase both retention and sales from this lucrative segment.
2. Target Key Age Groups:
Shift marketing focus towards the 35-44 age group, which has shown strong sales potential but is currently under-targeted by under-performing stores. This group should be the focal point for campaigns aimed at driving both new and repeat purchases.
3. Balance Campaign Efforts:
Re-balance marketing strategies to ensure a more even distribution of effort across different demographics. Avoid over-targeting the 65+ age group and focus on high-potential segments like the 25-34 and 55-64 age groups to optimize overall campaign effectiveness.
Limitations of the Analysis:
While our analysis offers valuable insights, it is not without limitations. The data is limited to coupon and campaign usage, which may not capture the full range of customer interactions with stores. Additional data, such as in-store purchase behaviors or customer feedback, could provide a more comprehensive view of customer engagement and store performance. Future analyses could also explore external factors such as location or competitive pricing, which may influence store performance.
Building on this analysis would involve expanding the dataset to include more detailed customer behaviors and considering factors beyond marketing, such as store operations, inventory management, and customer service quality.