Future Marketing Campaign

Introduction

Problem Summary

Given the current economic climate, increasing sales is critical as we aim to increase our market share. Getting consumers in our doors and making purchases has posed a challenge due to numerous external/internal factors including (but not limited to) stagnant wages, increased costs, inflation, etc. This has lead to certain departments and products declining in sales and puts Regork at risk for more potential losses.

Objective

“What is our next marketing strategy to drive consumers to shop at our stores and drive sales in our lowest performing product category?”

Our solution is to do a holistic review and with the goal of:

Identifying the lowest performing product category
Determining the most successful display location
Determining the most successful mailer type
Locating the best time frame to complete the campaign
Pinpointing the top 5 stores to test the campaign based on customer demographics

Impact

As a member of the marketing analytic team, it is my job to stay ahead of industry trends and create campaigns on a more granular level. Taking a step back, as a company, we must work towards increasing value for our shareholders and customers.

We should be working towards improving all segments of our business, but in the current state, I believe we should first tackle the area that we can show immediate results.

Throughout this report, you will see the marketing analytics team’s proposed first step.

Packages/Libraries Used

Please see the below list of packages used in this report:

completejourney - package provides household level transaction data for one year
tidyverse - group of packages that aids in the manipulation of data
ggplot2 - package that improves visualizations
lubridate - package that aids in the manipulation and calculation of date/time related fields
kableExtra - package that improves table visualizations
scales - package that eases number formatting into dollars
TTR - package that easily calculates moving average

suppressWarnings(suppressMessages(library(completejourney)))
suppressWarnings(suppressMessages(library(tidyverse)))
suppressWarnings(suppressMessages(library(ggplot2)))
suppressWarnings(suppressMessages(library(lubridate)))
suppressWarnings(suppressMessages(library(kableExtra)))
suppressWarnings(suppressMessages(library(scales)))
suppressWarnings(suppressMessages(library(TTR)))

transactions <- get_transactions()
products <- products
promotions <- get_promotions()
coupons <- coupons
coupon_redemptions <- coupon_redemptions
campaign_descriptions <- campaign_descriptions
campaigns <- campaigns
demographics <- demographics

Exploratory Data Analysis

Category Identification

Our first goal is to identify the product category that we will focus the campaign on. For this, we are looking at the total sales by month, calculating the difference from the previous month, and totaling the difference. This will give us our loss leader.

products_all <- transactions %>%
  left_join(products, by = "product_id") %>%
  mutate(trans_month = month(as.POSIXlt(transaction_timestamp))) %>%
  group_by(product_category,trans_month) %>%
  summarize(month_sales = sum(sales_value)) %>%
  mutate(change_in_sales = c(0, diff(month_sales))) %>%
  group_by(product_category) %>%
  summarize(month_to_month_net = sum(change_in_sales), .groups = "keep") %>%
  arrange(month_to_month_net)

## `summarise()` has grouped output by 'product_category'. You can override using
## the `.groups` argument.

top_10_loss <- head(products_all,10)


ggplot(top_10_loss, aes(x= reorder(product_category,-month_to_month_net), y = month_to_month_net)) +
  geom_bar(stat = "identity", fill = "red") +
  scale_x_discrete("Bottom 10 Product Categories") + 
  scale_y_continuous("Total Month to Month Losses", labels = scales::dollar) +
  labs(title = "Loss Leaders in Month to Month Sales") +
  geom_text(aes(label = product_category), 
            position = position_stack(vjust = 0.5), 
            color = "white", size = 2.5,angle = 90, fontface = "bold") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),  # Centers title
    plot.subtitle = element_text(hjust = 0.5),
    axis.text.x = element_text(vjust = -1, color = "white", size = 5))

top_10_loss$month_to_month_net <- dollar(top_10_loss$month_to_month_net)
top_10_loss <- top_10_loss %>%
  rename(Category=product_category, TotalLoss = month_to_month_net)

kable(top_10_loss, caption = "Worst Month to Month Losses") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "bordered"))

Worst Month to Month Losses
Category	TotalLoss
CHICKEN	-$910.52
MEAT - MISC	-$721.64
BERRIES	-$675.74
YOGURT	-$667.47
SEAFOOD - SHELF STABLE	-$572.17
CONVENIENT BRKFST/WHLSM SNACKS	-$529.33
LUNCHMEAT	-$517.35
MEAT - SHELF STABLE	-$515.66
FRZN NOVELTIES/WTR ICE	-$502.81
SALAD MIX	-$433.10

After reviewing the 10 product categories that had the lowest month to month loss in sales, we can clearly see the product category that has the greatest growth potential is: CHICKEN.

Display Location

Our second goal is to determine the best location to place the product. We will accomplish this by comparing the percent of chicken sales for each display type.

chicken_display <- transactions %>%
  left_join(products, by = "product_id") %>%
  filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
  mutate(product_category = ifelse(product_category == "CHICKEN/POULTRY", "POULTRY", "CHICKEN")) %>%
  filter(str_detect(product_category, regex("CHICKEN", ignore_case = T)))%>%
  left_join(promotions, by = "product_id") %>%
  group_by(display_location)%>%
  summarize(location_sales = sum(sales_value)) %>%
  mutate(display_percent = round((location_sales / sum(location_sales)),4)) %>%
  mutate(
    display_location = as.factor(
      ifelse(str_detect(display_location, regex("A", ignore_case = T)), "IN SHELF",
      ifelse(str_detect(display_location, regex("9", ignore_case = T)), "SECONDARY",
      ifelse(str_detect(display_location, regex("7", ignore_case = T)), "IN AISLE",
      ifelse(str_detect(display_location, regex("6", ignore_case = T)), "SIDE AISLE END CAP",
      ifelse(str_detect(display_location, regex("5", ignore_case = T)), "REAR END CAP",
      ifelse(str_detect(display_location, regex("4", ignore_case = T)), "MID AISLE END CAP",
      ifelse(str_detect(display_location, regex("3", ignore_case = T)), "FRONT END CAP",
      ifelse(str_detect(display_location, regex("2", ignore_case = T)), "STORE REAR",
      ifelse(str_detect(display_location, regex("1", ignore_case = T)), "STORE FRONT",
      ifelse(str_detect(display_location, regex("0", ignore_case = T)), "DISPLAY", "NA"))))))))))))  %>%
  arrange(desc(display_percent))

## Warning in left_join(., promotions, by = "product_id"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 328899 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

chicken_display$display_percent <- percent(chicken_display$display_percent)
chicken_display$location_sales <- dollar(chicken_display$location_sales)
chicken_display <- chicken_display %>%
  rename(SalesPerLocation=location_sales, PercentofTotalSales = display_percent, DisplayLocation = display_location)

kable(chicken_display, caption = "Sales By Product Location") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "bordered"))

Sales By Product Location
DisplayLocation	SalesPerLocation	PercentofTotalSales
DISPLAY	$86,545,471	98.82%
STORE REAR	$387,985	0.44%
SECONDARY	$289,096	0.33%
REAR END CAP	$205,348	0.23%
FRONT END CAP	$51,569	0.06%
IN AISLE	$55,194	0.06%
IN SHELF	$14,210	0.02%
STORE FRONT	$9,021	0.01%
MID AISLE END CAP	$12,103	0.01%
NA	$8,349	0.01%
SIDE AISLE END CAP	$3,297	0.00%

Throughout the year, there is a single product location that performed incredibly for our Chicken products resulting in over 98% of yearly chicken sales. That location would be an individualized DISPLAY.

Mailer Location

Our third goal would be to determine the best placement for the mailer to be placed in the campaign.

chicken_mailer <- transactions %>%
  left_join(products, by = "product_id") %>%
  filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
  mutate(product_category = ifelse(product_category == "CHICKEN/POULTRY", "POULTRY", "CHICKEN")) %>%
  filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
  left_join(promotions, by = "product_id") %>%
  mutate(
    mailer_location = as.factor(
      ifelse(str_detect(mailer_location, regex("Z", ignore_case = T)), "FREE ON FRONT/BACK PAGE OR WRAP",
      ifelse(str_detect(mailer_location, regex("X", ignore_case = T)), "FREE ON INTERIOR PAGE",
      ifelse(str_detect(mailer_location, regex("P", ignore_case = T)), "INTERIOR PAGE COUPON",
      ifelse(str_detect(mailer_location, regex("L", ignore_case = T)), "WRAP BACK FEATURE",
      ifelse(str_detect(mailer_location, regex("J", ignore_case = T)), "WRAP INTERIOR COUPON",
      ifelse(str_detect(mailer_location, regex("H", ignore_case = T)), "WRAP FRONT FEATURE",
      ifelse(str_detect(mailer_location, regex("D", ignore_case = T)), "FRONT PAGE FEATURE",
      ifelse(str_detect(mailer_location, regex("C", ignore_case = T)), "INTERIOR PAGE LINE ITEM",
      ifelse(str_detect(mailer_location, regex("A", ignore_case = T)), "INTERIOR PAGE FEATURE",
      ifelse(str_detect(mailer_location, regex("0", ignore_case = T)), "NOT AN AD", "NA"))))))))))))

## Warning in left_join(., promotions, by = "product_id"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 328899 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

ggplot(chicken_mailer, aes(x = factor(mailer_location), y = sales_value)) +
  geom_violin(fill = "skyblue", color = "black") +
  scale_x_discrete("Mailer Location") + 
  scale_y_continuous("Distribution of Chicken Sales", labels = scales::dollar) +
  labs(title = "Chicken Sales per Mailer Location")+
  theme(axis.text.x = element_text(angle = 15,vjust=0.6))+
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(plot.subtitle = element_text(hjust = 0.5))

chicken_mailer <- chicken_mailer %>%
  group_by(mailer_location)%>%
  summarize(SalesByMailer = sum(sales_value)) %>%
  mutate(PercentOfTotal = round((SalesByMailer / sum(SalesByMailer)),4))
  
chicken_mailer$PercentOfTotal <- percent(chicken_mailer$PercentOfTotal)
chicken_mailer$SalesByMailer <- dollar(chicken_mailer$SalesByMailer)
chicken_mailer <- chicken_mailer %>%
  rename(MailerType = mailer_location)

kable(chicken_mailer, caption = "Sales By Mailer Type") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "bordered"))

Sales By Mailer Type
MailerType	SalesByMailer	PercentOfTotal
FRONT PAGE FEATURE	$51,284,211	58.56%
INTERIOR PAGE FEATURE	$35,333,474	40.34%
NOT AN AD	$940,303	1.07%
WRAP FRONT FEATURE	$15,308	0.02%
NA	$8,349	0.01%

While the first graph didn’t clarify an ideal mailer type, we are able to see a better picture in the table. Ideally, for a new campaign I would recommend a FRONT PAGE FEATURE or (depending on cost) an INTERIOR PAGE MAILER.These two types combined make up 98% of all chicken sales.

Campaign Time Frame

Our fourth goal is going to determine when we run the new marketing campaign. To obtain this information we are going calculate a 4 week average. The week with the lowest 4 week average will mark the end of the campaign. To quickly visualize how Chicken sales have trended, see the below heat map:

weekly_sales <- transactions %>%
  left_join(products, by = "product_id") %>%
  filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
  mutate(product_category = ifelse(product_category == "CHICKEN/POULTRY", "POULTRY", "CHICKEN")) %>%
  filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
  mutate(DayofWeek = wday(transaction_timestamp, label = TRUE, abbr = TRUE, week_start = 7))%>%
  group_by(week,DayofWeek) %>%
  summarise(DailySales = sum(sales_value))

## `summarise()` has grouped output by 'week'. You can override using the
## `.groups` argument.

ggplot(weekly_sales, aes(x = DayofWeek, y = week, fill = DailySales)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "white", high = "blue") +
  labs(title = "Weekly Calendar Heatmap",
       x = "Day of the Week",
       y = "Week of the Year",
       fill = "Value") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(plot.subtitle = element_text(hjust = 0.5))

weekly_sales <- weekly_sales %>%
  group_by(week) %>%
  summarise(WeeklySales = sum(DailySales)) %>%
  mutate(FourWeekAvg = SMA(WeeklySales, 4)) 
  

weekly_sales$WeeklySales <- dollar(weekly_sales$WeeklySales)
weekly_sales$FourWeekAvg <- dollar(weekly_sales$FourWeekAvg)

kable(weekly_sales, caption = "Four Week Moving Average") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "bordered")) %>%
  row_spec(46, color = "orange") %>%
  row_spec(47, color = "orange") %>%
  row_spec(48, color = "orange") %>%
  row_spec(49, color = "red")

Four Week Moving Average
week	WeeklySales	FourWeekAvg
1	$195.18	NA
2	$1,201.42	NA
3	$1,276.96	NA
4	$963.07	$909.16
5	$1,211.85	$1,163.32
6	$1,092.58	$1,136.11
7	$1,432.57	$1,175.02
8	$1,071.19	$1,202.05
9	$973.07	$1,142.35
10	$1,315.60	$1,198.11
11	$1,095.69	$1,113.89
12	$1,018.01	$1,100.59
13	$907.51	$1,084.20
14	$1,046.77	$1,017.00
15	$939.29	$977.90
16	$836.42	$932.50
17	$1,098.88	$980.34
18	$955.38	$957.49
19	$1,215.19	$1,026.47
20	$1,300.60	$1,142.51
21	$1,124.11	$1,148.82
22	$1,125.88	$1,191.44
23	$1,196.43	$1,186.76
24	$989.23	$1,108.91
25	$1,210.50	$1,130.51
26	$800.78	$1,049.23
27	$1,162.36	$1,040.72
28	$1,359.69	$1,133.33
29	$994.02	$1,079.21
30	$1,104.92	$1,155.25
31	$838.02	$1,074.16
32	$1,043.18	$995.03
33	$1,001.64	$996.94
34	$923.02	$951.46
35	$926.29	$973.53
36	$1,392.02	$1,060.74
37	$986.36	$1,056.92
38	$861.49	$1,041.54
39	$1,371.23	$1,152.77
40	$1,162.94	$1,095.50
41	$1,076.07	$1,117.93
42	$847.34	$1,114.39
43	$963.77	$1,012.53
44	$929.21	$954.10
45	$1,298.93	$1,009.81
46	$761.19	$988.27
47	$768.21	$939.38
48	$737.90	$891.56
49	$1,180.43	$861.93
50	$1,181.29	$966.96
51	$1,005.64	$1,026.31
52	$740.05	$1,026.85
53	$807.76	$933.68

Again, to determine the 4 week span of the campaign (which is the typical length of previous campaigns), we are looking for the lowest 4 week moving average. We can see that week 49 has the lowest 4 week average of $861.93. This means our new campaign should run from WEEK 46 to WEEK 49.

This makes sense when you compare those weeks sales numbers individually.

Targeted Stores

Our fifth and final goal is to pinpoint the five stores that this campaign would be the most promising.

chicken_income <- transactions %>%
  left_join(products, by = "product_id") %>%
  filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
  mutate(product_category = ifelse(product_category == "CHICKEN/POULTRY", "POULTRY", "CHICKEN")) %>%
  filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>% 
  inner_join(demographics, by = "household_id") %>%
  mutate(Month = month(transaction_timestamp)) %>%
  group_by(Month, income) %>%
  summarise(sales_value = sum(sales_value))

## `summarise()` has grouped output by 'Month'. You can override using the
## `.groups` argument.

ggplot(chicken_income, aes(x= Month, y = sales_value, group = income, color = as.factor(income))) +
  geom_line(size = 1)+
  labs(title = "Chicken Sales by Income Bracket", color = "Income Bracket")+
  scale_x_continuous("Month", breaks = seq(1,12,1)) +
  scale_y_continuous("Total Sales", labels = scales::dollar)

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

total_sales_income <- transactions %>%
  inner_join(demographics, by = "household_id") %>%
  filter(str_detect(income, regex("50-74K", ignore_case = T))) %>%
  group_by(store_id,household_id) %>%
  summarise(HouseholdSpend = sum(sales_value)) %>%
  summarise(HouseholdAvg = mean(HouseholdSpend)) %>%
  arrange(desc(HouseholdAvg))

## `summarise()` has grouped output by 'store_id'. You can override using the
## `.groups` argument.

total_sales_income$HouseholdAvg <- dollar(total_sales_income$HouseholdAvg)
total_sales_income <- total_sales_income %>%
  rename(StoreID = store_id)

top_5_stores <- head(total_sales_income,5)

kable(top_5_stores, caption = "Top 5 Stores for Avg Household Spend (50-74K Bracket)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "bordered"))

Top 5 Stores for Avg Household Spend (50-74K Bracket)
StoreID	HouseholdAvg
311	$4,748.14
323	$3,987.04
408	$3,675.03
361	$2,365.45
439	$2,291.16

Comparing each income bracket against itself, we can see the biggest area for improvement during the Week 46 to Week 49 time frame, would be the 50-74K bracket.

Additionally, we can compare all the stores sales for that income bracket and can see where we have best chance of having customers in that income bracket. Simplifying it down, we can see that stores 311, 323, 408, 361, and 439 are the stores we should focus this campaign on.

Summary

Problem

“What is our next marketing strategy to drive consumers to shop at our stores and drive sales in our lowest performing product category?”

Solution

Throughout this report we completed our five objectives. Please see the below recommendation for our next marketing campaign:

Identify the lowest performing product category
- CHICKEN was the most under performing category with -$910.52 in month to month losses.
Determining the most successful display location
- 98% of all chicken sales has an individualized DISPLAY.
Determining the most successful mailer type
- 58% of all sales that involved promotions utilized a FRONT PAGE FEATURE and would be the recommended choice. Although an argument could be made for an INTERIOR PAGE FEATURE (40% of all chicken sales that involved promotions).
Locating the best time frame to complete the campaign
- WEEK 46 to WEEK 49 had the lowest average spent over the four weeks ($861.93).
Pinpointing the top 5 stores to test the campaign based on customer demographics
- During the time period above, the 50-74K income bracket had the largest decrease in sales.
- Stores 311, 323, 408, 361, and 439 had the highest average household spent for that income bracket.

Insights and Limitations

Throughout this report we kept coming back to the first objective - the product category. We were thoroughly surprised to see a very common and rather inexpensive meat contribute the most to the month to month losses. This could be done on purpose though to draw customers into the store to purchase more items.

The biggest limitation in this the unknown cost of marketing. Like do certain mailers/product placement cost more money? The recommendations are based on if all options have equal cost. Adding in a cost factor would be the next thing we focus on in the future.

Final Project - Regork

Lee Brodbeck-Moore

2025-02-26