Given the current economic climate, increasing sales is critical as we aim to increase our market share. Getting consumers in our doors and making purchases has posed a challenge due to numerous external/internal factors including (but not limited to) stagnant wages, increased costs, inflation, etc. This has lead to certain departments and products declining in sales and puts Regork at risk for more potential losses.
“What is our next marketing strategy to drive consumers to shop at our stores and drive sales in our lowest performing product category?”
Our solution is to do a holistic review and with the goal of:
As a member of the marketing analytic team, it is my job to stay ahead of industry trends and create campaigns on a more granular level. Taking a step back, as a company, we must work towards increasing value for our shareholders and customers.
We should be working towards improving all segments of our business, but in the current state, I believe we should first tackle the area that we can show immediate results.
Throughout this report, you will see the marketing analytics team’s proposed first step.
Please see the below list of packages used in this report:
suppressWarnings(suppressMessages(library(completejourney)))
suppressWarnings(suppressMessages(library(tidyverse)))
suppressWarnings(suppressMessages(library(ggplot2)))
suppressWarnings(suppressMessages(library(lubridate)))
suppressWarnings(suppressMessages(library(kableExtra)))
suppressWarnings(suppressMessages(library(scales)))
suppressWarnings(suppressMessages(library(TTR)))
transactions <- get_transactions()
products <- products
promotions <- get_promotions()
coupons <- coupons
coupon_redemptions <- coupon_redemptions
campaign_descriptions <- campaign_descriptions
campaigns <- campaigns
demographics <- demographics
Our first goal is to identify the product category that we will focus the campaign on. For this, we are looking at the total sales by month, calculating the difference from the previous month, and totaling the difference. This will give us our loss leader.
products_all <- transactions %>%
left_join(products, by = "product_id") %>%
mutate(trans_month = month(as.POSIXlt(transaction_timestamp))) %>%
group_by(product_category,trans_month) %>%
summarize(month_sales = sum(sales_value)) %>%
mutate(change_in_sales = c(0, diff(month_sales))) %>%
group_by(product_category) %>%
summarize(month_to_month_net = sum(change_in_sales), .groups = "keep") %>%
arrange(month_to_month_net)
## `summarise()` has grouped output by 'product_category'. You can override using
## the `.groups` argument.
top_10_loss <- head(products_all,10)
ggplot(top_10_loss, aes(x= reorder(product_category,-month_to_month_net), y = month_to_month_net)) +
geom_bar(stat = "identity", fill = "red") +
scale_x_discrete("Bottom 10 Product Categories") +
scale_y_continuous("Total Month to Month Losses", labels = scales::dollar) +
labs(title = "Loss Leaders in Month to Month Sales") +
geom_text(aes(label = product_category),
position = position_stack(vjust = 0.5),
color = "white", size = 2.5,angle = 90, fontface = "bold") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5), # Centers title
plot.subtitle = element_text(hjust = 0.5),
axis.text.x = element_text(vjust = -1, color = "white", size = 5))
top_10_loss$month_to_month_net <- dollar(top_10_loss$month_to_month_net)
top_10_loss <- top_10_loss %>%
rename(Category=product_category, TotalLoss = month_to_month_net)
kable(top_10_loss, caption = "Worst Month to Month Losses") %>%
kable_styling(bootstrap_options = c("striped", "hover", "bordered"))
| Category | TotalLoss |
|---|---|
| CHICKEN | -$910.52 |
| MEAT - MISC | -$721.64 |
| BERRIES | -$675.74 |
| YOGURT | -$667.47 |
| SEAFOOD - SHELF STABLE | -$572.17 |
| CONVENIENT BRKFST/WHLSM SNACKS | -$529.33 |
| LUNCHMEAT | -$517.35 |
| MEAT - SHELF STABLE | -$515.66 |
| FRZN NOVELTIES/WTR ICE | -$502.81 |
| SALAD MIX | -$433.10 |
After reviewing the 10 product categories that had the lowest month to month loss in sales, we can clearly see the product category that has the greatest growth potential is: CHICKEN.
Our second goal is to determine the best location to place the product. We will accomplish this by comparing the percent of chicken sales for each display type.
chicken_display <- transactions %>%
left_join(products, by = "product_id") %>%
filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
mutate(product_category = ifelse(product_category == "CHICKEN/POULTRY", "POULTRY", "CHICKEN")) %>%
filter(str_detect(product_category, regex("CHICKEN", ignore_case = T)))%>%
left_join(promotions, by = "product_id") %>%
group_by(display_location)%>%
summarize(location_sales = sum(sales_value)) %>%
mutate(display_percent = round((location_sales / sum(location_sales)),4)) %>%
mutate(
display_location = as.factor(
ifelse(str_detect(display_location, regex("A", ignore_case = T)), "IN SHELF",
ifelse(str_detect(display_location, regex("9", ignore_case = T)), "SECONDARY",
ifelse(str_detect(display_location, regex("7", ignore_case = T)), "IN AISLE",
ifelse(str_detect(display_location, regex("6", ignore_case = T)), "SIDE AISLE END CAP",
ifelse(str_detect(display_location, regex("5", ignore_case = T)), "REAR END CAP",
ifelse(str_detect(display_location, regex("4", ignore_case = T)), "MID AISLE END CAP",
ifelse(str_detect(display_location, regex("3", ignore_case = T)), "FRONT END CAP",
ifelse(str_detect(display_location, regex("2", ignore_case = T)), "STORE REAR",
ifelse(str_detect(display_location, regex("1", ignore_case = T)), "STORE FRONT",
ifelse(str_detect(display_location, regex("0", ignore_case = T)), "DISPLAY", "NA")))))))))))) %>%
arrange(desc(display_percent))
## Warning in left_join(., promotions, by = "product_id"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 328899 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
chicken_display$display_percent <- percent(chicken_display$display_percent)
chicken_display$location_sales <- dollar(chicken_display$location_sales)
chicken_display <- chicken_display %>%
rename(SalesPerLocation=location_sales, PercentofTotalSales = display_percent, DisplayLocation = display_location)
kable(chicken_display, caption = "Sales By Product Location") %>%
kable_styling(bootstrap_options = c("striped", "hover", "bordered"))
| DisplayLocation | SalesPerLocation | PercentofTotalSales |
|---|---|---|
| DISPLAY | $86,545,471 | 98.82% |
| STORE REAR | $387,985 | 0.44% |
| SECONDARY | $289,096 | 0.33% |
| REAR END CAP | $205,348 | 0.23% |
| FRONT END CAP | $51,569 | 0.06% |
| IN AISLE | $55,194 | 0.06% |
| IN SHELF | $14,210 | 0.02% |
| STORE FRONT | $9,021 | 0.01% |
| MID AISLE END CAP | $12,103 | 0.01% |
| NA | $8,349 | 0.01% |
| SIDE AISLE END CAP | $3,297 | 0.00% |
Throughout the year, there is a single product location that performed incredibly for our Chicken products resulting in over 98% of yearly chicken sales. That location would be an individualized DISPLAY.
Our third goal would be to determine the best placement for the mailer to be placed in the campaign.
chicken_mailer <- transactions %>%
left_join(products, by = "product_id") %>%
filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
mutate(product_category = ifelse(product_category == "CHICKEN/POULTRY", "POULTRY", "CHICKEN")) %>%
filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
left_join(promotions, by = "product_id") %>%
mutate(
mailer_location = as.factor(
ifelse(str_detect(mailer_location, regex("Z", ignore_case = T)), "FREE ON FRONT/BACK PAGE OR WRAP",
ifelse(str_detect(mailer_location, regex("X", ignore_case = T)), "FREE ON INTERIOR PAGE",
ifelse(str_detect(mailer_location, regex("P", ignore_case = T)), "INTERIOR PAGE COUPON",
ifelse(str_detect(mailer_location, regex("L", ignore_case = T)), "WRAP BACK FEATURE",
ifelse(str_detect(mailer_location, regex("J", ignore_case = T)), "WRAP INTERIOR COUPON",
ifelse(str_detect(mailer_location, regex("H", ignore_case = T)), "WRAP FRONT FEATURE",
ifelse(str_detect(mailer_location, regex("D", ignore_case = T)), "FRONT PAGE FEATURE",
ifelse(str_detect(mailer_location, regex("C", ignore_case = T)), "INTERIOR PAGE LINE ITEM",
ifelse(str_detect(mailer_location, regex("A", ignore_case = T)), "INTERIOR PAGE FEATURE",
ifelse(str_detect(mailer_location, regex("0", ignore_case = T)), "NOT AN AD", "NA"))))))))))))
## Warning in left_join(., promotions, by = "product_id"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 328899 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
ggplot(chicken_mailer, aes(x = factor(mailer_location), y = sales_value)) +
geom_violin(fill = "skyblue", color = "black") +
scale_x_discrete("Mailer Location") +
scale_y_continuous("Distribution of Chicken Sales", labels = scales::dollar) +
labs(title = "Chicken Sales per Mailer Location")+
theme(axis.text.x = element_text(angle = 15,vjust=0.6))+
theme(plot.title = element_text(hjust = 0.5)) +
theme(plot.subtitle = element_text(hjust = 0.5))
chicken_mailer <- chicken_mailer %>%
group_by(mailer_location)%>%
summarize(SalesByMailer = sum(sales_value)) %>%
mutate(PercentOfTotal = round((SalesByMailer / sum(SalesByMailer)),4))
chicken_mailer$PercentOfTotal <- percent(chicken_mailer$PercentOfTotal)
chicken_mailer$SalesByMailer <- dollar(chicken_mailer$SalesByMailer)
chicken_mailer <- chicken_mailer %>%
rename(MailerType = mailer_location)
kable(chicken_mailer, caption = "Sales By Mailer Type") %>%
kable_styling(bootstrap_options = c("striped", "hover", "bordered"))
| MailerType | SalesByMailer | PercentOfTotal |
|---|---|---|
| FRONT PAGE FEATURE | $51,284,211 | 58.56% |
| INTERIOR PAGE FEATURE | $35,333,474 | 40.34% |
| NOT AN AD | $940,303 | 1.07% |
| WRAP FRONT FEATURE | $15,308 | 0.02% |
| NA | $8,349 | 0.01% |
While the first graph didn’t clarify an ideal mailer type, we are able to see a better picture in the table. Ideally, for a new campaign I would recommend a FRONT PAGE FEATURE or (depending on cost) an INTERIOR PAGE MAILER.These two types combined make up 98% of all chicken sales.
Our fourth goal is going to determine when we run the new marketing campaign. To obtain this information we are going calculate a 4 week average. The week with the lowest 4 week average will mark the end of the campaign. To quickly visualize how Chicken sales have trended, see the below heat map:
weekly_sales <- transactions %>%
left_join(products, by = "product_id") %>%
filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
mutate(product_category = ifelse(product_category == "CHICKEN/POULTRY", "POULTRY", "CHICKEN")) %>%
filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
mutate(DayofWeek = wday(transaction_timestamp, label = TRUE, abbr = TRUE, week_start = 7))%>%
group_by(week,DayofWeek) %>%
summarise(DailySales = sum(sales_value))
## `summarise()` has grouped output by 'week'. You can override using the
## `.groups` argument.
ggplot(weekly_sales, aes(x = DayofWeek, y = week, fill = DailySales)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "white", high = "blue") +
labs(title = "Weekly Calendar Heatmap",
x = "Day of the Week",
y = "Week of the Year",
fill = "Value") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(plot.subtitle = element_text(hjust = 0.5))
weekly_sales <- weekly_sales %>%
group_by(week) %>%
summarise(WeeklySales = sum(DailySales)) %>%
mutate(FourWeekAvg = SMA(WeeklySales, 4))
weekly_sales$WeeklySales <- dollar(weekly_sales$WeeklySales)
weekly_sales$FourWeekAvg <- dollar(weekly_sales$FourWeekAvg)
kable(weekly_sales, caption = "Four Week Moving Average") %>%
kable_styling(bootstrap_options = c("striped", "hover", "bordered")) %>%
row_spec(46, color = "orange") %>%
row_spec(47, color = "orange") %>%
row_spec(48, color = "orange") %>%
row_spec(49, color = "red")
| week | WeeklySales | FourWeekAvg |
|---|---|---|
| 1 | $195.18 | NA |
| 2 | $1,201.42 | NA |
| 3 | $1,276.96 | NA |
| 4 | $963.07 | $909.16 |
| 5 | $1,211.85 | $1,163.32 |
| 6 | $1,092.58 | $1,136.11 |
| 7 | $1,432.57 | $1,175.02 |
| 8 | $1,071.19 | $1,202.05 |
| 9 | $973.07 | $1,142.35 |
| 10 | $1,315.60 | $1,198.11 |
| 11 | $1,095.69 | $1,113.89 |
| 12 | $1,018.01 | $1,100.59 |
| 13 | $907.51 | $1,084.20 |
| 14 | $1,046.77 | $1,017.00 |
| 15 | $939.29 | $977.90 |
| 16 | $836.42 | $932.50 |
| 17 | $1,098.88 | $980.34 |
| 18 | $955.38 | $957.49 |
| 19 | $1,215.19 | $1,026.47 |
| 20 | $1,300.60 | $1,142.51 |
| 21 | $1,124.11 | $1,148.82 |
| 22 | $1,125.88 | $1,191.44 |
| 23 | $1,196.43 | $1,186.76 |
| 24 | $989.23 | $1,108.91 |
| 25 | $1,210.50 | $1,130.51 |
| 26 | $800.78 | $1,049.23 |
| 27 | $1,162.36 | $1,040.72 |
| 28 | $1,359.69 | $1,133.33 |
| 29 | $994.02 | $1,079.21 |
| 30 | $1,104.92 | $1,155.25 |
| 31 | $838.02 | $1,074.16 |
| 32 | $1,043.18 | $995.03 |
| 33 | $1,001.64 | $996.94 |
| 34 | $923.02 | $951.46 |
| 35 | $926.29 | $973.53 |
| 36 | $1,392.02 | $1,060.74 |
| 37 | $986.36 | $1,056.92 |
| 38 | $861.49 | $1,041.54 |
| 39 | $1,371.23 | $1,152.77 |
| 40 | $1,162.94 | $1,095.50 |
| 41 | $1,076.07 | $1,117.93 |
| 42 | $847.34 | $1,114.39 |
| 43 | $963.77 | $1,012.53 |
| 44 | $929.21 | $954.10 |
| 45 | $1,298.93 | $1,009.81 |
| 46 | $761.19 | $988.27 |
| 47 | $768.21 | $939.38 |
| 48 | $737.90 | $891.56 |
| 49 | $1,180.43 | $861.93 |
| 50 | $1,181.29 | $966.96 |
| 51 | $1,005.64 | $1,026.31 |
| 52 | $740.05 | $1,026.85 |
| 53 | $807.76 | $933.68 |
Again, to determine the 4 week span of the campaign (which is the typical length of previous campaigns), we are looking for the lowest 4 week moving average. We can see that week 49 has the lowest 4 week average of $861.93. This means our new campaign should run from WEEK 46 to WEEK 49.
This makes sense when you compare those weeks sales numbers individually.
Our fifth and final goal is to pinpoint the five stores that this campaign would be the most promising.
chicken_income <- transactions %>%
left_join(products, by = "product_id") %>%
filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
mutate(product_category = ifelse(product_category == "CHICKEN/POULTRY", "POULTRY", "CHICKEN")) %>%
filter(str_detect(product_category, regex("CHICKEN", ignore_case = T))) %>%
inner_join(demographics, by = "household_id") %>%
mutate(Month = month(transaction_timestamp)) %>%
group_by(Month, income) %>%
summarise(sales_value = sum(sales_value))
## `summarise()` has grouped output by 'Month'. You can override using the
## `.groups` argument.
ggplot(chicken_income, aes(x= Month, y = sales_value, group = income, color = as.factor(income))) +
geom_line(size = 1)+
labs(title = "Chicken Sales by Income Bracket", color = "Income Bracket")+
scale_x_continuous("Month", breaks = seq(1,12,1)) +
scale_y_continuous("Total Sales", labels = scales::dollar)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
total_sales_income <- transactions %>%
inner_join(demographics, by = "household_id") %>%
filter(str_detect(income, regex("50-74K", ignore_case = T))) %>%
group_by(store_id,household_id) %>%
summarise(HouseholdSpend = sum(sales_value)) %>%
summarise(HouseholdAvg = mean(HouseholdSpend)) %>%
arrange(desc(HouseholdAvg))
## `summarise()` has grouped output by 'store_id'. You can override using the
## `.groups` argument.
total_sales_income$HouseholdAvg <- dollar(total_sales_income$HouseholdAvg)
total_sales_income <- total_sales_income %>%
rename(StoreID = store_id)
top_5_stores <- head(total_sales_income,5)
kable(top_5_stores, caption = "Top 5 Stores for Avg Household Spend (50-74K Bracket)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "bordered"))
| StoreID | HouseholdAvg |
|---|---|
| 311 | $4,748.14 |
| 323 | $3,987.04 |
| 408 | $3,675.03 |
| 361 | $2,365.45 |
| 439 | $2,291.16 |
Comparing each income bracket against itself, we can see the biggest area for improvement during the Week 46 to Week 49 time frame, would be the 50-74K bracket.
Additionally, we can compare all the stores sales for that income bracket and can see where we have best chance of having customers in that income bracket. Simplifying it down, we can see that stores 311, 323, 408, 361, and 439 are the stores we should focus this campaign on.
“What is our next marketing strategy to drive consumers to shop at our stores and drive sales in our lowest performing product category?”
Throughout this report we completed our five objectives. Please see the below recommendation for our next marketing campaign:
Throughout this report we kept coming back to the first objective - the product category. We were thoroughly surprised to see a very common and rather inexpensive meat contribute the most to the month to month losses. This could be done on purpose though to draw customers into the store to purchase more items.
The biggest limitation in this the unknown cost of marketing. Like do certain mailers/product placement cost more money? The recommendations are based on if all options have equal cost. Adding in a cost factor would be the next thing we focus on in the future.