Final Project - Group 1

Introduction

To be able to grow as a company at Regork, we (the marketing team) believe it is important to assess where we are at holistically. We have used data analysis to compile our monthly sales revenue from each month as well as to determine what our best promotional campaigns have been. What we have found is that we don’t have to reinvent the wheel. Our analyses have shown us that we should be able to meet our ultimate goal of maximizing profits and increasing sales revenue by simply answering our two research questions below:

Research Question 1: Which Promotional Campaigns led to the most sales and transactions?
Research Question 2: How can we utilize these Promotional Campaigns to maximize profit?

By assessing our top-selling products and cross-referencing the products that sell the best without promotions, we are able to work on key strategies to maximize sales for the products that are typically not in promotions (i.e. soft drinks).

Packages and Data Frames Used

Packages

library(dplyr)            # manipulating and transforming data
library(stringr)          # text manipulation
library(tidyverse)        # tidying data and working with other R packages
library(lubridate)        # working with dates and times
library(tidyr)            # tidying or cleaning up messy data
library(RColorBrewer)     # allows different color palettes for visuals 
library(ggplot2)          # data visualization
library(gt)               # creates more aesthetically pleasing tables
library(completejourney)  # grocery store shopping transactions data

Data Frames Created

transacts<-get_transactions()
promos<-get_promotions()
demo<-demographics
cd<-campaign_descriptions
camp<-campaigns
coup<-coupons
cr<-coupon_redemptions
prods<-products

Holistic Data (Precursor Information)

As mentioned, we first wanted to look at where we may have gaps. We looked at the historical data and saw the amount of sales per month for the whole year. We also looked at our top selling products (we will compare these later). Lastly, we wanted to see what our best marketing campaigns were in terms of transactions and sales value. Coincidentally, they were both the same. See below!

sales_by_month <- transacts%>%
  mutate(month = month(transaction_timestamp, label = TRUE))%>%
  group_by(month)%>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop")%>%
  arrange(month)

ggplot(sales_by_month, aes(x = month, y = total_sales, group = 1)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  labs(title = "Regork Total Sales by Month (2017)",
       x = "Month",
       y = "Total Sales") +
  theme_minimal()

Looking at this chart, February and September stand out as opportunities for us to pull some resources into bolstering these sales months.

highest_selling_products <- transacts%>% #Highest Selling Products df
  group_by(product_id)%>%
  summarize(
    total_sales = sum(sales_value, na.rm = TRUE), 
    total_units = n(),                            
    .groups = "drop"
  )%>%
  inner_join(prods, by = "product_id")%>%        
  select(product_id, total_sales, total_units, product_type)%>%
  arrange(desc(total_sales))%>%
  slice_max(order_by = total_sales, n = 15) 
highest_selling_products%>% #Highest Selling Products Table Creation
  gt()%>%
  tab_header(title = "Top 15 Highest Selling Products")%>%
  fmt_currency(columns = vars(total_sales),
    currency = "USD")%>%
  cols_label(
    product_id = "Product ID",
    total_sales = "Total Sales",
    total_units = "Total Units",
    product_type = "Product Type"
  )
Top 15 Highest Selling Products
Product ID Total Sales Total Units Product Type
6534178 $303,116.02 11582 GASOLINE-REG UNLEADED
6533889 $27,467.61 823 GASOLINE-REG UNLEADED
1029743 $22,729.71 7874 FLUID MILK WHITE ONLY
6534166 $20,477.54 722 GASOLINE-REG UNLEADED
6533765 $19,451.66 1150 GASOLINE-REG UNLEADED
1082185 $17,219.59 16992 BANANAS
916122 $16,120.01 2497 CHICKEN BREAST BONELESS
1106523 $15,629.95 5424 FLUID MILK WHITE ONLY
995242 $15,602.59 7441 FLUID MILK WHITE ONLY
5569230 $13,410.46 2771 SOFT DRINKS 12/18&15PK CAN CAR
1127831 $11,500.01 3551 STRAWBERRIES
1044078 $11,445.70 2464 LEAN
844179 $11,270.46 2208 PRIMAL
1133018 $8,436.41 4231 FLUID MILK WHITE ONLY
874972 $7,905.59 710 SELECT BEEF

While Gasoline remains our top selling item, it is not an item that would be beneficial in campaigning as the demand for gas will always be high. As you will see in the Marketing Plan tab, there are other products on this Highest Selling Products list that we can market better.

top_campaigns<-camp%>%
  inner_join(cd, by = "campaign_id")

top_campaigns<-top_campaigns%>%
  full_join(coup, by = "campaign_id")

top_campaigns_view<-top_campaigns%>% #Top Campaigns in terms of Coupon Redemptions
  semi_join(cr)%>%
  group_by(campaign_id, campaign_type, start_date, end_date)%>%
  summarize(total_redemptions = n())%>%
  arrange(desc(total_redemptions))

transacts_with_coups<-transacts%>%
  full_join(coup, by = "product_id")%>%
  semi_join(cr, by = "coupon_upc")

group_campaigns_sales<- transacts_with_coups %>% #Top Campaigns in terms of Sales Value
  group_by(campaign_id) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE))%>%
  arrange(desc(total_sales))
ggplot(top_campaigns_view, aes(x = reorder(as.factor(campaign_id), total_redemptions), 
                               y = total_redemptions / 1000, 
                               fill = campaign_type)) +
  geom_bar(stat = "identity") +
  labs(title = "Coupon Redemptions by Campaign (in Thousands)",
       x = "Campaign ID",
       y = "Total Redemptions (in Thousands)",
       fill = "Campaign Type") +
  coord_flip() +
  theme_minimal()

ggplot(group_campaigns_sales, aes(x = reorder(as.factor(campaign_id), total_sales), y = total_sales / 1000)) +
  geom_bar(stat = "identity", fill = "forestgreen") +
  labs(title = "Total Sales by Campaign (in $1000's)",
       x = "Campaign ID",
       y = "Total Sales ($1000's)") +
  theme_minimal()

From the tables above you can see that 3 of the 4 Type A Campaigns far and away bring in the most business for Regork in terms of sales and coupon redemptions.

We can take full advantage of this knowledge to create the optimal Marketing Plan moving forward!

Marketing Plan

Our plan is simple:

1. We are going to cut Campaign Types B and C AND run more Type A campaigns.

Reasoning: Roughly 91% of our stores total coupon redemptions comes from Type A campaign. We can allocate the resources used from the other 23 campaigns to develop more Type A ones.

2. Our new Type A campaigns will focus on the top selling items that have not already been attached to a promotion.

Reasoning: If those items are already selling well (Top 15 product overall) then adding a promotion to a target demographic should only benefit Regork in sales and customer traction.

3. We will focus on our least profitable months.

Reasoning: Focusing on our least profitable months allows us to create more demand when sales are traditionally low. This will ultimately improve overall profitability by changing off-peak periods into growth opportunities.

Overview

Let’s start by looking at our top selling items that are “non-coupon” transactions.

non_coupon_transacts <- transacts%>%
  anti_join(coup, by = "product_id")

highest_selling_non_coupon_products <- non_coupon_transacts %>% #Highest Selling without Coupons
  group_by(product_id)%>%
  summarize(
    total_sales = sum(sales_value, na.rm = TRUE),
    total_units = n(),
    .groups = "drop"
  )%>%
  inner_join(prods, by = "product_id")%>%
  select(product_id, total_sales, total_units, product_category, product_type)%>%
  arrange(desc(total_sales))
group_of_non_coupon<-highest_selling_non_coupon_products%>%
  group_by(product_category)%>%
  summarize(
    total_category_sales = sum(total_sales, na.rm = TRUE),
    total_category_units = sum(total_units, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(total_category_sales))%>%
  slice_max(order_by = total_category_sales, n = 15)

group_of_non_coupon%>%
  gt()%>%
  tab_header(title = "Top 15 Product Categories Without Coupons",
    subtitle = "Aggregated by Total Sales and Total Units Sold") %>%
  fmt_number(columns = vars(total_category_sales, total_category_units),
    decimals = 0)%>%
  cols_label(
    product_category = "Product Category",
    total_category_sales = "Total Sales",
    total_category_units = "Total Units"
  )
Top 15 Product Categories Without Coupons
Aggregated by Total Sales and Total Units Sold
Product Category Total Sales Total Units
COUPON/MISC ITEMS 379,394 14,705
SOFT DRINKS 114,118 42,212
BEERS/ALES 81,943 9,938
BAKED BREAD/BUNS/ROLLS 67,858 35,705
DELI MEATS 58,197 13,668
BAG SNACKS 54,936 25,205
CIGARETTES 54,350 6,834
CANDY - PACKAGED 40,994 18,569
DOMESTIC WINE 35,928 3,294
SOUP 35,818 19,989
BAKED SWEET GOODS 31,690 15,931
COFFEE 30,364 6,471
CANNED JUICES 29,566 12,362
WATER - CARBONATED/FLVRD DRINK 28,100 9,638
CHEESES 26,704 7,147

Gasoline and Gift Cards comprise the Coupon/Misc Items product category despite not actually being attached to any coupon promotions. As mentioned earlier we are going to ignore that category. We have devised a plan to use a section of our store that has huge potential.

The Drinks Section!

top_selling_feb_no_coupons <- non_coupon_transacts %>% #February
  mutate(month = month(transaction_timestamp, label = TRUE)) %>%
  filter(month %in% c("Feb")) %>%
  group_by(product_id) %>%
  summarize(
    total_sales = sum(sales_value, na.rm = TRUE),
    total_units = n(),
    .groups = "drop"
  ) %>%
  arrange(desc(total_sales)) %>%
  inner_join(prods, by = "product_id")

top_selling_feb_no_coupons<-top_selling_feb_no_coupons%>% 
  filter(product_category != "COUPON/MISC ITEMS") %>% 
  group_by(product_category)%>%
  summarize(
    total_category_sales = sum(total_sales, na.rm = TRUE),
    total_category_units = sum(total_units, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(total_category_sales))%>%
  slice_max(order_by = total_category_sales, n = 10)

top_selling_sep_no_coupons <- non_coupon_transacts %>% #September
  mutate(month = month(transaction_timestamp, label = TRUE)) %>%
  filter(month %in% c("Sep")) %>%
  group_by(product_id) %>%
  summarize(
    total_sales = sum(sales_value, na.rm = TRUE),
    total_units = n(),
    .groups = "drop"
  ) %>%
  arrange(desc(total_sales)) %>%
  inner_join(prods, by = "product_id")

top_selling_sep_no_coupons<-top_selling_sep_no_coupons%>% 
  filter(product_category != "COUPON/MISC ITEMS") %>% 
  group_by(product_category)%>%
  summarize(
    total_category_sales = sum(total_sales, na.rm = TRUE),
    total_category_units = sum(total_units, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(total_category_sales))%>%
  slice_max(order_by = total_category_sales, n = 10)
top_selling_sep_no_coupons <- top_selling_sep_no_coupons %>% 
  mutate(month = "Sep")

top_selling_feb_no_coupons <- top_selling_feb_no_coupons %>% 
  mutate(month = "Feb")

combined_top_categories <- bind_rows(top_selling_feb_no_coupons, top_selling_sep_no_coupons)

unique_categories <- length(unique(combined_top_categories$product_category))
mycolors <- colorRampPalette(brewer.pal(8, "Paired"))(unique_categories)

ggplot(combined_top_categories, aes(x = month, y = total_category_sales, fill = product_category)) +
  geom_col(color = "black") +
  scale_fill_manual(values = mycolors) +
  labs(title = "Top 10 Product Categories by Month",
       subtitle = "Items not linked to campaign promotions (excludes COUPON/MISC ITEMS)",
       x = "Month",
       y = "Total Sales",
       fill = "Product Category") +
  theme_minimal()

Soft drinks, Beers & Ales, as well as Domestic Wine bring in the most collective amount of money as a group. In the months of February and September that still rings true. We believe we can take advantage of special promotions and the “smaller” holidays to bolster sales in this market.

The targeted promotions will be sent to a specific age demographic. This group is a range of 25-54 year-olds (comprised of 3 separate age ranges: 25-34, 35-44, 45-54) as they were the collective age demographic that purchased 81% of the drink sales (see graph below).

sales_by_age <- non_coupon_transacts %>%
  inner_join(prods, by = "product_id") %>%
  inner_join(demo, by = "household_id") %>%
  group_by(product_category, age) %>%
  filter(product_category %in% c("SOFT DRINKS", "BEERS/ALES", "DOMESTIC WINE")) %>%
  summarize(
    total_sales = sum(sales_value, na.rm = TRUE),
    total_units = n(),
    .groups = "drop"
  ) %>%
  arrange(product_category, desc(total_sales))

ggplot(sales_by_age, aes(x = product_category, y = total_sales, fill = age)) +
  geom_bar(stat = "identity", position = "dodge", color = "black") +
  labs(title = "Total Sales by Product Category and Age Group",
       x = "Product Category",
       y = "Total Sales",
       fill = "Age Group") +
  theme_minimal() +
  coord_flip()

February Type A Campaign

Implementation strategy is to run a Type A campaign of our drink section that runs from December 29th-February 28th to the designated age demographic. During this time we would have promotions for Soft Drinks, Beers and Ales, and Domestic Wines through the following holidays or special events with added sales the day before through the day after the event. While we understand this will allow sales to grow in January - which is obviously a good thing - it should also show growth in February. Any growth in this month is a huge win for us.

List of of Holidays and Special Events:

-New Years Eve (January)

-The Super Bowl (February)

-Valentine’s Day (February)

September Type A Campaign

Implementation strategy is to run a Type A campaign of our drink section that runs the entire month of September to the designated age demographic. While the only major holiday is Labor Day, we believe running sales during this time, ‘Peak Fall Sport and Tailgating Season,’ can lead to some exponential growth and movement of product.

Summary

Our primary research questions were: Which promotional campaigns drive the most sales and transactions? And how can we leverage these campaigns to boost profit? We needed to create a comprehensive review of Regork and its performance to maximize profits.

What we did:

We analyzed monthly sales data, promotional campaign performance, and product sales (both with and without coupons). Using this data, we were able to create a marketing campaign from transactions, promotions, as well as demographics. We employed R (using packages like dplyr, ggplot2, gt, and others) to clean, summarize, and visualize the data.

What we found:

Our analysis revealed that Type A campaigns account for roughly 91% of coupon redemptions. This indicates that there is a strong correlation with sales. We found that there are other items that could benefit from promotional opportunities outside of gasoline sales. This opportunity lies in promoting high-margin items (e.g., soft drinks, beers, domestic wine) which aren’t typically attached to promotions. We found that February and September emerged as potential months to boost sales while targeting these promotions to 25-54 year-old consumers.

As mentioned in our plan, we propose cutting Campaign Types B and C and reallocating resources to increase Type A campaigns. Focusing these campaigns on top-selling items that were not previously promoted can drive both revenue growth and customer engagement, especially in months that had low-sales historically.

Limitations

The analysis is limited by data quality as well as information provided almost eight years ago. Future work should be done employing these strategies in a fixed experiment to see whether or not this strategy proves to be effective.