Introduction:

Super Bowl Sales

The business problem we are trying to solve is how can Regork boost sales during the Super Bowl season? This is one the biggest events in America. We aim to figure out how to maximize sales during this time and provide consumers with what they need to celebrate this big event.

Analyzing the Sales

In order to answer our question we decided to break it down into three smaller sections. Who are the top spenders during the Super Bowl? What are the top products sold during the Super Bowl? When do sales pick up before the Super Bowl? 

Proposed Solution

After analyzing the data we recommend promoting bundles for the top purchased items leading up to the Super Bowl. We can use demographic data to best decide pricing for bundles, and how to direct marketing campaigns. After looking at when sales start to increase leading up to the Super Bowl, promoting deals starting roughly 2 weeks before the Super Bowl could maximize sales. 

Libraries Needed

dplyr - helps with working with large data sets

stringr - makes working with strings easier

ggplot2- package for graphing data

tidyverse - package containing many data cleaning tools

completejourney - Data set with thousands of grocery store transactions and demographics

#libraries needed 
library(dplyr) # helps with large data sets 
library(stringr) # working with strings easier
library(ggplot2) #plotting
library(tidyverse) #cleaning data
library(completejourney) # data 

#data needed from completejourney 
transactions <- completejourney::get_transactions()
demographics <- completejourney::demographics

We started with cleaning the data. We had to filter through dates, so we were only gathering transaction info up to a week before the Super Bowl that year (2/5/2017). We found the top product categories in order to look for opportunities for bundling and promotions to boost sales. We also decided to filter out coupons and fuel since we are focusing on products to bundle. We decided to look deeper upon seeing milk as number 1. We noticed that that is common throughout the entire year, so we didn’t see a need to add to a potential bundle.

  • Leading Products: Milk, Soda, Meat, Alcohol, Fruit
#Right at Superbowl Day
superbowl_sales <- transactions %>%
  inner_join(products, by = "product_id") %>% 
  filter(as.Date(transaction_timestamp) >= "2017-02-05" & 
           as.Date(transaction_timestamp) < "2017-02-06",
         product_category != "COUPON/MISC ITEMS",
         product_category != "FUEL"
         )

top_products_named <- superbowl_sales %>%
  group_by(product_id, product_category, product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE),
            total_quantity = sum(quantity, na.rm = TRUE)) %>%
  select(product_id, product_category, product_type, total_sales, total_quantity) %>%
  arrange(desc(total_sales))

# 18 is needed to reach 10 categories for the graph due to different sizes of the same item being purchase (ex. 6pk of soda vs 12pk)
top_ten_products <- head(top_products_named, 18) 

#graph for top products 
ggplot(top_ten_products, aes(x = reorder(product_category, total_sales), y = total_sales, fill = product_category)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Top 10 Product Categories During Super Bowl Week",
       x = "Product Category", y = "Total Sales") 

Next, we filtered the data by date to gather the sales for each week leading up to the Super Bowl (1/9/17 - 2/5/17) We aimed to use this data to see when sales would increase, so it would be clear when to start sales & promotions the next Super Bowl season.

#A week before Superbowl Day excluding coupons and fuel 
superbowl_sales_1w <- transactions %>%
  inner_join(products, by = "product_id") %>% 
  filter(as.Date(transaction_timestamp) >= "2017-01-29" & 
           as.Date(transaction_timestamp) < "2017-02-06",
         product_category != "COUPON/MISC ITEMS",
         product_category != "FUEL")

#2 weeks before Superbowl Day excluding coupons and fuel 
superbowl_sales_2w <- transactions %>%
  inner_join(products, by = "product_id") %>% 
  filter(as.Date(transaction_timestamp) >= "2017-01-22" & 
           as.Date(transaction_timestamp) < "2017-01-29",
         product_category != "COUPON/MISC ITEMS",
         product_category != "FUEL")

#3 weeks before Superbowl Day excluding coupons and fuel 
superbowl_sales_3w <- transactions %>%
  inner_join(products, by = "product_id") %>% 
  filter(as.Date(transaction_timestamp) >= "2017-01-15" & 
           as.Date(transaction_timestamp) < "2017-01-22",
         product_category != "COUPON/MISC ITEMS",
         product_category != "FUEL")

#4 weeks before Superbowl Day excluding coupons and fuel 
superbowl_sales_4w <- transactions %>%
  inner_join(products, by = "product_id") %>% 
  filter(as.Date(transaction_timestamp) >= "2017-01-09" & 
           as.Date(transaction_timestamp) < "2017-01-15",
         product_category != "COUPON/MISC ITEMS",
         product_category != "FUEL")


#Top products sold by each week a month before Superbowl
superbowl_sales_ew <- superbowl_sales_4w%>%
  mutate(date = as.Date(superbowl_sales_4w$transaction_timestamp))

superbowl_sales_ew$week_num <- strftime(superbowl_sales_ew$date, format = "%V")

top_products_named_ew <- superbowl_sales_ew %>%
  group_by(product_id, product_category, product_type, week_num) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE),
            total_quantity = sum(quantity, na.rm = TRUE)) %>%
  select(week_num, product_id, product_category, product_type, total_sales, total_quantity) %>%
  arrange(desc(total_sales)) %>%
  arrange(week_num)
  

head(top_products_named, 10)
## # A tibble: 10 × 5
## # Groups:   product_id, product_category [10]
##    product_id product_category    product_type        total_sales total_quantity
##    <chr>      <chr>               <chr>                     <dbl>          <dbl>
##  1 1029743    FLUID MILK PRODUCTS FLUID MILK WHITE O…       139.              53
##  2 874972     BEEF                SELECT BEEF               104.              11
##  3 5569230    SOFT DRINKS         SOFT DRINKS 12/18&…        95               38
##  4 5569471    SOFT DRINKS         SOFT DRINKS 12/18&…        95               38
##  5 1106523    FLUID MILK PRODUCTS FLUID MILK WHITE O…        83.8             32
##  6 867420     SEAFOOD - FROZEN    SEAFOOD-FRZ-IQF RA…        68.0              4
##  7 1055503    BEEF                PRIMAL                     66.6             15
##  8 8090521    SOFT DRINKS         SOFT DRINKS 12/18&…        62.5             25
##  9 8090537    SOFT DRINKS         SOFT DRINKS 12/18&…        62.5             25
## 10 1025611    MEAT - MISC         WINGS (IQF)                59.9             10

As we can see by the graph, it starts to pick up roughly 2 weeks before the Super Bowl and then spikes dramatically the week of.

#combiining sales for each week
weekly_sales <- data.frame(
  Week = c("4 Weeks Before", "3 Weeks Before", "2 Weeks Before", "1 Week Before"),
  Total_Sales = c(
    sum(superbowl_sales_4w$sales_value, na.rm = TRUE),
    sum(superbowl_sales_3w$sales_value, na.rm = TRUE),
    sum(superbowl_sales_2w$sales_value, na.rm = TRUE),
    sum(superbowl_sales_1w$sales_value, na.rm = TRUE)
  )
)

weekly_sales$Week <- factor(weekly_sales$Week, levels = c("4 Weeks Before", "3 Weeks Before", "2 Weeks Before", "1 Week Before"))
#graphing sales for each week 
library(ggplot2)
ggplot(weekly_sales, aes(x = Week, y = Total_Sales, group = 1)) +
  geom_line(color = "blue", size = 1.2) +
  geom_point(size = 3, color = "red") + 
  labs(title = "Total Sales Leading Up to the Super Bowl",
       x = "Week Before Super Bowl", y = "Total Sales ($)") +
  theme_minimal()

We can use demographics to help determine how to best market to customers and determine what to bundle. Main Consumers: Ages 35-54, Income: 35-74k

#restting the superbowl_sales
superbowl_sales <- transactions %>%
  inner_join(products, by = "product_id") %>%
  inner_join(demographics, by = "household_id") %>% 
  filter(transaction_timestamp >= "2017-01-28 00:00:00" & 
           transaction_timestamp < "2017-02-06 00:00:00",
         product_category != "COUPON/MISC ITEMS")
#grouping by age 
top_ages <- superbowl_sales %>%
  group_by(age) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  arrange(desc(total_sales)) %>%
  head(5) %>% 
  pull(age)  

#grouping by income
top_incomes <- superbowl_sales %>%
  group_by(income) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  arrange(desc(total_sales)) %>%
  head(5) %>%  
  pull(income) 

#combing by age and income
top_age_income_sales <- superbowl_sales %>%
  filter(age %in% top_ages, income %in% top_incomes) %>%
  group_by(age, income) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  arrange(desc(total_sales))
#plotting by age and income
ggplot(top_age_income_sales, aes(x = age, y = total_sales, fill = income)) +
  geom_bar(stat = "identity", position = "dodge") + 
  labs(title = "Super Bowl Sales by Age & Income (Grouped Bar Chart)",
       x = "Age Group", y = "Total Sales", fill = "Income Level") 

Report: Summary

(i) Summarize the problem statement you addressed
  • The Super Bowl is a major annual event that consistently generates excitement and high consumer engagement.
  • Given this, we aimed to explore opportunities to incentivize purchases through strategic promotional campaigns, such as bundling, to maximize sales during this period.
(ii) Summarize how you addressed this problem statement (the data used and the methodology employed)
  • Demographics & Demand Trends: Identifying the primary consumer base and their purchasing behavior.
  • Top Purchased Products: Examining sales patterns four weeks, three weeks, two weeks, one week, and on the day of the Super Bowl.
  • Optimal Timing for Promotions: Pinpointing when sales naturally spiked to strategically introduce promotional campaigns.
  • By assessing these factors, we determined the ideal timeframe to launch promotions and the most effective bundling and pricing strategies.
(iii) Summarize the interesting insights that your analysis provided

1) Target Consumer Segment:

  • The core customer base falls within a mid-income range of $35K–$74K.
  • Since this group isn’t highly price-sensitive, discounting bundles by 10–15% (rather than deep price cuts) is sufficient to drive sales.

2) Best-Selling Products:

  • As the Super Bowl approaches, the top purchases are Milk, Soda, Meat, Alcohol, and Fruit.
  • However, Milk and Fruit remain top sellers year-round, making them less relevant for Super Bowl-specific bundles.

3) Timing Matters:

  • Sales spiked in the second week leading up to the event, making it the optimal time to launch promotions.
(iv) Summarize the implications to the consumer of your analysis. What would you propose to the Regork CEO?

1) Focus on key products:

  • Meat, Soda, and Alcohol, as they are frequently bought together.

2) Differentiate bundles

  • Meat Type (Beef vs. Chicken Wings) paired with specific Soda and Alcohol brands.

3) Timing for Promotions:

  • Launch alcohol and soda promotions in the mid-week of the second week before the event.
  • Raw meat sales peak in the final week, so stronger promotions should be introduced then to capitalize on last-minute grocery runs.

(v) Discuss the limitations of your analysis and how you, or someone else, could improve or build on it.

1) Data Limitations:

  • The biggest limitation is that this analysis is based on only one year of data.
  • Running the same tests on more recent Super Bowl sales would strengthen the findings.

2) Demographics & Location Factors:

  • Exploring regional preferences could refine bundling strategies.
  • Purchasing behavior in Ohio may differ significantly from Florida, so tailored promotions by region may be more effective.