The business problem we are trying to solve is how can Regork boost sales during the Super Bowl season? This is one the biggest events in America. We aim to figure out how to maximize sales during this time and provide consumers with what they need to celebrate this big event.
In order to answer our question we decided to break it down into three smaller sections. Who are the top spenders during the Super Bowl? What are the top products sold during the Super Bowl? When do sales pick up before the Super Bowl?
After analyzing the data we recommend promoting bundles for the top purchased items leading up to the Super Bowl. We can use demographic data to best decide pricing for bundles, and how to direct marketing campaigns. After looking at when sales start to increase leading up to the Super Bowl, promoting deals starting roughly 2 weeks before the Super Bowl could maximize sales.
dplyr - helps with working with large data sets
stringr - makes working with strings easier
ggplot2- package for graphing data
tidyverse - package containing many data cleaning tools
completejourney - Data set with thousands of grocery store transactions and demographics
#libraries needed
library(dplyr) # helps with large data sets
library(stringr) # working with strings easier
library(ggplot2) #plotting
library(tidyverse) #cleaning data
library(completejourney) # data
#data needed from completejourney
transactions <- completejourney::get_transactions()
demographics <- completejourney::demographics
We started with cleaning the data. We had to filter through dates, so we were only gathering transaction info up to a week before the Super Bowl that year (2/5/2017). We found the top product categories in order to look for opportunities for bundling and promotions to boost sales. We also decided to filter out coupons and fuel since we are focusing on products to bundle. We decided to look deeper upon seeing milk as number 1. We noticed that that is common throughout the entire year, so we didn’t see a need to add to a potential bundle.
#Right at Superbowl Day
superbowl_sales <- transactions %>%
inner_join(products, by = "product_id") %>%
filter(as.Date(transaction_timestamp) >= "2017-02-05" &
as.Date(transaction_timestamp) < "2017-02-06",
product_category != "COUPON/MISC ITEMS",
product_category != "FUEL"
)
top_products_named <- superbowl_sales %>%
group_by(product_id, product_category, product_type) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE),
total_quantity = sum(quantity, na.rm = TRUE)) %>%
select(product_id, product_category, product_type, total_sales, total_quantity) %>%
arrange(desc(total_sales))
# 18 is needed to reach 10 categories for the graph due to different sizes of the same item being purchase (ex. 6pk of soda vs 12pk)
top_ten_products <- head(top_products_named, 18)
#graph for top products
ggplot(top_ten_products, aes(x = reorder(product_category, total_sales), y = total_sales, fill = product_category)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Top 10 Product Categories During Super Bowl Week",
x = "Product Category", y = "Total Sales")
Next, we filtered the data by date to gather the sales for each week leading up to the Super Bowl (1/9/17 - 2/5/17) We aimed to use this data to see when sales would increase, so it would be clear when to start sales & promotions the next Super Bowl season.
#A week before Superbowl Day excluding coupons and fuel
superbowl_sales_1w <- transactions %>%
inner_join(products, by = "product_id") %>%
filter(as.Date(transaction_timestamp) >= "2017-01-29" &
as.Date(transaction_timestamp) < "2017-02-06",
product_category != "COUPON/MISC ITEMS",
product_category != "FUEL")
#2 weeks before Superbowl Day excluding coupons and fuel
superbowl_sales_2w <- transactions %>%
inner_join(products, by = "product_id") %>%
filter(as.Date(transaction_timestamp) >= "2017-01-22" &
as.Date(transaction_timestamp) < "2017-01-29",
product_category != "COUPON/MISC ITEMS",
product_category != "FUEL")
#3 weeks before Superbowl Day excluding coupons and fuel
superbowl_sales_3w <- transactions %>%
inner_join(products, by = "product_id") %>%
filter(as.Date(transaction_timestamp) >= "2017-01-15" &
as.Date(transaction_timestamp) < "2017-01-22",
product_category != "COUPON/MISC ITEMS",
product_category != "FUEL")
#4 weeks before Superbowl Day excluding coupons and fuel
superbowl_sales_4w <- transactions %>%
inner_join(products, by = "product_id") %>%
filter(as.Date(transaction_timestamp) >= "2017-01-09" &
as.Date(transaction_timestamp) < "2017-01-15",
product_category != "COUPON/MISC ITEMS",
product_category != "FUEL")
#Top products sold by each week a month before Superbowl
superbowl_sales_ew <- superbowl_sales_4w%>%
mutate(date = as.Date(superbowl_sales_4w$transaction_timestamp))
superbowl_sales_ew$week_num <- strftime(superbowl_sales_ew$date, format = "%V")
top_products_named_ew <- superbowl_sales_ew %>%
group_by(product_id, product_category, product_type, week_num) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE),
total_quantity = sum(quantity, na.rm = TRUE)) %>%
select(week_num, product_id, product_category, product_type, total_sales, total_quantity) %>%
arrange(desc(total_sales)) %>%
arrange(week_num)
head(top_products_named, 10)
## # A tibble: 10 × 5
## # Groups: product_id, product_category [10]
## product_id product_category product_type total_sales total_quantity
## <chr> <chr> <chr> <dbl> <dbl>
## 1 1029743 FLUID MILK PRODUCTS FLUID MILK WHITE O… 139. 53
## 2 874972 BEEF SELECT BEEF 104. 11
## 3 5569230 SOFT DRINKS SOFT DRINKS 12/18&… 95 38
## 4 5569471 SOFT DRINKS SOFT DRINKS 12/18&… 95 38
## 5 1106523 FLUID MILK PRODUCTS FLUID MILK WHITE O… 83.8 32
## 6 867420 SEAFOOD - FROZEN SEAFOOD-FRZ-IQF RA… 68.0 4
## 7 1055503 BEEF PRIMAL 66.6 15
## 8 8090521 SOFT DRINKS SOFT DRINKS 12/18&… 62.5 25
## 9 8090537 SOFT DRINKS SOFT DRINKS 12/18&… 62.5 25
## 10 1025611 MEAT - MISC WINGS (IQF) 59.9 10
As we can see by the graph, it starts to pick up roughly 2 weeks before the Super Bowl and then spikes dramatically the week of.
#combiining sales for each week
weekly_sales <- data.frame(
Week = c("4 Weeks Before", "3 Weeks Before", "2 Weeks Before", "1 Week Before"),
Total_Sales = c(
sum(superbowl_sales_4w$sales_value, na.rm = TRUE),
sum(superbowl_sales_3w$sales_value, na.rm = TRUE),
sum(superbowl_sales_2w$sales_value, na.rm = TRUE),
sum(superbowl_sales_1w$sales_value, na.rm = TRUE)
)
)
weekly_sales$Week <- factor(weekly_sales$Week, levels = c("4 Weeks Before", "3 Weeks Before", "2 Weeks Before", "1 Week Before"))
#graphing sales for each week
library(ggplot2)
ggplot(weekly_sales, aes(x = Week, y = Total_Sales, group = 1)) +
geom_line(color = "blue", size = 1.2) +
geom_point(size = 3, color = "red") +
labs(title = "Total Sales Leading Up to the Super Bowl",
x = "Week Before Super Bowl", y = "Total Sales ($)") +
theme_minimal()
We can use demographics to help determine how to best market to customers and determine what to bundle. Main Consumers: Ages 35-54, Income: 35-74k
#restting the superbowl_sales
superbowl_sales <- transactions %>%
inner_join(products, by = "product_id") %>%
inner_join(demographics, by = "household_id") %>%
filter(transaction_timestamp >= "2017-01-28 00:00:00" &
transaction_timestamp < "2017-02-06 00:00:00",
product_category != "COUPON/MISC ITEMS")
#grouping by age
top_ages <- superbowl_sales %>%
group_by(age) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales)) %>%
head(5) %>%
pull(age)
#grouping by income
top_incomes <- superbowl_sales %>%
group_by(income) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales)) %>%
head(5) %>%
pull(income)
#combing by age and income
top_age_income_sales <- superbowl_sales %>%
filter(age %in% top_ages, income %in% top_incomes) %>%
group_by(age, income) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales))
#plotting by age and income
ggplot(top_age_income_sales, aes(x = age, y = total_sales, fill = income)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Super Bowl Sales by Age & Income (Grouped Bar Chart)",
x = "Age Group", y = "Total Sales", fill = "Income Level")
1) Target Consumer Segment:
2) Best-Selling Products:
3) Timing Matters:
1) Focus on key products:
2) Differentiate bundles
3) Timing for Promotions:
1) Data Limitations:
2) Demographics & Location Factors: