Introduction

The Business Question

The large question we are asking in regards to Regork is which product categories are most responsive to promotional campaigns? We also decided to segment this data to determine which household income level purchases these promotional campaign responsive categories the most. Finally we examined how the most responsive income level purchases the categories that are most responsive to promotional campaigns.

How We Got Here

By separating promotional sales by product category we were able to find the products that are most responsive to promotions. Then by segmenting coupon usage by income level we were able to find the household income level that is most responsive to promotions. Finally, we showed the promotional sales from the most responsive income level for each of the top 10 product categories over the year. This was able to show us which months Regork could improve on. We found this to be an interesting approach because with the top 10 product categories we knew they were responsive to promotional campaigns. This led us to believe that Regork would just have to make small adjustments to improve rather than create a whole new marketing strategy.

Why and How This Will Help Regork

We have shown how the most frequently purchased discounted product categories are purchased by household income level and how they are purchased overtime. This information will be helpful to Regork’s executives, allowing them to strategically plan promotional campaigns that not only drive sales but also enhance the effectiveness of their marketing efforts.

Packages and Libraries Required

library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(lubridate)
library(RColorBrewer)

transactions <- get_transactions()

Why These Packages

  • completejourney: used to load data that were are analyzing
  • tidyverse: used to generate functions used in calculations and complete the analysis
  • dplyr: used to join data
  • ggplot2: used when visualizing data and making graphs aesthetically pleasing
  • lubridate: used for our third questions, about purchase habits over time
  • RColorBrewer: used when visualizing data and changing aesthetics of graphs

Exploratory Data Analysis

most_profit_products <- products %>%
  inner_join(transactions) %>%
  group_by(product_type) %>%
  filter(product_category != 'COUPON/MISC ITEMS') %>%
  filter(product_type != 'GASOLINE-REG UNLEADED') %>%
  mutate(discounted_sales_value = sales_value * (1 - retail_disc)) %>%
  summarise(total_sales = sum(discounted_sales_value, na.rm = TRUE)) %>%
  arrange(desc(total_sales)) 
## Joining with `by = join_by(product_id)`
top10_products <- head(most_profit_products,10)

ggplot(top10_products, aes(x = reorder(product_type, -total_sales), y = total_sales)) +
  geom_bar(stat = "identity", fill = "skyblue", size = 15) +
  labs(title = "Top 10 Highest Total Sales Discounted Product Types",
       x = "Product Type",
       y = "Total Sales") +
  theme_minimal() +
  coord_flip()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

coupon_data <- coupon_redemptions %>%
  left_join(demographics, by = "household_id")

coupon_data %>%
  filter(income != 'NA') %>%
  group_by(income) %>%
  summarize(total_coupons = n()) %>%
  ggplot(aes(x = income, y = total_coupons)) +
  geom_bar(stat = "identity", width = 0.8, fill = 'navy') +
  scale_x_discrete(expand = c(-0.11, -0.11)) +
  labs(title = "Coupon Usage by Income Group",
       x = "Income Group",
       y = "Total Coupons Used")+
  theme_minimal() +
  coord_flip()

transactions %>%
  inner_join(demographics, by = "household_id") %>%
  inner_join(products, by = "product_id") %>%
  filter(income == "50-74K") %>%
  filter(product_type %in% top10_products$product_type) %>%
  mutate(product_type = recode(product_type,
                                "YOGURT NOT MULTI-PACKS" = "YOGURT",
                                "POTATOES RUSSET (BULK&BAG)" = "POTATOES",
                                "SS ECONOMY ENTREES/DINNERS ALL" = "DINNERS",
                                "SALAD BAR FRESH FRUIT" = "FRUIT",
                                "CONDENSED SOUP" = "SOUP",
                                "FLUID MILK WHITE ONLY" = "MILK",
                                "BEERALEMALT LIQUORS" = "LIQUORS")) %>%
  group_by(product_type, month = month(transaction_timestamp, label = TRUE)) %>%
  summarize(total_sales = sum(sales_value *  quantity, na.rm = TRUE), .groups = 'drop') %>%
  filter(total_sales >= 0) %>%
  ggplot(aes(x = month, y = total_sales, group = product_type, color = month)) +  
  geom_line(size = 1) +  
  geom_point(size = 2) +  
  labs(title = "Total Sales of Top Discounted Products (Income Group: 50-74K)",
       x = NULL,  # Remove x-axis title
       y = "Total Sales") +
  scale_color_manual(values = brewer.pal(n = 12, name = "Set3")) +
  theme_minimal() +
  theme(axis.text.x = element_blank(),
        strip.text.x = element_text(size = 10)) + 
  facet_wrap(~ product_type, scales = "free_y")

Insights Obtained/New Information

Promotional campaign responsiveness by product category:

  • Through this data analysis we discovered the most promotional campaign responsive product categories
  • When showing the top ten categories we saw that Gas/Fuel was multiple categories in the top ten and we determined that we are not going to focus on this, so we excluded gas from out results
  • The top ten categories (excluding gas and fuel) are beer/liquors, cigarettes, potato chips, white milk, bananas, condensed soup, salad bar/fresh fruit, economy entrees, russet potatoes, and yogurt.
  • We were surprised by the top two categories; “beer, ale, malt liquors” and cigarettes because those are not items that are on sale extremely often. This shows that promotional campaigns are effective for those categories and that people are extremely likely to purchase those items if they are on sale.

Coupon usage by household income:

  • the household income level that uses coupons the most is by far 50-74K

  • the household income level that uses coupons the least is the highest two: 250K+ and 200-249K

  • The 50-74K income group used the most coupons, however, this may be because it is a large income group.

Total Sales of Discounted Products (Income Group 50-74K)

  • We segmented this data to show purchases just by the income level 50-74K

  • Bananas extremely low in February

  • Cigarettes low in November

  • Dinners low in January, July, and November

  • Fruit low in April, August, and November

  • Liquors low in March and November

  • Milk low in February, April, and November

  • Potato chips low in January, February, and November

  • Potatoes low in August

  • Soup low in June and July

  • These months with extremely low promotional sales are what we are focusing on because we know these products can sell, therefore, it should be possible to improve these months

  • We’ve noticed that promotional sales are low for almost every product in November. This could have something to with Thanksgiving and there being other sales on products like Turkeys and Stuffing. However, these products are still being bought during those months so I believe there is room for improvement.

Summary

When we started this analysis, we wanted to know what product categories were most responsive to promotional campaigns.

How We Answered the Business Question

To address this business question, we began by analyzing the product categories and identified the top 10 that showed the highest responsiveness to promotional campaigns. In this analysis, we excluded gasoline/fuel. Although we considered examining the bottom 10 categories, we ultimately chose to focus on the top performers. This approach allowed us to uncover seasonal buying trends and campaign patterns, allowing us to recommend strategies to further optimize these high-response categories. Next, we analyzed household income levels, as we believed this would be an important factor to consider when developing targeted campaigns. Our analysis revealed that households with an income of $50K–$74K are the most frequent users of coupons. This was somewhat unexpected, as we initially assumed that the lowest income brackets would have the highest coupon usage, but the data showed otherwise. Finally, we analyzed the promotional sales over the year in order to discover seasonal trends. We did this by graphing the promotional sales of the top ten product categories by the top household income. We did this because similarly to the product category, we know that this household income level is responsive to promotional campaigns. When we did this we were able to see what month each product category has its highest and lowest promotional sales. We were able to discover a few patterns that we believe we can make recommendations upon.

Recommendation

November has the opportunity to boost sales by focusing on two key areas: grocery essentials and Thanksgiving-related products. We recommend launching targeted promotional campaigns that combine these items, appealing to customers who are preparing for both the holiday and their regular shopping needs. By bundling common household items (from the top 10 categories)—such as milk, bananas, and yogurt—with Thanksgiving essentials like turkey, stuffing, and seasonal produce, Regork can create deals that encourage larger basket sizes. Customers are already planning bigger shopping trips for holiday meals, and highlighting savings on daily-use items within the same campaign can enhance the appeal of these promotions. This approach will not only cater to holiday shoppers but also capture the attention of those looking for regular household items, ultimately driving sales across multiple categories and enhancing customer loyalty during a key shopping period.

Limitations

  • One limitation we encountered in analyzing coupon usage by household income level is the potential impact of income group size. It’s possible that households in the $50K–$74K range have the highest coupon usage simply because they represent one of the largest income groups. However, we chose to proceed with this data, as even though their coupon usage may not be the highest proportionally, they still account for the largest total number of coupons used. This makes them a key target group for our campaigns.

  • A different approach to this analysis would be taking the average of promotional sales for our different explorations. This would eliminate the risk that the size of each population is affecting the data.