Introduction

In an ever-changing and volatile market, growth opportunities are vitally important to seize and take advantage of. With this data I intend to present Regork with a major opportunity that is sure to give great returns on investment.

The Problem

Coupons provide time-tested incentives for consumers to spend and purchase products they would otherwise ignore. While combing the transaction data, I noticed a significant mismatch between coupons issued vs redeemed as well as which household compositions were redeeming which coupons.

The Opportunity

With more focused coupon campaigns and better application of the economic principles of price discrimination, Regork could capture a great deal more of the consumer surplus and thus collect on a greater amount of untapped profits.

Libraries to be included in the data processing:

Library	Purpose
`tidyverse`	Packages for tidying the data
`tidytext`	To help tidy labels
`scales`	For graph aesthetic mapping
`knitr`	Compile results to HTML
`glue`	String helpers
`completejourney`	The data in question

Differences of Household Compositions

Our first step requires looking at the different household compositions. We use the completejourney package to examine the transactions separated by household compositions, and look at the top 5 product category purchased by each of the four: 1 Adult No Kids, 2 Adults No Kids, 1 Adult with Kids, 2 Adults with Kids.

joined <- transactions %>%
  left_join(products, by = "product_id") %>%
  left_join(demographics, by = "household_id") %>%
  mutate(hh_group = case_when(
    grepl("2 Adults No Kids", household_comp, ignore.case = TRUE) ~ "2 Adults, No Kids",
    grepl("2 Adults Kids",         household_comp, ignore.case = TRUE) ~ "2 Adults, Kids",
    grepl("1 Adult No Kids",household_comp, ignore.case = TRUE) ~ "1 Adult, No Kids",
    grepl("1 Adult Kids",        household_comp, ignore.case = TRUE) ~ "1 Adult, Kids",
    TRUE ~ NA_character_
  )) %>%
  filter(!is.na(hh_group), !is.na(product_category), product_category != "COUPON/MISC ITEMS")
#We disinclude Coupon/misc items as these items are
#either already couponed or ineligible for coupons
by_cat <- joined %>%
  group_by(hh_group, product_category) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop")
top5_each <- by_cat %>%
  group_by(hh_group) %>%
  slice_max(total_sales, n = 5, with_ties = FALSE) %>%
  ungroup()
#breakdown of the top 5 items by sale value
ggplot(top5_each,
       aes(x = reorder_within(product_category, total_sales, hh_group),
           y = total_sales,
           fill = hh_group)) +                     
  geom_col(show.legend = FALSE) +                 
  coord_flip() +
  facet_wrap(~ hh_group, scales = "free_y", ncol = 2) +
  scale_x_reordered() +
  scale_fill_brewer(palette = "Paired") +           
  labs(
    title = "Top 5 Categories by Household Comp",
    x = "Product Category",
    y = "Total Sales ($)"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    strip.background = element_rect(fill = "#f0f0f0", color = NA),
    strip.text = element_text(face = "bold"),
    plot.title = element_text(face = "bold", hjust = 0.5)
  )

As we can see from this data, all four demographics show similar splits among top five categories for sales volume. This is not unexpected as the items in question are daily consumption items like groceries, dairy, soft drinks, frozen food, meats, etc. But let’s take a look at income among the various compositions:

demo4 <- demographics %>%
  mutate(hh_norm  = str_squish(str_to_lower(household_comp)),
         hh_group = case_when(
           str_detect(hh_norm, "^1\\s*adult\\s*no\\s*kids$")  ~ "1 Adult No Kids",
           str_detect(hh_norm, "^1\\s*adult.*kids$")          ~ "1 Adult Kids",
           str_detect(hh_norm, "^2\\s*adults\\s*no\\s*kids$") ~ "2 Adults No Kids",
           str_detect(hh_norm, "^2\\s*adults.*kids$")         ~ "2 Adults Kids",
           TRUE ~ NA_character_
         )) %>% #helps to read strings, something spicy was happening when importing 
  filter(!is.na(hh_group), !is.na(income))


income_levels <- c("Under 15K","15-24K","25-34K","35-49K","50-74K",
                   "75-99K","100-124K","125-149K","150-174K","175-199K",
                   "200-249K","250K+")
demo4 <- demo4 %>%
  mutate(income = factor(income, levels = income_levels))

ggplot(demo4, aes(x = income, fill = income)) +
  geom_bar() +
  coord_flip() +
  facet_wrap(~ hh_group, ncol = 2) +
  labs(title = "Income Distribution by Household Composition",
       x = "Income", y = "Households") +
  scale_fill_brewer(palette = "Paired") +  
  theme_minimal(base_size = 13) +
  theme(legend.position = "none",
        strip.text = element_text(face = "bold"))

We can see that the income distribution heavily favors dual-income adult households with the highest concentrations of >$100k incomes, and particularly the “2 Adults No Kids” section has the highest concentration of $200k and higher. This is also not unexpected, as dual-income no kid (colloquially referred to as DINKs) households have earned a reputation for having significant disposable income. But then, when we look at coupon usage:

data("demographics")
data("products")
data("coupon_redemptions")
data("coupons")

# Join redemptions to coupon details and demographics
coupon_data <- coupon_redemptions %>%
  left_join(coupons, by = "coupon_upc") %>%
  left_join(products, by = "product_id") %>%
  left_join(demographics, by = "household_id") %>%
  filter(!is.na(household_comp))
coupon_data <- coupon_data %>%
  #Gives more legible labels and prevents reading issues
  mutate(hh_group = case_when(
    grepl("2 Adults No Kids", household_comp, ignore.case = TRUE) ~ "2 Adults, No Kids",
    grepl("2 Adults Kids",         household_comp, ignore.case = TRUE) ~ "2 Adults, Kids",
    grepl("1 Adult No Kids",household_comp, ignore.case = TRUE) ~ "1 Adult, No Kids",
    grepl("1 Adult Kids",        household_comp, ignore.case = TRUE) ~ "1 Adult, Kids",
    TRUE ~ NA_character_
  )) %>%
  filter(!is.na(hh_group)) %>%
  mutate(hh_group = factor(
    hh_group,
    levels = c("1 Adult, No Kids", "1 Adult, Kids", "2 Adults, No Kids", "2 Adults, Kids")
  ))

coupon_summary <- coupon_data %>%
  group_by(hh_group, product_category) %>%
  summarise(redemptions = n(), .groups = "drop")

top5_coupons <- coupon_summary %>%
  group_by(hh_group) %>%
  slice_max(redemptions, n = 5, with_ties = FALSE) %>%
  ungroup()
#separate and plot the top 5 coupons used for each group
ggplot(top5_coupons,
       aes(x = reorder_within(product_category, redemptions, hh_group),
           y = redemptions,
           fill = hh_group)) +  
  
  geom_col(show.legend = FALSE) +
  coord_flip() +
  facet_wrap(~ hh_group, ncol = 2, scales = "free_y") +
  scale_x_reordered() +
  scale_fill_brewer(palette = "Set2") +   
  labs(
    title = "Top 5 Coupon Categories Redeemed by Household Type",
    x = "Product Category",
    y = "Number of Coupons Redeemed"
  ) +
  theme_minimal(base_size = 13)

The results have a massively different breakdown. While some of the previous staples remain high on coupon usage, categories like makeup and hair care are suddenly in the top five items the households redeemed coupons for. This begs the question, if these coupons were so heavily redeemed, what ranks do these items hold in terms of sales at Regork?

seeking <- c("MAKEUP AND TREATMENT", "HAIR CARE PRODUCTS")
#rankings in total sales
ranks <- by_cat %>%
  group_by(hh_group) %>%
  mutate(rank = dense_rank(desc(total_sales))) %>%
  ungroup() %>%
  filter(product_category %in% seeking) %>%
  arrange(hh_group)

ranks

## # A tibble: 8 × 4
##   hh_group          product_category     total_sales  rank
##   <chr>             <chr>                      <dbl> <int>
## 1 1 Adult, Kids     HAIR CARE PRODUCTS         1229.    67
## 2 1 Adult, Kids     MAKEUP AND TREATMENT        914.    86
## 3 1 Adult, No Kids  HAIR CARE PRODUCTS         4548.    39
## 4 1 Adult, No Kids  MAKEUP AND TREATMENT       2180.    95
## 5 2 Adults, Kids    HAIR CARE PRODUCTS         4584.    37
## 6 2 Adults, Kids    MAKEUP AND TREATMENT       2229.    87
## 7 2 Adults, No Kids HAIR CARE PRODUCTS         5101.    40
## 8 2 Adults, No Kids MAKEUP AND TREATMENT       2780.    87

As we can see, despite being the top coupon category used by dual-income households with children, makeup only comes in at rank 87 for total sales. Meanwhile the hair care products with less coupon useage has a remarkable rank of 37th in these same households, despite far less coupon redemption.

So What?

To further explore this thought, we should look at which coupons are issued to which household compositions to identify the opportunity at hand.

hh_levels <- c("1 Adult Kids","1 Adult No Kids","2 Adults Kids","2 Adults No Kids")
#Joining coupons issued caused a new batch of households with unknown comps to emerge, so for the sake of integration those had to be filtered out

#separate out the top 5 for each group
issued_top5 <- campaigns %>%
  inner_join(coupons,  by = "campaign_id") %>%
  inner_join(products, by = "product_id") %>%
  inner_join(demographics, by = "household_id") %>%
  filter(household_comp %in% hh_levels,
         !is.na(product_category),
         product_category != "COUPON/MISC ITEMS") %>%
  count(household_comp, product_category, name = "coupons_issued") %>%
  mutate(household_comp = factor(household_comp, levels = hh_levels)) %>%
  group_by(household_comp) %>%
  slice_max(coupons_issued, n = 5, with_ties = FALSE) %>%

  mutate(cat_ord = forcats::fct_reorder(product_category, coupons_issued)) %>%
  ungroup()
#This is largely just to force recognition of singular categories
#so the system does not see 2 adults with kids beef as different
#from one adult with kids beef

ggplot(issued_top5,
       aes(x = coupons_issued, y = cat_ord, fill = household_comp)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ household_comp, ncol = 2) +
  scale_fill_brewer(palette = "Paired") +
  labs(
    title = "Top 5 Coupons Issued by Household Comp",
    x = "Coupons Issued", y = "Product Category"
  ) +
  theme_minimal(base_size = 13)

As we can see from this data, there is a punishing mismatch in coupons issued to household compositions vs what is redeemed. Despite the 2 Adults with Kids demographic being the largest consumer of beef coupons and is far more likely to make further purchases with additional beef coupons, this demographic is the second lowest recipient of coupons for beef. Similarly, 1 Adult No Kids receives the second-most makeup coupons despite not being a particularly remarkable consumer of such, while 2 adults with kids were shown to be the largest consumer yet again receive the second-least coupons.

Also worth noting is that despite soft drinks being a staple among all four demographics’ highest sales value for purchases, coupons for such items are not in their top issued or used. It would be worth examining if the issuing of slight coupons for such an already profitable category could potentially stimulate significant returns, as even a five percent increase in soft drink sales would rank it above beef for overall champion of the product categories for sales value.

Conclusion

Thus, it is my recommendation that Regork invests more focus into couponing these mentioned categories, as currently a great deal of surplus is being left on the table and profits are going untapped. By focusing on which demographic consumes which products with or without coupons, and targeting coupons to stimulate sales of a product the consumer already wants to buy, more product can be moved at great benefit to Regork.

Worth particular consideration are the dual-adult households, as both those with and without kids have significantly more disposable income and are thus more inclined to add-on, luxury, or even frivolous purchases. This is reflected in the data in that these two demographics were the only ones to significantly make use of coupons for luxury items like makeup and both made more than double the sales value of single adult with kids households for both makeup and hair care products. Where there is more available income to spend is worth considering for bonus coupons for additional purchases.

Final Project BANA7025

Steven Malseed