In an ever-changing and volatile market, growth opportunities are vitally important to seize and take advantage of. With this data I intend to present Regork with a major opportunity that is sure to give great returns on investment.
Coupons provide time-tested incentives for consumers to spend and
purchase products they would otherwise ignore. While combing the
transaction data, I noticed a significant mismatch between coupons
issued vs redeemed as well as which household compositions were
redeeming which coupons.
With more focused coupon campaigns and better application of the
economic principles of price discrimination, Regork could capture a
great deal more of the consumer surplus and thus collect on a greater
amount of untapped profits.
| Library | Purpose |
|---|---|
tidyverse |
Packages for tidying the data |
tidytext |
To help tidy labels |
scales |
For graph aesthetic mapping |
knitr |
Compile results to HTML |
glue |
String helpers |
completejourney |
The data in question |
Our first step requires looking at the different household compositions. We use the completejourney package to examine the transactions separated by household compositions, and look at the top 5 product category purchased by each of the four: 1 Adult No Kids, 2 Adults No Kids, 1 Adult with Kids, 2 Adults with Kids.
joined <- transactions %>%
left_join(products, by = "product_id") %>%
left_join(demographics, by = "household_id") %>%
mutate(hh_group = case_when(
grepl("2 Adults No Kids", household_comp, ignore.case = TRUE) ~ "2 Adults, No Kids",
grepl("2 Adults Kids", household_comp, ignore.case = TRUE) ~ "2 Adults, Kids",
grepl("1 Adult No Kids",household_comp, ignore.case = TRUE) ~ "1 Adult, No Kids",
grepl("1 Adult Kids", household_comp, ignore.case = TRUE) ~ "1 Adult, Kids",
TRUE ~ NA_character_
)) %>%
filter(!is.na(hh_group), !is.na(product_category), product_category != "COUPON/MISC ITEMS")
#We disinclude Coupon/misc items as these items are
#either already couponed or ineligible for coupons
by_cat <- joined %>%
group_by(hh_group, product_category) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop")
top5_each <- by_cat %>%
group_by(hh_group) %>%
slice_max(total_sales, n = 5, with_ties = FALSE) %>%
ungroup()
#breakdown of the top 5 items by sale value
ggplot(top5_each,
aes(x = reorder_within(product_category, total_sales, hh_group),
y = total_sales,
fill = hh_group)) +
geom_col(show.legend = FALSE) +
coord_flip() +
facet_wrap(~ hh_group, scales = "free_y", ncol = 2) +
scale_x_reordered() +
scale_fill_brewer(palette = "Paired") +
labs(
title = "Top 5 Categories by Household Comp",
x = "Product Category",
y = "Total Sales ($)"
) +
theme_minimal(base_size = 13) +
theme(
strip.background = element_rect(fill = "#f0f0f0", color = NA),
strip.text = element_text(face = "bold"),
plot.title = element_text(face = "bold", hjust = 0.5)
)
As we can see from this data, all four demographics show similar splits among top five categories for sales volume. This is not unexpected as the items in question are daily consumption items like groceries, dairy, soft drinks, frozen food, meats, etc. But let’s take a look at income among the various compositions:
demo4 <- demographics %>%
mutate(hh_norm = str_squish(str_to_lower(household_comp)),
hh_group = case_when(
str_detect(hh_norm, "^1\\s*adult\\s*no\\s*kids$") ~ "1 Adult No Kids",
str_detect(hh_norm, "^1\\s*adult.*kids$") ~ "1 Adult Kids",
str_detect(hh_norm, "^2\\s*adults\\s*no\\s*kids$") ~ "2 Adults No Kids",
str_detect(hh_norm, "^2\\s*adults.*kids$") ~ "2 Adults Kids",
TRUE ~ NA_character_
)) %>% #helps to read strings, something spicy was happening when importing
filter(!is.na(hh_group), !is.na(income))
income_levels <- c("Under 15K","15-24K","25-34K","35-49K","50-74K",
"75-99K","100-124K","125-149K","150-174K","175-199K",
"200-249K","250K+")
demo4 <- demo4 %>%
mutate(income = factor(income, levels = income_levels))
ggplot(demo4, aes(x = income, fill = income)) +
geom_bar() +
coord_flip() +
facet_wrap(~ hh_group, ncol = 2) +
labs(title = "Income Distribution by Household Composition",
x = "Income", y = "Households") +
scale_fill_brewer(palette = "Paired") +
theme_minimal(base_size = 13) +
theme(legend.position = "none",
strip.text = element_text(face = "bold"))
We can see that the income distribution heavily favors dual-income adult households with the highest concentrations of >$100k incomes, and particularly the “2 Adults No Kids” section has the highest concentration of $200k and higher. This is also not unexpected, as dual-income no kid (colloquially referred to as DINKs) households have earned a reputation for having significant disposable income. But then, when we look at coupon usage:
data("demographics")
data("products")
data("coupon_redemptions")
data("coupons")
# Join redemptions to coupon details and demographics
coupon_data <- coupon_redemptions %>%
left_join(coupons, by = "coupon_upc") %>%
left_join(products, by = "product_id") %>%
left_join(demographics, by = "household_id") %>%
filter(!is.na(household_comp))
coupon_data <- coupon_data %>%
#Gives more legible labels and prevents reading issues
mutate(hh_group = case_when(
grepl("2 Adults No Kids", household_comp, ignore.case = TRUE) ~ "2 Adults, No Kids",
grepl("2 Adults Kids", household_comp, ignore.case = TRUE) ~ "2 Adults, Kids",
grepl("1 Adult No Kids",household_comp, ignore.case = TRUE) ~ "1 Adult, No Kids",
grepl("1 Adult Kids", household_comp, ignore.case = TRUE) ~ "1 Adult, Kids",
TRUE ~ NA_character_
)) %>%
filter(!is.na(hh_group)) %>%
mutate(hh_group = factor(
hh_group,
levels = c("1 Adult, No Kids", "1 Adult, Kids", "2 Adults, No Kids", "2 Adults, Kids")
))
coupon_summary <- coupon_data %>%
group_by(hh_group, product_category) %>%
summarise(redemptions = n(), .groups = "drop")
top5_coupons <- coupon_summary %>%
group_by(hh_group) %>%
slice_max(redemptions, n = 5, with_ties = FALSE) %>%
ungroup()
#separate and plot the top 5 coupons used for each group
ggplot(top5_coupons,
aes(x = reorder_within(product_category, redemptions, hh_group),
y = redemptions,
fill = hh_group)) +
geom_col(show.legend = FALSE) +
coord_flip() +
facet_wrap(~ hh_group, ncol = 2, scales = "free_y") +
scale_x_reordered() +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Top 5 Coupon Categories Redeemed by Household Type",
x = "Product Category",
y = "Number of Coupons Redeemed"
) +
theme_minimal(base_size = 13)
The results have a massively different breakdown. While some of the previous staples remain high on coupon usage, categories like makeup and hair care are suddenly in the top five items the households redeemed coupons for. This begs the question, if these coupons were so heavily redeemed, what ranks do these items hold in terms of sales at Regork?
seeking <- c("MAKEUP AND TREATMENT", "HAIR CARE PRODUCTS")
#rankings in total sales
ranks <- by_cat %>%
group_by(hh_group) %>%
mutate(rank = dense_rank(desc(total_sales))) %>%
ungroup() %>%
filter(product_category %in% seeking) %>%
arrange(hh_group)
ranks
## # A tibble: 8 Ă— 4
## hh_group product_category total_sales rank
## <chr> <chr> <dbl> <int>
## 1 1 Adult, Kids HAIR CARE PRODUCTS 1229. 67
## 2 1 Adult, Kids MAKEUP AND TREATMENT 914. 86
## 3 1 Adult, No Kids HAIR CARE PRODUCTS 4548. 39
## 4 1 Adult, No Kids MAKEUP AND TREATMENT 2180. 95
## 5 2 Adults, Kids HAIR CARE PRODUCTS 4584. 37
## 6 2 Adults, Kids MAKEUP AND TREATMENT 2229. 87
## 7 2 Adults, No Kids HAIR CARE PRODUCTS 5101. 40
## 8 2 Adults, No Kids MAKEUP AND TREATMENT 2780. 87
As we can see, despite being the top coupon category used by dual-income households with children, makeup only comes in at rank 87 for total sales. Meanwhile the hair care products with less coupon useage has a remarkable rank of 37th in these same households, despite far less coupon redemption.
To further explore this thought, we should look at which coupons are issued to which household compositions to identify the opportunity at hand.
hh_levels <- c("1 Adult Kids","1 Adult No Kids","2 Adults Kids","2 Adults No Kids")
#Joining coupons issued caused a new batch of households with unknown comps to emerge, so for the sake of integration those had to be filtered out
#separate out the top 5 for each group
issued_top5 <- campaigns %>%
inner_join(coupons, by = "campaign_id") %>%
inner_join(products, by = "product_id") %>%
inner_join(demographics, by = "household_id") %>%
filter(household_comp %in% hh_levels,
!is.na(product_category),
product_category != "COUPON/MISC ITEMS") %>%
count(household_comp, product_category, name = "coupons_issued") %>%
mutate(household_comp = factor(household_comp, levels = hh_levels)) %>%
group_by(household_comp) %>%
slice_max(coupons_issued, n = 5, with_ties = FALSE) %>%
mutate(cat_ord = forcats::fct_reorder(product_category, coupons_issued)) %>%
ungroup()
#This is largely just to force recognition of singular categories
#so the system does not see 2 adults with kids beef as different
#from one adult with kids beef
ggplot(issued_top5,
aes(x = coupons_issued, y = cat_ord, fill = household_comp)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ household_comp, ncol = 2) +
scale_fill_brewer(palette = "Paired") +
labs(
title = "Top 5 Coupons Issued by Household Comp",
x = "Coupons Issued", y = "Product Category"
) +
theme_minimal(base_size = 13)
As we can see from this data, there is a punishing mismatch in
coupons issued to household compositions vs what is redeemed. Despite
the 2 Adults with Kids demographic being the largest consumer of beef
coupons and is far more likely to make further purchases with additional
beef coupons, this demographic is the second lowest recipient of coupons
for beef. Similarly, 1 Adult No Kids receives the second-most makeup
coupons despite not being a particularly remarkable consumer of such,
while 2 adults with kids were shown to be the largest consumer yet again
receive the second-least coupons.
Also worth noting is that despite soft drinks being a staple among
all four demographics’ highest sales value for purchases, coupons for
such items are not in their top issued or used. It would be worth
examining if the issuing of slight coupons for such an already
profitable category could potentially stimulate significant returns, as
even a five percent increase in soft drink sales would rank it above
beef for overall champion of the product categories for sales
value.
Thus, it is my recommendation that Regork invests more focus into
couponing these mentioned categories, as currently a great deal of
surplus is being left on the table and profits are going untapped. By
focusing on which demographic consumes which products with or without
coupons, and targeting coupons to stimulate sales of a product the
consumer already wants to buy, more product can be moved at great
benefit to Regork.
Worth particular consideration are the dual-adult households, as
both those with and without kids have significantly more disposable
income and are thus more inclined to add-on, luxury, or even frivolous
purchases. This is reflected in the data in that these two demographics
were the only ones to significantly make use of coupons for luxury items
like makeup and both made more than double the sales value of single
adult with kids households for both makeup and hair care products. Where
there is more available income to spend is worth considering for bonus
coupons for additional purchases.