Introduction

Business Problem

Coupons play a critical role in shaping consumer purchasing behavior, especially across different household sizes and income levels. The business problem we explored was detecting which groups redeem the most coupons and how income and household size are influencing coupon redemption patterns.

How We Addressed the Problem

We analyzed the customer demographics and discovered which income levels as well as household size redeem the most coupons. We pieced these findings together to determine the cohort of customers that use the most coupons and the cohort that uses the least.

Importance of the Analysis

This report explores which groups based of income level and household size redeem the most amount of coupons. Understanding these trends will allow retailers to optimize their coupon strategies by targeting the right customer segments, ensuring maximum engagement and redemption.

Packages Required

The following R packages were used to run this analysis.

library(tidyverse)      
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(ggplot2)
library(dplyr)

Data Preparation

We utilized transaction data from the completejourney package and joined it with coupon and demographic data to analyze coupon redemption.

transactions <- get_transactions()
promotions <- get_promotions()

redeemed_coupons <- transactions %>%
  inner_join(coupons, by = "product_id") %>%
  filter(coupon_disc > 0)

Coupons Redemption by Income

To determine how income influences coupon redemption, we aggregated the total coupon discount redeemed by income levels.

  redeemed_demo_income <- redeemed_coupons %>%
  inner_join(demographics, by = "household_id") %>%
  select(household_id, coupon_disc, income)

coupon_by_income <- redeemed_demo_income %>%
  group_by(income) %>%
  summarize(total_redeemed = sum(coupon_disc, na.rm = TRUE)) %>%
  arrange(desc(total_redeemed))

Findings

  • Households with middle income levels (35k to 99k, with greatest volume in 50-74k range) tend to redeem the most in coupon discounts.

  • The data suggests that while lower-income households use coupons, middle-income households use redeem higher values in coupons.

ggplot(coupon_by_income, aes(x = income, y = total_redeemed, fill = income)) + 
  geom_col() + 
  labs(title = "Coupon Redemptions by Income", x = "Income Level",y = "Total Discount Redeemed" ) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_fill_brewer(palette = "Paired")

redeemed_coupons <- transactions %>%
  inner_join(coupons, by = "product_id") %>%
  filter(coupon_disc > 0)

Coupon Redemption by Household Size

Next, we examined whether larger households redeem more coupons.

## # A tibble: 5 × 2
##   household_size total_redeemed
##   <ord>                   <dbl>
## 1 2                       9826.
## 2 1                       7939.
## 3 3                       4008.
## 4 5+                      3562.
## 5 4                       2684.

Findings

  • When examining coupon redemption by household size, households with 2 members redeem more coupons compared to households with 3 or more household members. Single-person households redeem the second greatest amount in coupons.

  • This trend is not what was expected as we hypothesized that larger families would use more coupons since they would need to purchase more items, thus increasing their opportunity for coupon use.

ggplot(coupon_by_household_size, aes(x = household_size, y = total_redeemed, fill = household_size)) + 
  geom_col() + 
  labs(
    title = "Coupon Redemptions by Household Size",
    x = "Household Size",
    y = "Total Redeemed"
  ) +
   scale_fill_brewer(palette = "Paired")

Interaction Between Household Size and Income

To understand the combined impact of household size and income, we examined coupon redemptions segmented by both factors.

Findings

  • Households with 1 or 2 people and within the middle-income brackets (36k to 99k, peaking in the 50-74k bracket) exhibit the highest coupon redemption values.

  • Households with 3 or more poeple and within either a lower-income bracket (<34k) or higher income brackets (>100k) redeem the fewest coupons.

  • Households of 3 people and an income of 75-99k used the greatest number of coupons among households with 3 or more people.

ggplot(coupon_by_size_income, aes(x = income, y = total_redeemed, fill = income)) +
  geom_col() +
  facet_wrap(~ household_size, scales = "free_x") +
  labs(
    title = "Coupon Redemptions by Income within Each Household Size",
    x = "Income Level",
    y = "Total Redeemed",
    fill = "Income Level"
  ) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1)) +
  scale_fill_brewer(palette = "Paired")

Summary

Our business problem was determining which groups redeem the most coupons and how income and household size are influencing coupon redemption patterns.

Our analysis revealed key insights:

  1. Households of 1 or 2 people with a middle-income redeem the most coupons, possibly due to a combination of strategic saving, available time, and frequent shopping.

  2. Households of 4 and 5 people redeem fewer coupons than smaller household sizes regardless of income.

Recommendations

We have created 4 recommendations as a result of these findings:

  • Focus promotions on middle-income consumers who are the most engaged

  • Offer bulk discounts for larger households

  • Create premium incentives for higher-income groups

  • Improve accessibility for lower-income consumers through targeted outreach