UNIVERSITY OF CINCINNATI
LINDNER COLLEGE OF BUSINESS
TEAM: Eli Bales, Andrew McCurrach, Kazuhide Watanabe
COUPON REDEMPTION AND CAMPAIGN SUCCESS
Exploring how campaign length, consumer income brackets, and product categories shape coupon utilization patterns before, during, and after promotions
1. Objective
As data scientists for Regork, a national grocery chain, our team has been tasked with identifying areas of potential growth where the company could strategically invest resources to drive revenue and profitability. This analysis focuses specifically on customer engagement with marketing campaigns, with particular attention to redemption rates.
By examining how campaign redemption rates vary across different dimensions, such as time, product category, and other relevant factors we aim to uncover meaningful patterns that can inform business strategy. Understanding these dynamics will not only provide insight into customer behavior, but also support data-driven decision making regarding promotional design, resource allocation, and long-term growth opportunities.
Through this targeted analysis, Regork will be better positioned to optimize marketing investments, strengthen customer relationships, and improve overall campaign effectiveness, ultimately contributing to sustainable increases in both revenue and profit
2. Datasource: The Complete Journey
This report will use the ‘completejourney’ which “provides access to data sets characterizing household level transactions over one year from a group of 2,469 households who are frequent shoppers at a grocery store. It contains all of each household’s purchases, not just those from a limited number of categories. For certain households, demographic information as well as direct marketing contact history are captured” (CompleteJourney).
3. Methodology
3.1 Tools & Packages
library(tidyverse)
library(completejourney)
library(dplyr)
library(lubridate)
library(ggplot2)
library(kableExtra)
library(patchwork)
library(tibble)
3.2 Data Import
transactions <- get_transactions()
promotions <- get_promotions()
4. Findings
4.1 Coupon Redemption Overall Trend
To begin, we filter transactions by finding all transactions with a coupon_disc >0, and with a quantity > 0. There are some erroneous transactions within the transactions table that have a quantity and subsequent sales value of 0. These occasionally have a coupon_disc > 0. This is obviously an error, and should be removed from our data for cleaning.
We’ll then visualize the overall trend of coupon redemptions across all campaigns.
# add transaction date for comparison later, filter out transactions with no coupon usage and no quantity (bad data)
transactions <- transactions %>%
mutate(transactions_date = as_date(transaction_timestamp)) %>%
filter((coupon_disc > 0) & (quantity > 0))
We have data for coupons that were redeemed in 2017. Looking at the overall trend for all campaigns and when coupons were redeemed, we can see there are three clear times when coupon activity peaked. May/June, August/September, and November/December.
We now want to see which campaigns have the greatest impact by counting the number of redemptions per campaign.
redemption_count_by_campaign <- coupon_redemptions %>%
group_by(campaign_id) %>%
tally() %>%
arrange(desc(n))
campaign_id | n |
---|---|
18 | 653 |
13 | 629 |
8 | 372 |
27 | 64 |
17 | 45 |
16 | 43 |
22 | 43 |
9 | 43 |
14 | 34 |
19 | 29 |
Campaign 18, 13, and 8 have the largest count, and the other campaigns do not have enough redemptions to dive deeper into analysis, so the top 3 campaigns will be used to further analyze the trends and patterns.
# filter for coupon redemptions
campaign_18 <- coupon_redemptions %>%
filter(campaign_id == 18) %>%
group_by(redemption_date) %>%
summarise(daily_redemptions = n())
campaign_13 <- coupon_redemptions %>%
filter(campaign_id == 13) %>%
group_by(redemption_date) %>%
summarise(daily_redemptions = n())
campaign_8 <- coupon_redemptions %>%
filter(campaign_id == 8) %>%
group_by(redemption_date) %>%
summarise(daily_redemptions = n())
campaign_other <- coupon_redemptions %>%
filter(campaign_id != c(8,13,18)) %>%
group_by(redemption_date) %>%
summarise(daily_redemptions = n())
#plot the trends
c18_gen <- ggplot(campaign_18, aes(x = redemption_date, y = daily_redemptions)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE, color = "steelblue") +
labs(
title = "Campaign 18",
subtitle = "Total Redemptions",
x = "",
y = "Coupon Redemptions"
) + scale_x_date(date_breaks = "1 week", date_labels = "%b %e") +
theme_bw(base_family = "serif")
c13_gen <- ggplot(campaign_13, aes(x = redemption_date, y = daily_redemptions)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE, color = "steelblue") +
labs(
title = "Campaign 13",
subtitle = "Total Redemptions",
x = "",
y = "Coupon Redemptions"
) + scale_x_date(date_breaks = "1 week", date_labels = "%b %e") +
theme_bw(base_family = "serif")
c8_gen <- ggplot(campaign_8, aes(x = redemption_date, y = daily_redemptions)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE, color = "steelblue") +
labs(
title = "Campaign 8",
subtitle = "Total Redemptions",
x = "",
y = "Coupon Redemptions"
) + scale_x_date(date_breaks = "1 week", date_labels = "%b %e") +
theme_bw(base_family = "serif")
cOther_gen <- campaign_other %>%
ggplot(aes(x = redemption_date, y = daily_redemptions)) +
geom_line() +
labs(
title = "Others",
x = "",
y = "Total Redemptions") +
theme_bw(base_family = "serif")
We see some interesting insights when analyzing the three most popular campaigns as well as all the other campaigns combined together. Campaign 18 was the most popular and was run in November/December, campaign 13 was not far behind and was run in August/September, and campaign 8 also saw a significant amount of redemptions and was run in May/June. These three promotions coincide with the most popular times coupons were redeemed throughout the year. The most popular campaign matches the highest peak in redemption activity, and the third most popular campaign matches the third highest peak in redemption activity. Going even further, we can see that all the other campaigns, which were less popular but in aggregate total a significant amount of redemptions, also follow the overall redemption pattern for the entire year. The fit is very close, and indicates that these are the three most popular times of the year that coupons are redeemed regardless of campaign promotion time or length.
Finally, when looking from the entire year’s perspective for the data we have regarding when coupons were redeemed, we can see how transactions involving these coupons compare in terms of total sales value. We see it follows a similar trend as the overall coupon redemptions, but with the value of coupons during the holiday season of November-December going a bit higher than the other peaks. This indicates that higher-value items are being redeemed at this time of the year and it is the time of the year we both redeem the most coupons as well as generate the most sales value in these transactions.
4.2 Coupon Redemptions by Top Product Categories
We now want to dive further into Campaigns 18, 13, and 8, and find out what the top products sold per campaign are. By viewing how these products are sold across the campaigns, we will gain more insight into the buying patterns of customers that the larger, general buying trends can’t show us.
# add transaction date for comparison later, filter out transactions with no coupon usage and no quantity (bad data)
transactions <- transactions %>%
mutate(transactions_date = as_date(transaction_timestamp)) %>%
filter((coupon_disc > 0) & (quantity > 0))
With transactions now cleaned, we can begin to find which transactions are related to which campaign. To achieve this, we will look specifically at campaign 18, the most successful campaign by coupons redeems. We filter the coupon_redemptions table for campaign_id == 18, and find distinct household_id and redemption_date pairs. We find these pairs for when we inner join with the transactions table, as these pairs will serve as what we join by.
# find all (household, date) pairs for campaign 18
df_unique_18 <- coupon_redemptions %>%
filter(campaign_id == 18) %>%
distinct(household_id, redemption_date)
Now, we inner join with these pairs, and keep only the transactions of households who redeemed coupons on the day they redeemed them. That way, we narrow our transactions down further to only include the transactions that used the coupons from the campaign we are looking at, as unfortunately the transactions table does not include a variable on which coupon was used for the transaction. Filtering and joining our data this way allows us to get around that.
# find all transactions for campaign 18
campaign_18_transactions <- transactions %>%
inner_join(df_unique_18, by = c("household_id" = "household_id",
"transactions_date" = "redemption_date")) %>%
inner_join(products, by = "product_id")
To finally find out what product categories sell the best per campaign, we group our campaign 18 transactions by product category, summarise to find the total sales value, and then arrange in descending order.
# find the top products by total sales value for c18
top_c18_products <- campaign_18_transactions %>%
group_by(product_category) %>%
summarise(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales))
All the previous work results in the following tibble:
product_category | total_sales |
---|---|
BATH TISSUES | 197.40 |
DIAPERS & DISPOSABLES | 182.85 |
REFRGRATD DOUGH PRODUCTS | 156.44 |
COLD CEREAL | 156.28 |
VITAMINS | 143.81 |
SOFT DRINKS | 129.90 |
LAUNDRY DETERGENTS | 124.89 |
ORAL HYGIENE PRODUCTS | 115.11 |
FROZEN PIZZA | 107.08 |
REFRGRATD JUICES/DRNKS | 96.84 |
SOUP | 94.50 |
FRZN MEAT/MEAT DINNERS | 88.87 |
SHAVING CARE PRODUCTS | 86.33 |
ICE CREAM/MILK/SHERBTS | 83.23 |
HAIR CARE PRODUCTS | 82.28 |
As you can see, for our top five products, Bath Tissues have the most sales value over the course of campaign 18, followed by Diapers & Disposables, Refrigerated Dough Products, Cold Cereal, and Vitamins. We will do the same steps for the other two campaigns, Campaign 13 and Campaign 8.
product_category | total_sales |
---|---|
BATH TISSUES | 187.27 |
DIAPERS & DISPOSABLES | 179.65 |
COLD CEREAL | 120.50 |
FROZEN PIZZA | 118.75 |
ICE CREAM/MILK/SHERBTS | 114.45 |
DOG FOODS | 104.90 |
FRZN MEAT/MEAT DINNERS | 91.73 |
SOFT DRINKS | 88.85 |
CONVENIENT BRKFST/WHLSM SNACKS | 77.55 |
HAND/BODY/FACIAL PRODUCTS | 70.43 |
SALD DRSNG/SNDWCH SPRD | 68.29 |
HOUSEHOLD CLEANG NEEDS | 65.65 |
CIGARETTES | 59.52 |
REFRGRATD DOUGH PRODUCTS | 58.55 |
CAT FOOD | 57.54 |
product_category | total_sales |
---|---|
BATH TISSUES | 185.90 |
DOG FOODS | 105.59 |
ICE CREAM/MILK/SHERBTS | 100.23 |
FROZEN PIZZA | 89.95 |
DIAPERS & DISPOSABLES | 78.54 |
COLD CEREAL | 77.06 |
COFFEE | 75.46 |
CONDIMENTS/SAUCES | 63.46 |
SOFT DRINKS | 62.37 |
YOGURT | 61.39 |
FRZN MEAT/MEAT DINNERS | 54.99 |
SOAP - LIQUID & BAR | 53.12 |
LAUNDRY DETERGENTS | 51.92 |
FRZN NOVELTIES/WTR ICE | 50.71 |
CONVENIENT BRKFST/WHLSM SNACKS | 49.98 |
Looking at each tibble, we see similarities across the different campaigns. Bath tissue is always the highest selling product. Diapers and disposables is second for Campaign 18 and 13, but drops heavily to fifth in Campaign 8. We see other common categories such as cereal, pizza, and ice creams making into the top products more than once.
Now knowing what the top products are for each category, we need to find how often these products were bought across the length of each of the different campaigns. This is a long and repetitive task, and therefore we created the function get_dfs. This function takes four parameters: n, top_products, c(ampaign)_transations, and range. N defines how many of the top products you want, in this case 5. Top_products takes in the list of top products in descending order we derived previously. C_transactions takes in the transactions from the campaign you are currently looking at. Range is a tuple, with the start and end date of the campaign.
get_dfs <- function(n, top_products, c_transactions, range) {
dfs <- list()
for (i in 1:n) {
# get the product name
product_cat <- top_products[[1]][i]
# print(product_cat)
# get the transactions of just that product type
product_transactions <- c_transactions %>%
filter(product_category == product_cat)
# summarise transactions per day
redemption <- product_transactions %>%
group_by(transactions_date) %>%
summarise(daily_redemptions = n())
# create dataframe from dates and counts
df <- data.frame(redemption[1], redemption[2])
# fill out the dataframe so days where no item was bought = 0
df <- df %>%
complete(
transactions_date = seq(range[1], range[2], by = "day"),
fill = list(daily_redemptions = 0)
)
# append dataframe of next category to list
dfs[[length(dfs) + 1]] <- df
}
return(dfs)
}
Briefly explaining the function, we begin with an empty list dfs, which will hold our data frames of product frequency data. We will return this list. We iterate with a for loop, looping n times. In this case, we could have hard-coded five iterations, but keeping the function flexible for future use is best. We first obtain the i-th top product category, and then filter our transactions to only contain transactions with this product_category. We group these transactions by transaction_date, and then summarise by count to obtain the number of items bought of that product type per day. We explicitly make a data frame out of this data, allowing us to then use the tidyr “complete” function. This function lets us “fill out” this data frame to cover any missing data. For example, say no bath tissues were bought on the fourth day of the campaign. Before using complete, our data frame would not have an observation for the fourth day of the campaign. When we graph this data, we obviously want to show when no products were bought, and therefore use the “complete” function to fill in any missing dates with a count of 0, representing no products of that type were bought with a coupon that day. This is why we need the range parameter, so we know when to start and stop filling in empty dates. We then add this dataframe to our list, and iterate again until we are finished.
To run get_dfs, we need the start and end date of each campaign. Obtaining that, we run get_dfs on each of the three campaigns’ data. This results in three lists containing five dataframes each, representing the top five products sold. These lists are stored in c18_graphs, c13_graphs, and c8_graphs.
# get the range of the three campaigns we are looking at, used for filling out df's later on
c18_range <- c(filter(campaign_descriptions, campaign_id == 18)$start_date, filter(campaign_descriptions, campaign_id == 18)$end_date)
c13_range <- c(filter(campaign_descriptions, campaign_id == 13)$start_date, filter(campaign_descriptions, campaign_id == 13)$end_date)
c8_range <- c(filter(campaign_descriptions, campaign_id == 8)$start_date, filter(campaign_descriptions, campaign_id == 8)$end_date)
c18_graphs <- get_dfs(5, top_c18_products, campaign_18_transactions, c18_range)
c13_graphs <- get_dfs(5, top_c13_products, campaign_13_transactions, c13_range)
c8_graphs <- get_dfs(5, top_c8_products, campaign_8_transactions, c8_range)
Taking a look at the first data frame in c8_graphs, representing the frequency of bath tissue sold during Campaign 8, we see no bath tissue sold on the first day of the campaign, 2017-05-08, likely due to the coupons being received that day, and then on the second day of the campaign, three households used coupons to buy bath tissue.
transactions_date | daily_redemptions |
---|---|
2017-05-08 | 0 |
2017-05-09 | 3 |
2017-05-10 | 0 |
2017-05-11 | 2 |
2017-05-12 | 1 |
2017-05-13 | 0 |
2017-05-14 | 2 |
2017-05-15 | 0 |
2017-05-16 | 3 |
2017-05-17 | 0 |
2017-05-18 | 1 |
2017-05-19 | 0 |
2017-05-20 | 0 |
2017-05-21 | 0 |
2017-05-22 | 1 |
campaign_18 <- coupon_redemptions %>%
filter(campaign_id == 18) %>%
group_by(redemption_date) %>%
summarise(daily_redemptions = n())
campaign_13 <- coupon_redemptions %>%
filter(campaign_id == 13) %>%
group_by(redemption_date) %>%
summarise(daily_redemptions = n())
campaign_8 <- coupon_redemptions %>%
filter(campaign_id == 8) %>%
group_by(redemption_date) %>%
summarise(daily_redemptions = n())
c8p <- ggplot() +
geom_line(data = c8_graphs[[1]], aes(x = transactions_date, y = daily_redemptions), color = "red") +
geom_line(data = c8_graphs[[2]], aes(x = transactions_date, y = daily_redemptions), color = "blue") +
geom_line(data = c8_graphs[[3]], aes(x = transactions_date, y = daily_redemptions), color = "cyan2") +
geom_line(data = c8_graphs[[4]], aes(x = transactions_date, y = daily_redemptions), color = "orange") +
geom_line(data = c8_graphs[[5]], aes(x = transactions_date, y = daily_redemptions), color = "purple") +
labs(
subtitle = "Top 5 Products (by sales value)",
x = "",
y = "Coupon Redemptions"
) + scale_x_date(date_breaks = "1 week", date_labels = "%b %e") +
theme_bw(base_family = "serif")
c13p <- ggplot() +
geom_line(data = c13_graphs[[1]], aes(x = transactions_date, y = daily_redemptions), color = "red") +
geom_line(data = c13_graphs[[2]], aes(x = transactions_date, y = daily_redemptions), color = "blue") +
geom_line(data = c13_graphs[[3]], aes(x = transactions_date, y = daily_redemptions), color = "cyan2") +
geom_line(data = c13_graphs[[4]], aes(x = transactions_date, y = daily_redemptions), color = "orange") +
geom_line(data = c13_graphs[[5]], aes(x = transactions_date, y = daily_redemptions), color = "purple") +
labs(
subtitle = "Top 5 Products (by sales value)",
x = "",
y = "Coupon Redemptions"
) + scale_x_date(date_breaks = "1 week", date_labels = "%b %e") +
theme_bw(base_family = "serif")
c18p <- ggplot() +
geom_line(data = c18_graphs[[1]], aes(x = transactions_date, y = daily_redemptions), color = "red") +
geom_line(data = c18_graphs[[2]], aes(x = transactions_date, y = daily_redemptions), color = "blue") +
geom_line(data = c18_graphs[[3]], aes(x = transactions_date, y = daily_redemptions), color = "cyan2") +
geom_line(data = c18_graphs[[4]], aes(x = transactions_date, y = daily_redemptions), color = "orange") +
geom_line(data = c18_graphs[[5]], aes(x = transactions_date, y = daily_redemptions), color = "purple") +
labs(
subtitle = "Top 5 Products (by sales value)",
x = "",
y = "Coupon Redemptions"
) + scale_x_date(date_breaks = "1 week", date_labels = "%b %e") +
theme_bw(base_family = "serif")
The graph for the top products for campaign 18 showcases similarities to what we saw across the general campaign 18 trend. Large spikes and larger volume early on, and then the coupon redemptions dwindle as time stretches towards the end of the campaign.
We then thought it would be interesting to see if the trends of the top five products would look even more similar if we combined their frequencies together into one line. We did so by binding the different campaign specific data frames and summarizing their daily redemptions.
combined_18_df <- bind_rows(c18_graphs[[1]], c18_graphs[[2]], c18_graphs[[3]], c18_graphs[[4]], c18_graphs[[5]]) %>%
group_by(transactions_date) %>%
summarise(daily_redemptions = sum(daily_redemptions, na.rm = TRUE))
combined_13_df <- bind_rows(c13_graphs[[1]], c13_graphs[[2]], c13_graphs[[3]], c13_graphs[[4]], c13_graphs[[5]]) %>%
group_by(transactions_date) %>%
summarise(daily_redemptions = sum(daily_redemptions, na.rm = TRUE))
combined_8_df <- bind_rows(c8_graphs[[1]], c8_graphs[[2]], c8_graphs[[3]], c8_graphs[[4]], c8_graphs[[5]]) %>%
group_by(transactions_date) %>%
summarise(daily_redemptions = sum(daily_redemptions, na.rm = TRUE))
Plotting these combined dataframes, again with a trendline, yields:
combined18p <- ggplot(data = combined_18_df, aes(x = transactions_date, y = daily_redemptions)) +
geom_line(color = "green3") +
geom_smooth(method = "lm", se = FALSE, color = "steelblue") +
labs(
subtitle = "Combined Top 5 Product Redemptions",
x = "",
y = "Coupon Redemptions"
) + scale_x_date(date_breaks = "1 week", date_labels = "%b %e") +
theme_bw(base_family = "serif")
combined13p <- ggplot(data = combined_13_df, aes(x = transactions_date, y = daily_redemptions)) +
geom_line(color = "green3") +
geom_smooth(method = "lm", se = FALSE, color = "steelblue") +
labs(
subtitle = "Combined Top 5 Product Redemptions",
x = "",
y = "Coupon Redemptions"
) + scale_x_date(date_breaks = "1 week", date_labels = "%b %e") +
theme_bw(base_family = "serif")
combined8p <- ggplot(data = combined_8_df, aes(x = transactions_date, y = daily_redemptions)) +
geom_line(color = "green3") +
geom_smooth(method = "lm", se = FALSE, color = "steelblue") +
labs(
subtitle = "Combined Top 5 Product Redemptions",
x = "",
y = "Coupon Redemptions"
) + scale_x_date(date_breaks = "1 week", date_labels = "%b %e") +
theme_bw(base_family = "serif")
The downward trendline, while still present, is less harsh compared to the general Campaign 18 trendline. We see similar graphs for the other two campaigns, but both other graphs have large spikes in the second half of the campaign, on par with the spikes in the first half in terms of total redemptions on a day. We do not really see this in the general trends, pointing to some difference between the redemption trends of the top five products versus all products in total.
Now, if we plot the total redemptions across a campaign against the two other graphs we have derived so far, we can provide a deeper analysis and hopefully find some conclusions that can optimize Regork’s campaigns. We will first look at Campaign 18’s graphs:
For our analysis, I will refer to the top (all products) as Graph 1, the graph in the middle (top five products, split) as Graph 2, and the graph in the bottom (top five products, combined) as Graph 3. In Campaign 18, by viewing Graph 1, we see the common large spike in redemptions a few days after the start of the campaign. If we look at Graph 2, we can see that only one top product followed this trend, the cyan product. The other four product types did not spike early on, and had a large portion of their redemptions occur around November 15th. This observation is mirrored if we look down to Graph 3, where the first spike of the campaign is not the largest spike, breaking from how Graph 1 looks above. Again comparing Graph 1 and Graph 3, the large spike just after December begins in Graph 3 is much smaller comparatively in Graph 1. In fact, by observing the scale of our y-axis, the redemptions for our top five products make up half of all redemptions that day, pointing to a concentration of what people are buying with coupons. If we view this same time period in Graph 2, we observe that this spike is not due to one singular item spiking, but all the items receiving a decent amount of redemptions that day (with cyan still having the most redemptions).
Moving on to Campaign 13, looking at Graph 1 and Graph 3, the trends they follow are incredibly similar, more than Campaign 18. The largest spike in coupon redemptions is not a few days after the start of the campaign, but about a week and a half after the start. Observing Graph 2, this is largely driven by the spike in redemptions by the purple and cyan products. We again see a mid-campaign spike around September 1st, but as with Campaign 18, the number of redemptions slowly dwindles as the campaign ends. Finally, we will look at Campaign 8, and then draw conclusions.
Looking at Campaign 8, we observe the same similarities between Graphs 1 and 3 as we did in the other two campaigns. For Graph 8, we actually see two early spikes, one a few days after the campaign starts, and one about a week and a half after the campaign starts. Graph 2 shows us no one product category was responsible for a large number of coupon redemptions. Then, in Graph 3, we see our mid-campaign spike fueled by one product category appear again. This time, the gold product had 5 redemptions in one day, making up over half of the coupon redemptions across all products if we look at the scale of Graph 1. This spike is almost non-existent if you were to just observe Graph 1, meaning there were almost no other coupon redemptions that day. From this analysis of the top products across a campaign, we can derive a plan to further increase redemptions in other products as well.
Throughout this analysis of the top products, it is clear one of the largest problems with campaigns and coupon redemptions is how customers stop using coupons the closer the campaign gets to its conclusion. Whether this is due to Regork stopping sending as many coupons out as time progresses, or customers have simply forgotten about or lost their coupons, we do not know. This trend could also point to customers who would be likely to use a coupon do so right away, and then they never receive another, and the redeemed coupons dwindle. Fortunately, our data may point to a solution. Looking at Graph 3 across the three campaigns, we consistently see a mid-campaign spike in redemptions that rival the spike we see at the start of the campaign. Yet this same time period in Graph 1 sees a much smaller coupon redemption spike. What this points to is a large number of customers are redeeming coupons for top products around the mid-point of the campaign. Sometimes this is driven by one particular item, such as Campaign 8, and sometimes it is driven by many products, such as Campaign 18 and 13. If we can identify why these mid-campaign spikes happen, we can hopefully replicate them throughout the campaign, and curtail the dwindling effects that are consistent across each campaign. Additionally, there were a couple occasions where one particular product or collection of products accounted for +50% of all coupon redemptions. See December 3rd from Campaign 18, and June 8th from Campaign 8. Identifying whether these spikes are due to seasonal causes, marketing causes, or another option, can point us towards optimizing coupon redemptions and therefore product purchases across our campaigns.
4.3 Coupon Redemptions by Income Level
This section examines coupon redemption behavior across different income levels to better gauge how various income brackets engage with promotional offers. By grouping households into Low, Medium, and High Income categories, we can observe how purchasing power and financial position influence coupon usage patterns and responsiveness to campaigns.
We’ll first create bins called “Low Income”, “Medium Income”, “High Income” and look into some general statistics about number of households and redemptions
demographics <- demographics %>%
mutate(income_group = case_when(
income %in% c("Under 15K", "15-24K", "25-34K", "35-49K") ~ "Low Income",
income %in% c("50-74K", "75-99K", "100-124K", "125-149K") ~ "Medium Income",
income %in% c("150-174K", "175-199K", "200-249K", "250K+") ~ "High Income",
TRUE ~ NA_character_
))
transactions <- transactions %>%
left_join(demographics %>% select(household_id, income_group),
by = "household_id")
# Create mapping of income groups to their bins
income_bins <- tibble(
income_group = c("Low Income", "Medium Income", "High Income"),
bins = c("Under 15K, 15-24K, 25-34K, 35-49K",
"50-74K, 75-99K, 100-124K, 125-149K",
"150-174K, 175-199K, 200-249K, 250K+")
)
# Household counts per income group
household_summary <- demographics %>%
filter(!is.na(income_group)) %>%
group_by(income_group) %>%
summarise(num_households = n(), .groups = "drop")
# Summarise redemptions by income group
redemption_summary <- transactions %>%
filter(coupon_disc > 0, quantity > 0, !is.na(income_group)) %>%
mutate(income_group = factor(income_group,
levels = c("Low Income", "Medium Income", "High Income"))) %>%
group_by(income_group) %>%
summarise(total_redemptions = n(), .groups = "drop") %>%
left_join(income_bins, by = "income_group") %>%
select(income_group, bins, total_redemptions)
# Join them together
combined_summary <- redemption_summary %>%
left_join(household_summary, by = "income_group")
income_group | bins | total_redemptions | num_households |
---|---|---|---|
Low Income | Under 15K, 15-24K, 25-34K, 35-49K | 4480 | 384 |
Medium Income | 50-74K, 75-99K, 100-124K, 125-149K | 7626 | 360 |
High Income | 150-174K, 175-199K, 200-249K, 250K+ | 1021 | 57 |
The function serves a similar purpose to that of the product section of this report. The function is tweaked to look into the income groups in order to obtain the right information.
get_income_graphs <- function(income, transactions, range) {
group_transactions <- transactions %>%
left_join(demographics %>% select(household_id, income_group),
by = "household_id") %>%
filter(income_group == income)
redemption <- group_transactions %>%
group_by(transactions_date) %>%
summarise(daily_redemptions = n(), .groups = "drop")
df <- data.frame(redemption[1], redemption[2])
df <- df %>%
complete(
transactions_date = seq(range[1], range[2], by = "day"),
fill = list(daily_redemptions = 0)
)
return(df)
}
We are now going to take a look at the frequency of coupon redemptions by income groups and campaign.
c8_graphs_High <- get_income_graphs(income = "High Income", campaign_8_transactions, c8_range)
c8_graphs_Med <- get_income_graphs(income = "Medium Income", campaign_8_transactions, c8_range)
c8_graphs_Low <- get_income_graphs(income = "Low Income", campaign_8_transactions, c8_range)
# bind your three dataframes and tag them
c8_all <- bind_rows(
mutate(c8_graphs_High, group = "High Income"),
mutate(c8_graphs_Med, group = "Medium Income"),
mutate(c8_graphs_Low, group = "Low Income")
)
c13_graphs_High <- get_income_graphs(income = "High Income", campaign_13_transactions, c13_range)
c13_graphs_Med <- get_income_graphs(income = "Medium Income", campaign_13_transactions, c13_range)
c13_graphs_Low <- get_income_graphs(income = "Low Income", campaign_13_transactions, c13_range)
# bind your three dataframes and tag them
c13_all <- bind_rows(
mutate(c13_graphs_High, group = "High Income"),
mutate(c13_graphs_Med, group = "Medium Income"),
mutate(c13_graphs_Low, group = "Low Income")
)
c18_graphs_High <- get_income_graphs(income = "High Income", campaign_18_transactions, c18_range)
c18_graphs_Med <- get_income_graphs(income = "Medium Income", campaign_18_transactions, c18_range)
c18_graphs_Low <- get_income_graphs(income = "Low Income", campaign_18_transactions, c18_range)
# bind your three dataframes and tag them
c18_all <- bind_rows(
mutate(c18_graphs_High, group = "High Income"),
mutate(c18_graphs_Med, group = "Medium Income"),
mutate(c18_graphs_Low, group = "Low Income")
)
c18_income_trend <- ggplot(c18_all, aes(x = transactions_date, y = daily_redemptions, color = group)) +
geom_line(alpha = 0.6) +
geom_smooth(se = FALSE, method = "loess", size = 1.2) +
scale_color_manual(values = c("High Income" = "darkred",
"Medium Income" = "steelblue",
"Low Income" = "forestgreen")) +
labs(
title = "Campaign 18: Coupon Redemption Trends by Income Group",
x = "",
y = "Daily Redemptions",
color = "Income Group" # make sure legend has a label
) +
theme_bw(base_family = "serif")
c13_income_trend <- ggplot(c13_all, aes(x = transactions_date, y = daily_redemptions, color = group)) +
geom_line(alpha = 0.6) +
geom_smooth(se = FALSE, method = "loess", size = 1.2) +
scale_color_manual(values = c("High Income" = "darkred",
"Medium Income" = "steelblue",
"Low Income" = "forestgreen")) +
labs(
title = "Campaign 13: Coupon Redemption Trends by Income Group",
x = "",
y = "Daily Redemptions",
color = "Income Group" # make sure legend has a label
) +
theme_bw(base_family = "serif")
c8_income_trend <- ggplot(c8_all, aes(x = transactions_date, y = daily_redemptions, color = group)) +
geom_line(alpha = 0.6) +
geom_smooth(se = FALSE, method = "loess", size = 1.2) +
scale_color_manual(values = c("High Income" = "darkred",
"Medium Income" = "steelblue",
"Low Income" = "forestgreen")) +
labs(
title = "Campaign 8: Coupon Redemption Trends by Income Group",
x = "",
y = "Daily Redemptions",
color = "Income Group" # make sure legend has a label
) +
theme_bw(base_family = "serif")
Across Campaigns 18, 13, and 8, Medium Income households continue to show the highest level of coupon redemption activity, with clear peaks and consistent engagement throughout each campaign period. In contrast, Low Income and High Income households exhibit far fewer redemptions, with more sporadic spikes and lower overall participation. This initial observation may suggest that Medium Income households are the most responsive to promotional campaigns, perhaps reflecting both purchasing power and motivation to use coupons effectively.
However, it is important to recognize that the Medium Income group actually represents a smaller share of households compared to the Low Income group. This means that their higher redemption totals cannot be explained simply by population size. Instead, the data suggest that Medium Income households are redeeming coupons at a disproportionately higher rate, demonstrating greater engagement per household.
To verify and better understand this difference, we turn to ratios of household counts and redemption counts. By comparing coupon redemptions relative to the number of households in each income group, we can more accurately assess which groups are most active on a per-household basis and determine whether this elevated participation among Medium Income households truly indicates stronger responsiveness to promotional efforts.
Type | Med_to_Low_Ratio |
---|---|
Household | 0.937500 |
Redemption | 1.702232 |
Interpretation of Ratios:
The results show that there are slightly fewer Medium Income households than Low Income households in the dataset, with a household ratio of about 0.94. This indicates that Medium Income households make up a smaller share of the population compared to Low Income households.
However, the redemption ratio tells a different story: Medium Income households redeemed approximately 1.70 times as many coupons as Low Income households. This is notably higher than the household ratio, which means that, despite being fewer in number, Medium Income households are contributing disproportionately more to coupon redemption activity.
If both groups were redeeming at the same rate per household, we would expect the redemption ratio to roughly match the household ratio (around 0.94). Instead, the much higher redemption ratio (1.70) shows that Medium Income households are significantly more active on a per-household basis.
From this analysis, we have been able to land on a couple insights:
First, Medium Income households have shown the highest coupon redemption rates relative to their population size, making them the most efficient and responsive target for promotions. This group represents consumers who are financially stable but still price-conscious—they value savings opportunities and are likely to respond to deals that stretch their purchasing power.
Second, although Low Income households make up a larger share of the population, their coupon redemption rate is lower than expected. Intuitively, we might expect the opposite — that households with tighter financial constraints would be more motivated to seek savings and therefore redeem more coupons. However, the data shows that this isn’t the case, raising important questions about accessibility and outreach.
Possible explanations include:
Limited exposure to promotions: Low Income consumers may not be reached as effectively by digital or targeted marketing channels.
Access barriers: Coupons might primarily be distributed through mobile apps, loyalty programs, or digital platforms that are less accessible to some households.
Product relevance: Coupons may focus on premium or non-essential items, which do not align with the shopping priorities of lower-income consumers.
Awareness and engagement gaps: Some households may be unaware of redemption opportunities or find the process cumbersome.
5. Summary
5.1 Strategic Implications
Avoid sending promotions only at the campaign launch; distribute communications strategically throughout the campaign period to maintain engagement and maximize impact.
Target products beyond the top five categories, as these leading categories tend to show smaller performance declines over the course of the campaign.
Focus Marketing and Promotional Resources on Medium Income Households
Re-evaluate Outreach to Low Income Households
5.2 Limitations of the Analysis
We have some serious concerns with regards to the quality of our data. Exploratory data analysis reveals many troubling aspects to the data we are wrangling and brings into question how accurate our data is. There are certain products in the transactions table that do not appear at all in the products table. When you take the transactions data frame and anti join it with products, there are 17 products that we are not able to account for in our products data frame accounting for nearly 5000 transactions. Some of the transactions have a quantity of 0, so we filtered these transactions out. On the other end, there are many transactions with a quantity in the tens of thousands, most of them being for gasoline. These transactions did not involve coupons so they were not a problem for our analysis, but if the quantity of products sold was in question this would certainly skew our data in a problematic way. The link between our transactions and coupons was a particular problem, since they are only linked by the foreign key of product ID. It would be helpful for future analysis of transactions to record if a coupon was used in some way in our transactional data, maybe by listing the coupon upc or a simple True/False indicating if a coupon was used. To join these data frames, we had to extract the date of the transaction as well as the date a coupon was redeemed, and also filtered the transactions for a coupon_disc > 0 to figure out which transactions involved a coupon. With a clearer link between the data sets, it would make analysis of them much easier in the future and would lead to clearer insights regarding the effectiveness of our coupon campaigns.