Idea

Coupons are a very important part of many businesses, especially grocery stores. So I wanted to see who coupons should be targeted at to get the most success from them. To do this I wanted to see the rate at which different demographics use coupons as well as how much their average purchases cost. This will help me conclude what demographics use coupons as well as which ones use them effectively.

Necessary Packages

tidyverse - provides tools to manipulate and interpret data
dplyr - easy functions for transforming data
dbplyr - allows users to manipulate dat bases just like data frames
completejourney - grocery purchase data
data.table - create enhanced data frames for easy data manipulation
ggplot2 - data visualization package to create complex graphs
RColorBrewer - creates color pallets for better visual plots

Setup

To set this up I had to first focus on accesses the important data and putting it into a readable and usable format. To do this I used a couple of left joins in order to combine the transactions, products, and demographics data.

# Access transactions, products, and demographics data
data("transactions")
data("products")
data("demographics")
# Join data into one tabke
transactions_with_products <- transactions %>%
  left_join(products, by = "product_id")
transactions_complete <- transactions_with_products %>%
  left_join(demographics, by = "household_id")

I also wanted to filter just the transactions where coupons were used or where the coupon_disc was greater than 0. I could then use then to calculate the frequency of use along with the average sales values based on different demographics.

# Filter for transactions where a coupon was redeemed
transactions_with_coupons <- transactions_complete %>%
  filter(coupon_disc > 0)

# Calculate frequency of coupon usage and average sales value by demographic factors
coupon_analysis <- transactions_complete %>%
  group_by(age, income, kids_count, marital_status) %>%
  summarise(
    coupon_usage = sum(coupon_disc > 0),
    total_transactions = n(),
    avg_sales_value = mean(sales_value, na.rm = TRUE)
  ) %>%
  mutate(coupon_usage_rate = coupon_usage / total_transactions)

Data

Variable Type Description
household_id character household identifier
store_id character store identifier
basket_id character basket identifier
project_id character product identifier
quantity number number of products purchase
sales_value number sales price
retail_disc number discount from retail price
coupon_disc number discount from coupon
coupon_match_disc number coupon that was used
week integer week of purchase
transaction_timestamp date-time purchase time
manufacturer_id character manufacturer of product
department character department of store
brand factor brand of product
product_category character category of product
product_type character type of product
package_size character size of product
age ordered factor age of purchaser
income ordered factor income of purchaser
home_ownership ordered factor do they own a home
marital_status ordered factor marital status
household_size ordered factor size of household
household_comp ordered factor composition of household
kids_count ordered factor number of kids in family

Plotting Findings

To get a better understanding of my findings I also wanted to plot them throughout different graphs. In doing so I also wanted to make them look a little nicer than the standard plots. To do this I used both ggplot2 and RColorBrewer to create my graphs. Using RColorBrewer I was able to create custom color palettes for each graph to get a better view of the data. From here I made a graph to compare each demographic with the frequency of coupon usage as well as the average purchase price.

# Create a custom color palette based on the number of unique values
# Adjust for different plots
age_colors <- scale_fill_brewer(palette = "Set2", n = length(unique(coupon_analysis$age)))
income_colors <- scale_fill_brewer(palette = "Set2", n = length(unique(coupon_analysis$income)))
kids_count_colors <- scale_fill_brewer(palette = "Set2", n = length(unique(coupon_analysis$kids_count)))
marital_status_colors <- scale_fill_brewer(palette = "Set2", n = length(unique(coupon_analysis$marital_status)))

# Plot frequency of coupon usage by age
ggplot(coupon_analysis, aes(x = age, y = coupon_usage_rate, fill = factor(age))) +
  geom_bar(stat = "identity") +
  age_colors +
  labs(title = "Coupon Usage Rate by Age", x = "Age", y = "Coupon Usage Rate") +
  theme_light(base_size = 14) +
  theme(legend.position = "none")

# Plot average sales value by age
ggplot(coupon_analysis, aes(x = age, y = avg_sales_value, fill = factor(age))) +
  geom_bar(stat = "identity") +
  age_colors +
  labs(title = "Average Sales Value by Age", x = "Age", y = "Average Sales Value") +
  theme_light(base_size = 14) +
  theme(legend.position = "none")

# Plot frequency of coupon usage by income
ggplot(coupon_analysis, aes(x = income, y = coupon_usage_rate, fill = factor(income))) +
  geom_bar(stat = "identity") +
  income_colors +
  labs(title = "Coupon Usage Rate by Income", x = "Income", y = "Coupon Usage Rate") +
  theme_light(base_size = 14) +
  theme(legend.position = "none")

# Plot average sales value by income
ggplot(coupon_analysis, aes(x = income, y = avg_sales_value, fill = factor(income))) +
  geom_bar(stat = "identity") +
  income_colors +
  labs(title = "Average Sales Value by Income", x = "Income", y = "Average Sales Value") +
  theme_light(base_size = 14) +
  theme(legend.position = "none")

# Plot frequency of coupon usage by kids_count
ggplot(coupon_analysis, aes(x = factor(kids_count), y = coupon_usage_rate, fill = factor(kids_count))) +
  geom_bar(stat = "identity") +
  kids_count_colors +
  labs(title = "Coupon Usage Rate by Kids Count", x = "Kids Count", y = "Coupon Usage Rate") +
  theme_light(base_size = 14) +
  theme(legend.position = "none")

# Plot average sales value by kids_count
ggplot(coupon_analysis, aes(x = factor(kids_count), y = avg_sales_value, fill = factor(kids_count))) +
  geom_bar(stat = "identity") +
  kids_count_colors +
  labs(title = "Average Sales Value by Kids Count", x = "Kids Count", y = "Average Sales Value") +
  theme_light(base_size = 14) +
  theme(legend.position = "none")

# Plot frequency of coupon usage by marital_status
ggplot(coupon_analysis, aes(x = marital_status, y = coupon_usage_rate, fill = marital_status)) +
  geom_bar(stat = "identity") +
  marital_status_colors +
  labs(title = "Coupon Usage Rate by Marital Status", x = "Marital Status", y = "Coupon Usage Rate") +
  theme_light(base_size = 14) +
  theme(legend.position = "none")

# Plot average sales value by marital_status
ggplot(coupon_analysis, aes(x = marital_status, y = avg_sales_value, fill = marital_status)) +
  geom_bar(stat = "identity") +
  marital_status_colors +
  labs(title = "Average Sales Value by Marital Status", x = "Marital Status", y = "Average Sales Value") +
  theme_light(base_size = 14) +
  theme(legend.position = "none")

Analysis

This data doesn’t really provide much and is pretty straight forward. You would expect grocery bills to slowly increase after college years as people are promoted and start to make money. You would also expect it to slowly fall once kids are out of the house and adults are just providing for themselves.

Based on the last plot this also makes a lot of sense. Purchasing prices go up as people get older and along with that they have more coupons to use. One distinct difference I do see is that coupons are definitely used at a higher rate in the 25-34 category. This makes sense as people fresh out of college are probably trying to buy more real food but may still not have the money to afford everything. I was also surprised at the low usage from the 65+ category as I think the elderly are who we typically view as coupon users.

This graph is actually quite interesting. I would expect the sales values to increase as income increases until reaching a point where it flat lines. This makes sense because at some point buying more and more groceries no longer makes much sense so you would spend the same amount as you had previously. However, we can see that the average price increases before reaching a maximum at 50K-74K and then taking an immediate drop off. This is very surprising because I can’t quite understand why it would take such a dip immediately at 75K. However, it does show that as a grocery store our primary focus should be on customers who are making less than 75K a year.

Once again we can see a lot of similarities between the coupon and income graphs with a few differences here and there. To no ones surprise we see very little usage by people who make over 100K. It makes sense that you wouldn’t be nearly as focused on saving money at that point. What is interesting is the drop off from the maximum is not nearly as steep. In fact, this time the 75K-99K range is the second highest rate of coupon usage. Combined with the oddly low grocery bill this is very interesting. It appears that this might be the range of people who are most effective with their coupon usage and might not be the group of people we want to target.

This graph is incredibly surprising. Why would the sales value be higher when you have less kids? Now the best guess I can make is that with kids you are more likely to make more grocery store trips in a week. This would cause individual trips to be of less money. However, people with no kids making $400 purchases is very surprising none the less.

From this graph the main takeaway is that kids have little to no effect on your likelihood of using coupons. While this is surprising there is definitely a higher ratio of coupon use to purchase price for families with 2 or 3 kids.

There isn’t too much to takeaway from this graph as married couples have around double the sales value of unmarried individuals.

Here we can see that married couples are much more likely to use coupons. I believe this is mostly due to them being more likely to have kids and thus having high coupon to sales price ratios. It is however surprising that the coupon usage is so low for unmarried people.

Conclusion

After doing this analysis there are a couple conclusions for retailers to consider. To start let’s go over the age demographic. Here, retailers should target people from age 25-54. These categories have both the highest rate of coupon usage and the largest average bills. Moving on to income, retailers should target those who make less than 75k. They should especially put a focus on the 15K-24K demographic as they have an oddly low usage and could benefit from using more. They should also avoid the 74K-99K group because this group is much more effective with their coupon usage. Looking at the useage by kids retailers could benefit from targeting households with multiple kids. From these findings they have a very low coupon usage and that is a group that would benefit from having more access. Similarly, they could also focus on targeting unmarried people as they may be more likely to need these coupons and don’t often use them. I was honestly a little disappointed in my findings, I thought I would find more definitive patterns that could be used by retailers to increase their profits. However, these findings were still interesting and should be considered for future promotions.