Coupons are a very important part of many businesses, especially grocery stores. So I wanted to see who coupons should be targeted at to get the most success from them. To do this I wanted to see the rate at which different demographics use coupons as well as how much their average purchases cost. This will help me conclude what demographics use coupons as well as which ones use them effectively.
tidyverse - provides tools to manipulate and interpret data
dplyr - easy functions for transforming data
dbplyr - allows users to manipulate dat bases just like data
frames
completejourney - grocery purchase data
data.table - create enhanced data frames for easy data
manipulation
ggplot2 - data visualization package to create complex graphs
RColorBrewer - creates color pallets for better visual plots
To set this up I had to first focus on accesses the important data and putting it into a readable and usable format. To do this I used a couple of left joins in order to combine the transactions, products, and demographics data.
# Access transactions, products, and demographics data
data("transactions")
data("products")
data("demographics")
# Join data into one tabke
transactions_with_products <- transactions %>%
left_join(products, by = "product_id")
transactions_complete <- transactions_with_products %>%
left_join(demographics, by = "household_id")
I also wanted to filter just the transactions where coupons were used or where the coupon_disc was greater than 0. I could then use then to calculate the frequency of use along with the average sales values based on different demographics.
# Filter for transactions where a coupon was redeemed
transactions_with_coupons <- transactions_complete %>%
filter(coupon_disc > 0)
# Calculate frequency of coupon usage and average sales value by demographic factors
coupon_analysis <- transactions_complete %>%
group_by(age, income, kids_count, marital_status) %>%
summarise(
coupon_usage = sum(coupon_disc > 0),
total_transactions = n(),
avg_sales_value = mean(sales_value, na.rm = TRUE)
) %>%
mutate(coupon_usage_rate = coupon_usage / total_transactions)
| Variable | Type | Description |
|---|---|---|
| household_id | character | household identifier |
| store_id | character | store identifier |
| basket_id | character | basket identifier |
| project_id | character | product identifier |
| quantity | number | number of products purchase |
| sales_value | number | sales price |
| retail_disc | number | discount from retail price |
| coupon_disc | number | discount from coupon |
| coupon_match_disc | number | coupon that was used |
| week | integer | week of purchase |
| transaction_timestamp | date-time | purchase time |
| manufacturer_id | character | manufacturer of product |
| department | character | department of store |
| brand | factor | brand of product |
| product_category | character | category of product |
| product_type | character | type of product |
| package_size | character | size of product |
| age | ordered factor | age of purchaser |
| income | ordered factor | income of purchaser |
| home_ownership | ordered factor | do they own a home |
| marital_status | ordered factor | marital status |
| household_size | ordered factor | size of household |
| household_comp | ordered factor | composition of household |
| kids_count | ordered factor | number of kids in family |
To get a better understanding of my findings I also wanted to plot them throughout different graphs. In doing so I also wanted to make them look a little nicer than the standard plots. To do this I used both ggplot2 and RColorBrewer to create my graphs. Using RColorBrewer I was able to create custom color palettes for each graph to get a better view of the data. From here I made a graph to compare each demographic with the frequency of coupon usage as well as the average purchase price.
# Create a custom color palette based on the number of unique values
# Adjust for different plots
age_colors <- scale_fill_brewer(palette = "Set2", n = length(unique(coupon_analysis$age)))
income_colors <- scale_fill_brewer(palette = "Set2", n = length(unique(coupon_analysis$income)))
kids_count_colors <- scale_fill_brewer(palette = "Set2", n = length(unique(coupon_analysis$kids_count)))
marital_status_colors <- scale_fill_brewer(palette = "Set2", n = length(unique(coupon_analysis$marital_status)))
# Plot frequency of coupon usage by age
ggplot(coupon_analysis, aes(x = age, y = coupon_usage_rate, fill = factor(age))) +
geom_bar(stat = "identity") +
age_colors +
labs(title = "Coupon Usage Rate by Age", x = "Age", y = "Coupon Usage Rate") +
theme_light(base_size = 14) +
theme(legend.position = "none")
# Plot average sales value by age
ggplot(coupon_analysis, aes(x = age, y = avg_sales_value, fill = factor(age))) +
geom_bar(stat = "identity") +
age_colors +
labs(title = "Average Sales Value by Age", x = "Age", y = "Average Sales Value") +
theme_light(base_size = 14) +
theme(legend.position = "none")
# Plot frequency of coupon usage by income
ggplot(coupon_analysis, aes(x = income, y = coupon_usage_rate, fill = factor(income))) +
geom_bar(stat = "identity") +
income_colors +
labs(title = "Coupon Usage Rate by Income", x = "Income", y = "Coupon Usage Rate") +
theme_light(base_size = 14) +
theme(legend.position = "none")
# Plot average sales value by income
ggplot(coupon_analysis, aes(x = income, y = avg_sales_value, fill = factor(income))) +
geom_bar(stat = "identity") +
income_colors +
labs(title = "Average Sales Value by Income", x = "Income", y = "Average Sales Value") +
theme_light(base_size = 14) +
theme(legend.position = "none")
# Plot frequency of coupon usage by kids_count
ggplot(coupon_analysis, aes(x = factor(kids_count), y = coupon_usage_rate, fill = factor(kids_count))) +
geom_bar(stat = "identity") +
kids_count_colors +
labs(title = "Coupon Usage Rate by Kids Count", x = "Kids Count", y = "Coupon Usage Rate") +
theme_light(base_size = 14) +
theme(legend.position = "none")
# Plot average sales value by kids_count
ggplot(coupon_analysis, aes(x = factor(kids_count), y = avg_sales_value, fill = factor(kids_count))) +
geom_bar(stat = "identity") +
kids_count_colors +
labs(title = "Average Sales Value by Kids Count", x = "Kids Count", y = "Average Sales Value") +
theme_light(base_size = 14) +
theme(legend.position = "none")
# Plot frequency of coupon usage by marital_status
ggplot(coupon_analysis, aes(x = marital_status, y = coupon_usage_rate, fill = marital_status)) +
geom_bar(stat = "identity") +
marital_status_colors +
labs(title = "Coupon Usage Rate by Marital Status", x = "Marital Status", y = "Coupon Usage Rate") +
theme_light(base_size = 14) +
theme(legend.position = "none")
# Plot average sales value by marital_status
ggplot(coupon_analysis, aes(x = marital_status, y = avg_sales_value, fill = marital_status)) +
geom_bar(stat = "identity") +
marital_status_colors +
labs(title = "Average Sales Value by Marital Status", x = "Marital Status", y = "Average Sales Value") +
theme_light(base_size = 14) +
theme(legend.position = "none")
This data doesn’t really provide much and is pretty straight forward. You would expect grocery bills to slowly increase after college years as people are promoted and start to make money. You would also expect it to slowly fall once kids are out of the house and adults are just providing for themselves.
Based on the last plot this also makes a lot of sense. Purchasing
prices go up as people get older and along with that they have more
coupons to use. One distinct difference I do see is that coupons are
definitely used at a higher rate in the 25-34 category. This makes sense
as people fresh out of college are probably trying to buy more real food
but may still not have the money to afford everything. I was also
surprised at the low usage from the 65+ category as I think the elderly
are who we typically view as coupon users.
This graph is actually quite interesting. I would expect the sales
values to increase as income increases until reaching a point where it
flat lines. This makes sense because at some point buying more and more
groceries no longer makes much sense so you would spend the same amount
as you had previously. However, we can see that the average price
increases before reaching a maximum at 50K-74K and then taking an
immediate drop off. This is very surprising because I can’t quite
understand why it would take such a dip immediately at 75K. However, it
does show that as a grocery store our primary focus should be on
customers who are making less than 75K a year.
Once again we can see a lot of similarities between the coupon and
income graphs with a few differences here and there. To no ones surprise
we see very little usage by people who make over 100K. It makes sense
that you wouldn’t be nearly as focused on saving money at that point.
What is interesting is the drop off from the maximum is not nearly as
steep. In fact, this time the 75K-99K range is the second highest rate
of coupon usage. Combined with the oddly low grocery bill this is very
interesting. It appears that this might be the range of people who are
most effective with their coupon usage and might not be the group of
people we want to target.
This graph is incredibly surprising. Why would the sales value be
higher when you have less kids? Now the best guess I can make is that
with kids you are more likely to make more grocery store trips in a
week. This would cause individual trips to be of less money. However,
people with no kids making $400 purchases is very surprising none the
less.
From this graph the main takeaway is that kids have little to no
effect on your likelihood of using coupons. While this is surprising
there is definitely a higher ratio of coupon use to purchase price for
families with 2 or 3 kids.
There isn’t too much to takeaway from this graph as married couples
have around double the sales value of unmarried individuals.
Here we can see that married couples are much more likely to use coupons. I believe this is mostly due to them being more likely to have kids and thus having high coupon to sales price ratios. It is however surprising that the coupon usage is so low for unmarried people.
After doing this analysis there are a couple conclusions for retailers to consider. To start let’s go over the age demographic. Here, retailers should target people from age 25-54. These categories have both the highest rate of coupon usage and the largest average bills. Moving on to income, retailers should target those who make less than 75k. They should especially put a focus on the 15K-24K demographic as they have an oddly low usage and could benefit from using more. They should also avoid the 74K-99K group because this group is much more effective with their coupon usage. Looking at the useage by kids retailers could benefit from targeting households with multiple kids. From these findings they have a very low coupon usage and that is a group that would benefit from having more access. Similarly, they could also focus on targeting unmarried people as they may be more likely to need these coupons and don’t often use them. I was honestly a little disappointed in my findings, I thought I would find more definitive patterns that could be used by retailers to increase their profits. However, these findings were still interesting and should be considered for future promotions.