Introduction

The Business Problem

In this report, the question I will be addressing is “How can we optimize coupon-related profits? Who should we send coupons to and at what time?” Currently, Regork does not have a strategy for delivering coupons to specific customers at specific times.

Explanation of How Business Problem is Addressed

To help increase/optimize coupon-related profits, I am going to look at coupon redemption counts across different consumer segments and months. Tables, bar charts, and line charts were generated using coupon redemption data, demographic data, and transaction data. Additionally, the visuals shown in this report will be briefly discussed in a summary at the end of this presentation.

Explanation of how Analysis with help the Regork CEO

This will enable the CEO to conveniently compare coupon redemptions across the different consumer segments and month, which in turn will help him offset coupon delivery costs.

Packages

Importing the packages

library(completejourney)
library(tidyverse)
library(lubridate)
library(DT)

Explanation for each of the packages

completejourney:
- Package required for access to the data used for generating insights into this report
tidyverse:
- Has both ggplot2 (required for visualizations) and dplyr (required for data frame transformations and creation)
Lubridate:
- Has date-time functions required for the time-series plot used for this analysis
DT:
- Allows end user to preview dataframe in a readable/easy-to-digest fashion in an R markdown file

Exploratory Data Analysis

Pre-Requisite

Prior to looking at coupon redemptions by different customer groups and different months, going to create a couple dataframes via joining pre-existing datasources within the completejourney package.

# combining dmographics and coupon redemptions datasets
cp_demo <- demographics %>%
  inner_join(coupon_redemptions)

# combining transactions and coupon redemptions datasets
cp_trans <- get_transactions() %>%
  inner_join(coupon_redemptions)

Creation and Preview of the Needed Tables

Before creating the tables required for this analysis, going to create function to avoid repetitive code. See function for creating counts and percentages in dataframe below.

cts_pcts <- function(df, grouping) {
  df %>% group_by({{grouping}}) %>%
    summarize(redemption_count = n()) %>%
    mutate(percent_of_coupon_redemptions = 100* redemption_count/sum(redemption_count))
}

Each of the following tables will be grouped accordingly – each will have counts and percentages by their respective groupings, as well as green color-coding in top 3 rows by coupon redemptions:

Coupon Redemptions by income class
Coupon Redemptions by age
Coupon Redemptions by household size
Coupon Redemptions by month

Coupon Redemptions by income class

# create the dataframe
cps_by_income <- cts_pcts(cp_demo, income)

# preview data table with top 3 rows manually color coded in green
datatable(cps_by_income) %>% formatStyle(
  'income',
  target = 'row',
  backgroundColor = styleEqual(c('35-49K','50-74K','75-99K'), c('green'))
)

Coupon Redemptions by age

# create the dataframe
cps_by_age <- cts_pcts(cp_demo, age)

# preview data table with top 3 rows manually color coded in green
datatable(cps_by_age) %>% formatStyle(
  'age',
  target = 'row',
  backgroundColor = styleEqual(c('25-34','35-44','45-54'), c('green'))
)

Coupon Redemptions by household size

# create the dataframe
cps_by_hhs <- cts_pcts(cp_demo, household_size)

# preview data table with top 3 rows manually color coded in green
datatable(cps_by_hhs) %>% formatStyle(
  'household_size',
  target = 'row',
  backgroundColor = styleEqual(c('1','2','3'), c('green'))
)

Coupon Redemptions by Month

# create the dataframe
cps_by_month <- cts_pcts(cp_trans, month(redemption_date))

# preview data table with top 3 rows manually color coded in green
datatable(cps_by_month) %>% formatStyle(
  'month(redemption_date)',
  target = 'row',
  backgroundColor = styleEqual(c('5','8','11'), c('green'))
)

Visualizations for Each of the Coupon Redemption Groupings

Once again, the top three categories – according to their coupon redemption counts – will be color-coded in green.

Coupon Redemptions by income class

cps_by_income %>%
  ggplot(aes(x=income, y=percent_of_coupon_redemptions, fill = income)) +
  geom_col() +
  labs(
    title = "Coupon Redemptions by Income Group",
    x = "Income Group",
    y = "Percent of Total Coupon Redemptions (%)") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  theme(plot.subtitle = element_text(hjust = 0.5)) + 
  theme(plot.title = element_text(hjust = 0.5, face = "bold")) + 
  scale_fill_manual(values = c("35-49K" = "green",
                               "50-74K" = "green", 
                               "75-99K" = "green")) + 
  theme(legend.position='none')

Coupon Redemptions by age

cps_by_age %>%
  ggplot(aes(x=age, y=percent_of_coupon_redemptions, fill = age)) +
  geom_col() +
  labs(
    title = "Coupon Redemptions by Age",
    x = "Age",
    y = "Percent of Total Coupon Redemptions (%)") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  theme(plot.subtitle = element_text(hjust = 0.5)) + 
  theme(plot.title = element_text(hjust = 0.5, face = "bold")) + 
  scale_fill_manual(values = c("25-34" = "green",
                               "35-44" = "green", 
                               "45-54" = "green")) + 
  theme(legend.position='none')

Coupon Redemptions by household size

cps_by_hhs %>%
  ggplot(aes(x=household_size, y=percent_of_coupon_redemptions, fill = household_size)) +
  geom_col() +
  labs(
    title = "Coupon Redemptions by Household Size",
    x = "Household Size",
    y = "Percent of Total Coupon Redemptions (%)") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  theme(plot.subtitle = element_text(hjust = 0.5)) + 
  theme(plot.title = element_text(hjust = 0.5, face = "bold")) + 
  scale_fill_manual(values = c("1" = "green",
                               "2" = "green", 
                               "3" = "green")) + 
  theme(legend.position='none')

Coupon Redemptions by month

cps_by_month %>%
  ggplot(aes(x=`month(redemption_date)`, y=percent_of_coupon_redemptions)) +
  geom_line(color = "blue") +
  scale_x_continuous(breaks = seq(1, 12, by = 1)) + 
  labs(
    title = "Coupon Redemptions by Month",
    x = "Month (1 = January, 2 = February, ...)",
    y = "Percent of Total Coupon Redemptions (%)") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  theme(plot.subtitle = element_text(hjust = 0.5)) + 
  theme(plot.title = element_text(hjust = 0.5, face = "bold")) + 
  annotate("text", x = 5, y= 15, label = "Start of Summer!") + 
  annotate("text", x = 8, y= 23, label = "Back to School Shopping!") + 
  annotate("text", x = 11, y= 25, label = "Christmas Shopping!")

Summary

The Problem Statement summarized:

Regork could be more tactful in deciding to whom (consumer segment) and when they deliver their coupons.

How I addressed the problem statement:

Data Used (from the “completejourney” package):

get_transactions()
demographics
coupon_redemptions

Methodology

Created the following tables:
- Coupon Redemptions by Income Group
- Coupon Redemptions by Age
- Coupon Redemptions by Household Size
- Coupon Redemptions by Month
Created Graphs for each of the above tables:
- Bar Charts for coupon redemptions by each of the consumer groups w/ top three groups highlighted
- Line Chart for the coupon redemptions by month w/ annotations at the top three months

Interesting Insights

Coupon Redemptions by income class

The “middle class” income groups have the most coupon redemptions:
- 50-74k
- 75-99k
- 35-49k

Coupon Redemptions by income age

“Post-College” to “Middle Aged” age groups have the most coupon redemptions:
- 45-54
- 35-44
- 25-34

Coupon Redemptions by household size

Smaller households – 1-2 people – redeem the most coupons

Coupon Redemptions by month

Coupon redemption counts are highest in the months of May, August, and November

Implications/Recommendations

Deliver more coupons to the following consumer segments and deliver fewer to the others they were compared to:
- “Middle Class” income households
- Post-College to Middle-Aged
- Smaller Households (1-2 People)
Deliver more coupons during the following months and deliver fewer during the other months:
- May
- August
- November
The hunches behind the “high coupon redemption” months:
- May: Start of Summer!
- August: Back to School!
- November: Christmas Shopping!

While I do not have exact odds/probabilities that a coupon would be redeemed within a given month or be redeemed by a given consumer segment, I think it is safe to assume that these percentages indicate the relative likelihoods.

Limitations of the Analysis

Limitations:

We only had 2017 data for this analysis – the trends/patterns identified could have changed since then
- Example:
  - Coupon Redemptions for 2020 could have been highest in March instead of November
Did not have exact figures for a couple of the attributes I looked at. Could have looked at these figures in greater detail if I had exact figures intead of “buckets”/approximations
- Incomes/Salaries
- Ages

How to Improve upon limitations:

In the future, the datasets could cover the relevant information for the 2017-2020 timeframe, as opposed to just the 2017 timeframe
In the future, the income/salaries and ages listed could be exact figures, as opposed to bins/ranges

Benjamin Harris – Final Project – Report

Benjamin Harris

2024-02-22

Introduction

The Business Problem

Explanation of How Business Problem is Addressed

Explanation of how Analysis with help the Regork CEO

Packages

Importing the packages

Explanation for each of the packages

Exploratory Data Analysis

Pre-Requisite

Creation and Preview of the Needed Tables

Visualizations for Each of the Coupon Redemption Groupings

Summary

The Problem Statement summarized:

How I addressed the problem statement:

Interesting Insights

Implications/Recommendations

Limitations of the Analysis