Introduction

This report explores the impact of coupon usage on customer spending at Regork. We are going to analyze whether customers who use coupons tend to spend more than those who do not and identify which coupons are most effective in driving sales volume. We will try to provide actionable insights into the value of coupons in increasing revenue, suggesting future strategies for maximum impact. To conduct the data analysis, we utilized several key libraries, including completejourney, tidyverse, dplyr, and ggplot2. Initially, we performed an overview of each dataset to gain a comprehensive understanding of their structure, which allowed us to efficiently join and manipulate the data. Our analysis focused on comparing the total spending and transaction counts of customers who used coupons versus those who did not. To ensure a more accurate comparison, we calculated the average spending per transaction, offering a clearer representation of the differences between the two groups.

Packages Required

completejourney- data sets characterizing household level transactions

tidyverse- system of packages for data manipulation, exploration and visualization

dplyr - manipulating and transforming data (i.e., filtering, joining, etc.)

ggplot2 - data visualization plotting system using “Grammar of Graphics”

knitr - dynamic report generation in R

#load libraries
library(completejourney)
library(tidyverse)
# Load datasets
transactions <- get_transactions()  # Load full transactions data
coupons <- completejourney::coupons # Load coupons
coupon_redemptions <- completejourney::coupon_redemptions  # Load coupon redemption data
#showing different ways to inspect data structure
glimpse(transactions)
## Rows: 1,469,307
## Columns: 11
## $ household_id          <chr> "900", "900", "1228", "906", "906", "906", "906"…
## $ store_id              <chr> "330", "330", "406", "319", "319", "319", "319",…
## $ basket_id             <chr> "31198570044", "31198570047", "31198655051", "31…
## $ product_id            <chr> "1095275", "9878513", "1041453", "1020156", "105…
## $ quantity              <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ sales_value           <dbl> 0.50, 0.99, 1.43, 1.50, 2.78, 5.49, 1.50, 1.88, …
## $ retail_disc           <dbl> 0.00, 0.10, 0.15, 0.29, 0.80, 0.50, 0.29, 0.21, …
## $ coupon_disc           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ coupon_match_disc     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ week                  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ transaction_timestamp <dttm> 2017-01-01 06:53:26, 2017-01-01 07:10:28, 2017-…
str(coupons)
## spc_tbl_ [116,204 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ coupon_upc : chr [1:116204] "10000085207" "10000085207" "10000085207" "10000085207" ...
##  $ product_id : chr [1:116204] "9676830" "9676943" "9676944" "9676947" ...
##  $ campaign_id: chr [1:116204] "26" "26" "26" "26" ...
str(coupon_redemptions)
## spc_tbl_ [2,102 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ household_id   : chr [1:2102] "1029" "1029" "165" "712" ...
##  $ coupon_upc     : chr [1:2102] "51380041013" "51380041313" "53377610033" "51380041013" ...
##  $ campaign_id    : chr [1:2102] "26" "26" "26" "26" ...
##  $ redemption_date: Date[1:2102], format: "2017-01-01" "2017-01-01" ...
# checking for any missing values in transactions data set
sum(is.na(transactions))
## [1] 0
sum(is.na(coupons))
## [1] 0
sum(is.na(coupon_redemptions))
## [1] 0
transactions_with_coupons <- transactions %>%
  left_join(coupon_redemptions, by = "household_id") %>%
  mutate(coupon_used = ifelse(!is.na(coupon_upc), 1, 0))

# Check the joined data
glimpse(transactions_with_coupons)
## Rows: 3,970,675
## Columns: 15
## $ household_id          <chr> "900", "900", "1228", "1228", "1228", "1228", "1…
## $ store_id              <chr> "330", "330", "406", "406", "406", "406", "406",…
## $ basket_id             <chr> "31198570044", "31198570047", "31198655051", "31…
## $ product_id            <chr> "1095275", "9878513", "1041453", "1041453", "104…
## $ quantity              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, …
## $ sales_value           <dbl> 0.50, 0.99, 1.43, 1.43, 1.43, 1.43, 1.43, 1.43, …
## $ retail_disc           <dbl> 0.00, 0.10, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, …
## $ coupon_disc           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ coupon_match_disc     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ week                  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ transaction_timestamp <dttm> 2017-01-01 06:53:26, 2017-01-01 07:10:28, 2017-…
## $ coupon_upc            <chr> NA, NA, "10000085363", "10000085429", "550000130…
## $ campaign_id           <chr> NA, NA, "8", "13", "13", "18", "18", "18", NA, N…
## $ redemption_date       <date> NA, NA, 2017-05-22, 2017-09-12, 2017-09-12, 201…
## $ coupon_used           <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
# Summarize total spending by coupon usage
spending_summary <- transactions_with_coupons %>%
  group_by(coupon_used) %>%
  summarise(
    total_spending = sum(sales_value, na.rm = TRUE),  # Total sales
    avg_spending_per_basket = mean(sales_value, na.rm = TRUE),  # Average spending per basket
    transaction_count = n()  # Number of transactions
  )

# View the summary
print(spending_summary)
## # A tibble: 2 × 4
##   coupon_used total_spending avg_spending_per_basket transaction_count
##         <dbl>          <dbl>                   <dbl>             <int>
## 1           0       3001246.                    3.11            966179
## 2           1       9652022.                    3.21           3004496

Coupon Effectiveness: Holiday vs. Non-Holiday Periods

The analysis compares the total spending by customers who used coupons during the holiday season and non-holiday periods. The bar chart highlights that customers using coupons (blue bars) consistently spent more than non-coupon users (red bars) in both periods. During non-holiday periods, coupon users drove a significant amount of total spending, over $8.7 million, compared to $2.7 million for non-coupon users. In contrast, during the holiday season, coupon users contributed just under $1 million, while non-coupon users spent $271,968. Interestingly, the average spending per basket was slightly higher during the holiday season for both coupon and non-coupon users, indicating that holiday shoppers may make larger purchases in individual transactions, but overall spending is much higher outside of holiday periods.

This suggests that while coupons are effective year-round, they are particularly impactful during non-holiday periods when customers may be more influenced by promotions or everyday purchases. Regork could consider optimizing coupon strategies during non-holiday times, where the return on investment appears to be higher, while maintaining holiday-specific offers to capture seasonal spending behavior.

# Summarize coupon effectiveness by holiday periods
coupon_holiday_effectiveness <- transactions_with_coupons %>%
  group_by(holiday, coupon_used) %>%
  summarise(
    total_spending = sum(sales_value, na.rm = TRUE),
    avg_spending_per_basket = mean(sales_value, na.rm = TRUE),
    redemption_count = n()
  ) %>%
  arrange(desc(total_spending))

# View the summarized data
print(coupon_holiday_effectiveness)
## # A tibble: 4 × 5
## # Groups:   holiday [2]
##   holiday     coupon_used total_spending avg_spending_per_bas…¹ redemption_count
##   <chr>             <dbl>          <dbl>                  <dbl>            <int>
## 1 Non-Holiday           1       8708096.                   3.19          2729369
## 2 Non-Holiday           0       2729278.                   3.10           881628
## 3 Holiday Se…           1        943926.                   3.43           275127
## 4 Holiday Se…           0        271968.                   3.22            84551
## # ℹ abbreviated name: ¹​avg_spending_per_basket
ggplot(coupon_holiday_effectiveness, aes(x = holiday, y = total_spending, fill = factor(coupon_used))) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = c("red", "blue"), labels = c("No Coupon", "Coupon Used")) +
  labs(
    title = "Coupon Effectiveness: Holiday Season vs. Non-Holiday",
    x = "Holiday Period",
    y = "Total Spending ($)",
    fill = "Coupon Usage"
  ) +
  theme_minimal()


Analyzing effectiveness of coupons by Product Departments

This analysis examines coupon usage effectiveness across different product departments. As seen in the chart, the Grocery department overwhelmingly leads in both coupon-driven spending (blue bar) and non-coupon spending (red bar), contributing over $4.7 million from coupon users alone. Departments such as Drug GM, Fuel, and Produce also show significant coupon-driven spending, with Drug GM and Fuel standing out in terms of high average spending per basket. In contrast, smaller departments such as Miscellaneous and Pastry display minimal coupon impact, suggesting that coupons have less influence in these categories. Overall, coupons are most effective in high-traffic departments like Grocery and Drug GM, where they seem to drive both higher total and average spending. This suggests that optimizing coupon strategies in these key departments could further enhance revenue, while reducing or redesigning coupon offers in departments where they have a lower impact could cut costs.

transactions <- get_transactions()
products <- completejourney::products
coupons <- completejourney::coupons
coupon_redemptions <- completejourney::coupon_redemptions

products_clean <- products %>% distinct(product_id, .keep_all = TRUE)
coupons_clean <- coupons %>% distinct(coupon_upc, .keep_all = TRUE)

# Join transactions with coupon redemptions by household_id
transactions_with_coupons <- transactions %>%
  left_join(coupon_redemptions, by = "household_id") %>%
  mutate(coupon_used = ifelse(!is.na(coupon_upc), 1, 0))  # Flag coupon usage

# Now join the resulting data with product details using product_id to get department
transactions_with_coupons <- transactions_with_coupons %>%
  left_join(products_clean, by = "product_id")

# Inspect the joined data
glimpse(transactions_with_coupons)
## Rows: 3,970,675
## Columns: 21
## $ household_id          <chr> "900", "900", "1228", "1228", "1228", "1228", "1…
## $ store_id              <chr> "330", "330", "406", "406", "406", "406", "406",…
## $ basket_id             <chr> "31198570044", "31198570047", "31198655051", "31…
## $ product_id            <chr> "1095275", "9878513", "1041453", "1041453", "104…
## $ quantity              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, …
## $ sales_value           <dbl> 0.50, 0.99, 1.43, 1.43, 1.43, 1.43, 1.43, 1.43, …
## $ retail_disc           <dbl> 0.00, 0.10, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, …
## $ coupon_disc           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ coupon_match_disc     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ week                  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ transaction_timestamp <dttm> 2017-01-01 06:53:26, 2017-01-01 07:10:28, 2017-…
## $ coupon_upc            <chr> NA, NA, "10000085363", "10000085429", "550000130…
## $ campaign_id           <chr> NA, NA, "8", "13", "13", "18", "18", "18", NA, N…
## $ redemption_date       <date> NA, NA, 2017-05-22, 2017-09-12, 2017-09-12, 201…
## $ coupon_used           <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ manufacturer_id       <chr> "2", "69", "69", "69", "69", "69", "69", "69", "…
## $ department            <chr> "PASTRY", "GROCERY", "GROCERY", "GROCERY", "GROC…
## $ brand                 <fct> National, Private, Private, Private, Private, Pr…
## $ product_category      <chr> "ROLLS", "FACIAL TISS/DNR NAPKIN", "BAG SNACKS",…
## $ product_type          <chr> "ROLLS: BAGELS", "FACIAL TISSUE & PAPER HANDKE",…
## $ package_size          <chr> "4 OZ", "85 CT", "11.5 OZ", "11.5 OZ", "11.5 OZ"…
# Summarize coupon effectiveness by product department with .groups = 'drop' to remove the warning
coupon_effectiveness_by_department <- transactions_with_coupons %>%
  group_by(department, coupon_used) %>%
  summarise(
    total_spending = sum(sales_value, na.rm = TRUE),
    avg_spending_per_basket = mean(sales_value, na.rm = TRUE),
    redemption_count = n(),
    .groups = 'drop'  # This will suppress the grouping message
  ) %>%
  arrange(desc(total_spending))

# View the summary of coupon effectiveness by department
print(coupon_effectiveness_by_department)
## # A tibble: 59 × 5
##    department coupon_used total_spending avg_spending_per_bas…¹ redemption_count
##    <chr>            <dbl>          <dbl>                  <dbl>            <int>
##  1 GROCERY              1       4717628.                   2.50          1886556
##  2 GROCERY              0       1522675.                   2.48           613966
##  3 DRUG GM              1       1214013.                   3.91           310721
##  4 FUEL                 1       1010241.                  29.1             34767
##  5 PRODUCE              1        695108.                   2.10           330865
##  6 MEAT                 1        589397.                   6.32            93289
##  7 MEAT-PCKGD           1        409560.                   3.69           110936
##  8 DRUG GM              0        396592.                   3.81           103981
##  9 DELI                 1        296191.                   4.21            70433
## 10 MISCELLAN…           1        230879.                  19.7             11711
## # ℹ 49 more rows
## # ℹ abbreviated name: ¹​avg_spending_per_basket
# Summarize coupon effectiveness by product department, filter out NA departments and no sales
coupon_effectiveness_by_department <- transactions_with_coupons %>%
  group_by(department, coupon_used) %>%
  summarise(
    total_spending = sum(sales_value, na.rm = TRUE),
    avg_spending_per_basket = mean(sales_value, na.rm = TRUE),
    redemption_count = n(),
    .groups = 'drop'
  ) %>%
  filter(!is.na(department), total_spending > 10000) %>%  # Filter out NA and departments with no sales
  arrange(desc(total_spending))

# View the filtered summary
print(coupon_effectiveness_by_department)
## # A tibble: 32 × 5
##    department coupon_used total_spending avg_spending_per_bas…¹ redemption_count
##    <chr>            <dbl>          <dbl>                  <dbl>            <int>
##  1 GROCERY              1       4717628.                   2.50          1886556
##  2 GROCERY              0       1522675.                   2.48           613966
##  3 DRUG GM              1       1214013.                   3.91           310721
##  4 FUEL                 1       1010241.                  29.1             34767
##  5 PRODUCE              1        695108.                   2.10           330865
##  6 MEAT                 1        589397.                   6.32            93289
##  7 MEAT-PCKGD           1        409560.                   3.69           110936
##  8 DRUG GM              0        396592.                   3.81           103981
##  9 DELI                 1        296191.                   4.21            70433
## 10 MISCELLAN…           1        230879.                  19.7             11711
## # ℹ 22 more rows
## # ℹ abbreviated name: ¹​avg_spending_per_basket
# Visualize total spending by department and coupon usage
ggplot(coupon_effectiveness_by_department, aes(x = reorder(department, total_spending), y = total_spending, fill = factor(coupon_used))) +
  geom_bar(stat = "identity", position = "dodge") +
  coord_flip() +
  scale_fill_manual(values = c("red", "blue"), labels = c("No Coupon", "Coupon Used")) +
  labs(
    title = "Coupon Effectiveness by Product Department",
    x = "Department",
    y = "Total Spending ($)",
    fill = "Coupon Usage"
  ) +
  theme_minimal()


Summary and Recommendations

This analysis clearly shows that coupon users at Regork consistently outspent non-coupon users, with higher total sales and average spending per transaction, particularly in key departments such as Grocery, Drug GM, and Produce. Coupons were especially effective during non-holiday periods and weekdays, driving substantial increases in spending outside traditional peak seasons. These findings highlight that coupons are a powerful tool for influencing regular shopping habits and driving consistent revenue. Given these insights, we recommend that Regork focus its coupon strategies on high-performing departments like Grocery and Drug GM, where they have proven most effective. Additionally, by offering weekday-specific promotions, Regork can capitalize on the stronger coupon-related spending seen during these times, further boosting sales during quieter shopping days. Emphasizing coupon campaigns during non-holiday periods also provides a significant opportunity to increase sales during non-seasonal times, where coupon use drives greater customer engagement. Finally, resources should be reallocated away from lower-impact departments, such as Pastry and Miscellaneous, where coupons showed minimal effect, allowing Regork to optimize marketing spend.

The CEO should accept these recommendations because they are data-driven and focused on maximizing return on investment. By concentrating efforts where coupons have the most impact, Regork can enhance overall sales, improve customer engagement, and reduce unnecessary marketing expenditures in areas that are not yielding sufficient returns. These recommendations provide actionable strategies that can lead to both immediate revenue increases and long-term customer loyalty, making them essential for Regork’s growth and competitive advantage in the market.