This report explores the impact of coupon usage on customer spending at Regork. We are going to analyze whether customers who use coupons tend to spend more than those who do not and identify which coupons are most effective in driving sales volume. We will try to provide actionable insights into the value of coupons in increasing revenue, suggesting future strategies for maximum impact. To conduct the data analysis, we utilized several key libraries, including completejourney, tidyverse, dplyr, and ggplot2. Initially, we performed an overview of each dataset to gain a comprehensive understanding of their structure, which allowed us to efficiently join and manipulate the data. Our analysis focused on comparing the total spending and transaction counts of customers who used coupons versus those who did not. To ensure a more accurate comparison, we calculated the average spending per transaction, offering a clearer representation of the differences between the two groups.
completejourney- data sets characterizing household level transactions
tidyverse- system of packages for data manipulation, exploration and visualization
dplyr - manipulating and transforming data (i.e., filtering, joining, etc.)
ggplot2 - data visualization plotting system using “Grammar of Graphics”
knitr - dynamic report generation in R
#load libraries
library(completejourney)
library(tidyverse)
# Load datasets
transactions <- get_transactions() # Load full transactions data
coupons <- completejourney::coupons # Load coupons
coupon_redemptions <- completejourney::coupon_redemptions # Load coupon redemption data
#showing different ways to inspect data structure
glimpse(transactions)
## Rows: 1,469,307
## Columns: 11
## $ household_id <chr> "900", "900", "1228", "906", "906", "906", "906"…
## $ store_id <chr> "330", "330", "406", "319", "319", "319", "319",…
## $ basket_id <chr> "31198570044", "31198570047", "31198655051", "31…
## $ product_id <chr> "1095275", "9878513", "1041453", "1020156", "105…
## $ quantity <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ sales_value <dbl> 0.50, 0.99, 1.43, 1.50, 2.78, 5.49, 1.50, 1.88, …
## $ retail_disc <dbl> 0.00, 0.10, 0.15, 0.29, 0.80, 0.50, 0.29, 0.21, …
## $ coupon_disc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ coupon_match_disc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ week <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ transaction_timestamp <dttm> 2017-01-01 06:53:26, 2017-01-01 07:10:28, 2017-…
str(coupons)
## spc_tbl_ [116,204 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ coupon_upc : chr [1:116204] "10000085207" "10000085207" "10000085207" "10000085207" ...
## $ product_id : chr [1:116204] "9676830" "9676943" "9676944" "9676947" ...
## $ campaign_id: chr [1:116204] "26" "26" "26" "26" ...
str(coupon_redemptions)
## spc_tbl_ [2,102 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ household_id : chr [1:2102] "1029" "1029" "165" "712" ...
## $ coupon_upc : chr [1:2102] "51380041013" "51380041313" "53377610033" "51380041013" ...
## $ campaign_id : chr [1:2102] "26" "26" "26" "26" ...
## $ redemption_date: Date[1:2102], format: "2017-01-01" "2017-01-01" ...
# checking for any missing values in transactions data set
sum(is.na(transactions))
## [1] 0
sum(is.na(coupons))
## [1] 0
sum(is.na(coupon_redemptions))
## [1] 0
transactions_with_coupons <- transactions %>%
left_join(coupon_redemptions, by = "household_id") %>%
mutate(coupon_used = ifelse(!is.na(coupon_upc), 1, 0))
# Check the joined data
glimpse(transactions_with_coupons)
## Rows: 3,970,675
## Columns: 15
## $ household_id <chr> "900", "900", "1228", "1228", "1228", "1228", "1…
## $ store_id <chr> "330", "330", "406", "406", "406", "406", "406",…
## $ basket_id <chr> "31198570044", "31198570047", "31198655051", "31…
## $ product_id <chr> "1095275", "9878513", "1041453", "1041453", "104…
## $ quantity <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, …
## $ sales_value <dbl> 0.50, 0.99, 1.43, 1.43, 1.43, 1.43, 1.43, 1.43, …
## $ retail_disc <dbl> 0.00, 0.10, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, …
## $ coupon_disc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ coupon_match_disc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ week <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ transaction_timestamp <dttm> 2017-01-01 06:53:26, 2017-01-01 07:10:28, 2017-…
## $ coupon_upc <chr> NA, NA, "10000085363", "10000085429", "550000130…
## $ campaign_id <chr> NA, NA, "8", "13", "13", "18", "18", "18", NA, N…
## $ redemption_date <date> NA, NA, 2017-05-22, 2017-09-12, 2017-09-12, 201…
## $ coupon_used <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
# Summarize total spending by coupon usage
spending_summary <- transactions_with_coupons %>%
group_by(coupon_used) %>%
summarise(
total_spending = sum(sales_value, na.rm = TRUE), # Total sales
avg_spending_per_basket = mean(sales_value, na.rm = TRUE), # Average spending per basket
transaction_count = n() # Number of transactions
)
# View the summary
print(spending_summary)
## # A tibble: 2 × 4
## coupon_used total_spending avg_spending_per_basket transaction_count
## <dbl> <dbl> <dbl> <int>
## 1 0 3001246. 3.11 966179
## 2 1 9652022. 3.21 3004496
In this analysis, we explored the impact of coupon usage during weekends compared to weekdays. Using transactional data, we categorized each transaction as either a weekend or weekday based on the is_weekend variable (where 1 represents weekend transactions and 0 represents weekday transactions). We also examined whether coupons were used during each transaction.
From the bar plot, we can observe that customers who used coupons (represented by the blue bars) spent significantly more than those who did not use coupons (red bars), both during weekdays and weekends. Specifically, total spending for customers using coupons on weekdays far exceeded both non-coupon users and weekend coupon users. This suggests that coupon usage drives higher spending during weekdays, which could be attributed to higher traffic or promotional activities targeted at weekday shoppers. Additionally, while weekend shoppers who used coupons also showed a boost in spending, the overall difference between weekday and weekend coupon users was substantial, with weekday coupon users contributing over $6 million in spending compared to $3.4 million from weekend coupon users.
The data implies that coupons are effective in increasing spending regardless of the day, but their impact is particularly pronounced during weekdays. This could suggest an opportunity for Regork to optimize its coupon strategies by focusing promotions on weekdays, when customers seem more responsive to using coupons.
transactions_with_coupons <- transactions_with_coupons %>%
mutate(
date = as.Date(transaction_timestamp), # Ensure this is your date column
day_of_week = weekdays(date), # Extract the day of the week
is_weekend = ifelse(day_of_week %in% c("Saturday", "Sunday"), 1, 0), # Identify weekends
month = months(date), # Extract the month
year = format(date, "%Y"), # Extract year for potential seasonal analysis
holiday = ifelse(month == "December", "Holiday Season", "Non-Holiday") # Example of holiday flag (December as holiday season)
)
glimpse(transactions_with_coupons)
## Rows: 3,970,675
## Columns: 21
## $ household_id <chr> "900", "900", "1228", "1228", "1228", "1228", "1…
## $ store_id <chr> "330", "330", "406", "406", "406", "406", "406",…
## $ basket_id <chr> "31198570044", "31198570047", "31198655051", "31…
## $ product_id <chr> "1095275", "9878513", "1041453", "1041453", "104…
## $ quantity <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, …
## $ sales_value <dbl> 0.50, 0.99, 1.43, 1.43, 1.43, 1.43, 1.43, 1.43, …
## $ retail_disc <dbl> 0.00, 0.10, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, …
## $ coupon_disc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ coupon_match_disc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ week <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ transaction_timestamp <dttm> 2017-01-01 06:53:26, 2017-01-01 07:10:28, 2017-…
## $ coupon_upc <chr> NA, NA, "10000085363", "10000085429", "550000130…
## $ campaign_id <chr> NA, NA, "8", "13", "13", "18", "18", "18", NA, N…
## $ redemption_date <date> NA, NA, 2017-05-22, 2017-09-12, 2017-09-12, 201…
## $ coupon_used <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ date <date> 2017-01-01, 2017-01-01, 2017-01-01, 2017-01-01,…
## $ day_of_week <chr> "Sunday", "Sunday", "Sunday", "Sunday", "Sunday"…
## $ is_weekend <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ month <chr> "January", "January", "January", "January", "Jan…
## $ year <chr> "2017", "2017", "2017", "2017", "2017", "2017", …
## $ holiday <chr> "Non-Holiday", "Non-Holiday", "Non-Holiday", "No…
# Summarize coupon effectiveness by weekend usage
coupon_weekend_effectiveness <- transactions_with_coupons %>%
group_by(is_weekend, coupon_used) %>%
summarise(
total_spending = sum(sales_value, na.rm = TRUE),
avg_spending_per_basket = mean(sales_value, na.rm = TRUE),
redemption_count = n()
) %>%
arrange(desc(total_spending))
# View the summarized data
print(coupon_weekend_effectiveness)
## # A tibble: 4 × 5
## # Groups: is_weekend [2]
## is_weekend coupon_used total_spending avg_spending_per_basket redemption_count
## <dbl> <dbl> <dbl> <dbl> <int>
## 1 0 1 6218804. 3.25 1910765
## 2 1 1 3433218. 3.14 1093731
## 3 0 0 1950814. 3.10 629447
## 4 1 0 1050432. 3.12 336732
ggplot(coupon_weekend_effectiveness, aes(x = factor(is_weekend), y = total_spending, fill = factor(coupon_used))) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("red", "blue"), labels = c("No Coupon", "Coupon Used")) +
labs(
title = "Coupon Effectiveness: Weekend vs. Weekday",
x = "Weekend (1 = Yes, 0 = No)",
y = "Total Spending ($)",
fill = "Coupon Usage"
) +
theme_minimal()
The analysis compares the total spending by customers who used coupons during the holiday season and non-holiday periods. The bar chart highlights that customers using coupons (blue bars) consistently spent more than non-coupon users (red bars) in both periods. During non-holiday periods, coupon users drove a significant amount of total spending, over $8.7 million, compared to $2.7 million for non-coupon users. In contrast, during the holiday season, coupon users contributed just under $1 million, while non-coupon users spent $271,968. Interestingly, the average spending per basket was slightly higher during the holiday season for both coupon and non-coupon users, indicating that holiday shoppers may make larger purchases in individual transactions, but overall spending is much higher outside of holiday periods.
This suggests that while coupons are effective year-round, they are particularly impactful during non-holiday periods when customers may be more influenced by promotions or everyday purchases. Regork could consider optimizing coupon strategies during non-holiday times, where the return on investment appears to be higher, while maintaining holiday-specific offers to capture seasonal spending behavior.
# Summarize coupon effectiveness by holiday periods
coupon_holiday_effectiveness <- transactions_with_coupons %>%
group_by(holiday, coupon_used) %>%
summarise(
total_spending = sum(sales_value, na.rm = TRUE),
avg_spending_per_basket = mean(sales_value, na.rm = TRUE),
redemption_count = n()
) %>%
arrange(desc(total_spending))
# View the summarized data
print(coupon_holiday_effectiveness)
## # A tibble: 4 × 5
## # Groups: holiday [2]
## holiday coupon_used total_spending avg_spending_per_bas…¹ redemption_count
## <chr> <dbl> <dbl> <dbl> <int>
## 1 Non-Holiday 1 8708096. 3.19 2729369
## 2 Non-Holiday 0 2729278. 3.10 881628
## 3 Holiday Se… 1 943926. 3.43 275127
## 4 Holiday Se… 0 271968. 3.22 84551
## # ℹ abbreviated name: ¹avg_spending_per_basket
ggplot(coupon_holiday_effectiveness, aes(x = holiday, y = total_spending, fill = factor(coupon_used))) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("red", "blue"), labels = c("No Coupon", "Coupon Used")) +
labs(
title = "Coupon Effectiveness: Holiday Season vs. Non-Holiday",
x = "Holiday Period",
y = "Total Spending ($)",
fill = "Coupon Usage"
) +
theme_minimal()
This analysis examines coupon usage effectiveness across different product departments. As seen in the chart, the Grocery department overwhelmingly leads in both coupon-driven spending (blue bar) and non-coupon spending (red bar), contributing over $4.7 million from coupon users alone. Departments such as Drug GM, Fuel, and Produce also show significant coupon-driven spending, with Drug GM and Fuel standing out in terms of high average spending per basket. In contrast, smaller departments such as Miscellaneous and Pastry display minimal coupon impact, suggesting that coupons have less influence in these categories. Overall, coupons are most effective in high-traffic departments like Grocery and Drug GM, where they seem to drive both higher total and average spending. This suggests that optimizing coupon strategies in these key departments could further enhance revenue, while reducing or redesigning coupon offers in departments where they have a lower impact could cut costs.
transactions <- get_transactions()
products <- completejourney::products
coupons <- completejourney::coupons
coupon_redemptions <- completejourney::coupon_redemptions
products_clean <- products %>% distinct(product_id, .keep_all = TRUE)
coupons_clean <- coupons %>% distinct(coupon_upc, .keep_all = TRUE)
# Join transactions with coupon redemptions by household_id
transactions_with_coupons <- transactions %>%
left_join(coupon_redemptions, by = "household_id") %>%
mutate(coupon_used = ifelse(!is.na(coupon_upc), 1, 0)) # Flag coupon usage
# Now join the resulting data with product details using product_id to get department
transactions_with_coupons <- transactions_with_coupons %>%
left_join(products_clean, by = "product_id")
# Inspect the joined data
glimpse(transactions_with_coupons)
## Rows: 3,970,675
## Columns: 21
## $ household_id <chr> "900", "900", "1228", "1228", "1228", "1228", "1…
## $ store_id <chr> "330", "330", "406", "406", "406", "406", "406",…
## $ basket_id <chr> "31198570044", "31198570047", "31198655051", "31…
## $ product_id <chr> "1095275", "9878513", "1041453", "1041453", "104…
## $ quantity <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, …
## $ sales_value <dbl> 0.50, 0.99, 1.43, 1.43, 1.43, 1.43, 1.43, 1.43, …
## $ retail_disc <dbl> 0.00, 0.10, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, …
## $ coupon_disc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ coupon_match_disc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ week <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ transaction_timestamp <dttm> 2017-01-01 06:53:26, 2017-01-01 07:10:28, 2017-…
## $ coupon_upc <chr> NA, NA, "10000085363", "10000085429", "550000130…
## $ campaign_id <chr> NA, NA, "8", "13", "13", "18", "18", "18", NA, N…
## $ redemption_date <date> NA, NA, 2017-05-22, 2017-09-12, 2017-09-12, 201…
## $ coupon_used <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ manufacturer_id <chr> "2", "69", "69", "69", "69", "69", "69", "69", "…
## $ department <chr> "PASTRY", "GROCERY", "GROCERY", "GROCERY", "GROC…
## $ brand <fct> National, Private, Private, Private, Private, Pr…
## $ product_category <chr> "ROLLS", "FACIAL TISS/DNR NAPKIN", "BAG SNACKS",…
## $ product_type <chr> "ROLLS: BAGELS", "FACIAL TISSUE & PAPER HANDKE",…
## $ package_size <chr> "4 OZ", "85 CT", "11.5 OZ", "11.5 OZ", "11.5 OZ"…
# Summarize coupon effectiveness by product department with .groups = 'drop' to remove the warning
coupon_effectiveness_by_department <- transactions_with_coupons %>%
group_by(department, coupon_used) %>%
summarise(
total_spending = sum(sales_value, na.rm = TRUE),
avg_spending_per_basket = mean(sales_value, na.rm = TRUE),
redemption_count = n(),
.groups = 'drop' # This will suppress the grouping message
) %>%
arrange(desc(total_spending))
# View the summary of coupon effectiveness by department
print(coupon_effectiveness_by_department)
## # A tibble: 59 × 5
## department coupon_used total_spending avg_spending_per_bas…¹ redemption_count
## <chr> <dbl> <dbl> <dbl> <int>
## 1 GROCERY 1 4717628. 2.50 1886556
## 2 GROCERY 0 1522675. 2.48 613966
## 3 DRUG GM 1 1214013. 3.91 310721
## 4 FUEL 1 1010241. 29.1 34767
## 5 PRODUCE 1 695108. 2.10 330865
## 6 MEAT 1 589397. 6.32 93289
## 7 MEAT-PCKGD 1 409560. 3.69 110936
## 8 DRUG GM 0 396592. 3.81 103981
## 9 DELI 1 296191. 4.21 70433
## 10 MISCELLAN… 1 230879. 19.7 11711
## # ℹ 49 more rows
## # ℹ abbreviated name: ¹avg_spending_per_basket
# Summarize coupon effectiveness by product department, filter out NA departments and no sales
coupon_effectiveness_by_department <- transactions_with_coupons %>%
group_by(department, coupon_used) %>%
summarise(
total_spending = sum(sales_value, na.rm = TRUE),
avg_spending_per_basket = mean(sales_value, na.rm = TRUE),
redemption_count = n(),
.groups = 'drop'
) %>%
filter(!is.na(department), total_spending > 10000) %>% # Filter out NA and departments with no sales
arrange(desc(total_spending))
# View the filtered summary
print(coupon_effectiveness_by_department)
## # A tibble: 32 × 5
## department coupon_used total_spending avg_spending_per_bas…¹ redemption_count
## <chr> <dbl> <dbl> <dbl> <int>
## 1 GROCERY 1 4717628. 2.50 1886556
## 2 GROCERY 0 1522675. 2.48 613966
## 3 DRUG GM 1 1214013. 3.91 310721
## 4 FUEL 1 1010241. 29.1 34767
## 5 PRODUCE 1 695108. 2.10 330865
## 6 MEAT 1 589397. 6.32 93289
## 7 MEAT-PCKGD 1 409560. 3.69 110936
## 8 DRUG GM 0 396592. 3.81 103981
## 9 DELI 1 296191. 4.21 70433
## 10 MISCELLAN… 1 230879. 19.7 11711
## # ℹ 22 more rows
## # ℹ abbreviated name: ¹avg_spending_per_basket
# Visualize total spending by department and coupon usage
ggplot(coupon_effectiveness_by_department, aes(x = reorder(department, total_spending), y = total_spending, fill = factor(coupon_used))) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
scale_fill_manual(values = c("red", "blue"), labels = c("No Coupon", "Coupon Used")) +
labs(
title = "Coupon Effectiveness by Product Department",
x = "Department",
y = "Total Spending ($)",
fill = "Coupon Usage"
) +
theme_minimal()
This analysis clearly shows that coupon users at Regork consistently outspent non-coupon users, with higher total sales and average spending per transaction, particularly in key departments such as Grocery, Drug GM, and Produce. Coupons were especially effective during non-holiday periods and weekdays, driving substantial increases in spending outside traditional peak seasons. These findings highlight that coupons are a powerful tool for influencing regular shopping habits and driving consistent revenue. Given these insights, we recommend that Regork focus its coupon strategies on high-performing departments like Grocery and Drug GM, where they have proven most effective. Additionally, by offering weekday-specific promotions, Regork can capitalize on the stronger coupon-related spending seen during these times, further boosting sales during quieter shopping days. Emphasizing coupon campaigns during non-holiday periods also provides a significant opportunity to increase sales during non-seasonal times, where coupon use drives greater customer engagement. Finally, resources should be reallocated away from lower-impact departments, such as Pastry and Miscellaneous, where coupons showed minimal effect, allowing Regork to optimize marketing spend.
The CEO should accept these recommendations because they are data-driven and focused on maximizing return on investment. By concentrating efforts where coupons have the most impact, Regork can enhance overall sales, improve customer engagement, and reduce unnecessary marketing expenditures in areas that are not yielding sufficient returns. These recommendations provide actionable strategies that can lead to both immediate revenue increases and long-term customer loyalty, making them essential for Regork’s growth and competitive advantage in the market.