knitr::opts_chunk$set(echo = TRUE)

Introduction

The business question I am trying to answer is: Are certain coupons or promotions more impactful than others?

To answer this question, I examine the folowing sub questions: - Which promotions are most impactful (generate higher average sales per transactions)? - Which coupons are most impactful (create highest average sales per transactions)? - Which demographics generate higher average sales?

These questions are highly relevant to the Regork chain because promotions and coupons can directly affect profit. Marketing expenses are crucial as they help to promote products and generate more profit for a company. Assessing whether the marketing (in this case, promotions and coupons) are effective and impactful allows the company to determine whether to increase, decrease or maintain the current marketing budget.

Required Packages

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(skimr)
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
library(lubridate)
library(gt)

Package purpose

tidyverse:data manipulation,exploration and visualization

completejourney: data sets characterizing household level transactions

janitor:simplifies the cleaning of data frames-examining and cleaning dirty data)

skimr:designed to provide comprehensive, easy-to-use, and visually appealing summary statistics

scales:provides the internal scaling infrastructure used by ggplot2, and gives tools to override the default breaks, labels, transformations and palettes

lubridate:easier statistical computing environment

gt:creates beautiful, publication-quality, and highly customizable display tables

Data

transactions <-get_transactions()
promotions <- get_promotions()
data("coupon_redemptions")
data("coupons")
data("demographics")
data("products")

Cleaning Up Data

transactions <-transactions%>% clean_names()
promotions <-promotions%>% clean_names()
coupon_redemptions <-coupon_redemptions%>% clean_names()
coupons <-coupons%>% clean_names()
demographics <- demographics%>% clean_names()

Joining Data

promo_transactions <-transactions%>%
  inner_join(promotions, by = c("product_id","store_id","week"))
## Warning in inner_join(., promotions, by = c("product_id", "store_id", "week")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 25 of `x` matches multiple rows in `y`.
## ℹ Row 352741 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

Analysis

promo_transactions <-promo_transactions%>%
  mutate(
    promo_type = case_when(
      !is.na(display_location) & !is.na(mailer_location) ~ "Display + Mailer",
      !is.na(display_location) ~ "Display only",
      !is.na(mailer_location) ~ "Mailer only",
      TRUE ~ "Other"
    )
  )

Calculating Impact of Promotion

promo_impact <-promo_transactions%>%
  group_by(promo_type)%>%
  summarise(
    avg_sales = mean(sales_value, na.rm = TRUE),
    total_sales = sum(sales_value, na.rm = TRUE),
    transactions = n()
  )%>%
  arrange(desc(avg_sales))

Most Impactful Promotion Method

ggplot(promo_impact,
       aes(x=reorder(promo_type,avg_sales),
           y= avg_sales)) +
  geom_col(fill = "lightblue")+
  coord_flip()+
  scale_y_continuous(labels = dollar) +
  labs(
    title = "Average Sales per Transactions by Promotion Type", 
    x= "Promotion Type",
    y= "Average Sales per Transaction"
  ) +
  theme_minimal()

promo_impact%>%
  gt()%>%
  fmt_currency(columns = avg_sales)
promo_type avg_sales total_sales transactions
Display + Mailer $2.76 905726 327736
promo_f <-transactions%>%
  left_join(promotions,
            by = c("store_id","product_id","week"))%>%
  mutate(
    promo_f = if_else(
      !is.na(display_location)| !is.na(mailer_location),
      "Promotion",
      "Non-Promotion"
    )
  )
## Warning in left_join(., promotions, by = c("store_id", "product_id", "week")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 25 of `x` matches multiple rows in `y`.
## ℹ Row 352741 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
transaction_level <-promo_f%>%
  group_by(household_id,week,basket_id,promo_f)%>%
  summarise(
    transaction_sales = sum(sales_value, na.rm = TRUE),
    .groups = "drop"
  )

Promotion Comparison Table

promo_comparison_table <-transaction_level%>%
  group_by(promo_f)%>%
  summarise(
    avg_sales_per_transaction = mean(transaction_sales, na.rm = TRUE),
    total_transactions = n(),
    total_sales = sum(transaction_sales,na.rm = TRUE),
    .groups= "drop"
  )

Upon analyzing the data, the combination of display and mailer promotion structure is the most effective as it generates the most average sales per transacctions. This promo- -tion generates an average of $2.76 per transaction and they are widely used as they generate a large transaction volume (more than 327 thousands).

While these promotions a generating large transactions numbers,the average of $2.76 per transaction is relatively low meaning that promotions are increasing traffic but are not significantly impacting profit.

When comparing promotions (all structures) to non-promotion sales, It is evident that non-promotion sales are generating more average sales per transactions and more transactions overall as seen in the table below. This shows that promotion strategies and budgeting may need to be realocated.

promo_comparison_table%>%
  gt()%>%
  fmt_currency(
    columns = c(avg_sales_per_transaction,total_sales)
  )%>%
  cols_label(
    promo_f = "Transaction Type", 
    avg_sales_per_transaction = "Average Sales per Transaction", 
    total_transactions = "Total Transactions",
    total_sales = "Total Sales"
  )%>%
  tab_header(
    title = "Promotion vs Non-Promotion Sales Comparison",
    subtitle = "Average Sale per Transaction"
  )
Promotion vs Non-Promotion Sales Comparison
Average Sale per Transaction
Transaction Type Average Sales per Transaction Total Transactions Total Sales
Non-Promotion $24.51 150620 $3,691,519.78
Promotion $10.39 87172 $905,725.98

Joining & Linking Data

coupon_products <-coupon_redemptions%>%
  left_join(coupons, by = "coupon_upc")
## Warning in left_join(., coupons, by = "coupon_upc"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 91127 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
coupon_sales <-coupon_products%>%
  inner_join(transactions,
             by= c("household_id","product_id"))%>%
  group_by(coupon_upc)%>%
  summarise(
    avg_sales = mean(sales_value, na.rm = TRUE),
    redemptions = n()
  )%>%
  arrange(desc(avg_sales))
## Warning in inner_join(., transactions, by = c("household_id", "product_id")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 3 of `x` matches multiple rows in `y`.
## ℹ Row 8816 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

Most Impactful Coupons

The most impactful coupons are listed below and they generate more that $10 per transaction on average.

coupon_sales%>%
  slice_max(avg_sales, n=10)%>%
  ggplot(aes(x = reorder(coupon_upc,avg_sales),
             y= avg_sales)) +
  geom_col(fill = "darkgreen") +
  coord_flip() +
  scale_y_continuous(labels = dollar) +
  labs(
    title = "Top 10 Coupons by Average Sales per Transaction", 
    x= "Coupon UPC",
    y= "Average Sales per Transactions"
  ) +
  theme_minimal()

Joining Data

coupon_demog <-coupon_products%>%
  inner_join(demographics, by = "household_id")%>%
  inner_join(transactions,
             by = c("household_id","product_id"))%>%
  group_by(income)%>%
  summarise(
    avg_sales = mean(sales_value, na.rm = TRUE), 
    redemptions = n()
  )%>%
  arrange(desc(avg_sales))
## Warning in inner_join(., transactions, by = c("household_id", "product_id")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 26 of `x` matches multiple rows in `y`.
## ℹ Row 8816 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
promo_transactions <-transactions%>%
  inner_join(promotions,
             by = c("store_id","product_id","week"))
## Warning in inner_join(., promotions, by = c("store_id", "product_id", "week")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 25 of `x` matches multiple rows in `y`.
## ℹ Row 352741 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
promo_demo <-promo_transactions%>%
  inner_join(demographics, by = "household_id")
income_sales <-promo_demo%>%
  group_by(income)%>%
  summarise(
    avg_sales = mean(sales_value, na.rm = TRUE),
    transactions = n()
  )%>%
  arrange(desc(avg_sales))

Which demographic generate higher average sales?

The table below shows the average sales per transactions by income level. It shows that there is almost what appears to be a direct correlation between the income level and the average sales per transactions.

ggplot(income_sales,
       aes(x = reorder(income,avg_sales),
           y= avg_sales)) +
  geom_col(fill = "darkblue") +
  coord_flip() +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    title = "Average Sales per Transactions by Income Level",
    subtitle = "Transactions Occuring During Promotions", 
    x = "Income Level",
    y = "Average Sales per Transaction"
  ) +
  theme_minimal(base_size = 14)

Summary

This report examined wheter certain coupons and promotion strategies are more impactful than others in generating average sales per transaction. The goal was to determine how Regork can optimize marketing investments and budgeting to potentially increase revenue. To do so, I analyzed which promotion types generate the highest average sales per tran- saction. I then compared promotion structures to non-promotion to compare their average sales per transaction. I also looked at the top ten coupons that generate thre highest sales transactions. Finally, I looked at which demographics produced the highest average sales per transactions during promotions. This data analysis reveals that while promotions effectively generate large numbers of transactions, they produce relatively low average sales per transaction. It also high- lights the most effective coupons.

Recommendations

I recommend maintaining display and mailer promotion structures as they generate traffic, however, I would recommend assessing their ability to increase sales per transaction. I would recommend increasing investment in high-performing coupons, particularly those that are associated with larger transaction values. I would recommend implementing demographic targeting strategies, particularly targeting demographics that produced lower average sales per transaction.

Limitations and Areas of Improvement

While this analysis provides some valuable insight, it has some limitations. For instance, the analysis is mostly observational and does not establish causal relationships.Another limitation is that long-term habits cannot be analyzed and the data represented only spans one year. Future work could include predictive modeling that would allow the company to easily identify trends and patterns in order to adjust strategies as needed and optimize promtion and coupon based expenses.