Final Project

"Introduction:"

## [1] "Introduction:"

"Today, we will be reviewing recommended strategies for future promotions looking at the previous year’s historical data.  
Through the analysis we were able to identify 3 different approaches for future promotions, consolidating in one strategic plan.  
We were able to look at about 1.5 million transactions placed by 801 card-members for regwork.  We will use information from these loyalty customers to project information about the rest of our consumer group.

There are 3 recommendations we will be going through: the first is to focus coupon efforts into the Grocery category items.   The second is to use coupons to assist in converting consumers from general high-profitability items into privately produced equivalents.  The third is to pull additional information from non-loyalty users to validate profitability findings"

## [1] "Today, we will be reviewing recommended strategies for future promotions looking at the previous year’s historical data.  \nThrough the analysis we were able to identify 3 different approaches for future promotions, consolidating in one strategic plan.  \nWe were able to look at about 1.5 million transactions placed by 801 card-members for regwork.  We will use information from these loyalty customers to project information about the rest of our consumer group.\n\nThere are 3 recommendations we will be going through: the first is to focus coupon efforts into the Grocery category items.   The second is to use coupons to assist in converting consumers from general high-profitability items into privately produced equivalents.  The third is to pull additional information from non-loyalty users to validate profitability findings"

library(scales)
library(completejourney)

## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ readr::col_factor() masks scales::col_factor()
## ✖ purrr::discard()    masks scales::discard()
## ✖ dplyr::filter()     masks stats::filter()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(ggplot2)
library(lubridate)
library(RColorBrewer)


transactions <- get_transactions()
print(transactions)

## # A tibble: 1,469,307 × 11
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 900          330      31198570044 1095275           1        0.5         0   
##  2 900          330      31198570047 9878513           1        0.99        0.1 
##  3 1228         406      31198655051 1041453           1        1.43        0.15
##  4 906          319      31198705046 1020156           1        1.5         0.29
##  5 906          319      31198705046 1053875           2        2.78        0.8 
##  6 906          319      31198705046 1060312           1        5.49        0.5 
##  7 906          319      31198705046 1075313           1        1.5         0.29
##  8 1058         381      31198676055 985893            1        1.88        0.21
##  9 1058         381      31198676055 988791            1        1.5         1.29
## 10 1058         381      31198676055 9297106           1        2.69        0   
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>

promotions <- get_promotions()
print(promotions)

## # A tibble: 20,940,529 × 5
##    product_id store_id display_location mailer_location  week
##    <chr>      <chr>    <fct>            <fct>           <int>
##  1 1000050    316      9                0                   1
##  2 1000050    337      3                0                   1
##  3 1000050    441      5                0                   1
##  4 1000092    292      0                A                   1
##  5 1000092    293      0                A                   1
##  6 1000092    295      0                A                   1
##  7 1000092    298      0                A                   1
##  8 1000092    299      0                A                   1
##  9 1000092    304      0                A                   1
## 10 1000092    306      0                A                   1
## # ℹ 20,940,519 more rows

dim(promotions)

## [1] 20940529        5

demographics

## # A tibble: 801 × 8
##    household_id age   income    home_ownership marital_status household_size
##    <chr>        <ord> <ord>     <ord>          <ord>          <ord>         
##  1 1            65+   35-49K    Homeowner      Married        2             
##  2 1001         45-54 50-74K    Homeowner      Unmarried      1             
##  3 1003         35-44 25-34K    <NA>           Unmarried      1             
##  4 1004         25-34 15-24K    <NA>           Unmarried      1             
##  5 101          45-54 Under 15K Homeowner      Married        4             
##  6 1012         35-44 35-49K    <NA>           Married        5+            
##  7 1014         45-54 15-24K    <NA>           Married        4             
##  8 1015         45-54 50-74K    Homeowner      Unmarried      1             
##  9 1018         45-54 35-49K    Homeowner      Married        5+            
## 10 1020         45-54 25-34K    Homeowner      Married        2             
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>

print(demographics)

## # A tibble: 801 × 8
##    household_id age   income    home_ownership marital_status household_size
##    <chr>        <ord> <ord>     <ord>          <ord>          <ord>         
##  1 1            65+   35-49K    Homeowner      Married        2             
##  2 1001         45-54 50-74K    Homeowner      Unmarried      1             
##  3 1003         35-44 25-34K    <NA>           Unmarried      1             
##  4 1004         25-34 15-24K    <NA>           Unmarried      1             
##  5 101          45-54 Under 15K Homeowner      Married        4             
##  6 1012         35-44 35-49K    <NA>           Married        5+            
##  7 1014         45-54 15-24K    <NA>           Married        4             
##  8 1015         45-54 50-74K    Homeowner      Unmarried      1             
##  9 1018         45-54 35-49K    Homeowner      Married        5+            
## 10 1020         45-54 25-34K    Homeowner      Married        2             
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>

products

## # A tibble: 92,331 × 7
##    product_id manufacturer_id department    brand  product_category product_type
##    <chr>      <chr>           <chr>         <fct>  <chr>            <chr>       
##  1 25671      2               GROCERY       Natio… FRZN ICE         ICE - CRUSH…
##  2 26081      2               MISCELLANEOUS Natio… <NA>             <NA>        
##  3 26093      69              PASTRY        Priva… BREAD            BREAD:ITALI…
##  4 26190      69              GROCERY       Priva… FRUIT - SHELF S… APPLE SAUCE 
##  5 26355      69              GROCERY       Priva… COOKIES/CONES    SPECIALTY C…
##  6 26426      69              GROCERY       Priva… SPICES & EXTRAC… SPICES & SE…
##  7 26540      69              GROCERY       Priva… COOKIES/CONES    TRAY PACK/C…
##  8 26601      69              DRUG GM       Priva… VITAMINS         VITAMIN - M…
##  9 26636      69              PASTRY        Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691      16              GROCERY       Priva… PNT BTR/JELLY/J… HONEY       
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>

print(products)

## # A tibble: 92,331 × 7
##    product_id manufacturer_id department    brand  product_category product_type
##    <chr>      <chr>           <chr>         <fct>  <chr>            <chr>       
##  1 25671      2               GROCERY       Natio… FRZN ICE         ICE - CRUSH…
##  2 26081      2               MISCELLANEOUS Natio… <NA>             <NA>        
##  3 26093      69              PASTRY        Priva… BREAD            BREAD:ITALI…
##  4 26190      69              GROCERY       Priva… FRUIT - SHELF S… APPLE SAUCE 
##  5 26355      69              GROCERY       Priva… COOKIES/CONES    SPECIALTY C…
##  6 26426      69              GROCERY       Priva… SPICES & EXTRAC… SPICES & SE…
##  7 26540      69              GROCERY       Priva… COOKIES/CONES    TRAY PACK/C…
##  8 26601      69              DRUG GM       Priva… VITAMINS         VITAMIN - M…
##  9 26636      69              PASTRY        Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691      16              GROCERY       Priva… PNT BTR/JELLY/J… HONEY       
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>

demographics_transactions <- inner_join(demographics, transactions)

## Joining with `by = join_by(household_id)`

promtions_transactions <- inner_join(promotions, transactions)

## Joining with `by = join_by(product_id, store_id, week)`

## Warning in inner_join(promotions, transactions): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1882 of `x` matches multiple rows in `y`.
## ℹ Row 25 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

transactions_products <- inner_join(transactions, products)

## Joining with `by = join_by(product_id)`

demographics_promotions_transactions <- inner_join(demographics_transactions, promotions)

## Joining with `by = join_by(store_id, product_id, week)`

## Warning in inner_join(demographics_transactions, promotions): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 16864 of `x` matches multiple rows in `y`.
## ℹ Row 6484322 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

products_demographics_promotions_transactions <- inner_join(demographics_promotions_transactions, products)

## Joining with `by = join_by(product_id)`

#Graph1: 
demographics_promotions_transactions$transaction_timestamp <- as.POSIXct(demographics_promotions_transactions$transaction_timestamp, format = "%Y-%m-%d %H:%M:%S")
demographics_promotions_transactions <- demographics_promotions_transactions %>% arrange(household_id, transaction_timestamp)
first_coupon_purchase <- demographics_promotions_transactions %>%
  group_by(household_id, product_id) %>%
  filter(coupon_disc > 0) %>%
  slice(1) %>%
  select(household_id, product_id, transaction_timestamp) %>%
  rename(first_coupon_purchase_date = transaction_timestamp)
data_with_first_coupon <- left_join(demographics_promotions_transactions, first_coupon_purchase, by = c("household_id", "product_id"))
repeat_purchases_after_coupon <- data_with_first_coupon %>%
  filter(transaction_timestamp > first_coupon_purchase_date) %>%
  group_by(household_id, product_id) %>%
  summarise(total_repeat_purchases = n()) %>%
  ungroup()

## `summarise()` has grouped output by 'household_id'. You can override using the
## `.groups` argument.

repeat_after_coupon_with_products <- left_join(repeat_purchases_after_coupon, products)

## Joining with `by = join_by(product_id)`

total_repeat_purchases_by_department <- repeat_after_coupon_with_products %>%
  group_by(department) %>%
  summarise(total_repeat_purchases = sum(total_repeat_purchases))
total_repeat_purchases_by_department <- total_repeat_purchases_by_department %>%
  arrange(desc(total_repeat_purchases))

bar_plot <- ggplot(total_repeat_purchases_by_department, aes(x = reorder(department, -total_repeat_purchases), y = total_repeat_purchases)) +
  geom_bar(stat = "identity", fill = "skyblue", color = "black") +
  labs(title = "Repeat Purchases After Coupon by Department",
       x = "Department",
       y = "Total Repeat Purchases") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) 

# Display the bar graph
print(bar_plot)

"The first proposal is to focus on coupons in the grocery category, as grocery items have a much higher likelihood of repeat purchase after initial buy.  This will help maintain high profitabily after coupon is completed."

## [1] "The first proposal is to focus on coupons in the grocery category, as grocery items have a much higher likelihood of repeat purchase after initial buy.  This will help maintain high profitabily after coupon is completed."

# Graph 2 -------------------------------------------------------------------------
percent_off_data <- demographics_promotions_transactions %>%
  group_by(marital_status, household_size) %>%
  summarise(percent_off = sum((retail_disc + coupon_match_disc) / sum(sales_value)) * 100)

## `summarise()` has grouped output by 'marital_status'. You can override using
## the `.groups` argument.

marital_colors <- c("Married" = "blue", "Unmarried" = "grey", "NA" = "white")

bar_plot_percent_off <- ggplot(percent_off_data, aes(x = household_size, y = percent_off, fill = marital_status)) +
  geom_bar(stat = "identity", position = "dodge", color = "black") +
  labs(title = "Percent Off of Total Transactions by Marital Status and Household Size",
       x = "Household Size",
       y = "Percent Off") +
  scale_fill_manual(values = marital_colors) +  # Set colors for marital status
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))  # Rotate x-axis labels
print(bar_plot_percent_off)

"The Second Proposal is to complete additional consumer research, as our information is only from Loyalty members. These consumers are receiving 40-50% off the total retail price in this year timeframes. This leads to the recommendation of increasing base prices and pushing the loyalty membership to recieve more consumer information.  "

## [1] "The Second Proposal is to complete additional consumer research, as our information is only from Loyalty members. These consumers are receiving 40-50% off the total retail price in this year timeframes. This leads to the recommendation of increasing base prices and pushing the loyalty membership to recieve more consumer information.  "

# Graph 3 -------------------------------------------------------------------------
# 

profit_transactions <- transactions %>%
  mutate(profit = sales_value - coupon_disc - retail_disc - coupon_match_disc)

total_profit_per_product <- profit_transactions %>%
  group_by(product_id) %>%
  summarise(total_profit = sum(profit))

profit_by_category_brand <- products %>%
  inner_join(total_profit_per_product, by = "product_id")

profit_summary <- profit_by_category_brand %>%
  group_by(product_category, brand) %>%
  summarise(total_profit = sum(total_profit))

## `summarise()` has grouped output by 'product_category'. You can override using
## the `.groups` argument.

# Find the top 5 categories for both national and private brands
top_categories <- profit_summary %>%
  group_by(brand) %>%
  top_n(5, total_profit)

# Create a bar chart with the top 5 categories for both national and private brands
bar_chart <- ggplot(top_categories, aes(x = interaction(product_category, brand), y = total_profit, fill = brand)) +
  geom_bar(stat = "identity", position = "dodge", color = "black") +
  labs(title = "Top 5 Profitable Product Categories by Brand Groups",
       x = "Product Category and Brand",
       y = "Total Profit") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))  # Rotate x-axis labels

# Display the bar chart
print(bar_chart)

"The Third proposal is to convert high-profitability items from general to privately produced items.  Since there is such strong brand loyalty in the Grocery industry.  By focusing our promotions on the Bag Snacks and Soft Drink/Beer industry, we can gain a higher margin on sales."

## [1] "The Third proposal is to convert high-profitability items from general to privately produced items.  Since there is such strong brand loyalty in the Grocery industry.  By focusing our promotions on the Bag Snacks and Soft Drink/Beer industry, we can gain a higher margin on sales."

Final Project

Danielle Robben

2023-10-05