"Introduction:"
## [1] "Introduction:"
"Today, we will be reviewing recommended strategies for future promotions looking at the previous year’s historical data.
Through the analysis we were able to identify 3 different approaches for future promotions, consolidating in one strategic plan.
We were able to look at about 1.5 million transactions placed by 801 card-members for regwork. We will use information from these loyalty customers to project information about the rest of our consumer group.
There are 3 recommendations we will be going through: the first is to focus coupon efforts into the Grocery category items. The second is to use coupons to assist in converting consumers from general high-profitability items into privately produced equivalents. The third is to pull additional information from non-loyalty users to validate profitability findings"
## [1] "Today, we will be reviewing recommended strategies for future promotions looking at the previous year’s historical data. \nThrough the analysis we were able to identify 3 different approaches for future promotions, consolidating in one strategic plan. \nWe were able to look at about 1.5 million transactions placed by 801 card-members for regwork. We will use information from these loyalty customers to project information about the rest of our consumer group.\n\nThere are 3 recommendations we will be going through: the first is to focus coupon efforts into the Grocery category items. The second is to use coupons to assist in converting consumers from general high-profitability items into privately produced equivalents. The third is to pull additional information from non-loyalty users to validate profitability findings"
library(scales)
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ readr::col_factor() masks scales::col_factor()
## ✖ purrr::discard() masks scales::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(lubridate)
library(RColorBrewer)
transactions <- get_transactions()
print(transactions)
## # A tibble: 1,469,307 × 11
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570044 1095275 1 0.5 0
## 2 900 330 31198570047 9878513 1 0.99 0.1
## 3 1228 406 31198655051 1041453 1 1.43 0.15
## 4 906 319 31198705046 1020156 1 1.5 0.29
## 5 906 319 31198705046 1053875 2 2.78 0.8
## 6 906 319 31198705046 1060312 1 5.49 0.5
## 7 906 319 31198705046 1075313 1 1.5 0.29
## 8 1058 381 31198676055 985893 1 1.88 0.21
## 9 1058 381 31198676055 988791 1 1.5 1.29
## 10 1058 381 31198676055 9297106 1 2.69 0
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>
promotions <- get_promotions()
print(promotions)
## # A tibble: 20,940,529 × 5
## product_id store_id display_location mailer_location week
## <chr> <chr> <fct> <fct> <int>
## 1 1000050 316 9 0 1
## 2 1000050 337 3 0 1
## 3 1000050 441 5 0 1
## 4 1000092 292 0 A 1
## 5 1000092 293 0 A 1
## 6 1000092 295 0 A 1
## 7 1000092 298 0 A 1
## 8 1000092 299 0 A 1
## 9 1000092 304 0 A 1
## 10 1000092 306 0 A 1
## # ℹ 20,940,519 more rows
dim(promotions)
## [1] 20940529 5
demographics
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
print(demographics)
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
products
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
print(products)
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
demographics_transactions <- inner_join(demographics, transactions)
## Joining with `by = join_by(household_id)`
promtions_transactions <- inner_join(promotions, transactions)
## Joining with `by = join_by(product_id, store_id, week)`
## Warning in inner_join(promotions, transactions): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1882 of `x` matches multiple rows in `y`.
## ℹ Row 25 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
transactions_products <- inner_join(transactions, products)
## Joining with `by = join_by(product_id)`
demographics_promotions_transactions <- inner_join(demographics_transactions, promotions)
## Joining with `by = join_by(store_id, product_id, week)`
## Warning in inner_join(demographics_transactions, promotions): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 16864 of `x` matches multiple rows in `y`.
## ℹ Row 6484322 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
products_demographics_promotions_transactions <- inner_join(demographics_promotions_transactions, products)
## Joining with `by = join_by(product_id)`
#Graph1:
demographics_promotions_transactions$transaction_timestamp <- as.POSIXct(demographics_promotions_transactions$transaction_timestamp, format = "%Y-%m-%d %H:%M:%S")
demographics_promotions_transactions <- demographics_promotions_transactions %>% arrange(household_id, transaction_timestamp)
first_coupon_purchase <- demographics_promotions_transactions %>%
group_by(household_id, product_id) %>%
filter(coupon_disc > 0) %>%
slice(1) %>%
select(household_id, product_id, transaction_timestamp) %>%
rename(first_coupon_purchase_date = transaction_timestamp)
data_with_first_coupon <- left_join(demographics_promotions_transactions, first_coupon_purchase, by = c("household_id", "product_id"))
repeat_purchases_after_coupon <- data_with_first_coupon %>%
filter(transaction_timestamp > first_coupon_purchase_date) %>%
group_by(household_id, product_id) %>%
summarise(total_repeat_purchases = n()) %>%
ungroup()
## `summarise()` has grouped output by 'household_id'. You can override using the
## `.groups` argument.
repeat_after_coupon_with_products <- left_join(repeat_purchases_after_coupon, products)
## Joining with `by = join_by(product_id)`
total_repeat_purchases_by_department <- repeat_after_coupon_with_products %>%
group_by(department) %>%
summarise(total_repeat_purchases = sum(total_repeat_purchases))
total_repeat_purchases_by_department <- total_repeat_purchases_by_department %>%
arrange(desc(total_repeat_purchases))
bar_plot <- ggplot(total_repeat_purchases_by_department, aes(x = reorder(department, -total_repeat_purchases), y = total_repeat_purchases)) +
geom_bar(stat = "identity", fill = "skyblue", color = "black") +
labs(title = "Repeat Purchases After Coupon by Department",
x = "Department",
y = "Total Repeat Purchases") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Display the bar graph
print(bar_plot)

"The first proposal is to focus on coupons in the grocery category, as grocery items have a much higher likelihood of repeat purchase after initial buy. This will help maintain high profitabily after coupon is completed."
## [1] "The first proposal is to focus on coupons in the grocery category, as grocery items have a much higher likelihood of repeat purchase after initial buy. This will help maintain high profitabily after coupon is completed."
# Graph 2 -------------------------------------------------------------------------
percent_off_data <- demographics_promotions_transactions %>%
group_by(marital_status, household_size) %>%
summarise(percent_off = sum((retail_disc + coupon_match_disc) / sum(sales_value)) * 100)
## `summarise()` has grouped output by 'marital_status'. You can override using
## the `.groups` argument.
marital_colors <- c("Married" = "blue", "Unmarried" = "grey", "NA" = "white")
bar_plot_percent_off <- ggplot(percent_off_data, aes(x = household_size, y = percent_off, fill = marital_status)) +
geom_bar(stat = "identity", position = "dodge", color = "black") +
labs(title = "Percent Off of Total Transactions by Marital Status and Household Size",
x = "Household Size",
y = "Percent Off") +
scale_fill_manual(values = marital_colors) + # Set colors for marital status
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) # Rotate x-axis labels
print(bar_plot_percent_off)

"The Second Proposal is to complete additional consumer research, as our information is only from Loyalty members. These consumers are receiving 40-50% off the total retail price in this year timeframes. This leads to the recommendation of increasing base prices and pushing the loyalty membership to recieve more consumer information. "
## [1] "The Second Proposal is to complete additional consumer research, as our information is only from Loyalty members. These consumers are receiving 40-50% off the total retail price in this year timeframes. This leads to the recommendation of increasing base prices and pushing the loyalty membership to recieve more consumer information. "
# Graph 3 -------------------------------------------------------------------------
#
profit_transactions <- transactions %>%
mutate(profit = sales_value - coupon_disc - retail_disc - coupon_match_disc)
total_profit_per_product <- profit_transactions %>%
group_by(product_id) %>%
summarise(total_profit = sum(profit))
profit_by_category_brand <- products %>%
inner_join(total_profit_per_product, by = "product_id")
profit_summary <- profit_by_category_brand %>%
group_by(product_category, brand) %>%
summarise(total_profit = sum(total_profit))
## `summarise()` has grouped output by 'product_category'. You can override using
## the `.groups` argument.
# Find the top 5 categories for both national and private brands
top_categories <- profit_summary %>%
group_by(brand) %>%
top_n(5, total_profit)
# Create a bar chart with the top 5 categories for both national and private brands
bar_chart <- ggplot(top_categories, aes(x = interaction(product_category, brand), y = total_profit, fill = brand)) +
geom_bar(stat = "identity", position = "dodge", color = "black") +
labs(title = "Top 5 Profitable Product Categories by Brand Groups",
x = "Product Category and Brand",
y = "Total Profit") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) # Rotate x-axis labels
# Display the bar chart
print(bar_chart)

"The Third proposal is to convert high-profitability items from general to privately produced items. Since there is such strong brand loyalty in the Grocery industry. By focusing our promotions on the Bag Snacks and Soft Drink/Beer industry, we can gain a higher margin on sales."
## [1] "The Third proposal is to convert high-profitability items from general to privately produced items. Since there is such strong brand loyalty in the Grocery industry. By focusing our promotions on the Bag Snacks and Soft Drink/Beer industry, we can gain a higher margin on sales."