This report examines how sales vary over a business day for weekdays to develop strategies to maximize sales. First, I will look at how the time-of-day affects the sales across departments. Then I will take a close look at personal care essentials in the “Drug GM” department. I will then make recommendations on how to leverage coupon promotions and mobile app notifications to maximize sales during the shopping experience.
Approach: To get a clear picture of how sales are shaped over the weekday, I filtered out Saturday and Sunday transactions. Then I performed a loop with a case_when() statement to sort each transaction into three levels: “MORN, MIDDAY, and NIGHT.” The boundaries of each of those sections were determined by the earliest and latest sales as well as common knowledge of when most individuals conclude their workday. A sampling of the transactions showed the earliest transaction timestamp was approximately 6am and the latest was approximately midnight. So, the morning sales were considered those between 6am and noon, midday sales were those between noon and 4pm, and all sales after 4pm but before 6am were night sales.
From here I joined transactions with products, and later coupons and demographics to identify patterns sales activity by department, product type and age group.
This report utilizes the transactions, products, coupons, and demographics data sets from the Complete Journey Library.
library(completejourney)
library(ggplot2)
library(dplyr)
library(stringr) ## handling character strings
library(lubridate) ## handling dates and times
library(viridis) ## color palette package
## Complete Journey Data Sets:
transactions <- get_transactions()
## demographics
## coupons
## products
## Various Product Departments
var_departments <- unique(products$department)
print(var_departments)
## [1] "GROCERY" "MISCELLANEOUS" "PASTRY" "DRUG GM"
## [5] "MEAT-PCKGD" "SEAFOOD-PCKGD" "PRODUCE" "NUTRITION"
## [9] "DELI" "COSMETICS" "MEAT" "FLORAL"
## [13] "TRAVEL & LEISURE" "SEAFOOD" "SALAD BAR" "FUEL"
## [17] "ELECT &PLUMBING" "FROZEN GROCERY" "COUPON" "SPIRITS"
## [21] "GARDEN CENTER" "TOYS" "CHARITABLE CONT" "RESTAURANT"
## [25] "PROD-WHS SALES" "CHEF SHOPPE" "GM MERCH EXP" "AUTOMOTIVE"
## [29] "PHOTO & VIDEO" "CNTRL/STORE SUP" "HOUSEWARES" "POSTAL CENTER"
# Finding the earliest and latest transactions.
wday_transacts <- transactions %>%
filter(wday(transaction_timestamp, label = TRUE,
abbr = FALSE) %in% c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday"))
wday_transacts <- wday_transacts$transaction_timestamp
range(wday_transacts)
## [1] "2017-01-02 00:05:54 EST" "2017-12-29 23:45:07 EST"
#Filter for only weekdays and separate the day into three parts: MORN, MID, NIGHT).
transactions_times <- transactions %>%
filter(wday(transaction_timestamp, label = TRUE,
abbr = FALSE) %in% c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")) %>%
mutate(transaction_timestamp = as.POSIXct(transaction_timestamp),
time_of_day = case_when(
hour(transaction_timestamp) >= 6 & hour(transaction_timestamp) < 12 ~ "MORN",
hour(transaction_timestamp) >= 12 & hour(transaction_timestamp) < 16 ~ "MIDDAY",
hour(transaction_timestamp) >= 16 ~ "NIGHT",
TRUE ~ "NIGHT"
))
An aggregation of sales values over the three daytime categories showed that the highest proportion of sales occurred during evening shopping. It appears as the day progressed the sales increased exponentially.
transactions_time_distr <- transactions_times %>%
inner_join(products, by = "product_id") %>%
group_by(time_of_day) %>%
summarise(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales))
knitr::kable(transactions_time_distr, caption = 'Distribution of Sales by Time of Day')
| time_of_day | total_sales |
|---|---|
| NIGHT | 1679742.0 |
| MIDDAY | 853215.1 |
| MORN | 467918.1 |
The Drug GM department showed some of the strongest sales, so I wanted to explore the various store departments to determine if the time of day affected each department similarly. I took a sample of departments, both food and nonfood, to see how sales trended each hour of the day. An aggregation of the number of transactions, based on unique basket_id, and the total sales for each department showed a steady increase as the day went on for each department.
We can infer that most shoppers were shopping in the evening and were mainly drawn to the store for groceries. This led me to explore Drug GM products more closely to identify coupon promotions as an incentive to visit the store and maximize sales for our evening shoppers.
Here I selected five essential personal care categories from the Drugs GM department to examine more closely. I selected Deodorants, Eye and Ear Care Products, Oral Hygiene Products, Shaving Care Products and Soap. In addition to strong sales, these products are a point of interest because they are considered essential and commonly used in day-to-day life. Take toothpaste for example. They are also in a separate area of the store from food items and are often forgotten during the normal grocery trip.
product_purch_times <- transactions_times %>%
inner_join(products, by = "product_id")
desired_order <- desired_order <- c("MORN", "MIDDAY", "NIGHT")
product_purch_times %>%
filter(department %in% c('MEAT', 'FUEL', 'DRUG GM', 'COSMETICS', 'SPIRITS', 'FLORAL')) %>%
mutate(time_of_day = factor(time_of_day, levels = desired_order)) %>%
group_by(department, time_of_day) %>%
summarize(volume_sales = sum(sales_value)) %>%
ggplot(aes(x = reorder(department, volume_sales), y = log(volume_sales), shape = department)) +
geom_point(show.legend = FALSE, size = 4, color = "#355E3B") +
facet_wrap(~time_of_day, scales = "free_x", nrow = 1) +
labs(x = "Department", y = "Total Sales", title = "Total Sales by Department and Time of Day") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
product_purch_times %>%
filter(department %in% c('MEAT', 'FUEL', 'DRUG GM', 'COSMETICS', 'SPIRITS', 'FLORAL')) %>%
mutate(time_of_day = factor(time_of_day, levels = desired_order)) %>%
group_by(department, time_of_day) %>%
summarize(transact_vol = n_distinct(basket_id)) %>%
ggplot(aes(x = reorder(department, transact_vol), y = log(transact_vol), shape = department)) +
geom_point(show.legend = FALSE, size = 4, color = "#355E3B") +
facet_wrap(~time_of_day, scales = "free_x") +
labs(x = "Department", y = "Number of Transactions", title = "Sales Traffic by Department",
subtitle = "A Measure of Unique Baskets Containing Items from Each Department") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Here, I calculated the distribution of the number of transactions for each of these categories over each hour of the day. The plot shows a peak in transactions happening at about 6pm, which further enforced why the previous plots showed the highest number of transactions at NIGHT.
product_purch_times %>%
filter(department == 'DRUG GM',
product_category %in% c("DEODORANTS",
"SOAP - LIQUID & BAR",
"SHAVING CARE PRODUCTS",
"ORAL HYGIENE PRODUCTS",
"EYE AND EAR CARE PRODUCTS")) %>%
mutate(hour = hour(as.POSIXct(transaction_timestamp))) %>%
group_by(product_category, time_of_day, hour) %>%
summarize(transact_vol = n_distinct(basket_id)) %>%
filter(hour >= 6 & hour <= 23) %>%
ggplot(aes(x = hour, y = transact_vol, shape = product_category)) +
geom_point(alpha = 0.7, size = 3, color = "darkgreen") +
geom_smooth(method = "loess", se = FALSE, color = "darkgreen") +
labs(x = "Hour of the Day", y = "Number of Transactions",
title = "Number of Transactions by Hour and Product Category") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(6, 23, by = 1),
labels = format(seq(as.POSIXct("06:00:00", format = "%H:%M:%S"),
as.POSIXct("23:00:00", format = "%H:%M:%S"),
by = "hour"), "%I %p")) +
guides(shape = guide_legend(title = "Product Category"))
Next, I wanted to explore coupon usage for these five personal care categories. A comparison of the transactions where a coupon was used to those in which coupons were not used showed that most transactions were taking place without the use of coupons. This represents an opportunity to grow coupon usage to drive more sales.
drug_prods <- product_purch_times %>%
filter(department == 'DRUG GM',
product_category %in% c("DEODORANTS",
"SOAP - LIQUID & BAR",
"SHAVING CARE PRODUCTS",
"ORAL HYGIENE PRODUCTS", # <- Corrected
"EYE AND EAR CARE PRODUCTS")) %>%
mutate(coupon_used = ifelse(coupon_disc != 0, 1, 0))
ggplot(drug_prods) +
aes(x = product_category, fill = factor(coupon_used)) +
geom_bar(position = "fill") +
labs(x = "Product Category", y = "Proportion",
title = "Proportion of Transactions With Coupons and Without Coupons") +
scale_fill_manual(values = c("#355E3B", "#FFA500"),
labels = c("No Coupon Used", "Coupon Used")) +
theme_minimal() +
scale_x_discrete(labels = function(x) str_trunc(x, 20, "right")) +
theme(axis.text.x = element_text(angle = 35, hjust = 1))
ggplot(drug_prods, aes(x = sales_value, y = product_category)) +
geom_boxplot(fill = "#355E3B") +
labs(x = "Sales Value", y = "Product Category",
title = "Sales Value Summary Statistics by Product Category",
subtitle = "(with Coupon vs. without)") +
facet_wrap(~ coupon_used, nrow = 2,
labeller = labeller(coupon_used = c("1" = "Coupon Used",
"0" = "No Coupon Used")))
The box plot shows the summary of sales value resulting from transactions with and without coupons. You can see that the use of coupons eliminates a lot of the high sales outliers. We see that shaving care products are the most affected by the presence of a coupon. Specifically, the median, maximum and upper limit of sales were much lower with the use of a coupon. This implies that customers are saving a higher percentage on shaving products. Therefore, shaving care products would be considered desirable and exciting to consumers and can be used as an incentive to visit the store.
To find coupons that would represent the most savings to customers, and therefore be the most incentivizing, I calculated the coupon discount proportions for each coupon in these categories. I then arranged them in descending order to show the coupons that represented the most savings to customers.
big_savings <- drug_prods %>%
inner_join(coupons, by = "product_id") %>%
mutate(disc_proportion = coupon_disc / (sales_value + coupon_disc + retail_disc)) %>%
group_by(disc_proportion) %>%
arrange(desc(disc_proportion)) %>%
select(disc_proportion, coupon_upc, product_type)
knitr::kable(head(big_savings, 20),
caption = 'Coupon and Product Pairings with Large Discounts')
| disc_proportion | coupon_upc | product_type |
|---|---|---|
| 1.0000000 | 10000085428 | TOOTHPASTE |
| 1.0000000 | 10000085478 | TOOTHPASTE |
| 1.0000000 | 53700060051 | TOOTHPASTE |
| 1.0000000 | 53700060051 | TOOTHPASTE |
| 1.0000000 | 53700060051 | TOOTHPASTE |
| 1.0000000 | 53700061076 | TOOTHPASTE |
| 0.7143538 | 10000085428 | MENS SKIN CARE/COLOGNE |
| 0.7143538 | 10000085478 | MENS SKIN CARE/COLOGNE |
| 0.6318565 | 10000085428 | SHAVE CREAMS AND POWDERS |
| 0.6318565 | 10000085478 | SHAVE CREAMS AND POWDERS |
| 0.6001335 | 10000085428 | MENS SKIN CARE/COLOGNE |
| 0.6001335 | 10000085478 | MENS SKIN CARE/COLOGNE |
| 0.5434783 | 10000085428 | BODY WASH |
| 0.5434783 | 10000085478 | BODY WASH |
| 0.5434783 | 10000085480 | BODY WASH |
| 0.5434783 | 57940033055 | BODY WASH |
| 0.5384911 | 10000085428 | RAZORS AND BLADES |
| 0.5384911 | 10000085478 | RAZORS AND BLADES |
| 0.5384911 | 54740018078 | RAZORS AND BLADES |
| 0.5000000 | 10000085428 | MENS SKIN CARE/COLOGNE |
Finally, I examined how transactions varied over the day across consumer age groups. This bar graph shows that the highest number of transactions for nearly all age groups occurs in the evening, which follows the trend shown by our previous analysis. We can see that the age group that shops most frequently is the 45-54 age group. The group that will make up most shoppers in the store in the morning is that same age group. Shoppers aged 65 and over do most of their shopping in the middle of the day and have the least amount of variability, likely because most retired individuals would fall into this age group.
shop_times <- transactions_times %>%
inner_join(demographics, by = "household_id") %>%
group_by(age, time_of_day) %>%
mutate(time_of_day = factor(time_of_day, levels = desired_order)) %>%
summarise(num_transactions = n())
ggplot(shop_times, aes(x = age, y = num_transactions, fill = time_of_day)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "Age Group", y = "Number of Transactions",
title = "Most Popular Shopping Times by Age Group") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_manual(values = c("MORN" = "skyblue", "MIDDAY" = "lightgreen", "NIGHT" = "salmon"))
Based on this analysis I would recommend the following action steps:
Before the visit:
Ping consumers during the afternoon, between 3pm and 5pm, as they are making their shopping lists and mobile pick-up orders and heading to the store. Remind them to purchase personal care items and offer exciting coupons on essential items.
Consider the top coupons for mailers to bring shoppers into the store to increase the frequency of visits.
To maximize sales during each consumer visit to the store:
# A list of the Top 10 Products
distinct_products <- big_savings$product_type
top_products <- unique(distinct_products)
print(head(top_products, 10))
## [1] "TOOTHPASTE" "MENS SKIN CARE/COLOGNE"
## [3] "SHAVE CREAMS AND POWDERS" "BODY WASH"
## [5] "RAZORS AND BLADES" "MENS HAIR SPRAYS/TONICS/CREAMS"
## [7] "ANTIPERSPIRANTS ONLY (ALL OTHE" "BAR SOAP"
## [9] "ORAL HYGIENE BRUSHES" "SOLID/STK DEODORANTS"
# A list of the Top 10 Coupon UPC's
distinct_coupons <- big_savings$coupon_upc
top_coupon_upcs <- unique(distinct_coupons)
print(head(top_coupon_upcs, 10))
## [1] "10000085428" "10000085478" "53700060051" "53700061076" "10000085480"
## [6] "57940033055" "54740018078" "57033030076" "53500051576" "54589371155"
Who to target and when:
This analysis examines only a sampling of all personal care items. A broader analysis of all DRUG GM products could be done to find all opportunities for coupon incentives.
The calculations performed in this analysis take all store_ids into account without any data on store business hours. A more detailed analysis could be done by grouping stores with similar business hours. Additionally, it would be interesting to find stores that show busy times that differ from what was found here.
Further analysis could be done to explore how to maximize the sales of lunchtime shoppers. Personal care items are often nonperishable and can stay in the car for those who take a break from an office job to get lunch and pick up a few items. More analysis could be done on the demographics of customers who purchase lunch at a store, to determine how Regork can best target them.