Maximizing Sales Personal Care Items Based on Peak Shopping Times

Introduction and Data Exploration

This report examines how sales vary over a business day for weekdays to develop strategies to maximize sales. First, I will look at how the time-of-day affects the sales across departments. Then I will take a close look at personal care essentials in the “Drug GM” department. I will then make recommendations on how to leverage coupon promotions and mobile app notifications to maximize sales during the shopping experience.

Approach: To get a clear picture of how sales are shaped over the weekday, I filtered out Saturday and Sunday transactions. Then I performed a loop with a case_when() statement to sort each transaction into three levels: “MORN, MIDDAY, and NIGHT.” The boundaries of each of those sections were determined by the earliest and latest sales as well as common knowledge of when most individuals conclude their workday. A sampling of the transactions showed the earliest transaction timestamp was approximately 6am and the latest was approximately midnight. So, the morning sales were considered those between 6am and noon, midday sales were those between noon and 4pm, and all sales after 4pm but before 6am were night sales.

From here I joined transactions with products, and later coupons and demographics to identify patterns sales activity by department, product type and age group.

Prerequisites

This report utilizes the transactions, products, coupons, and demographics data sets from the Complete Journey Library.

library(completejourney)
library(ggplot2)
library(dplyr)
library(stringr)   ## handling character strings
library(lubridate) ## handling dates and times
library(viridis)   ## color palette package

## Complete Journey Data Sets: 
transactions <- get_transactions()
## demographics
## coupons
## products

## Various Product Departments
var_departments <- unique(products$department)
print(var_departments)
##  [1] "GROCERY"          "MISCELLANEOUS"    "PASTRY"           "DRUG GM"         
##  [5] "MEAT-PCKGD"       "SEAFOOD-PCKGD"    "PRODUCE"          "NUTRITION"       
##  [9] "DELI"             "COSMETICS"        "MEAT"             "FLORAL"          
## [13] "TRAVEL & LEISURE" "SEAFOOD"          "SALAD BAR"        "FUEL"            
## [17] "ELECT &PLUMBING"  "FROZEN GROCERY"   "COUPON"           "SPIRITS"         
## [21] "GARDEN CENTER"    "TOYS"             "CHARITABLE CONT"  "RESTAURANT"      
## [25] "PROD-WHS SALES"   "CHEF SHOPPE"      "GM MERCH EXP"     "AUTOMOTIVE"      
## [29] "PHOTO & VIDEO"    "CNTRL/STORE SUP"  "HOUSEWARES"       "POSTAL CENTER"
# Finding the earliest and latest transactions. 
wday_transacts <- transactions %>%
  filter(wday(transaction_timestamp, label = TRUE, 
              abbr = FALSE) %in% c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday"))

wday_transacts <- wday_transacts$transaction_timestamp

range(wday_transacts)
## [1] "2017-01-02 00:05:54 EST" "2017-12-29 23:45:07 EST"
#Filter for only weekdays and separate the day into three parts: MORN, MID, NIGHT).

transactions_times <- transactions %>%
  filter(wday(transaction_timestamp, label = TRUE, 
              abbr = FALSE) %in% c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")) %>%
  mutate(transaction_timestamp = as.POSIXct(transaction_timestamp),
         time_of_day = case_when(
           hour(transaction_timestamp) >= 6 & hour(transaction_timestamp) < 12 ~ "MORN",
           hour(transaction_timestamp) >= 12 & hour(transaction_timestamp) < 16 ~ "MIDDAY",
           hour(transaction_timestamp) >= 16 ~ "NIGHT",
           TRUE ~ "NIGHT" 
         ))

What section of the day represents the highest sales?

An aggregation of sales values over the three daytime categories showed that the highest proportion of sales occurred during evening shopping. It appears as the day progressed the sales increased exponentially.

transactions_time_distr <- transactions_times %>%
  inner_join(products, by = "product_id") %>%
  group_by(time_of_day) %>%
  summarise(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales))

knitr::kable(transactions_time_distr, caption = 'Distribution of Sales by Time of Day')
Distribution of Sales by Time of Day
time_of_day total_sales
NIGHT 1679742.0
MIDDAY 853215.1
MORN 467918.1

Peak Shopping Times

The Drug GM department showed some of the strongest sales, so I wanted to explore the various store departments to determine if the time of day affected each department similarly. I took a sample of departments, both food and nonfood, to see how sales trended each hour of the day. An aggregation of the number of transactions, based on unique basket_id, and the total sales for each department showed a steady increase as the day went on for each department.

We can infer that most shoppers were shopping in the evening and were mainly drawn to the store for groceries. This led me to explore Drug GM products more closely to identify coupon promotions as an incentive to visit the store and maximize sales for our evening shoppers.

Here I selected five essential personal care categories from the Drugs GM department to examine more closely. I selected Deodorants, Eye and Ear Care Products, Oral Hygiene Products, Shaving Care Products and Soap. In addition to strong sales, these products are a point of interest because they are considered essential and commonly used in day-to-day life. Take toothpaste for example. They are also in a separate area of the store from food items and are often forgotten during the normal grocery trip.

Plot Tabs

Sales Value
product_purch_times <- transactions_times %>%
  inner_join(products, by = "product_id")

desired_order <- desired_order <- c("MORN", "MIDDAY", "NIGHT")

product_purch_times %>%
  filter(department %in% c('MEAT', 'FUEL', 'DRUG GM', 'COSMETICS', 'SPIRITS', 'FLORAL')) %>%
  mutate(time_of_day = factor(time_of_day, levels = desired_order)) %>%
  group_by(department, time_of_day) %>%
  summarize(volume_sales = sum(sales_value)) %>%
  ggplot(aes(x = reorder(department, volume_sales), y = log(volume_sales), shape = department)) +
  geom_point(show.legend = FALSE, size = 4, color = "#355E3B") +
  facet_wrap(~time_of_day, scales = "free_x", nrow = 1) +
  labs(x = "Department", y = "Total Sales", title = "Total Sales by Department and Time of Day") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Transactions
product_purch_times %>%
  filter(department %in% c('MEAT', 'FUEL', 'DRUG GM', 'COSMETICS', 'SPIRITS', 'FLORAL')) %>%
  mutate(time_of_day = factor(time_of_day, levels = desired_order)) %>%
  group_by(department, time_of_day) %>%
  summarize(transact_vol = n_distinct(basket_id)) %>%
  ggplot(aes(x = reorder(department, transact_vol), y = log(transact_vol), shape = department)) +
  geom_point(show.legend = FALSE, size = 4, color = "#355E3B") +
  facet_wrap(~time_of_day, scales = "free_x") +
  labs(x = "Department", y = "Number of Transactions", title = "Sales Traffic by Department",
       subtitle = "A Measure of Unique Baskets Containing Items from Each Department") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Sales by Hour

Here, I calculated the distribution of the number of transactions for each of these categories over each hour of the day. The plot shows a peak in transactions happening at about 6pm, which further enforced why the previous plots showed the highest number of transactions at NIGHT.

product_purch_times %>%
  filter(department == 'DRUG GM', 
       product_category %in% c("DEODORANTS", 
                               "SOAP - LIQUID & BAR", 
                               "SHAVING CARE PRODUCTS", 
                               "ORAL HYGIENE PRODUCTS", 
                               "EYE AND EAR CARE PRODUCTS")) %>%
  mutate(hour = hour(as.POSIXct(transaction_timestamp))) %>% 
  group_by(product_category, time_of_day, hour) %>%
  summarize(transact_vol = n_distinct(basket_id)) %>%
  filter(hour >= 6 & hour <= 23) %>%  
  ggplot(aes(x = hour, y = transact_vol, shape = product_category)) +
  geom_point(alpha = 0.7, size = 3, color = "darkgreen") +  
  geom_smooth(method = "loess", se = FALSE, color = "darkgreen") +  
  labs(x = "Hour of the Day", y = "Number of Transactions", 
       title = "Number of Transactions by Hour and Product Category") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_continuous(breaks = seq(6, 23, by = 1),  
                     labels = format(seq(as.POSIXct("06:00:00", format = "%H:%M:%S"), 
                                          as.POSIXct("23:00:00", format = "%H:%M:%S"), 
                                          by = "hour"), "%I %p")) +
  guides(shape = guide_legend(title = "Product Category"))  

Coupon Usage

Next, I wanted to explore coupon usage for these five personal care categories. A comparison of the transactions where a coupon was used to those in which coupons were not used showed that most transactions were taking place without the use of coupons. This represents an opportunity to grow coupon usage to drive more sales.

drug_prods <- product_purch_times %>%
  filter(department == 'DRUG GM', 
       product_category %in% c("DEODORANTS", 
                               "SOAP - LIQUID & BAR", 
                               "SHAVING CARE PRODUCTS", 
                               "ORAL HYGIENE PRODUCTS",  # <- Corrected
                               "EYE AND EAR CARE PRODUCTS")) %>%
  mutate(coupon_used = ifelse(coupon_disc != 0, 1, 0))

ggplot(drug_prods) +
  aes(x = product_category, fill = factor(coupon_used)) +  
  geom_bar(position = "fill") +
  labs(x = "Product Category", y = "Proportion",
      title = "Proportion of Transactions With Coupons and Without Coupons") +
  scale_fill_manual(values = c("#355E3B", "#FFA500"),  
                    labels = c("No Coupon Used", "Coupon Used")) +  
  theme_minimal() +
  scale_x_discrete(labels = function(x) str_trunc(x, 20, "right")) +
  theme(axis.text.x = element_text(angle = 35, hjust = 1))

ggplot(drug_prods, aes(x = sales_value, y = product_category)) +
  geom_boxplot(fill = "#355E3B") +
  labs(x = "Sales Value", y = "Product Category",
       title = "Sales Value Summary Statistics by Product Category", 
       subtitle = "(with Coupon vs. without)") +
  facet_wrap(~ coupon_used, nrow = 2, 
             labeller = labeller(coupon_used = c("1" = "Coupon Used", 
                                                 "0" = "No Coupon Used")))

The box plot shows the summary of sales value resulting from transactions with and without coupons. You can see that the use of coupons eliminates a lot of the high sales outliers. We see that shaving care products are the most affected by the presence of a coupon. Specifically, the median, maximum and upper limit of sales were much lower with the use of a coupon. This implies that customers are saving a higher percentage on shaving products. Therefore, shaving care products would be considered desirable and exciting to consumers and can be used as an incentive to visit the store.

To find coupons that would represent the most savings to customers, and therefore be the most incentivizing, I calculated the coupon discount proportions for each coupon in these categories. I then arranged them in descending order to show the coupons that represented the most savings to customers.

big_savings <- drug_prods %>%
  inner_join(coupons, by = "product_id") %>%
  mutate(disc_proportion = coupon_disc / (sales_value + coupon_disc + retail_disc)) %>%  
  group_by(disc_proportion) %>%
  arrange(desc(disc_proportion)) %>%
  select(disc_proportion, coupon_upc, product_type)


knitr::kable(head(big_savings, 20), 
                  caption = 'Coupon and Product Pairings with Large Discounts')
Coupon and Product Pairings with Large Discounts
disc_proportion coupon_upc product_type
1.0000000 10000085428 TOOTHPASTE
1.0000000 10000085478 TOOTHPASTE
1.0000000 53700060051 TOOTHPASTE
1.0000000 53700060051 TOOTHPASTE
1.0000000 53700060051 TOOTHPASTE
1.0000000 53700061076 TOOTHPASTE
0.7143538 10000085428 MENS SKIN CARE/COLOGNE
0.7143538 10000085478 MENS SKIN CARE/COLOGNE
0.6318565 10000085428 SHAVE CREAMS AND POWDERS
0.6318565 10000085478 SHAVE CREAMS AND POWDERS
0.6001335 10000085428 MENS SKIN CARE/COLOGNE
0.6001335 10000085478 MENS SKIN CARE/COLOGNE
0.5434783 10000085428 BODY WASH
0.5434783 10000085478 BODY WASH
0.5434783 10000085480 BODY WASH
0.5434783 57940033055 BODY WASH
0.5384911 10000085428 RAZORS AND BLADES
0.5384911 10000085478 RAZORS AND BLADES
0.5384911 54740018078 RAZORS AND BLADES
0.5000000 10000085428 MENS SKIN CARE/COLOGNE

Finally, I examined how transactions varied over the day across consumer age groups. This bar graph shows that the highest number of transactions for nearly all age groups occurs in the evening, which follows the trend shown by our previous analysis. We can see that the age group that shops most frequently is the 45-54 age group. The group that will make up most shoppers in the store in the morning is that same age group. Shoppers aged 65 and over do most of their shopping in the middle of the day and have the least amount of variability, likely because most retired individuals would fall into this age group.

shop_times <- transactions_times %>%
  inner_join(demographics, by = "household_id") %>%
  group_by(age, time_of_day) %>%
  mutate(time_of_day = factor(time_of_day, levels = desired_order)) %>%
  summarise(num_transactions = n())

ggplot(shop_times, aes(x = age, y = num_transactions, fill = time_of_day)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(x = "Age Group", y = "Number of Transactions", 
       title = "Most Popular Shopping Times by Age Group") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_fill_manual(values = c("MORN" = "skyblue", "MIDDAY" = "lightgreen", "NIGHT" = "salmon"))

Recommendations and Opportunities for Further Analysis

Recommendations

Based on this analysis I would recommend the following action steps:

Before the visit:

  • Ping consumers during the afternoon, between 3pm and 5pm, as they are making their shopping lists and mobile pick-up orders and heading to the store. Remind them to purchase personal care items and offer exciting coupons on essential items.

  • Consider the top coupons for mailers to bring shoppers into the store to increase the frequency of visits.

To maximize sales during each consumer visit to the store:

  • Use mobile app notifications between the hours of 5pm to 7pm to remind customers to pick up personal care items during their regular grocery shopping. Use the top coupons to incentivize sales. Below is a list of the top coupon UPCs and Product types associated with those coupons.
# A list of the Top 10 Products
distinct_products <- big_savings$product_type
top_products <- unique(distinct_products)
print(head(top_products, 10))
##  [1] "TOOTHPASTE"                     "MENS SKIN CARE/COLOGNE"        
##  [3] "SHAVE CREAMS AND POWDERS"       "BODY WASH"                     
##  [5] "RAZORS AND BLADES"              "MENS HAIR SPRAYS/TONICS/CREAMS"
##  [7] "ANTIPERSPIRANTS ONLY (ALL OTHE" "BAR SOAP"                      
##  [9] "ORAL HYGIENE BRUSHES"           "SOLID/STK DEODORANTS"
# A list of the Top 10 Coupon UPC's
distinct_coupons <- big_savings$coupon_upc
top_coupon_upcs <- unique(distinct_coupons)
print(head(top_coupon_upcs, 10))
##  [1] "10000085428" "10000085478" "53700060051" "53700061076" "10000085480"
##  [6] "57940033055" "54740018078" "57033030076" "53500051576" "54589371155"

Who to target and when:

  • The 45–55 age group makes up most shoppers at any given time during the day. This group is closely followed by those aged 35 – 44. All age groups are less likely to make purchases early in the day as opposed to the evening. You can expect the number of 65+ shoppers in the store at any given time to be steady throughout the day.

Limitations of this Analysis:

This analysis examines only a sampling of all personal care items. A broader analysis of all DRUG GM products could be done to find all opportunities for coupon incentives.

The calculations performed in this analysis take all store_ids into account without any data on store business hours. A more detailed analysis could be done by grouping stores with similar business hours. Additionally, it would be interesting to find stores that show busy times that differ from what was found here.

Further analysis could be done to explore how to maximize the sales of lunchtime shoppers. Personal care items are often nonperishable and can stay in the car for those who take a break from an office job to get lunch and pick up a few items. More analysis could be done on the demographics of customers who purchase lunch at a store, to determine how Regork can best target them.