Introduction

Regork is a a national grocery chain with close to 450 stores across US. As a part of analytics team, we worked towards identifying the potential area of growth where the company could invest future resources to increase revenue and profits.

Problem Statement

After preliminary analysis of the data sets captured for Regork, we collaborated with the sales and marketing team at Regork. to provide them meaningful insights regarding various marketing campaigns launched in the year 2017. The objective is to evaluate campaign performance and highlight potential areas of focus.

Packages Required

The following packages are required to execute this .rmd file without witnessing any error -

Package Purpose
Tidyverse Opinionated collection of R packages designed for data science
Complete Journey Provides access to data sets characterizing household level transactions
Plotly Makes interactive, publication-quality graphs
wordcloud Plots a word cloud
RColorBrewer Contains a ready-to-use color palettes for creating beautiful graphics
knitr Provides a general-purpose tool for dynamic report generation in R
library(tidyverse)
library(completejourney)
library(plotly)
library(wordcloud)
library(RColorBrewer)
library(knitr)

Data Prepration

Complete Journey package provides access to data sets characterizing household level transactions over one year from a group of 2,469 households who are frequent shoppers at a grocery store. For certain households, demographic information as well as direct marketing contact history are captured.

For more details on built-in data sets under this package: The Complete Journey User Guide

Load Datasets

All the data sets required for analysis are built-in. However, promotions and transactions data sets only provide a snapshot of full table due to their size. We need to load them separately to gain access to the entire table.

transactions <- get_transactions()
promotions <- get_promotions()

Create Master Datasets

  • Joined transactions table with products and households to extract product hierarchy and household demographics against each transaction in the transaction table.
trans_master <- transactions %>% left_join(products, by = 'product_id') %>% left_join(demographics, by = 'household_id')
  • Created another data set to extract all the campaign related information in terms of target households and products included in each campaign
campaign_master <- campaigns %>% left_join(campaign_descriptions, by = 'campaign_id') %>% 
  left_join(coupon_redemptions, by = c('campaign_id','household_id'))

EDA

There were 27 campaigns that got rolled out in the year 2017 of three different types - Type A, Type B and Type C.

  • Type A - 4 campaigns
  • Type B - 17 campaigns
  • Type C - 6 campaigns
camp_details <- coupons %>% left_join(products, by = 'product_id') %>% group_by(campaign_id, department) %>% 
  summarise(count = n()) %>% inner_join(campaign_descriptions, by = 'campaign_id') %>% 
  group_by(campaign_id, campaign_type, start_date) %>%
  summarise(department_cnt = n_distinct(department)) %>% arrange(start_date)
kable(camp_details, caption = "**Below is the snapshot of campaign details sorted on campaign start date** -")
Below is the snapshot of campaign details sorted on campaign start date -
campaign_id campaign_type start_date department_cnt
24 Type C 2016-11-14 3
25 Type B 2016-12-06 1
26 Type B 2016-12-28 3
27 Type A 2017-02-08 11
1 Type B 2017-03-03 1
2 Type B 2017-03-08 2
3 Type C 2017-03-13 5
4 Type B 2017-03-29 2
5 Type B 2017-04-03 1
6 Type C 2017-04-19 1
7 Type B 2017-04-24 3
8 Type A 2017-05-08 20
9 Type B 2017-05-31 3
10 Type B 2017-06-28 3
11 Type B 2017-07-12 2
12 Type B 2017-07-12 3
13 Type A 2017-08-08 21
14 Type C 2017-09-04 4
15 Type C 2017-09-20 1
16 Type B 2017-10-04 3
17 Type B 2017-10-18 3
18 Type A 2017-10-30 21
19 Type B 2017-11-15 1
20 Type C 2017-11-27 2
21 Type B 2017-12-06 6
22 Type B 2017-12-06 4
23 Type B 2017-12-28 3
  • As highlighted in the table above, Type A campaigns were launched at an approximate equal interval throughout the year
  • Type B campaigns were too frequent and were even launched consecutively multiple times
  • Type C campaigns didn’t seem to have any pattern in launch time-frame

Product distributions across different Campaign type

Campaign A included products from various products categories while focusing majorly on products under Grocery and Drug GM departments.

x <- coupons %>% left_join(products, by = 'product_id') %>% group_by(campaign_id, department) %>% 
  summarise(count = n()) %>% inner_join(campaign_descriptions, by = 'campaign_id') %>% 
  group_by(campaign_type, department) %>% summarise(count = sum(count))

a <- x[x$campaign_type == 'Type A',2:3]

set.seed(10)
pal <- brewer.pal(8,"Dark2")
wordcloud(a$department, a$count, random.order=FALSE,
          rot.per=.15, colors=pal,scale=c(4,.9),
          title="Campaign Type A: Product distribution")
Type A - Products Distribution

Type A - Products Distribution

While Type B and Type C campaigns included products from Grocery and Drug GM departments and had very minimal items from some other departments.

Household targeting and redemption%

Redemption% is calculated as: count(households redeemed campaign-coupons)/count(households received the campaigns)

The grouped bar chart below highlights the #households targeted during each campaign and their response in terms of coupon redemption.

  type_a <- campaigns %>% left_join(campaign_descriptions, by = 'campaign_id') %>% filter(campaign_type == 'Type A') %>%
    left_join(coupons, by = 'campaign_id') %>% 
    left_join(coupon_redemptions, by = c('campaign_id','household_id','coupon_upc')) %>% 
    mutate(is_redeemed = ifelse(is.na(redemption_date) == F, 1, 0))
  
  l <- type_a %>% group_by(campaign_id, start_date) %>% 
    summarise(n()) %>% arrange(start_date) %>%
    select(campaign_id)
  
  
  type_a$campaign_id <- factor(type_a$campaign_id, levels = as.list(l$campaign_id), ordered = T)
  
  type_a %>% group_by(campaign_id,start_date, end_date) %>% summarise(count_household = n_distinct(household_id)) %>%
    inner_join(type_a %>% filter(is_redeemed == 1) %>% 
                 group_by(campaign_id) %>% summarise(household_redeemed = n_distinct(household_id)), by = 'campaign_id') %>%
    mutate(response_rate = household_redeemed/count_household*100) %>% 
    plot_ly(x = ~campaign_id, y = ~count_household, type = 'bar', name = '#hh_received',
            marker = list(color = 'rgb(192,192,192)')) %>%
    add_trace(y = ~household_redeemed, name = '#hh_redeemed',
              marker = list(color = 'rgb(158,202,225)')) %>% 
    add_trace(y = ~response_rate, name = "yaxis 2 data", yaxis = "y2", 
              mode = "lines+markers", type = "scatter", showlegend = FALSE,
              marker = list(size = 5,
                            color = 'rgb(230,120,140)',
                            line = list(color = 'rgb(0,0,0)',
                                        width = 2))) %>%
    layout(
      title = "<b>Type A Campaign<b>",
      xaxis = list(title="<b>campaign id</b> "),
      yaxis = list(side = 'left', title="<b>#household</b>", showgrid = FALSE, zeroline = FALSE),
      yaxis2 = list(side = 'right', overlaying = "y", title = "<b>redemption%</b>", showgrid 
                    =  FALSE, zeroline = FALSE)
    )

Campaign IDs are sorted based on rollout date**

Type A campaigns had a fair redemption% of ~15% and it has only improved with every subsequent launch. Target audience for these kinds of campaigns have been around ~1000 households.

type_b <- campaigns %>% left_join(campaign_descriptions, by = 'campaign_id') %>% filter(campaign_type == 'Type B') %>%
  left_join(coupons, by = 'campaign_id') %>% left_join(products, by = 'product_id') %>%
  left_join(coupon_redemptions, by = c('campaign_id','household_id','coupon_upc')) %>% 
  mutate(is_redeemed = ifelse(is.na(redemption_date) == F, 1, 0))

l = type_b %>% group_by(campaign_id, start_date) %>% 
  summarise(n()) %>% arrange(start_date) %>%
  select(campaign_id)


type_b$campaign_id <- factor(type_b$campaign_id, levels = as.list(l$campaign_id), ordered = T)

type_b %>% group_by(campaign_id,start_date, end_date) %>% summarise(count_household = n_distinct(household_id)) %>%
  inner_join(type_b %>% filter(is_redeemed == 1) %>% 
               group_by(campaign_id) %>% summarise(household_redeemed = n_distinct(household_id)), by = 'campaign_id') %>%
  mutate(response_rate = household_redeemed/count_household*100) %>% 
  plot_ly(x = ~campaign_id, y = ~count_household, type = 'bar', name = '#hh_received',
          marker = list(color = 'rgb(192,192,192)')) %>%
  add_trace(y = ~household_redeemed, name = '#hh_redeemed',
            marker = list(color = 'rgb(158,202,225)')) %>% 
  add_trace(y = ~response_rate, name = "yaxis 2 data", yaxis = "y2", 
            mode = "markers", type = "scatter", showlegend = FALSE,
            marker = list(size = 5,
                          color = 'rgb(230,120,140)',
                          line = list(color = 'rgb(0,0,0)',
                                      width = 2))) %>%
  layout(
    title = "<b>Type B Campaign<b>",
    xaxis = list(title="<b>campaign id</b> "),
    yaxis = list(side = 'left', title="<b>#household</b>", showgrid = FALSE, zeroline = FALSE),
    yaxis2 = list(side = 'right', overlaying = "y", title = "<b>redemption%</b>", showgrid 
                  =  FALSE, zeroline = FALSE)
  )

Campaign IDs are sorted based on rollout date**

type_c <- campaigns %>% left_join(campaign_descriptions, by = 'campaign_id') %>% filter(campaign_type == 'Type C') %>%
  left_join(coupons, by = 'campaign_id') %>% left_join(products, by = 'product_id') %>%
  left_join(coupon_redemptions, by = c('campaign_id','household_id','coupon_upc')) %>% 
  mutate(is_redeemed = ifelse(is.na(redemption_date) == F, 1, 0))

l = type_c %>% group_by(campaign_id, start_date) %>% 
  summarise(n()) %>% arrange(start_date) %>%
  select(campaign_id)


type_c$campaign_id <- factor(type_c$campaign_id, levels = as.list(l$campaign_id), ordered = T)

type_c %>% group_by(campaign_id,start_date, end_date) %>% summarise(count_household = n_distinct(household_id)) %>%
  inner_join(type_c %>% filter(is_redeemed == 1) %>% 
               group_by(campaign_id) %>% summarise(household_redeemed = n_distinct(household_id)), by = 'campaign_id') %>%
  mutate(response_rate = household_redeemed/count_household*100) %>% 
  plot_ly(x = ~campaign_id, y = ~count_household, type = 'bar', name = '#hh_received',
          marker = list(color = 'rgb(192,192,192)')) %>%
    add_trace(y = ~household_redeemed, name = '#hh_redeemed',
            marker = list(color = 'rgb(158,202,225)')) %>% 
  add_trace(y = ~response_rate, name = "yaxis 2 data", yaxis = "y2", 
            mode = "markers", type = "scatter", showlegend = FALSE,
            marker = list(size = 5,
                          color = 'rgb(230,120,140)',
                          line = list(color = 'rgb(0,0,0)',
                                      width = 2))) %>%
  layout(
    title = "<b>Type C Campaign<b>",
    xaxis = list(title="<b>campaign id</b> "),
    yaxis = list(side = 'left', title="<b>#household</b>", showgrid = FALSE, zeroline = FALSE),
    yaxis2 = list(side = 'right', overlaying = "y", title = "<b>redemption%</b>", showgrid 
                  =  FALSE, zeroline = FALSE)
  )

Campaign IDs are sorted based on rollout date**

Type B and Type C campaigns had approximately 6-8% redemption with a target of approximately 200 households.

Summary

  1. Despite focusing on the mainstream categories, Type B and Type C campaigns didn’t seem very effective in getting the desired customer response
  2. Type B and Type C campaigns did not show any signs of improvement in terms of coupon redemption over time. Instead, they fell to as low as 2% and 4% respectively in their latest launch

Next Steps

Further analysis would provide more information regarding the underperforming campaigns:

  • Household analysis to evaluate if these campaigns are being rolled out to the right audience
  • SKUs present in the coupon for these campaigns