Regork is a a national grocery chain with close to 450 stores across US. As a part of analytics team, we worked towards identifying the potential area of growth where the company could invest future resources to increase revenue and profits.
After preliminary analysis of the data sets captured for Regork, we collaborated with the sales and marketing team at Regork. to provide them meaningful insights regarding various marketing campaigns launched in the year 2017. The objective is to evaluate campaign performance and highlight potential areas of focus.
The following packages are required to execute this .rmd file without witnessing any error -
| Package | Purpose |
|---|---|
| Tidyverse | Opinionated collection of R packages designed for data science |
| Complete Journey | Provides access to data sets characterizing household level transactions |
| Plotly | Makes interactive, publication-quality graphs |
| wordcloud | Plots a word cloud |
| RColorBrewer | Contains a ready-to-use color palettes for creating beautiful graphics |
| knitr | Provides a general-purpose tool for dynamic report generation in R |
library(tidyverse)
library(completejourney)
library(plotly)
library(wordcloud)
library(RColorBrewer)
library(knitr)Complete Journey package provides access to data sets characterizing household level transactions over one year from a group of 2,469 households who are frequent shoppers at a grocery store. For certain households, demographic information as well as direct marketing contact history are captured.
For more details on built-in data sets under this package: The Complete Journey User Guide
All the data sets required for analysis are built-in. However, promotions and transactions data sets only provide a snapshot of full table due to their size. We need to load them separately to gain access to the entire table.
transactions <- get_transactions()
promotions <- get_promotions()trans_master <- transactions %>% left_join(products, by = 'product_id') %>% left_join(demographics, by = 'household_id')campaign_master <- campaigns %>% left_join(campaign_descriptions, by = 'campaign_id') %>%
left_join(coupon_redemptions, by = c('campaign_id','household_id'))There were 27 campaigns that got rolled out in the year 2017 of three different types - Type A, Type B and Type C.
camp_details <- coupons %>% left_join(products, by = 'product_id') %>% group_by(campaign_id, department) %>%
summarise(count = n()) %>% inner_join(campaign_descriptions, by = 'campaign_id') %>%
group_by(campaign_id, campaign_type, start_date) %>%
summarise(department_cnt = n_distinct(department)) %>% arrange(start_date)
kable(camp_details, caption = "**Below is the snapshot of campaign details sorted on campaign start date** -")| campaign_id | campaign_type | start_date | department_cnt |
|---|---|---|---|
| 24 | Type C | 2016-11-14 | 3 |
| 25 | Type B | 2016-12-06 | 1 |
| 26 | Type B | 2016-12-28 | 3 |
| 27 | Type A | 2017-02-08 | 11 |
| 1 | Type B | 2017-03-03 | 1 |
| 2 | Type B | 2017-03-08 | 2 |
| 3 | Type C | 2017-03-13 | 5 |
| 4 | Type B | 2017-03-29 | 2 |
| 5 | Type B | 2017-04-03 | 1 |
| 6 | Type C | 2017-04-19 | 1 |
| 7 | Type B | 2017-04-24 | 3 |
| 8 | Type A | 2017-05-08 | 20 |
| 9 | Type B | 2017-05-31 | 3 |
| 10 | Type B | 2017-06-28 | 3 |
| 11 | Type B | 2017-07-12 | 2 |
| 12 | Type B | 2017-07-12 | 3 |
| 13 | Type A | 2017-08-08 | 21 |
| 14 | Type C | 2017-09-04 | 4 |
| 15 | Type C | 2017-09-20 | 1 |
| 16 | Type B | 2017-10-04 | 3 |
| 17 | Type B | 2017-10-18 | 3 |
| 18 | Type A | 2017-10-30 | 21 |
| 19 | Type B | 2017-11-15 | 1 |
| 20 | Type C | 2017-11-27 | 2 |
| 21 | Type B | 2017-12-06 | 6 |
| 22 | Type B | 2017-12-06 | 4 |
| 23 | Type B | 2017-12-28 | 3 |
Campaign A included products from various products categories while focusing majorly on products under Grocery and Drug GM departments.
x <- coupons %>% left_join(products, by = 'product_id') %>% group_by(campaign_id, department) %>%
summarise(count = n()) %>% inner_join(campaign_descriptions, by = 'campaign_id') %>%
group_by(campaign_type, department) %>% summarise(count = sum(count))
a <- x[x$campaign_type == 'Type A',2:3]
set.seed(10)
pal <- brewer.pal(8,"Dark2")
wordcloud(a$department, a$count, random.order=FALSE,
rot.per=.15, colors=pal,scale=c(4,.9),
title="Campaign Type A: Product distribution")Type A - Products Distribution
While Type B and Type C campaigns included products from Grocery and Drug GM departments and had very minimal items from some other departments.
Redemption% is calculated as: count(households redeemed campaign-coupons)/count(households received the campaigns)
The grouped bar chart below highlights the #households targeted during each campaign and their response in terms of coupon redemption.
type_a <- campaigns %>% left_join(campaign_descriptions, by = 'campaign_id') %>% filter(campaign_type == 'Type A') %>%
left_join(coupons, by = 'campaign_id') %>%
left_join(coupon_redemptions, by = c('campaign_id','household_id','coupon_upc')) %>%
mutate(is_redeemed = ifelse(is.na(redemption_date) == F, 1, 0))
l <- type_a %>% group_by(campaign_id, start_date) %>%
summarise(n()) %>% arrange(start_date) %>%
select(campaign_id)
type_a$campaign_id <- factor(type_a$campaign_id, levels = as.list(l$campaign_id), ordered = T)
type_a %>% group_by(campaign_id,start_date, end_date) %>% summarise(count_household = n_distinct(household_id)) %>%
inner_join(type_a %>% filter(is_redeemed == 1) %>%
group_by(campaign_id) %>% summarise(household_redeemed = n_distinct(household_id)), by = 'campaign_id') %>%
mutate(response_rate = household_redeemed/count_household*100) %>%
plot_ly(x = ~campaign_id, y = ~count_household, type = 'bar', name = '#hh_received',
marker = list(color = 'rgb(192,192,192)')) %>%
add_trace(y = ~household_redeemed, name = '#hh_redeemed',
marker = list(color = 'rgb(158,202,225)')) %>%
add_trace(y = ~response_rate, name = "yaxis 2 data", yaxis = "y2",
mode = "lines+markers", type = "scatter", showlegend = FALSE,
marker = list(size = 5,
color = 'rgb(230,120,140)',
line = list(color = 'rgb(0,0,0)',
width = 2))) %>%
layout(
title = "<b>Type A Campaign<b>",
xaxis = list(title="<b>campaign id</b> "),
yaxis = list(side = 'left', title="<b>#household</b>", showgrid = FALSE, zeroline = FALSE),
yaxis2 = list(side = 'right', overlaying = "y", title = "<b>redemption%</b>", showgrid
= FALSE, zeroline = FALSE)
)Campaign IDs are sorted based on rollout date**
Type A campaigns had a fair redemption% of ~15% and it has only improved with every subsequent launch. Target audience for these kinds of campaigns have been around ~1000 households.
type_b <- campaigns %>% left_join(campaign_descriptions, by = 'campaign_id') %>% filter(campaign_type == 'Type B') %>%
left_join(coupons, by = 'campaign_id') %>% left_join(products, by = 'product_id') %>%
left_join(coupon_redemptions, by = c('campaign_id','household_id','coupon_upc')) %>%
mutate(is_redeemed = ifelse(is.na(redemption_date) == F, 1, 0))
l = type_b %>% group_by(campaign_id, start_date) %>%
summarise(n()) %>% arrange(start_date) %>%
select(campaign_id)
type_b$campaign_id <- factor(type_b$campaign_id, levels = as.list(l$campaign_id), ordered = T)
type_b %>% group_by(campaign_id,start_date, end_date) %>% summarise(count_household = n_distinct(household_id)) %>%
inner_join(type_b %>% filter(is_redeemed == 1) %>%
group_by(campaign_id) %>% summarise(household_redeemed = n_distinct(household_id)), by = 'campaign_id') %>%
mutate(response_rate = household_redeemed/count_household*100) %>%
plot_ly(x = ~campaign_id, y = ~count_household, type = 'bar', name = '#hh_received',
marker = list(color = 'rgb(192,192,192)')) %>%
add_trace(y = ~household_redeemed, name = '#hh_redeemed',
marker = list(color = 'rgb(158,202,225)')) %>%
add_trace(y = ~response_rate, name = "yaxis 2 data", yaxis = "y2",
mode = "markers", type = "scatter", showlegend = FALSE,
marker = list(size = 5,
color = 'rgb(230,120,140)',
line = list(color = 'rgb(0,0,0)',
width = 2))) %>%
layout(
title = "<b>Type B Campaign<b>",
xaxis = list(title="<b>campaign id</b> "),
yaxis = list(side = 'left', title="<b>#household</b>", showgrid = FALSE, zeroline = FALSE),
yaxis2 = list(side = 'right', overlaying = "y", title = "<b>redemption%</b>", showgrid
= FALSE, zeroline = FALSE)
)Campaign IDs are sorted based on rollout date**
type_c <- campaigns %>% left_join(campaign_descriptions, by = 'campaign_id') %>% filter(campaign_type == 'Type C') %>%
left_join(coupons, by = 'campaign_id') %>% left_join(products, by = 'product_id') %>%
left_join(coupon_redemptions, by = c('campaign_id','household_id','coupon_upc')) %>%
mutate(is_redeemed = ifelse(is.na(redemption_date) == F, 1, 0))
l = type_c %>% group_by(campaign_id, start_date) %>%
summarise(n()) %>% arrange(start_date) %>%
select(campaign_id)
type_c$campaign_id <- factor(type_c$campaign_id, levels = as.list(l$campaign_id), ordered = T)
type_c %>% group_by(campaign_id,start_date, end_date) %>% summarise(count_household = n_distinct(household_id)) %>%
inner_join(type_c %>% filter(is_redeemed == 1) %>%
group_by(campaign_id) %>% summarise(household_redeemed = n_distinct(household_id)), by = 'campaign_id') %>%
mutate(response_rate = household_redeemed/count_household*100) %>%
plot_ly(x = ~campaign_id, y = ~count_household, type = 'bar', name = '#hh_received',
marker = list(color = 'rgb(192,192,192)')) %>%
add_trace(y = ~household_redeemed, name = '#hh_redeemed',
marker = list(color = 'rgb(158,202,225)')) %>%
add_trace(y = ~response_rate, name = "yaxis 2 data", yaxis = "y2",
mode = "markers", type = "scatter", showlegend = FALSE,
marker = list(size = 5,
color = 'rgb(230,120,140)',
line = list(color = 'rgb(0,0,0)',
width = 2))) %>%
layout(
title = "<b>Type C Campaign<b>",
xaxis = list(title="<b>campaign id</b> "),
yaxis = list(side = 'left', title="<b>#household</b>", showgrid = FALSE, zeroline = FALSE),
yaxis2 = list(side = 'right', overlaying = "y", title = "<b>redemption%</b>", showgrid
= FALSE, zeroline = FALSE)
)Campaign IDs are sorted based on rollout date**
Type B and Type C campaigns had approximately 6-8% redemption with a target of approximately 200 households.
Further analysis would provide more information regarding the underperforming campaigns: