Synopsis

The goal of the project is to analyze and understand the sales pattern across the different campaigns that will help in improving the revenue for our company. This can be achieved by identifying products that the customers are buying more across different campaigns.

We plan to address the problem by adopting the following steps:

  1. We want to identify the campaigns that generates the maximum revenue.

  2. In order to better understand the driving factors for generating better revenue in the top 3 campaigns we try to understand the sales over the different age group and income in these campaigns.

  3. After understanding what age group and income group is contributing to the revenue in the most popular campaign, we want to figure out what product(s) is making this particular campaign generate better revenue when compared to other top campaigns.

For this analysis, we have used complete journey datasets.

This analysis will help to understand what coupons should be recommended targeting which age and income group. Also it will give us insights that what products should the company target to give coupons on, so we can attract more customers.

Packages Required

The following packages are used in this project :

Package Name Purpose
dplyr data manipulation
lubridate parsing of date-times
ggplot2 data visualization
gridExtra functions on grid objects
completejourney transactions related datasets
library(dplyr)
library(lubridate)
library(ggplot2)
library(gridExtra)
library(completejourney)

Variable Catalog

The completejourney package provides access to data sets characterizing household level transactions over one year from a group of 2,469 households who are frequent shoppers at a grocery store. For certain households, demographic information as well as direct marketing contact history are included.

The following datasets from complete journey package is used in this project:

transactions

This table contains transaction-level product purchases by households.

Variable Data Type Description
household_id character Uniquely identifies each household
store_id character Uniquely identifies each store
basket_id character Uniquely identifies each purchase occasion
product_id character Uniquely identifies each product
quantity numeric Number of the product purchased during the trip
sales_value numeric Amount of dollars the retailer receives from sale
retail_disc numeric Discount applied due to the retailer’s loyalty card program
coupon_disc numeric Discount applied due to a manufacturer coupon
coupon_match_disc numeric Discount applied due to retailer’s match of manufacturer coupon
week integer Week of the transaction; Ranges 1-53
transaction_timestamp POSIXct Date and time of day when the transaction occurred

demographics

This table contains demographic information for a portion of households.

Variable Data Type Description
household_id character Uniquely identifies each household
age ordered factor Estimated age range
income ordered factor Household income
home_ownership ordered factor Homeowner, renter, etc.
marital_status ordered factor Marital status (Married, Single, Unknown)
household_size ordered factor Size of household up to 5+
household_comp ordered factor Household composition
kid_count ordered factor Number of children present up to 3+

products

This table contains metatdata regarding the products purchased (brand, description, etc.).

Variable Data Type Description
product_id character Uniquely identifies each product
manufacturer_id character Uniquely identifies each manufacturer
department character Groups similar products together
brand factor Indicates private or national label brand
product_category character Groups similar products together at lower level
product_type character Groups similar products together at lowest level
package_size character Indicates package size (not available for all products)

campaigns

This table contains identifying information for the marketing campaigns each households participated in.

Variable Data Type Description
campaign_id character Uniquely identifies each campaign; Ranges 1-27
household_id character Uniquely identifies each household

campaign_descriptions

This table contains campaign metadata, specifically the time a campaign was active.

Variable Data Type Description
campaign_id character Uniquely identifies each campaign; Ranges 1-27
campaign_type ordered factor Type of campaign (Type A, Type B, Type C)
start_date Date Start date of campaign
end_date Date End date of campaign

coupons

This table contains all the coupons sent to customers as part of a campaign, as well as the products for which each coupon is redeemable.

Variable Data Type Description
coupon_upc character Uniquely identifies each coupon (unique to household and campaign)
product_id character Uniquely identifies each product
campaign_id character Uniquely identifies each campaign

coupon_redemptions

This table contains individual coupon redemptions (household, day, UPC code, campaign).

Variable Data Type Description
household_id character Uniquely identifies each household
coupon_upc character Uniquely identifies each coupon (unique to household and campaign)
campaign_id character Uniquely identifies each campaign
redemption_date Date Date when the coupon was redeemed

Exploratory Data Analysis

Loading all the required dataframes

# Getting all the required data sets required for our analysis.
transactions <- get_transactions()
coupons = coupons
coupon_redemptions = coupon_redemptions
demo = demographics
prod = products
campaigns = campaigns
campaign_desc = campaign_descriptions

Preparing the required data for our analysis

# Preparing transactions, demo, and products data
tran_demo_prod = transactions %>%
  inner_join(products, "product_id") %>%
  inner_join(demo, "household_id")

# Preparing the campaign data
campaigns_data = campaign_desc %>%
  left_join(campaigns, "campaign_id")

# Preparing the Household and campaign
tran_demo_prod_camp = tran_demo_prod %>%
  left_join(campaigns_data, c("household_id")) %>%
  left_join(coupon_redemptions, c("household_id", "campaign_id"))

# Preparing the coupons and redemptions
coup_data = coupons %>%
  left_join(coupon_redemptions, c("coupon_upc", "campaign_id"))

# Preparing transactions, demo, products and coupon redemptions data
tran_demo_prod_red = tran_demo_prod %>% inner_join(coupon_redemptions, by = "household_id")

Analysing the campaigns

# Preparing campaigns Max Min Sales data
camp_max_min <- tran_demo_prod_red %>%
  filter(yday(transaction_timestamp) == yday(redemption_date)) %>%
  group_by(campaign_id, week) %>%
  summarize(total_sale = sum(sales_value, na.rm = T)) %>%
  summarize(min = min(total_sale),
            max = max(total_sale))

other_camp <- camp_max_min %>%
  filter(!(campaign_id %in% c(8,13,18)))

camp_max_min_high <- camp_max_min %>%
  filter((campaign_id %in% c(8,13,18)))

ggplot(camp_max_min)+
  geom_segment(aes(x=1, xend=2, y=`min`, yend=`max`, color=campaign_id), size=.99, show.legend=T)+
  geom_vline(xintercept=1, linetype="dashed", size=.1)+
  geom_vline(xintercept=2, linetype="dashed", size=.1)+
  labs(x="Campaigns ", y="Sales Value", color = 'Campaign ID',
       title = "Campaign Analysis with Sales Value")+
  scale_x_continuous(breaks = seq(1, 2, by = 1))+
  scale_y_continuous(breaks = seq(0, 15000, by = 5000), labels = scales::dollar)+
  geom_text(label="Campaign Start", x=1, y=1*(max(camp_max_min$`min`, camp_max_min$`max`)), hjust=0.2, size=3)+
  geom_text(label="Campaign End", x=2, y=1*(max(camp_max_min$`min`, camp_max_min$`max`)), hjust= 0.8, size=3)+
  theme(plot.title = element_text(size = 15, face = 'bold'),
        axis.text.x = element_text(face = "bold",hjust=1),
        axis.text.y = element_text(face = "bold"),
        axis.title.x = element_text(face = "bold"),
       axis.title.y = element_text(face = "bold"),
       legend.title = element_text(face = "bold"))

Analysing all the campaign trends with respect to sales values

Comparing the top 3 campaigns with others

left_label <- paste(other_camp$campaign_id, round(other_camp$`min`),sep=", ")
right_label <- paste(other_camp$campaign_id, round(other_camp$`max`),sep=", ")

plot1 <- ggplot(other_camp) + geom_segment(aes(x=1, xend=2, y=`min`, yend=`max` ), size=.99, show.legend=F) +
  geom_vline(xintercept=1, linetype="dashed", size=.1) +
  geom_vline(xintercept=2, linetype="dashed", size=.1) +
  labs(x="", y="Sales Value", title = " Low Performing Campaigns ") +
  scale_x_continuous(breaks = seq(1, 2, by = 1))+
  ylim(0,15000)+  
  theme(plot.title = element_text(size = 15, face = 'bold'),
        axis.text.x = element_text(face = "bold",hjust=1),
        axis.text.y = element_text(face = "bold"),
        axis.title.x = element_text(face = "bold"),
       axis.title.y = element_text(face = "bold"),
       legend.title = element_text(face = "bold"))



left_label1 <- paste(camp_max_min_high$campaign_id, round(camp_max_min_high$`min`),sep=", ")
right_label1 <- paste(camp_max_min_high$campaign_id, round(camp_max_min_high$`max`),sep=", ")

plot2<- ggplot(camp_max_min_high) + geom_segment(aes(x=1, xend=2, y=`min`, yend=`max`, color = campaign_id), size=.99, show.legend=F) +
  geom_vline(xintercept=1, linetype="dashed", size=.1) + 
  geom_vline(xintercept=2, linetype="dashed", size=.1) +
  labs(x="", y="Sales Value",title = "Top 3 Performing Campaigns ") +
  scale_x_continuous(breaks = seq(1, 2, by = 1))+
  scale_y_continuous(breaks = seq(0, 15000, by = 5000), labels = scales::dollar)+
  geom_text(label=left_label1, y=1.05*camp_max_min_high$`min`, x=rep(1, NROW(camp_max_min_high)), hjust=0.2, size=3.3, dodge = 1)+
  geom_text(label=right_label1, y=camp_max_min_high$`max`, x=rep(2, NROW(camp_max_min_high)), hjust=0.7, size=3.3)+ 
  theme(plot.title = element_text(size = 15, face = 'bold'),
        axis.text.x = element_text(face = "bold",hjust=1),
        axis.text.y = element_text(face = "bold"),
        axis.title.x = element_text(face = "bold"),
       axis.title.y = element_text(face = "bold"),
       legend.title = element_text(face = "bold"))

grid.arrange( plot2, plot1, ncol = 2)

The campaigns 8,13,18 had highest margin compared to other campaigns

Creating seperate data for each campaigns

tran_demo_prod_red <- tran_demo_prod%>%inner_join(coupon_redemptions, by = "household_id")

camp <- tran_demo_prod_red%>%filter(yday(transaction_timestamp)==yday(redemption_date))

camp8<- tran_demo_prod_red%>%filter(yday(transaction_timestamp)==yday(redemption_date)) %>% filter(campaign_id==8)

camp18<- tran_demo_prod_red%>%filter(yday(transaction_timestamp)==yday(redemption_date)) %>% filter(campaign_id==18)

camp13<- tran_demo_prod_red%>%filter(yday(transaction_timestamp)==yday(redemption_date)) %>% filter(campaign_id==13)

Comparing Campaign sales by different age groups

p1<-camp%>%
 group_by( age)%>%
  summarize(total_sales = sum(sales_value))%>%
  ggplot(aes(age, total_sales))+
  geom_point(size=3) +
  geom_segment(aes(x = age, 
                   xend = age, 
                   y = 0, 
                   yend = total_sales)) +
  scale_y_continuous( labels = scales::dollar)+
  labs(title = "All Campaigns Sales by Age",
       x = "Age",
       y = "Sales")+
  theme(plot.title = element_text(size = 10, face = 'bold'),
        plot.subtitle = element_text(size = 5, face = 'italic'),
        axis.text.x = element_text(face = "bold",hjust=1),
        axis.text.y = element_text(face = "bold"),
        axis.title.x = element_text(face = "bold"),
        axis.title.y = element_text(face = "bold"),
        legend.title = element_text(face = "bold"),
        plot.background = element_rect(fill = "white") )

p2<-camp8%>%
 group_by( age)%>%
  summarize(total_sales = sum(sales_value))%>%
  ggplot(aes(age, total_sales))+
  geom_point(size=3) +
  geom_segment(aes(x = age, 
                   xend = age, 
                   y = 0, 
                   yend = total_sales)) +
  scale_y_continuous(labels = scales::dollar)+
  labs(title = "Campaign 8 Sales by Age",
       x = "Age",
       y = "Sales")+
  theme(plot.title = element_text(size = 10, face = 'bold'),
        plot.subtitle = element_text(size = 5, face = 'italic'),
        axis.text.x = element_text(face = "bold",hjust=1),
        axis.text.y = element_text(face = "bold"),
        axis.title.x = element_text(face = "bold"),
        axis.title.y = element_text(face = "bold"),
        legend.title = element_text(face = "bold"),
        plot.background = element_rect(fill = "white") )

p3<- camp13%>%
 group_by( age)%>%
  summarize(total_sales = sum(sales_value))%>%
  ggplot(aes(age, total_sales))+
  geom_point(size=3) +
  geom_segment(aes(x = age, 
                   xend = age, 
                   y = 0, 
                   yend = total_sales)) +
  scale_y_continuous( labels = scales::dollar)+
  labs(title = "Campaign 13 Sales by Age",
       x = "Age",
       y = "Sales")+
  theme(plot.title = element_text(size = 10, face = 'bold'),
        plot.subtitle = element_text(size = 5, face = 'italic'),
        axis.text.x = element_text(face = "bold",hjust=1),
        axis.text.y = element_text(face = "bold"),
        axis.title.x = element_text(face = "bold"),
        axis.title.y = element_text(face = "bold"),
        legend.title = element_text(face = "bold"),
        plot.background = element_rect(fill = "white") )

p4<-camp18%>%
 group_by( age)%>%
  summarize(total_sales = sum(sales_value))%>%
  ggplot(aes(age, total_sales))+
  geom_point(size=3) +
  scale_y_continuous( labels = scales::dollar)+
  geom_segment(aes(x = age, 
                   xend = age, 
                   y = 0, 
                   yend = total_sales)) +
  labs(title = "Campaign 18 Sales by Age",
       x = "Age",
       y = "Sales")+
  theme(plot.title = element_text(size = 10, face = 'bold'),
        plot.subtitle = element_text(size = 5, face = 'italic'),
        axis.text.x = element_text(face = "bold",hjust=1),
        axis.text.y = element_text(face = "bold"),
        axis.title.x = element_text(face = "bold"),
        axis.title.y = element_text(face = "bold"),
        legend.title = element_text(face = "bold"),
        plot.background = element_rect(fill = "white") )

grid.arrange(p1, p2, p3, p4, ncol = 2)

From the above plots it is observed that the 45-54 age group people were highest contributers followed by 35-44 age group and 25-34 age group. But, when it comes to campaign 18 age 25-34 group contributed more than 35-44 age group. It would be interesting to find what would be the driving factor for the 25-34 age group in the campaign 18.

Segregating the previous analysis with income level

tran_demo_prod_camp%>%
  filter(age == '25-34')%>%
  filter(campaign_id %in% c(13,8,18))%>%
  group_by(campaign_id, age, income)%>%
  summarize(total_sales = sum(sales_value))%>%
  ggplot(aes(age, total_sales, fill = income))+
  geom_col()+facet_grid(~campaign_id)+
  scale_x_discrete(breaks = " ")+
  scale_y_continuous(breaks = seq(0, 800000, by = 100000), labels = scales::comma)+
   labs(title = " Top 3 Campaigns - Sales by Income Levels",
       x = "Age Group 25-34",
       y = "Sales",
       fill = 'Income Levels')+
  theme(plot.title = element_text(size = 10, face = 'bold'),
        plot.subtitle = element_text(size = 5, face = 'italic'),
        axis.text.x = element_text(face = "bold",hjust=1),
        axis.text.y = element_text(face = "bold"),
        axis.title.x = element_text(face = "bold"),
        axis.title.y = element_text(face = "bold"),
        legend.title = element_text(face = "bold"),
        plot.background = element_rect(fill = "white"))

From the Top 3 Campaigns - Sales by Income Levels chart, it is observed that people with income level under 15000$ shop more during the campaign 18. Let’s drill down to department level to find what were things which made these people shop more…

Analysing products purchased by the ‘Under 15k’ income household for patterns

tran_demo_prod_red %>%
  filter(yday(transaction_timestamp) == yday(redemption_date)) %>%
  filter(campaign_id %in% c(8, 13, 18)) %>%
  filter(income == 'Under 15K') %>%
  filter(age == '25-34') %>%
  filter(department != 'GROCERY')%>%
  group_by(campaign_id) %>%
  ggplot(aes(campaign_id, sales_value, fill = department)) +
  geom_bar(stat = "identity")+
  labs(title = "Top 3 Campaigns - Sales by Department",
       subtitle = "Grocery is excluded to get clear understanding on other departments",
       x = "Campaign ID",
       y = "Total Sales",
       fill = "Departments") +
  theme(plot.title = element_text(size = 10, face = 'bold'),
        plot.subtitle = element_text(size = 8, face = 'italic'),
        axis.text.x = element_text(face = "bold",hjust=1),
        axis.text.y = element_text(face = "bold"),
        axis.title.x = element_text(face = "bold"),
        axis.title.y = element_text(face = "bold"),
        legend.title = element_text(face = "bold"),
        plot.background = element_rect(fill = "white"))

Since grocery was top department in all campaigns it was excluded from the above analysis to find which other departments performed better. We can clearly see that Meat, Meat-Packaged, Produce, Seafood-Packaged are sold the mostly in the Campaign 18.

Conclusion

The objective of the analysis is to understand the sales pattern across the different campaigns and improve the revenue for our company. In order to boost the revenue, we analyzed the various datasets in the completejourney package by various data wrangling and visualization techniques such as joining, filtering, and summarizing data affiliated with campaigns, demographics, and products.

After a thorough analysis, we have identified the following insights from the data we considered for our analysis:

  • First, we identified that, out of all the 27 campaigns, campaigns such as 8, 13, and 18 generate the maximum revenue.

  • We deep-dived into the transactions associated with Campaigns 8, 13, and 18, and found that under campaign 18, the sale value generated from the customers that belong to the age group ‘25-34’ is higher than that of the’35-44’ age group.

  • Later, we also analyzed the previous trend with income groups and found that the ‘Under 15k’ income households are the primary driving factor.

  • On further drilling down, we found that Meat related products such as Meat, Packaged meat, seafood, packaged seafood, and produce are major factors that are contributing to the increased revenue.

Including meat-related coupons similar to that used in campaign 18 in future campaigns will target more customers belonging to the age group of ‘25-34’ and ‘Under 15k’ income household groups will help us increase the revenue.

Limitations

Most of the products were not on display, they were placed in random locations than any other display locations which could have driven the customers. So, we could not do further analysis using the promotions dataset which could have given us the information on which is the right way to promote our products. Hence, using this analysis we can conclude only about which products to focus on to improve the revenue for the other campaigns too.