Trade Activity Effectiveness

2018-12-09

Introduction

The purpose of this report is to provide an overview of the Trade Activity Effectiveness analysis. This document covers the end-to-end journey of the analysis of how the products under Pasta, Pasta Sauce, Pancake mix and Syrups are impacted by trade activities like Feature and Display.

Problem Statement

Feature and Display are part of in-store trade activity that are used to increase sales of products every week. Using the household transaction data, Can we understand what the impact of Feature and Display on the sales of four categories is?

What is the penetration of different categories at a weekly store level?
What are the products for which Feature and/or Display has increased its units sold per week in the 2 years?
Did the sales increase because of the trade activity?
How does the interaction between complimentary categories play a role in this impact?

Why should we solve this problem?

Two reasons why this problem should be solved:

Improve trade activity efficiency by spending trade on the right products
Increase profit from products that have had positive effect from trade activity

Approach

We will use trade activity data and transactions of household data to understand the impact of trade on sales of products
Perform EDA to understand the effect of Feature and Display on sales of the product along with other factors
Perform pre-post analysis to check if the change in sales is statistically significant because of trade of activity

How can the results be consumed?

In case there has been a positive impact on certain products, Kroger can increase its retail margin with the manufacturers
In case there has been no impact and the sales of product have been decreasing, Kroger can increase the cost of Feature and/or Display through product price for manufacturers
For complimentary categories, use Feature and Display for cross promotions to increase sales of products

Packages Required

Packages used:

sas7bdat - to load the SAS datasets
tidyverse - for data cleaning (tidyr), data manipulation (dplyr) and data visualization (ggplot2)
png - to read PNG images
grid - for graphical modifications to images
GGally - for data visualization
DT - for convenient table viewing

# Loading all the packages

library(sas7bdat)
library(tidyverse)
library(png)
library(grid)
library(GGally)
library(DT)

Data Preparation

Step 1: Understanding the context

Data Source: Carbo-loading data

Carbo-Loading contains household level transactions over a period of two years from four categories: Pasta, Pasta Sauce, Syrup, and Pancake Mix. These categories were chosen so that interactions between the categories can be detected and studied.

Step 2: Data Importing

There are four datasets in Carbo-loading:

store_lookup - A dim table containing store information
product_lookup - A dim table containing product information
transactions - A fact table containing transactions from households for 104 weeks
causal_lookup - A fact table containing trade activity information for different products across 62 weeks. There is no trade activity recorded for week 1 to 42

# Iteration for reading the SAS datasets

file_1 <- c("causal_lookup","product_lookup","store_lookup","transactions")
file_2 <- ".sas7bdat"


for (i in seq_along(file_1)) {
  filename <- paste0("data/",file_1[i],file_2)
  if (file.exists(filename)) {
    assign(paste0("df_",file_1[i],file_2),read.sas7bdat(filename))
  } else {
    print("File doesn't exist")
  }
}

Step 3: Data Cleaning

As part of Data Cleaning, we will first check the dimensions and structure of the datasets.

Below is the summary from each table.

causal_lookup :

## Observations: 351,372
## Variables: 6
## $ upc          <fct> 7680850108, 5100001212, 5100002792, 3620000300, 4...
## $ store        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ week         <dbl> 68, 66, 72, 55, 68, 55, 66, 55, 55, 76, 66, 66, 6...
## $ feature_desc <fct> Wrap Interior Feature, Wrap Back Feature, Interio...
## $ display_desc <fct> Not on Display, Not on Display, Not on Display, N...
## $ geography    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...

product_lookup :

## Observations: 927
## Variables: 5
## $ upc                 <fct> 111112360, 566300023, 566300028, 566300029...
## $ product_description <fct> VINCENT S ORIG MARINARA S, PINE MOUNTAIN S...
## $ commodity           <fct> pasta sauce, syrups, syrups, syrups, syrup...
## $ brand               <fct> Vincent's, Pine Mountain, Miller, Miller, ...
## $ product_size        <fct> 25 OZ, 40 OZ, 19 OZ, 12 OZ, 19 OZ, 26    O...

store_lookup :

## Observations: 387
## Variables: 2
## $ store          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ...
## $ store_zip_code <fct> 37865, 30084, 30039, 31210, 30044, 31204, 30064...

transactions :

## Observations: 5,197,681
## Variables: 11
## $ upc                 <fct> 7680850106, 3620000470, 1800028064, 999998...
## $ dollar_sales        <dbl> 0.80, 3.59, 2.25, 0.85, 2.19, 2.19, 3.45, ...
## $ units               <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ time_of_transaction <fct> 1100, 1100, 1137, 1148, 1323, 1323, 1415, ...
## $ geography           <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
## $ week                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ household           <dbl> 125434, 125434, 108320, 162016, 89437, 894...
## $ store               <dbl> 244, 244, 244, 244, 244, 244, 244, 244, 24...
## $ basket              <dbl> 1, 1, 2, 3, 4, 4, 5, 5, 6, 7, 8, 8, 8, 9, ...
## $ day                 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ coupon              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...

Step 4: Checking for duplicate entries

Before joining the datasets into one, check for duplicate entries in each dataset at the right level.

Summary of dimensions (Row X Columns) of the four datasets - Before and after removing duplicates

Table name	Dimensions Before	Dimensions (After)
Causal data	351372, 6	351372, 6
Product data	927, 5	927, 5
Store data	387, 2	387, 2
Transactions	5197681, 11	5197681, 11

Conclusion: There are no duplicate rows for the respective levels in all four datasets.

Step 5: Identifying and treating missing values

We need to check for missing values (NA values) in each dataset individually and then check for NAs, once again, after they have been joined into one dataset.

Missing values in causal_lookup :

Missing values in product_lookup :

Missing values in store_lookup :

Missing values in transactions :

The four datasets don’t have missing values in them separately. But once we join them into one dataset using appropriate keys, we may notice missing values.

Sample of the joined dataset :

Missing values in joined dataset :

Following are the reasons for missing values:

Product table is not updated to include new products from the transactions (plausible reason)
Causal data has data only from week 43 to week 104
Products didn’t have any trade activity for a particular week in a store

Missing values in the Feature and Display columns were imputed in 2 steps:

Since causal data is not available from week 1 to week 42, replace NA is such rows with “No Data”
For rest of the rows with NA values, replace them with “No Trade” since no trade was done for those products

Exploratory Data Analysis

To understand the data better, we will perform univariate and bivariate analysis on the variables through plots.

The weekly sales over 2 years show that there were two unusual spikes in the sales at week 26 and 81.

Total Sales by commodity over the two years.

Pasta Sauce have the highest penetration across almost all weeks because of the volume of products in that category.

The growth has been flat for pancake mixes and syrups whereas sales in Year2 has increased for Pasta and Pasta Sauce.

The distribution of product sales seems to have a lot of outliers for all categories. Since we are going to do a product level comparison of sales, these outliers will be taken care of.

There is high correlation between sales of Pasta and Pasta sauce.

Pre-post

Proposed Approach

Below is an overview of the analytical approach that will be followed to answer the problem statement.

Actual Approach

After cleaning the data, the following steps were performed on the data:

Filtered the cleaned data for weeks 43-104 since they there is no trade data captured from week 1 to 42
Split the above dataset into transactions with trade and transactions without trade
Calculated Average Sales per Store at a product and week level
For each product, ran a paired t-test to obtain the answer if the ‘Average Sales per Store’ for stores with trade is significantly greater that ‘Average Sales per Store’ for stores without trade across all weeks

The final results table:

datatable(final_prod_results)

Key Findings:

Pancake mix has only two products that were used in the t-test and neither of them had any success
40% of pasta sauce products had a positive impact from trade activities (40 out of 100 products), which is the highest
Even though pasta product sales have a high correlation with pasta sauce products, the percentage of positive impact of pasta products (19%) is half of positive impact that of pasta sauce products (20%). This indicates that there may not be an interaction between the sales of these complimentary products

Point to note

Paired t-tests removes paired values which are exactly the same or is just one observation at the product-week level, so a lot of products may have gotten eliminated from t-test process above. The results shown are for only product week sales that conformed to the requirements of the paired t-test.

Summary

To summarize the analysis:

Can we understand what the impact of Feature and Display on the sales of four categories is?

Yes, we have used a paired t-test to understand the effects of weekly sales of each product in stores with and without trade activities.

What is the penetration of different categories at a weekly store level?

Pasta Sauce has had the highest penetration across all weeks, given it’s wide variety of product choices.

What are the products for which Feature and/or Display has increased its units sold per week in the 2 years?

Did the sales increase because of the trade activity?

Given that the sales was compared at weekly level (time aligned) for each product, we can say at 0.05 level of significance that the sales of products increased because of trade activity.

How does the interaction between complimentary categories play a role in this impact?

Although there is high correlation between the sales of complimentary categories, the impact of trade activity doesn’t seem to have been affected by this correlation (Refer to finding 3)

Final Project

Varsha S Kumar