1 Introduction

This report analyzes the trends of customers’ purchase for select items and explore methods to improve sales. We study the interactions between four commodities, namely Pasta, Pasta Sauce, Syrup and Pancake-Mix. The available data consists of purchase information spanning two years.

The data contains purchase information mapped to household and brand data for a duration of two years. Our goal is to find a pattern in consumer purchase and explore ways to increase sales.

The analysis is geared toward using coupons to improve sales. Coupons have become an essential tool in retail. They are often used for promotional activities, for building customer loyalty and driving sale of a product by bucketing them with existing popular items.

We assess the impact of coupons on the spending habits of the consumers in terms of the number of units purchased and the overall amount spent by the consumer. In addition to this, we also want to explore the relationship between the sale of complementary and substitute product categories.

Our focus is on the following KPIs:

  • Units purchased
  • Coupon usage
  • Customer Spending per Transaction/Household

We analyze these KPIs at commodity, brand and household level.

We apply generalized statistical methods, which can be easily transferred to similar problems in other commercial segments. In addition, the gained insights are readily applicable to similar product combinations. This approach can also be used to understand the effectiveness of any similar promotional activity.

Based on the data provided, we plan to develop a multiple linear regression model which will facilitate the understanding of the degree of variability in the customer spending as a function of coupon usage.

This analysis will help retailers

  • Assess the impact of coupons in driving sales and increasing market penetration
  • To segregate market segments and tailor promotional schemes specific to each market segment
  • Increase sales and greater customer satisfaction
  • Additionally, help drive sales of a particular brand e.g. store-owned brands.

2 Package requirements

Following R packages were used in this analysis:

  • haven: Import/export SAS data files
  • dplyr: For data manipulation e.g. filter, mutate
  • base: Base R functions
  • stringr: Simplify string operations
  • ggplot2: Plot variables of interest

3 Data Preparation

Data Source

The data set for this project “Carbo-Loading SAS” can be downloaded at this location.

3.1 Data Dictionary

The data used for this analysis is gathered over a period of 2 years and has been fully anonymized. It contains household level transactions over a period of two years from four categories: Pasta, Pasta Sauce, Syrup and Pancake Mix. These categories were chosen so that interactions between the categories can be detected and studied. The data-set consists of four tables exported from SAS. The schema design of the source tables involved is as follows: