The objective of this analysis is to study the interaction between two pairs of products - Pasta with Pasta Sauce and Syrup with Pancake Mix. We intend to ascertain the popularity index for each commodity-brand combination and further analyze the magnitude of association between pairs of brands each belonging to one of the associated pairs of commodities (E.g. Brand X of Pasta Sauce with Brand Y of Pasta).
The data used is the Carbo-Loading dataset which contains 2 years of Pasta, Pasta Sauce, Syrup and Pancake Mix transactions at household level which was obtained through the loyalty card program of the Kroger departmental store chain.
Using Association Rule Mining we wish to classify not only the brands which are highly popular overall, but pairs which have high lift or magnitude of association.
Firstly, we intend to rank each brand within each commodity category according to its overall popularity by examining total units sold and then total dollar value (We prefer to rank based on units sold over dollar value, as we are assuming that lower prices is a factor in popularity).
Then, we intend to compute certain statistics which will help in our Market Basket Analysis. These are the support for each commodity-brand entity overall (not within each category) which is the frequency of the itemset in the dataset, the confidence of item pairs which is the how often we find that association rule to be true, and the lift which is the main statistic of interest and is the degree to which those two occurrences (the purchase of the two items) are dependent on one another.
A supporting feature of our analysis is to find the top n popular brands within each commodity
This analysis will give the company an understanding of the popularity distribution of the different brands they sell and the associations between product pairs will help the company increase sales by marketing these associated pairs of brands together, incorporating coupons, discounts, promotional schemes etc.
library(haven)
library(dplyr)
library(tidyr)
library(fBasics)
library(ggplot2)
library(gridExtra)
library(knitr)
library(kableExtra)
library(wordcloud)
haven : To Import SAS files
dplyr: Data Manipulation
tidyr: Data Cleaning
fBasics: For Summary Statistics (::basicStats)
ggplot2: Plotting
gridExtra: Arranging plots in a grid
knitr and kableExtra: Dynamic Report Generation - For displaying tables with scroll bars
wordcloud : To create wordclouds
Original Data Source from 84.51: CARBO-LOADING: A RELATIONAL DATABASE
The data was originally collected to understand the interaction between 4 categories Pasta, Pasta Sauce, Pancake Mix, Syrups.
The dataset has 2 years of transactions for the above categories across 387 stores with household level obtained from the loyalty program of the leading US grocer. We have 4 tables transactions, store, product and causal.
Transactions: 5,197,681 rows and 11 variables (columns)
Store: 387 rows and 2 variables
Product: 927 rows and 5 variables
Causal: 351,372 rows and 6 variables
In the Causal table, weeks 1-42 data are missing, 6 stores are missing, 50%(448) products out of 927 are missing .
In the Transactions table Day 159 in week 23 is missing.
We have a SAS dataset which we import using read_sas. We have a relational database which is linked with the help of primary and foreign keys. With the help of these keys we combine the datasets to create a master dataset. On joining, missing values are generated due to a mismatch of values in upc variable. This occurred due to presence of zero at the start of upc in the Transactions table which wasnt the case in the other tables for the same upc ids.
This mismatch is remedied by converting upc variable in all tables to numeric and then joining.
Geography is a redundant column in the causal dataset as it is also there in transaction dataset. Hence, we remove it from the causal dataset
##################
# Importing Data #
##################
files <- list.files(path = "data",pattern = ".sas7bdat")
for (i in seq_along(files)) {
df <- read_sas(paste0("data/",files[i]))
assign(paste0(unlist(strsplit(files[i], "[.]"))[[1]]),df)
rm(df)
}
########################
# Cleaning and Joining #
########################
causal_lookup$upc <- as.numeric(causal_lookup$upc)
product_lookup$upc <- as.numeric(product_lookup$upc)
transactions$upc <- as.numeric(transactions$upc)
causal_lookup$geography <- NULL
trx_prod <- merge(transactions, product_lookup, by = 'upc',
all.x = T)
trx_activity_product_lookup_store <- merge(trx_prod, store_lookup,
by = 'store', all.x = T)
Dollar_sales in the transactions table have negative and zero values for 9585 rows (0.18%), we assume these as abnormal values and as the percentage of rows is low and as dollar sales is not a main variable of concern for this problem statement, we impute these values with the below formula :\[ Imputed.dollar.sales = Average.price.per.unit.of.product * Number.of.units.of.the.observation \]
############################
# Imputing abnormal values #
############################
product_lookup_price <- trx_activity_product_lookup_store %>%
group_by(commodity) %>%
summarize(product_lookup_avg_price =
sum(dollar_sales) / sum(units) )
trx_activity_product_lookup_store <-
trx_activity_product_lookup_store %>%
left_join(product_lookup_price, by = "commodity") %>%
mutate(imputed_dollar_sales = case_when(
dollar_sales > 0 ~ dollar_sales,
dollar_sales < 0 ~ units * product_lookup_avg_price
))
trx_activity_product_lookup_store$dollar_sales <- NULL
| store | upc | units | time_of_transaction | geography | week | household | basket | day | coupon | product_description | commodity | brand | product_size | store_zip_code | product_lookup_avg_price | imputed_dollar_sales |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 5100011464 | 1 | 1800 | 1 | 9 | 33094 | 826468 | 60 | 0 | PREGO MUSHROOM SUPREME SC | pasta sauce | Prego | 25.75 OZ | 37865 | 1.740049 | 1.50 |
| 1 | 3905972884 | 1 | 1308 | 1 | 88 | 46014 | 2818933 | 616 | 0 | SPRING TREE MAPLE SYRUP | syrups | Spring Tree | 8.5 OZ | 37865 | 2.401884 | 4.49 |
| 1 | 9999981816 | 1 | 2311 | 1 | 76 | 448923 | 2373667 | 531 | 0 | PRIVATE LABEL MAPLE SYRUP | syrups | Private Label Premium | 12 OZ | 37865 | 2.401884 | 4.29 |
| 1 | 1510000017 | 1 | 1547 | 1 | 78 | 370371 | 2433682 | 540 | 0 | CRMET LONG SPAG | pasta | Creamette | 7 OZ | 37865 | 0.989966 | 0.69 |
| 1 | 3620000468 | 1 | 0822 | 1 | 42 | 66170 | 1298998 | 291 | 0 | RAGUONION/GARLIC | pasta sauce | Ragu | 26 OZ | 37865 | 1.740049 | 1.61 |
| 1 | 7130000140 | 1 | 1406 | 1 | 30 | 168809 | 684743 | 208 | 0 | RONZONI TOM BSL LINGUINE | pasta | Ronzoni | 12 OZ | 37865 | 0.989966 | 1.49 |
| 1 | 9999966070 | 1 | 1552 | 1 | 13 | 380733 | 328933 | 89 | 0 | PRIVATE LABEL BUTTERED SYRUP | syrups | Private Label | P 24 OZ | 37865 | 2.401884 | 1.79 |
| 1 | 9999985004 | 1 | 1042 | 1 | 26 | 235012 | 536968 | 176 | 0 | PRIVATE LABEL SPAGHETTI REGULAR | pasta | Private Label | 16 OZ | 37865 | 0.989966 | 0.69 |
| 1 | 4144900110 | 1 | 1812 | 1 | 26 | 357535 | 544249 | 179 | 0 | KRUSTEAZ COMP BM PANCAKE | pancake mixes | Krusteaz | 32 OZ | 37865 | 1.702382 | 2.29 |
| 1 | 2066200022 | 1 | 1043 | 1 | 3 | 66466 | 426081 | 21 | 0 | NEWMANSSAUCE | pasta sauce | Newman’s | 26 OZ | 37865 | 1.740049 | 2.49 |
| 1 | 5100002794 | 1 | 1831 | 1 | 78 | 78447 | 2447915 | 543 | 0 | PREGO SPAG SAUCE MEAT | pasta sauce | Prego | 26 OZ | 37865 | 1.740049 | 1.58 |
| 1 | 1510000026 | 1 | 1415 | 1 | 80 | 55160 | 2548204 | 560 | 0 | CRMET EGG NOODLES-WIDE | pasta | Creamette | 1 LB | 37865 | 0.989966 | 1.79 |
| 1 | 601011296 | 1 | 1849 | 1 | 49 | 243508 | 1511932 | 343 | 0 | BARILLA BASILICO SAUCE | pasta sauce | Barilla | 26 OZ | 37865 | 1.740049 | 2.00 |
| 1 | 9999981175 | 1 | 1928 | 1 | 41 | 284378 | 1271568 | 284 | 0 | PRIVATE LABEL X WIDE EGG NOODLES | pasta | Private Label Value | 1 LB | 37865 | 0.989966 | 0.79 |
| 1 | 4112927445 | 1 | 0938 | 1 | 43 | 244182 | 1331140 | 300 | 0 | CLSCO CABERNET MARINARA | pasta sauce | Classico | 26 OZ | 37865 | 1.740049 | 2.50 |
| 1 | 1510000256 | 1 | 1341 | 1 | 76 | 384918 | 2373672 | 532 | 0 | CREAMETTE ROTINI MACARONI | pasta | Creamette | 16 OZ | 37865 | 0.989966 | 1.29 |
| 1 | 9999981581 | 1 | 1312 | 1 | 31 | 68360 | 698006 | 213 | 0 | PRIVATE LABEL SPAG SCE MUSHROOM | pasta sauce | Private Label | 26.50 OZ | 37865 | 1.740049 | 0.88 |
| 1 | 9999985005 | 1 | 1720 | 1 | 30 | 237576 | 680473 | 206 | 0 | PRIVATE LABEL SPAGHETTI REGULAR | pasta | Private Label | 32OZ | 37865 | 0.989966 | 1.19 |
| 1 | 5100001212 | 1 | 1330 | 1 | 65 | 234461 | 2013783 | 455 | 0 | PREGO MUSHROOM & GARLIC | pasta sauce | Prego | 26 OZ | 37865 | 1.740049 | 1.65 |
| 1 | 1510000007 | 1 | 1758 | 1 | 85 | 32681 | 2692564 | 590 | 0 | CREAMETTE ELBOW MACARONI | pasta | Creamette | N 1 LB | 37865 | 0.989966 | 0.99 |
| 1 | 3620000445 | 1 | 1132 | 1 | 75 | 132318 | 2321928 | 519 | 0 | RAGU GS SUPER MUSHROOM | pasta sauce | Ragu | 26.3 OZ | 37865 | 1.740049 | 1.58 |
| 1 | 5100002796 | 1 | 2116 | 1 | 49 | 75500 | 1489583 | 337 | 0 | PREGO MUSHROOM SPAG SAUCE | pasta sauce | Prego | 26 OZ | 37865 | 1.740049 | 1.58 |
| 1 | 5100002548 | 1 | 1814 | 1 | 1 | 50579 | 231266 | 5 | 0 | PREGO SPAG SAUCE PLAIN | pasta sauce | Prego | 14 OZ | 37865 | 1.740049 | 1.39 |
| 1 | 7680852138 | 1 | 1809 | 1 | 3 | 60929 | 415988 | 17 | 0 | BARILLA ITALIAN CHEESE SC | pasta sauce | Barilla | 26 OZ | 37865 | 1.740049 | 2.00 |
| 1 | 3000005970 | 1 | 2032 | 1 | 10 | 32640 | 844933 | 68 | 0 | AUNT JEMIMA ORIGINL SYRUP | syrups | Aunt Jemima | 24 OZ | 37865 | 2.401884 | 3.19 |
| 1 | 3620001375 | 1 | 1853 | 1 | 2 | 32019 | 1736377 | 11 | 0 | RAGU ROBUSTO SIX CHEESE | pasta sauce | Ragu | 26 OZ | 37865 | 1.740049 | 1.99 |
| 1 | 3620001495 | 1 | 1114 | 1 | 5 | 293538 | 456247 | 33 | 0 | (S)RAGU CHNKY GRDNSTYL GR | pasta sauce | Ragu | 45 OZ | 37865 | 1.740049 | 2.79 |
| 1 | 3620001376 | 1 | 1431 | 1 | 49 | 338989 | 1489575 | 337 | 0 | RAGU ROBUSTO ITAL SAUSAGE | pasta sauce | Ragu | 26 OZ | 37865 | 1.740049 | 1.59 |
| 1 | 3620001364 | 2 | 1817 | 1 | 89 | 400504 | 2843382 | 622 | 0 | RAGU CHUNKY G/S MAMAS SCE | pasta sauce | Ragu | 26 OZ | 37865 | 1.740049 | 3.16 |
| 1 | 1510002700 | 1 | 1701 | 1 | 32 | 241528 | 714709 | 220 | 0 | HEALTHY HARVEST ROTINI | pasta | Healthy Harvest | 13.25 OZ | 37865 | 0.989966 | 1.59 |
| 1 | 3620001369 | 1 | 1233 | 1 | 13 | 236391 | 330934 | 90 | 0 | RAGUCREATION ALFRDO | pasta sauce | Ragu | 16 OZ | 37865 | 1.740049 | 0.99 |
| 1 | 9999985050 | 2 | 1431 | 1 | 60 | 434162 | 1823508 | 415 | 0 | PRIVATE LABEL FUSILLI-ROTINI-SPRL | pasta | Private Label Premium | 16 OZ | 37865 | 0.989966 | 1.78 |
| 1 | 1510000031 | 1 | 1044 | 1 | 11 | 241368 | 287252 | 73 | 0 | CREAMETTE LASAGNA MACR | pasta | Creamette | 16 Z | 37865 | 0.989966 | 1.89 |
| 1 | 4112907702 | 1 | 2030 | 1 | 95 | 346966 | 3012692 | 659 | 0 | CLSCO SAUCE ITALIAN SAUSG | pasta sauce | Classico | 26 OZ | 37865 | 1.740049 | 2.00 |
| 1 | 1510000043 | 1 | 1918 | 1 | 67 | 240908 | 2077902 | 469 | 0 | CREAMETTE SPAGHETTI | pasta | Creamette | 16 OZ | 37865 | 0.989966 | 1.39 |
| 1 | 5100002549 | 1 | 1658 | 1 | 10 | 31809 | 839958 | 66 | 0 | PREGO REG SPAGHETTI SAUCE | pasta sauce | Prego | 26 OZ | 37865 | 1.740049 | 1.50 |
| 1 | 9999967725 | 1 | 1152 | 1 | 98 | 502296 | 3100354 | 680 | 0 | PRIVATE LABEL PANCAKE SYRUP | syrups | Private Label | 12 OZ | 37865 | 2.401884 | 1.19 |
| 1 | 7680828008 | 1 | 1428 | 1 | 27 | 374916 | 565789 | 186 | 0 | BARILLA SPAGHETTI PASTA | pasta | Barilla | 16 OZ | 37865 | 0.989966 | 0.99 |
| 1 | 3000003690 | 1 | 1409 | 1 | 52 | 292753 | 1571765 | 358 | 0 | ANT JEMIMA CONTRY RICH SY | syrups | Aunt Jemima | 24 OZ | 37865 | 2.401884 | 1.99 |
| 1 | 5100002549 | 1 | 1259 | 1 | 15 | 275242 | 362295 | 104 | 0 | PREGO REG SPAGHETTI SAUCE | pasta sauce | Prego | 26 OZ | 37865 | 1.740049 | 1.50 |
| 1 | 3000005970 | 1 | 1812 | 1 | 18 | 324141 | 405940 | 122 | 0 | AUNT JEMIMA ORIGINL SYRUP | syrups | Aunt Jemima | 24 OZ | 37865 | 2.401884 | 3.19 |
| 1 | 9999971306 | 1 | 1653 | 1 | 93 | 50278 | 2965902 | 649 | 0 | PRIVATE LABEL HS SPAG SAUCE TOM/BAS | pasta sauce | Private Label | 26 OZ | 37865 | 1.740049 | 1.00 |
| 1 | 5100002796 | 1 | 1209 | 1 | 94 | 372618 | 3008213 | 658 | 0 | PREGO MUSHROOM SPAG SAUCE | pasta sauce | Prego | 26 OZ | 37865 | 1.740049 | 1.58 |
| 1 | 9999985133 | 1 | 1349 | 1 | 8 | 241085 | 268397 | 51 | 0 | PRIVATE LABEL SPAGHETTI | pasta | Private Label | 8 OZ | 37865 | 0.989966 | 0.39 |
| 1 | 3620001376 | 1 | 1749 | 1 | 41 | 37510 | 1267969 | 283 | 0 | RAGU ROBUSTO ITAL SAUSAGE | pasta sauce | Ragu | 26 OZ | 37865 | 1.740049 | 1.61 |
| 1 | 3620000480 | 1 | 1334 | 1 | 36 | 405085 | 1114352 | 246 | 0 | RAGU HEARTY ITAL PSTA SCE | pasta sauce | Ragu | 26.3 OZ | 37865 | 1.740049 | 1.50 |
| 1 | 9999985067 | 1 | 1718 | 1 | 74 | 242778 | 2296306 | 514 | 0 | PRIVATE LABEL VERMICELLI | pasta | Private Label | 16 OZ | 37865 | 0.989966 | 0.52 |
| 1 | 3620000482 | 1 | 1716 | 1 | 86 | 329958 | 2726765 | 597 | 0 | RAGU HEARTY SAUT ON/MUSH | pasta sauce | Ragu | 26.3 OZ | 37865 | 1.740049 | 1.58 |
| 1 | 1510000027 | 1 | 0939 | 1 | 15 | 338989 | 362293 | 104 | 0 | CREAMETTE X-WIDE NOODLE | pasta | Creamette | 16 OZ | 37865 | 0.989966 | 1.79 |
| 1 | 3000005560 | 1 | 1605 | 1 | 86 | 77712 | 2731616 | 598 | 0 | AUNT JEMIMA LITE SYRUP | syrups | Aunt Jemima | 36 OZ | 37865 | 2.401884 | 3.99 |
There are 17 variables in total from the 4 tables these can be majorly divided among store, customer, product, discount and transaction. We have 2 numerical variables number of units and the corresponding sales value of a product type in a basket.
The categorical variables are store, product, household, basket, commodity, brand and a Boolean variable coupon.
The main variables of concern for the project are brand, commodity, basket and household. Pasta and pasta sauce have 56 and 43 type of brands respectively, which can have association with each other, whereas pancake mixes and syrups have 35 and 17 type of brands respectively, which can have association with each other.
###############################
# Summary - numeric variables #
###############################
basic_stats <- basicStats(trx_activity_product_lookup_store[ , c("units",
"imputed_dollar_sales") ])[
c("Mean",
"Stdev",
"Median",
"Minimum",
"Maximum"),
]
| units | imputed_dollar_sales | |
|---|---|---|
| Mean | 1.196616 | 1.759720 |
| Stdev | 0.568602 | 1.126983 |
| Median | 1.000000 | 1.500000 |
| Minimum | 1.000000 | 0.010000 |
| Maximum | 156.000000 | 153.140000 |
#################################
# Summary - character variables #
#################################
char_var <- trx_activity_product_lookup_store %>%
summarize(store = n_distinct(store),
product_lookup = n_distinct(upc),
household = n_distinct(household),
basket = n_distinct(basket),
coupon = n_distinct(coupon),
commodity = n_distinct(commodity),
brand = n_distinct(brand)) %>%
gather(variable, number_of_levels)
| variable | number_of_levels |
|---|---|
| store | 387 |
| product_lookup | 927 |
| household | 510027 |
| basket | 3316349 |
| coupon | 2 |
| commodity | 4 |
| brand | 131 |
#######################################
# Number of brands per each commodity #
#######################################
num_brands <- trx_activity_product_lookup_store %>%
group_by(commodity) %>%
summarize(no_distinct_brand = n_distinct(brand))
| commodity | no_distinct_brand |
|---|---|
| pancake mixes | 17 |
| pasta | 56 |
| pasta sauce | 43 |
| syrups | 35 |
Exploratory Data Analysis
household and commodity to find total values, and then summed for each brand. This will help us calculate support.#######################
# Support Calculation #
#######################
trx_prod_support <- as.data.frame(trx_prod %>%
group_by(basket, commodity, brand) %>%
summarize(num_units = sum(units)) %>%
select( basket, commodity, brand,
num_units) %>%
group_by(commodity, brand) %>%
summarize(sum = sum(num_units))) %>%
mutate('support' = sum / sum(sum))
trx_prod1 <- left_join(trx_prod, trx_prod_support, by = c('commodity','brand') )
trx_prod1 <- trx_prod1[,c('basket','commodity','brand','support', 'units')]
We then split the dataset based on commodity and joined the Pasta with Pasta sauce and Pancake Mix with Syrup datasets by basket, to create datasets where a single lineitem contains both a certain Pancake Mix with a certain Syrup, and likewise for the pasta pair. This will give us associated pairs of products that were bought together.
We then appended the pasta dataset to the pancake dataset and calculated the support of each of the pairs. Following this we calculated lift which is the ratio between support of associated pairs and product of the support of each of the two entities in the pair.
#############################
# Creating Associated Pairs #
#############################
j = 1
name = c('pancake' , 'syrup' , 'pasta' , 'sauce')
for (i in c('pancake mixes' , 'syrups' , 'pasta' , 'pasta sauce'))
{
assign(paste0(name[j]), trx_prod1[trx_prod1$commodity == i, ])
j = j + 1
}
pancake2 <- inner_join(pancake, syrup, by = 'basket')
pasta2 <- inner_join(pasta, sauce, by = 'basket')
full <- rbind(pancake2, pasta2)
| basket | commodity.x | brand.x | support.x | units.x | commodity.y | brand.y | support.y | units.y |
|---|---|---|---|---|---|---|---|---|
| 3060398 | pancake mixes | Fastshake | 6.30e-06 | 2 | syrups | Aunt Jemima | 0.0199269 | 1 |
| 2899256 | pancake mixes | Fastshake | 6.30e-06 | 4 | syrups | Aunt Jemima | 0.0199269 | 1 |
| 3125620 | pancake mixes | Fastshake | 6.30e-06 | 1 | syrups | Aunt Jemima | 0.0199269 | 1 |
| 3121056 | pancake mixes | Fastshake | 6.30e-06 | 1 | syrups | Aunt Jemima | 0.0199269 | 1 |
| 3153652 | pancake mixes | Fastshake | 6.30e-06 | 2 | syrups | Mrs Butterworth | 0.0084565 | 1 |
| 1832615 | pancake mixes | Fastshake | 6.30e-06 | 1 | syrups | Private Label | 0.0357777 | 1 |
| 531796 | pancake mixes | Fastshake | 6.30e-06 | 1 | syrups | Aunt Jemima | 0.0199269 | 1 |
| 154279 | pancake mixes | Fastshake | 6.30e-06 | 1 | syrups | Cozy Cottage | 0.0029629 | 1 |
| 2127566 | pancake mixes | Fastshake | 6.30e-06 | 1 | syrups | Cozy Cottage | 0.0029629 | 1 |
| 2311710 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Karo | 0.0083613 | 1 |
| 1572785 | pancake mixes | M W Flapstax | 4.92e-04 | 6 | syrups | Log Cabin | 0.0068337 | 1 |
| 3056835 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Private Label Value | 0.0016007 | 1 |
| 1517469 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Pennant | 0.0019466 | 1 |
| 2166265 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Private Label | 0.0357777 | 1 |
| 650903 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Log Cabin | 0.0068337 | 1 |
| 776828 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Mrs Butterworth | 0.0084565 | 1 |
| 1455951 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Aunt Jemima | 0.0199269 | 1 |
| 3067503 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Private Label | 0.0357777 | 1 |
| 1806408 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Private Label | 0.0357777 | 1 |
| 1073388 | pancake mixes | M W Flapstax | 4.92e-04 | 3 | syrups | Pennant | 0.0019466 | 1 |
| 2628692 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Log Cabin | 0.0068337 | 1 |
| 2679760 | pancake mixes | M W Flapstax | 4.92e-04 | 3 | syrups | Private Label | 0.0357777 | 1 |
| 2007072 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Aunt Jemima | 0.0199269 | 1 |
| 1498655 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Private Label | 0.0357777 | 1 |
| 1885958 | pancake mixes | M W Flapstax | 4.92e-04 | 4 | syrups | Mrs Butterworth | 0.0084565 | 1 |
| 1576442 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Mrs Butterworth | 0.0084565 | 1 |
| 1314426 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Cozy Cottage | 0.0029629 | 1 |
| 1577620 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Private Label | 0.0357777 | 1 |
| 2358459 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Private Label | 0.0357777 | 1 |
| 791808 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Spring Tree | 0.0015634 | 1 |
| 3217880 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Kellogg | 0.0031179 | 1 |
| 2536741 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Aunt Jemima | 0.0199269 | 1 |
| 1098956 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Aunt Jemima | 0.0199269 | 1 |
| 51445 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Pennant | 0.0019466 | 1 |
| 2507968 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Mrs Butterworth | 0.0084565 | 1 |
| 1107583 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Hungry Jack | 0.0056838 | 1 |
| 20593 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Log Cabin | 0.0068337 | 1 |
| 121886 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Cozy Cottage | 0.0029629 | 1 |
| 2780417 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Aunt Jemima | 0.0199269 | 1 |
| 3512 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Mrs Butterworth | 0.0084565 | 1 |
| 2681321 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Private Label | 0.0357777 | 1 |
| 2421347 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Private Label Premium | 0.0043776 | 1 |
| 42991 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Private Label | 0.0357777 | 1 |
| 1557178 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Private Label | 0.0357777 | 1 |
| 1691914 | pancake mixes | M W Flapstax | 4.92e-04 | 3 | syrups | Cary’s | 0.0023183 | 1 |
| 1691914 | pancake mixes | M W Flapstax | 4.92e-04 | 3 | syrups | Private Label | 0.0357777 | 2 |
| 1901897 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Aunt Jemima | 0.0199269 | 1 |
| 1591607 | pancake mixes | M W Flapstax | 4.92e-04 | 1 | syrups | Mrs Butterworth | 0.0084565 | 1 |
| 2127478 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Aunt Jemima | 0.0199269 | 1 |
| 207758 | pancake mixes | M W Flapstax | 4.92e-04 | 2 | syrups | Aunt Jemima | 0.0199269 | 1 |
########
# Lift #
########
association['support.xy'] <- association$xUy / (nrow(trx_prod_double))
association['lift'] <- association$support.xy / (association$support.x *
association$support.y)
lift_ass <- unique(association[ ,c('commoditybrand.xy', 'lift')])
lift <- lift_ass[order(lift_ass$lift, decreasing = TRUE), ]
####################
# Popularity Index #
####################
popularity <- as.data.frame(trx_prod %>%
group_by(commodity, brand) %>%
summarize(total_sales = sum(dollar_sales),
num_house =
n_distinct(factor(household))) %>%
mutate('popularity.score' =
(num_house *
(total_sales ^ (1 / 3)) / 10 ^ 4)) %>%
select(commodity, brand, num_house,
total_sales, popularity.score)) %>%
group_by(commodity) %>%
mutate(rank =
order(order(popularity.score,
decreasing = TRUE)))
We filtered for brands that have been bought by at least 1000 households and have had at least $10,000 of sales. This new information was joined with our earlier dataset, so we can summarize the brand pairs with the highest lift while having a popularity index for each brand alongside.
| commodity.x | brand.x | commodity.y | brand.y | lift | rank.x | rank.y | total_sales.x | total_sales.y | num_house.x | num_house.y |
|---|---|---|---|---|---|---|---|---|---|---|
| pancake mixes | Krusteaz | syrups | Smuckers | 15.289399 | 4 | 14 | 30528.77 | 24862.79 | 9241 | 6315 |
| pancake mixes | Mrs Butterworth | syrups | Mrs Butterworth | 15.192832 | 7 | 3 | 11415.08 | 159165.47 | 5164 | 31809 |
| pancake mixes | Hodgson Mills | syrups | Cozy Cottage | 15.103232 | 8 | 10 | 12857.64 | 41890.14 | 4287 | 11379 |
| pancake mixes | Private Label | syrups | Private Label Value | 14.060408 | 3 | 15 | 73616.82 | 13190.32 | 32199 | 7227 |
| pancake mixes | Hodgson Mills | syrups | Spring Tree | 11.739578 | 8 | 12 | 12857.64 | 32740.97 | 4287 | 6460 |
| pancake mixes | Hungry Jack | syrups | Hungry Jack | 11.290699 | 2 | 7 | 111435.32 | 113414.69 | 35557 | 18766 |
| pancake mixes | Hodgson Mills | syrups | Maple Grove | 10.997654 | 8 | 17 | 12857.64 | 21175.21 | 4287 | 2452 |
| pancake mixes | Hodgson Mills | syrups | Cary’s | 10.254439 | 8 | 11 | 12857.64 | 46800.22 | 4287 | 8417 |
| pancake mixes | Hodgson Mills | syrups | Private Label Premium | 10.182362 | 8 | 8 | 12857.64 | 103906.08 | 4287 | 14060 |
| pancake mixes | Aunt Jemima | syrups | Aunt Jemima | 8.723005 | 1 | 2 | 206698.12 | 359668.92 | 56103 | 65654 |
| pancake mixes | Private Label | syrups | Private Label | 8.097007 | 3 | 1 | 73616.82 | 352930.86 | 32199 | 106181 |
| pancake mixes | Hodgson Mills | syrups | Smuckers | 7.985365 | 8 | 14 | 12857.64 | 24862.79 | 4287 | 6315 |
| pancake mixes | Krusteaz | syrups | Maple Grove | 7.307305 | 4 | 17 | 30528.77 | 21175.21 | 9241 | 2452 |
| pancake mixes | Mrs Butterworth | syrups | Private Label Value | 7.306707 | 7 | 15 | 11415.08 | 13190.32 | 5164 | 7227 |
| pancake mixes | Hungry Jack | syrups | Mrs Butterworth | 6.558429 | 2 | 3 | 111435.32 | 159165.47 | 35557 | 31809 |
| pancake mixes | Aunt Jemima | syrups | Mrs Butterworth | 6.097321 | 1 | 3 | 206698.12 | 159165.47 | 56103 | 31809 |
| pancake mixes | Hungry Jack | syrups | Smuckers | 6.017868 | 2 | 14 | 111435.32 | 24862.79 | 35557 | 6315 |
| pancake mixes | Mrs Butterworth | syrups | Northwoods | 5.893625 | 7 | 5 | 11415.08 | 90610.37 | 5164 | 29456 |
| pancake mixes | Mrs Butterworth | syrups | Aunt Jemima | 5.381893 | 7 | 2 | 11415.08 | 359668.92 | 5164 | 65654 |
| pancake mixes | Krusteaz | syrups | Kellogg | 5.215197 | 4 | 9 | 30528.77 | 55015.38 | 9241 | 12796 |
| pancake mixes | Hungry Jack | syrups | Kellogg | 5.213002 | 2 | 9 | 111435.32 | 55015.38 | 35557 | 12796 |
| pancake mixes | Hodgson Mills | syrups | Log Cabin | 5.090279 | 8 | 6 | 12857.64 | 130904.45 | 4287 | 23590 |
| pancake mixes | Mrs Butterworth | syrups | Pennant | 5.084162 | 7 | 13 | 11415.08 | 24661.83 | 5164 | 6490 |
| pancake mixes | Hungry Jack | syrups | Aunt Jemima | 5.016468 | 2 | 2 | 111435.32 | 359668.92 | 35557 | 65654 |
| pancake mixes | Krusteaz | syrups | Mrs Butterworth | 4.896092 | 4 | 3 | 30528.77 | 159165.47 | 9241 | 31809 |
| pancake mixes | Hungry Jack | syrups | Log Cabin | 4.739713 | 2 | 6 | 111435.32 | 130904.45 | 35557 | 23590 |
| pancake mixes | Aunt Jemima | syrups | Northwoods | 4.623519 | 1 | 5 | 206698.12 | 90610.37 | 56103 | 29456 |
| pancake mixes | Hungry Jack | syrups | Northwoods | 4.500026 | 2 | 5 | 111435.32 | 90610.37 | 35557 | 29456 |
| pancake mixes | Krusteaz | syrups | Log Cabin | 4.461440 | 4 | 6 | 30528.77 | 130904.45 | 9241 | 23590 |
| pancake mixes | Aunt Jemima | syrups | Log Cabin | 4.330303 | 1 | 6 | 206698.12 | 130904.45 | 56103 | 23590 |
| pancake mixes | Mrs Butterworth | syrups | Private Label | 4.315227 | 7 | 1 | 11415.08 | 352930.86 | 5164 | 106181 |
| pasta | Barilla Plus | pasta sauce | Bertolli | 4.286954 | 13 | 6 | 19262.89 | 258090.84 | 6849 | 40187 |
| pancake mixes | Krusteaz | syrups | Northwoods | 4.280993 | 4 | 5 | 30528.77 | 90610.37 | 9241 | 29456 |
| pasta | Dececco | pasta sauce | Emeril’s | 4.174479 | 15 | 10 | 14780.83 | 60173.35 | 3111 | 9564 |
| pancake mixes | Aunt Jemima | syrups | Hungry Jack | 4.117391 | 1 | 7 | 206698.12 | 113414.69 | 56103 | 18766 |
| pancake mixes | Krusteaz | syrups | Private Label Premium | 4.092764 | 4 | 8 | 30528.77 | 103906.08 | 9241 | 14060 |
| pancake mixes | Krusteaz | syrups | Hungry Jack | 4.026349 | 4 | 7 | 30528.77 | 113414.69 | 9241 | 18766 |
| pancake mixes | Krusteaz | syrups | Aunt Jemima | 3.985543 | 4 | 2 | 30528.77 | 359668.92 | 9241 | 65654 |
| pancake mixes | Bisquick | syrups | Hungry Jack | 3.902702 | 5 | 7 | 18437.72 | 113414.69 | 7021 | 18766 |
| pancake mixes | Mrs Butterworth | syrups | Smuckers | 3.887910 | 7 | 14 | 11415.08 | 24862.79 | 5164 | 6315 |
| pasta | Healthy Harvest | pasta sauce | Emeril’s | 3.871710 | 11 | 10 | 28112.45 | 60173.35 | 8054 | 9564 |
| pancake mixes | Mrs Butterworth | syrups | Hungry Jack | 3.862348 | 7 | 7 | 11415.08 | 113414.69 | 5164 | 18766 |
| pasta | Barilla | pasta sauce | Barilla | 3.843450 | 2 | 7 | 433764.78 | 175055.76 | 125579 | 28574 |
| pancake mixes | Aunt Jemima | syrups | Smuckers | 3.841292 | 1 | 14 | 206698.12 | 24862.79 | 56103 | 6315 |
| pancake mixes | Bisquick | syrups | Smuckers | 3.812462 | 5 | 14 | 18437.72 | 24862.79 | 7021 | 6315 |
| pancake mixes | Private Label | syrups | Northwoods | 3.755225 | 3 | 5 | 73616.82 | 90610.37 | 32199 | 29456 |
| pancake mixes | Hungry Jack | syrups | Private Label Value | 3.746628 | 2 | 15 | 111435.32 | 13190.32 | 35557 | 7227 |
| pancake mixes | Krusteaz | syrups | Spring Tree | 3.707536 | 4 | 12 | 30528.77 | 32740.97 | 9241 | 6460 |
| pancake mixes | Bisquick | syrups | Mrs Butterworth | 3.703635 | 5 | 3 | 18437.72 | 159165.47 | 7021 | 31809 |
| pancake mixes | Aunt Jemima | syrups | Kellogg | 3.626670 | 1 | 9 | 206698.12 | 55015.38 | 56103 | 12796 |
The objective of this analysis was to study the interaction between two pairs of products - Pasta with Pasta Sauce and Syrup with Pancake Mix.
We ranked brands with a popularity score for within each commodity and further calculated the magnitude of association between pairs of brands each belonging to one of the associated pairs of commodities (E.g. Brand X of Pasta Sauce with Brand Y of Pasta).
We first joined the transactions table with the product table and then grouped the transactions by household, commodity, brand and basket to calculate the number of units bought by a certain household of a commodity-brand in a single purchase.
We then calculated the total number of units bought for every unique commodity-brand and divided it by the total units bought across all stores to calculate the support.
Support is an indication of how frequently the itemset appears in the dataset
\[ Support = TotalNumProductX / TotalNumAllProducts \]
We then split the data into four datasets on the commodity and join the pairs of datasets - Pasta with Pasta Sauce and Syrup with Pancake Mix. This allowed us to have a dataset with each row containing every product pair that was bought together
Following this, we calculate xUy which is the support of each product-pair.
Note : We use minimum number of units for the product pair as this makes more sense when calculating the frequency of the pair.
Once we have the pair-frequency we proceeded to join the dataset on itself so we had each row containing the frequency and support for product X, product Y and product X & Y.
We removed the rows where the same commodity-brand appears twice in the same row.
We then calculate the lift for each product pair which measures the degree to which those two occurrences are dependent on one another
\[ Lift(X => Y) = Support(X U Y) / [Support(X) * Support(Y)] \]
Here we ascertained the popularity index of each product using households and sales giving a higher weightage for number of distinct households that bought the product. We use this score to rank each induvidual product by its popularity within each commodity category
Finally we filtered for products that have sales greater than $10000 and those which have been bought by atleast 1000 distinct households
We found out the top brand pairs for pancake mixes and syrups and also pasta and pasta sauce.
| commodity.x | brand.x | commodity.y | brand.y | lift | rank.x | rank.y | total_sales.x | total_sales.y | num_house.x | num_house.y |
|---|---|---|---|---|---|---|---|---|---|---|
| pancake mixes | Krusteaz | syrups | Smuckers | 15.289399 | 4 | 14 | 30528.77 | 24862.79 | 9241 | 6315 |
| pancake mixes | Mrs Butterworth | syrups | Mrs Butterworth | 15.192832 | 7 | 3 | 11415.08 | 159165.47 | 5164 | 31809 |
| pancake mixes | Hodgson Mills | syrups | Cozy Cottage | 15.103232 | 8 | 10 | 12857.64 | 41890.14 | 4287 | 11379 |
| pancake mixes | Private Label | syrups | Private Label Value | 14.060408 | 3 | 15 | 73616.82 | 13190.32 | 32199 | 7227 |
| pancake mixes | Hodgson Mills | syrups | Spring Tree | 11.739578 | 8 | 12 | 12857.64 | 32740.97 | 4287 | 6460 |
| pancake mixes | Hungry Jack | syrups | Hungry Jack | 11.290699 | 2 | 7 | 111435.32 | 113414.69 | 35557 | 18766 |
| pancake mixes | Hodgson Mills | syrups | Maple Grove | 10.997654 | 8 | 17 | 12857.64 | 21175.21 | 4287 | 2452 |
| pancake mixes | Hodgson Mills | syrups | Cary’s | 10.254439 | 8 | 11 | 12857.64 | 46800.22 | 4287 | 8417 |
| pancake mixes | Hodgson Mills | syrups | Private Label Premium | 10.182362 | 8 | 8 | 12857.64 | 103906.08 | 4287 | 14060 |
| pancake mixes | Aunt Jemima | syrups | Aunt Jemima | 8.723005 | 1 | 2 | 206698.12 | 359668.92 | 56103 | 65654 |
| pancake mixes | Private Label | syrups | Private Label | 8.097007 | 3 | 1 | 73616.82 | 352930.86 | 32199 | 106181 |
| pancake mixes | Hodgson Mills | syrups | Smuckers | 7.985365 | 8 | 14 | 12857.64 | 24862.79 | 4287 | 6315 |
| pancake mixes | Krusteaz | syrups | Maple Grove | 7.307305 | 4 | 17 | 30528.77 | 21175.21 | 9241 | 2452 |
| pancake mixes | Mrs Butterworth | syrups | Private Label Value | 7.306707 | 7 | 15 | 11415.08 | 13190.32 | 5164 | 7227 |
| pancake mixes | Hungry Jack | syrups | Mrs Butterworth | 6.558429 | 2 | 3 | 111435.32 | 159165.47 | 35557 | 31809 |
| pancake mixes | Aunt Jemima | syrups | Mrs Butterworth | 6.097321 | 1 | 3 | 206698.12 | 159165.47 | 56103 | 31809 |
| pancake mixes | Hungry Jack | syrups | Smuckers | 6.017868 | 2 | 14 | 111435.32 | 24862.79 | 35557 | 6315 |
| pancake mixes | Mrs Butterworth | syrups | Northwoods | 5.893625 | 7 | 5 | 11415.08 | 90610.37 | 5164 | 29456 |
| pancake mixes | Mrs Butterworth | syrups | Aunt Jemima | 5.381893 | 7 | 2 | 11415.08 | 359668.92 | 5164 | 65654 |
| pancake mixes | Krusteaz | syrups | Kellogg | 5.215197 | 4 | 9 | 30528.77 | 55015.38 | 9241 | 12796 |
| pancake mixes | Hungry Jack | syrups | Kellogg | 5.213002 | 2 | 9 | 111435.32 | 55015.38 | 35557 | 12796 |
| pancake mixes | Hodgson Mills | syrups | Log Cabin | 5.090279 | 8 | 6 | 12857.64 | 130904.45 | 4287 | 23590 |
| pancake mixes | Mrs Butterworth | syrups | Pennant | 5.084162 | 7 | 13 | 11415.08 | 24661.83 | 5164 | 6490 |
| pancake mixes | Hungry Jack | syrups | Aunt Jemima | 5.016468 | 2 | 2 | 111435.32 | 359668.92 | 35557 | 65654 |
| pancake mixes | Krusteaz | syrups | Mrs Butterworth | 4.896092 | 4 | 3 | 30528.77 | 159165.47 | 9241 | 31809 |
| pancake mixes | Hungry Jack | syrups | Log Cabin | 4.739713 | 2 | 6 | 111435.32 | 130904.45 | 35557 | 23590 |
| pancake mixes | Aunt Jemima | syrups | Northwoods | 4.623519 | 1 | 5 | 206698.12 | 90610.37 | 56103 | 29456 |
| pancake mixes | Hungry Jack | syrups | Northwoods | 4.500026 | 2 | 5 | 111435.32 | 90610.37 | 35557 | 29456 |
| pancake mixes | Krusteaz | syrups | Log Cabin | 4.461440 | 4 | 6 | 30528.77 | 130904.45 | 9241 | 23590 |
| pancake mixes | Aunt Jemima | syrups | Log Cabin | 4.330303 | 1 | 6 | 206698.12 | 130904.45 | 56103 | 23590 |
| pancake mixes | Mrs Butterworth | syrups | Private Label | 4.315227 | 7 | 1 | 11415.08 | 352930.86 | 5164 | 106181 |
| pasta | Barilla Plus | pasta sauce | Bertolli | 4.286954 | 13 | 6 | 19262.89 | 258090.84 | 6849 | 40187 |
| pancake mixes | Krusteaz | syrups | Northwoods | 4.280993 | 4 | 5 | 30528.77 | 90610.37 | 9241 | 29456 |
| pasta | Dececco | pasta sauce | Emeril’s | 4.174479 | 15 | 10 | 14780.83 | 60173.35 | 3111 | 9564 |
| pancake mixes | Aunt Jemima | syrups | Hungry Jack | 4.117391 | 1 | 7 | 206698.12 | 113414.69 | 56103 | 18766 |
| pancake mixes | Krusteaz | syrups | Private Label Premium | 4.092764 | 4 | 8 | 30528.77 | 103906.08 | 9241 | 14060 |
| pancake mixes | Krusteaz | syrups | Hungry Jack | 4.026349 | 4 | 7 | 30528.77 | 113414.69 | 9241 | 18766 |
| pancake mixes | Krusteaz | syrups | Aunt Jemima | 3.985543 | 4 | 2 | 30528.77 | 359668.92 | 9241 | 65654 |
| pancake mixes | Bisquick | syrups | Hungry Jack | 3.902702 | 5 | 7 | 18437.72 | 113414.69 | 7021 | 18766 |
| pancake mixes | Mrs Butterworth | syrups | Smuckers | 3.887910 | 7 | 14 | 11415.08 | 24862.79 | 5164 | 6315 |
| pasta | Healthy Harvest | pasta sauce | Emeril’s | 3.871710 | 11 | 10 | 28112.45 | 60173.35 | 8054 | 9564 |
| pancake mixes | Mrs Butterworth | syrups | Hungry Jack | 3.862348 | 7 | 7 | 11415.08 | 113414.69 | 5164 | 18766 |
| pasta | Barilla | pasta sauce | Barilla | 3.843450 | 2 | 7 | 433764.78 | 175055.76 | 125579 | 28574 |
| pancake mixes | Aunt Jemima | syrups | Smuckers | 3.841292 | 1 | 14 | 206698.12 | 24862.79 | 56103 | 6315 |
| pancake mixes | Bisquick | syrups | Smuckers | 3.812462 | 5 | 14 | 18437.72 | 24862.79 | 7021 | 6315 |
| pancake mixes | Private Label | syrups | Northwoods | 3.755225 | 3 | 5 | 73616.82 | 90610.37 | 32199 | 29456 |
| pancake mixes | Hungry Jack | syrups | Private Label Value | 3.746628 | 2 | 15 | 111435.32 | 13190.32 | 35557 | 7227 |
| pancake mixes | Krusteaz | syrups | Spring Tree | 3.707536 | 4 | 12 | 30528.77 | 32740.97 | 9241 | 6460 |
| pancake mixes | Bisquick | syrups | Mrs Butterworth | 3.703635 | 5 | 3 | 18437.72 | 159165.47 | 7021 | 31809 |
| pancake mixes | Aunt Jemima | syrups | Kellogg | 3.626670 | 1 | 9 | 206698.12 | 55015.38 | 56103 | 12796 |
Pancake mixes and syrups dominate the list with high lift meaning that they are the product pairs that are bought togetehr a lot more
We also can see that the top brands within each product category arent the ones that occur at the top of the list when bought together. Infact some of the lesser preffered brands occur in the high lift pairs.
The implications are tremendous :-
We have provided a summary which is highly counter-intuitive to what was assumed. The top brands arent bought along with its product counterpart - this could be due to finances, existing offers etc.
We suggested that the company revise the offer scheme and also changed the placement points for these products - placing the ones with higher affinity with each other.
Also, only some brands who produce for example both pancake mixes and syrups are the preffered brands for both products.
We were unable to delve deeper and understand why it was that certain brands were preffered with certain others. With deeper domain knowledge and customer feedback one could ascertain this.
We also could not analyze how often a customer came back for a certain product/ product pair, and we wish to do this in the future.