Association Analysis for Carrefour Kenya Supermarket

1. Defining the Question

Research Question

You are a Data analyst at Carrefour Kenya and are currently undertaking a project that will inform the marketing department on the most relevant marketing strategies that will result in the highest no. of sales (total price including tax). Your project has been divided into four parts where you’ll explore a recent marketing dataset by performing various unsupervised learning techniques and later providing recommendations based on your insights.

a.) Specifying the question

Create association rules that will allow us to identify relationships between variables in the dataset.

b.) The metric for success

Finding most important association rules with a lift greater than 1 in order of confidence level from the dataset provided.

c.) Understanding the context

The dataset provided contains various transactions by Carrefoure Supermarket customers. We are able to perform market basket analysis from these transactions.

d.) Experimental design taken

  1. Reading the data

  2. Checking the data - data understanding

  3. Implementing the solution

  4. Challenge the solution

  5. Follow up Questions

  6. Conclusion.

  7. Recommendations.

e.) Data appropriateness to answer the given question.

We aim to find which associative rules are most important for the supermarket in order to increase sales of items and also strategize marketing teams for certain products which in turn increases profit.

2. Import libraries

# import arules and arulesviz

library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ tidyr::pack()   masks Matrix::pack()
## ✖ dplyr::recode() masks arules::recode()
## ✖ tidyr::unpack() masks Matrix::unpack()
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(glue)

# libraries for visualization
library(ggiraph)
library(ggiraphExtra)

3. Loading the dataset

# load the dataset as Transactions

df <- read.transactions('http://bit.ly/SupermarketDatasetII', sep=',') 
## Warning in asMethod(object): removing duplicated items in transactions
#check data structure of our dataset

class(df)
## [1] "transactions"
## attr(,"package")
## [1] "arules"

This method automatically drops duplicate transactions which will reduce bias in our model.

4. Checking the data

# Previewing our first 5 transactions
#
inspect(head(df))
##     items               
## [1] {almonds,           
##      antioxydant juice, 
##      avocado,           
##      cottage cheese,    
##      energy drink,      
##      frozen smoothie,   
##      green grapes,      
##      green tea,         
##      honey,             
##      low fat yogurt,    
##      mineral water,     
##      olive oil,         
##      salad,             
##      salmon,            
##      shrimp,            
##      spinach,           
##      tomato juice,      
##      vegetables mix,    
##      whole weat flour,  
##      yams}              
## [2] {burgers,           
##      eggs,              
##      meatballs}         
## [3] {chutney}           
## [4] {avocado,           
##      turkey}            
## [5] {energy bar,        
##      green tea,         
##      milk,              
##      mineral water,     
##      whole wheat rice}  
## [6] {low fat yogurt}

5. Exploratory Data Analysis

# checking summary of dataset
summary(df)
## transactions as itemMatrix in sparse format with
##  7501 rows (elements/itemsets/transactions) and
##  119 columns (items) and a density of 0.03288973 
## 
## most frequent items:
## mineral water          eggs     spaghetti  french fries     chocolate 
##          1788          1348          1306          1282          1229 
##       (Other) 
##         22405 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 1754 1358 1044  816  667  493  391  324  259  139  102   67   40   22   17    4 
##   18   19   20 
##    1    2    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   3.914   5.000  20.000 
## 
## includes extended item information - examples:
##              labels
## 1           almonds
## 2 antioxydant juice
## 3         asparagus

We have records of 7501 transactions in our dataset independent of whether the customer is unique since this has not been specified.

There are 119 products in our dataset.

Most people bought 1 item in a single transaction.

#frequency/support plot for items with minimum support of 0.1

df %>%  
  
  itemFrequency() %>% 
  
  as_tibble(rownames = "items") %>% 
  
  rename("support"="value") %>% 
  
  filter(support >= 0.1) %>% 
  
  arrange(-support) %>% 
  
  ggDonut(aes(donuts=items,count=support), explode = c(2,4,6,8), labelposition=0)

The most frequent items are: mineral water, eggs, spaghetti, french fries, chocolate, green tea and milk in that order.

# checking item frequency for first 5 items

itemFrequency(df[, 1:5],type = "absolute")
##           almonds antioxydant juice         asparagus           avocado 
##               153                67                36               250 
##       babies food 
##                34
# percentage frequency relative to all other products

round(itemFrequency(df[, 1:5],type = "relative")*100,2)
##           almonds antioxydant juice         asparagus           avocado 
##              2.04              0.89              0.48              3.33 
##       babies food 
##              0.45
# create subplot
par(mfrow = c(1, 2))

# plot the frequency of top 10 most frequent items 
itemFrequencyPlot(df, topN = 10,col="#34495E")

# plot the freuency of items with a support limit of 0.1

itemFrequencyPlot(df, support = 0.1, col="#935116")

Plotting relative item frequency, we see that out of the 10 top 10 elements with the highest frequency, only 7 have a support not less than 0.1.

The apriori algorithm uses some thresholds measurements namely:

6. Implementing the solution

# building an association rules based model

# set minimum support = 0.001 and minimum confidence = 0.8
rules <- apriori (df, parameter = list(supp = 0.001, conf = 0.8))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 7 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.01s].
## sorting and recoding items ... [116 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [74 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules
## set of 74 rules

The final result is 74 rules.

# check rules summary 

summary(rules)
## set of 74 rules
## 
## rule length distribution (lhs + rhs):sizes
##  3  4  5  6 
## 15 42 16  1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   4.000   4.000   4.041   4.000   6.000 
## 
## summary of quality measures:
##     support           confidence        coverage             lift       
##  Min.   :0.001067   Min.   :0.8000   Min.   :0.001067   Min.   : 3.356  
##  1st Qu.:0.001067   1st Qu.:0.8000   1st Qu.:0.001333   1st Qu.: 3.432  
##  Median :0.001133   Median :0.8333   Median :0.001333   Median : 3.795  
##  Mean   :0.001256   Mean   :0.8504   Mean   :0.001479   Mean   : 4.823  
##  3rd Qu.:0.001333   3rd Qu.:0.8889   3rd Qu.:0.001600   3rd Qu.: 4.877  
##  Max.   :0.002533   Max.   :1.0000   Max.   :0.002666   Max.   :12.722  
##      count       
##  Min.   : 8.000  
##  1st Qu.: 8.000  
##  Median : 8.500  
##  Mean   : 9.419  
##  3rd Qu.:10.000  
##  Max.   :19.000  
## 
## mining info:
##  data ntransactions support confidence
##    df          7501   0.001        0.8
##                                                            call
##  apriori(data = df, parameter = list(supp = 0.001, conf = 0.8))

Here, we discover that majority have a combination of 3 items being bought together.

The summary of quality measures also explains statistic measures of the support, confidence, lift, coverage and count of the 74 rules.

The average confidence of the combination of these items is 85%.

# Observing rules built in our model i.e. first 5 model rules
# ---
# 
inspect(rules[1:5])
##     lhs                              rhs             support     confidence
## [1] {frozen smoothie, spinach}    => {mineral water} 0.001066524 0.8888889 
## [2] {bacon, pancakes}             => {spaghetti}     0.001733102 0.8125000 
## [3] {nonfat milk, turkey}         => {mineral water} 0.001199840 0.8181818 
## [4] {ground beef, nonfat milk}    => {mineral water} 0.001599787 0.8571429 
## [5] {mushroom cream sauce, pasta} => {escalope}      0.002532996 0.9500000 
##     coverage    lift      count
## [1] 0.001199840  3.729058  8   
## [2] 0.002133049  4.666587 13   
## [3] 0.001466471  3.432428  9   
## [4] 0.001866418  3.595877 12   
## [5] 0.002666311 11.976387 19

The first rule states that there is a 88% chance that a customer who picks frozen smoothie and spinach will most likely buy mineral water as well. There are 8 instances of this from our dataset. The lift is greater than 1 hence supports the existence of a correlation between these products.

# Looking at the first ten rules sorted by confidence 
#
# 
rules<-sort(rules, by="confidence", decreasing=TRUE)
inspect(rules[1:10])
##      lhs                        rhs                 support confidence    coverage      lift count
## [1]  {french fries,                                                                               
##       mushroom cream sauce,                                                                       
##       pasta}                 => {escalope}      0.001066524  1.0000000 0.001066524 12.606723     8
## [2]  {ground beef,                                                                                
##       light cream,                                                                                
##       olive oil}             => {mineral water} 0.001199840  1.0000000 0.001199840  4.195190     9
## [3]  {cake,                                                                                       
##       meatballs,                                                                                  
##       mineral water}         => {milk}          0.001066524  1.0000000 0.001066524  7.717078     8
## [4]  {cake,                                                                                       
##       olive oil,                                                                                  
##       shrimp}                => {mineral water} 0.001199840  1.0000000 0.001199840  4.195190     9
## [5]  {mushroom cream sauce,                                                                       
##       pasta}                 => {escalope}      0.002532996  0.9500000 0.002666311 11.976387    19
## [6]  {red wine,                                                                                   
##       soup}                  => {mineral water} 0.001866418  0.9333333 0.001999733  3.915511    14
## [7]  {eggs,                                                                                       
##       mineral water,                                                                              
##       pasta}                 => {shrimp}        0.001333156  0.9090909 0.001466471 12.722185    10
## [8]  {herb & pepper,                                                                              
##       mineral water,                                                                              
##       rice}                  => {ground beef}   0.001333156  0.9090909 0.001466471  9.252498    10
## [9]  {ground beef,                                                                                
##       pancakes,                                                                                   
##       whole wheat rice}      => {mineral water} 0.001333156  0.9090909 0.001466471  3.813809    10
## [10] {frozen vegetables,                                                                          
##       milk,                                                                                       
##       spaghetti,                                                                                  
##       turkey}                => {mineral water} 0.001199840  0.9000000 0.001333156  3.775671     9

The above first 4 rules have a 100% confidence chance of occurring.

The lift value for all the top 10 rules is greater than 1 in all instances signifying correlation.

Let’s visualize top 10 rules sorted by lift.

# visualizing the rules with ggiraph for interactivity
plot_rules <- 
  
    rules %>%
  
  # sort rules by lift
  arules::sort(by="lift") %>% 
  
  # convert ouptput to dataframe
  DATAFRAME() %>% 
  
  # convert dataframe to Tibble 
  as_tibble() %>% 
  
  # Take first 10 rules
  head(10) %>%
  
  # define rulename without deleting present variables
  mutate(ruleName = paste(LHS,"=>",RHS) %>% 
  
  # reorder factor levels according to lift and assign support values to support parameter            
  fct_reorder(lift), support = support, confidence = confidence %>% 
  
  # convert support and confidence to percentage   
  percent(),
  
  # assign lift values to lift parameter and round to 2dp
  lift = lift %>% round(2)) %>%
  
  # use rulename, support and lift for plot
  select(ruleName, support, confidence, lift) %>% 
  
  #create plot
  ggplot(aes(x=ruleName,y=lift)) + ggtitle('Top 10 Rules Plot') +
  geom_segment(aes(xend=ruleName, yend=0), 
               color="#DC7633",
               size=1) +
  
  # make plot interactive
  geom_point_interactive(aes(tooltip=glue("Support: {support}\nConfidence: {confidence}\nLift: {lift}"), 
                             data_id=support),
                         size=3, 
                         color="#85C1E9") +
  coord_flip() +
  theme_minimal() +
  theme(
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    
    # theme background color
    panel.background = element_rect(fill = "#979A9A", color = NA),
    # theme background color for plot
    plot.background = element_rect(fill = "#D4EFDF", color = NA)
  ) + xlab("") + ylab("") 

# display plot
girafe(ggobj = plot_rules)

The top most rule implies that it is more likely that customers will buy eggs, mineral water and pasta and later add shrimp than buying shrimp alone. In order to increase shrimp sales, these antecedent products can be given a group discount.

# checking mineral water appearance on rhs
water <- subset(rules, subset = rhs %pin% "mineral water")
 
# Then order by confidence
water <- sort(water, by="confidence", decreasing=TRUE)
inspect(water)
##      lhs                     rhs                 support confidence    coverage     lift count
## [1]  {ground beef,                                                                            
##       light cream,                                                                            
##       olive oil}          => {mineral water} 0.001199840  1.0000000 0.001199840 4.195190     9
## [2]  {cake,                                                                                   
##       olive oil,                                                                              
##       shrimp}             => {mineral water} 0.001199840  1.0000000 0.001199840 4.195190     9
## [3]  {red wine,                                                                               
##       soup}               => {mineral water} 0.001866418  0.9333333 0.001999733 3.915511    14
## [4]  {ground beef,                                                                            
##       pancakes,                                                                               
##       whole wheat rice}   => {mineral water} 0.001333156  0.9090909 0.001466471 3.813809    10
## [5]  {frozen vegetables,                                                                      
##       milk,                                                                                   
##       spaghetti,                                                                              
##       turkey}             => {mineral water} 0.001199840  0.9000000 0.001333156 3.775671     9
## [6]  {chocolate,                                                                              
##       frozen vegetables,                                                                      
##       olive oil,                                                                              
##       shrimp}             => {mineral water} 0.001199840  0.9000000 0.001333156 3.775671     9
## [7]  {frozen smoothie,                                                                        
##       spinach}            => {mineral water} 0.001066524  0.8888889 0.001199840 3.729058     8
## [8]  {cake,                                                                                   
##       meatballs,                                                                              
##       milk}               => {mineral water} 0.001066524  0.8888889 0.001199840 3.729058     8
## [9]  {cake,                                                                                   
##       olive oil,                                                                              
##       whole wheat pasta}  => {mineral water} 0.001066524  0.8888889 0.001199840 3.729058     8
## [10] {brownies,                                                                               
##       eggs,                                                                                   
##       ground beef}        => {mineral water} 0.001066524  0.8888889 0.001199840 3.729058     8
## [11] {chicken,                                                                                
##       fresh bread,                                                                            
##       pancakes}           => {mineral water} 0.001066524  0.8888889 0.001199840 3.729058     8
## [12] {chocolate,                                                                              
##       soup,                                                                                   
##       turkey}             => {mineral water} 0.001066524  0.8888889 0.001199840 3.729058     8
## [13] {chocolate,                                                                              
##       frozen vegetables,                                                                      
##       shrimp,                                                                                 
##       spaghetti}          => {mineral water} 0.001733102  0.8666667 0.001999733 3.635831    13
## [14] {ground beef,                                                                            
##       nonfat milk}        => {mineral water} 0.001599787  0.8571429 0.001866418 3.595877    12
## [15] {turkey,                                                                                 
##       whole wheat pasta}  => {mineral water} 0.001466471  0.8461538 0.001733102 3.549776    11
## [16] {frozen vegetables,                                                                      
##       milk,                                                                                   
##       shrimp,                                                                                 
##       spaghetti}          => {mineral water} 0.001466471  0.8461538 0.001733102 3.549776    11
## [17] {chocolate,                                                                              
##       eggs,                                                                                   
##       frozen vegetables,                                                                      
##       ground beef}        => {mineral water} 0.001466471  0.8461538 0.001733102 3.549776    11
## [18] {olive oil,                                                                              
##       soup,                                                                                   
##       tomatoes}           => {mineral water} 0.001333156  0.8333333 0.001599787 3.495992    10
## [19] {frozen vegetables,                                                                      
##       olive oil,                                                                              
##       shrimp}             => {mineral water} 0.001866418  0.8235294 0.002266364 3.454862    14
## [20] {nonfat milk,                                                                            
##       turkey}             => {mineral water} 0.001199840  0.8181818 0.001466471 3.432428     9
## [21] {cooking oil,                                                                            
##       fromage blanc}      => {mineral water} 0.001199840  0.8181818 0.001466471 3.432428     9
## [22] {french fries,                                                                           
##       herb & pepper,                                                                          
##       milk}               => {mineral water} 0.001199840  0.8181818 0.001466471 3.432428     9
## [23] {burgers,                                                                                
##       frozen vegetables,                                                                      
##       olive oil}          => {mineral water} 0.001199840  0.8181818 0.001466471 3.432428     9
## [24] {frozen vegetables,                                                                      
##       milk,                                                                                   
##       olive oil,                                                                              
##       soup}               => {mineral water} 0.001199840  0.8181818 0.001466471 3.432428     9
## [25] {chocolate,                                                                              
##       eggs,                                                                                   
##       olive oil,                                                                              
##       spaghetti}          => {mineral water} 0.001199840  0.8181818 0.001466471 3.432428     9
## [26] {chocolate,                                                                              
##       milk,                                                                                   
##       shrimp,                                                                                 
##       spaghetti}          => {mineral water} 0.001199840  0.8181818 0.001466471 3.432428     9
## [27] {frozen vegetables,                                                                      
##       olive oil,                                                                              
##       soup}               => {mineral water} 0.001733102  0.8125000 0.002133049 3.408592    13
## [28] {black tea,                                                                              
##       salmon}             => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8
## [29] {pancakes,                                                                               
##       tomato sauce}       => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8
## [30] {milk,                                                                                   
##       spaghetti,                                                                              
##       strong cheese}      => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8
## [31] {grated cheese,                                                                          
##       ground beef,                                                                            
##       rice}               => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8
## [32] {oil,                                                                                    
##       shrimp,                                                                                 
##       spaghetti}          => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8
## [33] {escalope,                                                                               
##       hot dogs,                                                                               
##       milk}               => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8
## [34] {chocolate,                                                                              
##       hot dogs,                                                                               
##       milk}               => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8
## [35] {chocolate,                                                                              
##       olive oil,                                                                              
##       soup}               => {mineral water} 0.001599787  0.8000000 0.001999733 3.356152    12
## [36] {cooking oil,                                                                            
##       eggs,                                                                                   
##       olive oil}          => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8
## [37] {burgers,                                                                                
##       frozen vegetables,                                                                      
##       low fat yogurt}     => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8
## [38] {cake,                                                                                   
##       eggs,                                                                                   
##       milk,                                                                                   
##       turkey}             => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8
## [39] {chocolate,                                                                              
##       eggs,                                                                                   
##       milk,                                                                                   
##       olive oil}          => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8
## [40] {chocolate,                                                                              
##       frozen vegetables,                                                                      
##       pancakes,                                                                               
##       shrimp}             => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8
## [41] {french fries,                                                                           
##       milk,                                                                                   
##       pancakes,                                                                               
##       spaghetti}          => {mineral water} 0.001066524  0.8000000 0.001333156 3.356152     8

Out of the 74 rules, mineral water is a consequent product in 41 instances.

# investigating water as an antecedent(bought first)
# checking mineral water appearance on rhs
mineral <- subset(rules, subset = lhs %pin% "mineral water")
 
# Then order by confidence
mineral <- sort(mineral, by="confidence", decreasing=TRUE)
inspect(mineral)
##      lhs                     rhs                     support confidence    coverage      lift count
## [1]  {cake,                                                                                        
##       meatballs,                                                                                   
##       mineral water}      => {milk}              0.001066524  1.0000000 0.001066524  7.717078     8
## [2]  {eggs,                                                                                        
##       mineral water,                                                                               
##       pasta}              => {shrimp}            0.001333156  0.9090909 0.001466471 12.722185    10
## [3]  {herb & pepper,                                                                               
##       mineral water,                                                                               
##       rice}               => {ground beef}       0.001333156  0.9090909 0.001466471  9.252498    10
## [4]  {light cream,                                                                                 
##       mineral water,                                                                               
##       shrimp}             => {spaghetti}         0.001066524  0.8888889 0.001199840  5.105326     8
## [5]  {grated cheese,                                                                               
##       mineral water,                                                                               
##       rice}               => {ground beef}       0.001066524  0.8888889 0.001199840  9.046887     8
## [6]  {escalope,                                                                                    
##       hot dogs,                                                                                    
##       mineral water}      => {milk}              0.001066524  0.8888889 0.001199840  6.859625     8
## [7]  {chocolate,                                                                                   
##       ground beef,                                                                                 
##       milk,                                                                                        
##       mineral water,                                                                               
##       spaghetti}          => {frozen vegetables} 0.001066524  0.8888889 0.001199840  9.325253     8
## [8]  {frozen vegetables,                                                                           
##       ground beef,                                                                                 
##       mineral water,                                                                               
##       shrimp}             => {spaghetti}         0.001733102  0.8666667 0.001999733  4.977693    13
## [9]  {mineral water,                                                                               
##       pasta,                                                                                       
##       shrimp}             => {eggs}              0.001333156  0.8333333 0.001599787  4.637117    10
## [10] {frozen vegetables,                                                                           
##       ground beef,                                                                                 
##       mineral water,                                                                               
##       tomatoes}           => {spaghetti}         0.001199840  0.8181818 0.001466471  4.699220     9
## [11] {milk,                                                                                        
##       mineral water,                                                                               
##       parmesan cheese}    => {spaghetti}         0.001066524  0.8000000 0.001333156  4.594793     8
## [12] {cooking oil,                                                                                 
##       mineral water,                                                                               
##       red wine}           => {spaghetti}         0.001066524  0.8000000 0.001333156  4.594793     8
## [13] {frozen vegetables,                                                                           
##       mineral water,                                                                               
##       olive oil,                                                                                   
##       tomatoes}           => {spaghetti}         0.001066524  0.8000000 0.001333156  4.594793     8
## [14] {chocolate,                                                                                   
##       french fries,                                                                                
##       mineral water,                                                                               
##       olive oil}          => {spaghetti}         0.001066524  0.8000000 0.001333156  4.594793     8

Results tell us that the consequent product having bought mineral water is spaghetti(has higher probability than other products).

# checking eggs appearance on rhs

eggsrhs <- subset(rules, subset = rhs %pin% "eggs")
 
# Then order by confidence
eggsrhs <- sort(eggsrhs, by="confidence", decreasing=TRUE)
inspect(eggsrhs)
##     lhs                               rhs    support     confidence coverage   
## [1] {black tea, spaghetti, turkey} => {eggs} 0.001066524 0.8888889  0.001199840
## [2] {mineral water, pasta, shrimp} => {eggs} 0.001333156 0.8333333  0.001599787
##     lift     count
## [1] 4.946258  8   
## [2] 4.637117 10
# checking mineral water appearance on lhs

eggslhs <- subset(rules, subset = lhs %pin% "eggs")
 
# Then order by confidence
eggslhs <- sort(eggslhs, by="confidence", decreasing=TRUE)
inspect(eggslhs)
##     lhs                     rhs                 support confidence    coverage      lift count
## [1] {eggs,                                                                                    
##      mineral water,                                                                           
##      pasta}              => {shrimp}        0.001333156  0.9090909 0.001466471 12.722185    10
## [2] {brownies,                                                                                
##      eggs,                                                                                    
##      ground beef}        => {mineral water} 0.001066524  0.8888889 0.001199840  3.729058     8
## [3] {chocolate,                                                                               
##      eggs,                                                                                    
##      frozen vegetables,                                                                       
##      ground beef}        => {mineral water} 0.001466471  0.8461538 0.001733102  3.549776    11
## [4] {chocolate,                                                                               
##      eggs,                                                                                    
##      olive oil,                                                                               
##      spaghetti}          => {mineral water} 0.001199840  0.8181818 0.001466471  3.432428     9
## [5] {cooking oil,                                                                             
##      eggs,                                                                                    
##      olive oil}          => {mineral water} 0.001066524  0.8000000 0.001333156  3.356152     8
## [6] {cake,                                                                                    
##      eggs,                                                                                    
##      milk,                                                                                    
##      turkey}             => {mineral water} 0.001066524  0.8000000 0.001333156  3.356152     8
## [7] {chocolate,                                                                               
##      eggs,                                                                                    
##      milk,                                                                                    
##      olive oil}          => {mineral water} 0.001066524  0.8000000 0.001333156  3.356152     8

7. Challenging the solution

Since we do not know the frequency of customers, we are unabl to rule out the possibility of one customer buying same products severally. However, this is only a small probability and does not prevent us from trusting our findings.

8. Follow up Questions

Do we have the right data?

We have the right data containing transactions in a supermarket.

Do we have the right question?

The supermarket(client) wanted to find out which products are associated.

9. Conclusion

According to our analysis, there are 74 rules that can be applied by the customer. We shall however focus on the top 10 and test the algorithm with the response of customers’ transaction behaviours.

10. Recommendations

We are able to determine that mineral water is the most bought item from the supermarket. In order to increase profits, the supermarket can give discounts to consequent products after a customer picks mineral water and also rearrange the shelves in such a way that these products are close to the mineral water. These products include spaghetti, ground beef, milk, eggs, frozen vegetables and shrimp.