Associative_analysis

Association analysis is the task of finding interesting relationships in large datasets. These interesting relationships can take two forms: frequent item sets or association rules. Frequent item sets are a collection of items that frequently occur together.

This section will require creation of association rules that will identify relationships between variables in the dataset. Insights from the analysis will be provided.

Loading the ‘arules’ library that has the infrastructure for representing, manipulating and analyzing transaction data and patterns

# Loading the arules library
#
library(arules)

## Loading required package: Matrix

## 
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

path <- "/home/oppy/Downloads/Supermarket_Sales_Dataset II.csv"
df <- read.transactions(path, sep = ",", rm.duplicates=T)

## distribution of transactions with duplicates:
## 1 
## 5

df

## transactions in sparse format with
##  7501 transactions (rows) and
##  119 items (columns)

# Verifying the object's class
# ---
# This should show us transactions as the type of data that we will need
# ---
# 
class(df)

## [1] "transactions"
## attr(,"package")
## [1] "arules"

# Loading the items
inspect(df[1:5])

##     items               
## [1] {almonds,           
##      antioxydant juice, 
##      avocado,           
##      cottage cheese,    
##      energy drink,      
##      frozen smoothie,   
##      green grapes,      
##      green tea,         
##      honey,             
##      low fat yogurt,    
##      mineral water,     
##      olive oil,         
##      salad,             
##      salmon,            
##      shrimp,            
##      spinach,           
##      tomato juice,      
##      vegetables mix,    
##      whole weat flour,  
##      yams}              
## [2] {burgers,           
##      eggs,              
##      meatballs}         
## [3] {chutney}           
## [4] {avocado,           
##      turkey}            
## [5] {energy bar,        
##      green tea,         
##      milk,              
##      mineral water,     
##      whole wheat rice}

# Generating a summary of the transaction dataset
# ---
# This would give us some information such as the most purchased items, 
# distribution of the item sets (no. of items purchased in each transaction), etc.
# ---
# 
summary(df)

## transactions as itemMatrix in sparse format with
##  7501 rows (elements/itemsets/transactions) and
##  119 columns (items) and a density of 0.03288973 
## 
## most frequent items:
## mineral water          eggs     spaghetti  french fries     chocolate 
##          1788          1348          1306          1282          1229 
##       (Other) 
##         22405 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 1754 1358 1044  816  667  493  391  324  259  139  102   67   40   22   17    4 
##   18   19   20 
##    1    2    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   3.914   5.000  20.000 
## 
## includes extended item information - examples:
##              labels
## 1           almonds
## 2 antioxydant juice
## 3         asparagus

# Exploring the frequency of some articles 
# i.e. items ranging from 8 to 10 and performing 
# some operation in percentage terms of the total transactions 
# 
itemFrequency(df[, 1:10],type = "absolute")

##           almonds antioxydant juice         asparagus           avocado 
##               153                67                36               250 
##       babies food             bacon    barbecue sauce         black tea 
##                34                65                81               107 
##       blueberries        body spray 
##                69                86

round(itemFrequency(df[, 5:10],type = "relative")*100,2)

##    babies food          bacon barbecue sauce      black tea    blueberries 
##           0.45           0.87           1.08           1.43           0.92 
##     body spray 
##           1.15

# Producing a chart of frequencies and fitering 
# to consider only items with a minimum percentage 
# of support/ considering a top x of items
# ---
# Displaying top 10 most common items in the transactions dataset 
# and the items whose relative importance is at least 10%
# 
par(mfrow = c(1, 2))

# plot the frequency of items
itemFrequencyPlot(df, topN = 10,col="cornflowerblue")

itemFrequencyPlot(df, support = 0.1,col="pink")

Mineral water was the top most purchased item

# Building a model based on association rules 
# using the apriori function 
# ---
# We use Min Support as 0.001 and confidence as 0.8
# ---
# 
rulez <- apriori (df, parameter = list(supp = 0.001, conf = 0.8))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 7 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [116 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [74 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rulez

## set of 74 rules

With 0.001 Min support and confidence as 0.8 we obtained 74 rules.

# We can perform an exploration of our model 
summary(rulez)

## set of 74 rules
## 
## rule length distribution (lhs + rhs):sizes
##  3  4  5  6 
## 15 42 16  1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   4.000   4.000   4.041   4.000   6.000 
## 
## summary of quality measures:
##     support           confidence        coverage             lift       
##  Min.   :0.001067   Min.   :0.8000   Min.   :0.001067   Min.   : 3.356  
##  1st Qu.:0.001067   1st Qu.:0.8000   1st Qu.:0.001333   1st Qu.: 3.432  
##  Median :0.001133   Median :0.8333   Median :0.001333   Median : 3.795  
##  Mean   :0.001256   Mean   :0.8504   Mean   :0.001479   Mean   : 4.823  
##  3rd Qu.:0.001333   3rd Qu.:0.8889   3rd Qu.:0.001600   3rd Qu.: 4.877  
##  Max.   :0.002533   Max.   :1.0000   Max.   :0.002666   Max.   :12.722  
##      count       
##  Min.   : 8.000  
##  1st Qu.: 8.000  
##  Median : 8.500  
##  Mean   : 9.419  
##  3rd Qu.:10.000  
##  Max.   :19.000  
## 
## mining info:
##  data ntransactions support confidence
##    df          7501   0.001        0.8
##                                                            call
##  apriori(data = df, parameter = list(supp = 0.001, conf = 0.8))

# Observing rules built in our model i.e. first 5 model rules
# ---
# 
inspect(rulez[10:15])

##     lhs                               rhs             support     confidence
## [1] {red wine, tomato sauce}       => {chocolate}     0.001066524 0.8000000 
## [2] {pancakes, tomato sauce}       => {mineral water} 0.001066524 0.8000000 
## [3] {chicken, protein bar}         => {spaghetti}     0.001199840 0.8181818 
## [4] {meatballs, whole wheat pasta} => {milk}          0.001333156 0.8333333 
## [5] {red wine, soup}               => {mineral water} 0.001866418 0.9333333 
## [6] {turkey, whole wheat pasta}    => {mineral water} 0.001466471 0.8461538 
##     coverage    lift     count
## [1] 0.001333156 4.882669  8   
## [2] 0.001333156 3.356152  8   
## [3] 0.001466471 4.699220  9   
## [4] 0.001599787 6.430898 10   
## [5] 0.001999733 3.915511 14   
## [6] 0.001733102 3.549776 11

Interpretation of the above: If a shopper buys red wine and/or tomato sauce, they are also likely to buy chocolate

Since Mineral Water, eggs, spaghetti were the top 3 most purchased items, we will create a promotion relating to the sale of these items by creating a subset of rules concerning them

This would tell us the items that the customers bought before purchasing each of the item

# ---
# 
mineral <- subset(rulez, subset = rhs %pin% "mineral water")
 
# Then order by confidence
mineral <-sort(mineral, by="confidence", decreasing=TRUE)
inspect(mineral[1:5])

##     lhs                     rhs                 support confidence    coverage     lift count
## [1] {ground beef,                                                                            
##      light cream,                                                                            
##      olive oil}          => {mineral water} 0.001199840  1.0000000 0.001199840 4.195190     9
## [2] {cake,                                                                                   
##      olive oil,                                                                              
##      shrimp}             => {mineral water} 0.001199840  1.0000000 0.001199840 4.195190     9
## [3] {red wine,                                                                               
##      soup}               => {mineral water} 0.001866418  0.9333333 0.001999733 3.915511    14
## [4] {ground beef,                                                                            
##      pancakes,                                                                               
##      whole wheat rice}   => {mineral water} 0.001333156  0.9090909 0.001466471 3.813809    10
## [5] {frozen vegetables,                                                                      
##      milk,                                                                                   
##      spaghetti,                                                                              
##      turkey}             => {mineral water} 0.001199840  0.9000000 0.001333156 3.775671     9

Customers who bought the items in the first column were more likely to buy mineral water

# ---
# 
eggs <- subset(rulez, subset = rhs %pin% "eggs" )
 
# Then order by confidence
eggs <-sort(eggs, by="confidence", decreasing=TRUE)
inspect(eggs[])

##     lhs                               rhs    support     confidence coverage   
## [1] {black tea, spaghetti, turkey} => {eggs} 0.001066524 0.8888889  0.001199840
## [2] {mineral water, pasta, shrimp} => {eggs} 0.001333156 0.8333333  0.001599787
##     lift     count
## [1] 4.946258  8   
## [2] 4.637117 10

Customers who bought the items in the first column were more likely to buy eggs

# ---
# 
spaghetti <- subset(rulez, subset = rhs %pin% "spaghetti" )
 
# Then order by confidence
spaghetti <-sort(spaghetti, by="confidence", decreasing=TRUE)
inspect(spaghetti[])

##      lhs                     rhs             support confidence    coverage     lift count
## [1]  {light cream,                                                                        
##       mineral water,                                                                      
##       shrimp}             => {spaghetti} 0.001066524  0.8888889 0.001199840 5.105326     8
## [2]  {ground beef,                                                                        
##       salmon,                                                                             
##       shrimp}             => {spaghetti} 0.001066524  0.8888889 0.001199840 5.105326     8
## [3]  {burgers,                                                                            
##       milk,                                                                               
##       salmon}             => {spaghetti} 0.001066524  0.8888889 0.001199840 5.105326     8
## [4]  {frozen vegetables,                                                                  
##       ground beef,                                                                        
##       mineral water,                                                                      
##       shrimp}             => {spaghetti} 0.001733102  0.8666667 0.001999733 4.977693    13
## [5]  {burgers,                                                                            
##       frozen vegetables,                                                                  
##       pancakes}           => {spaghetti} 0.001466471  0.8461538 0.001733102 4.859877    11
## [6]  {frozen vegetables,                                                                  
##       olive oil,                                                                          
##       tomatoes}           => {spaghetti} 0.002133049  0.8421053 0.002532996 4.836624    16
## [7]  {green tea,                                                                          
##       ground beef,                                                                        
##       tomato sauce}       => {spaghetti} 0.001333156  0.8333333 0.001599787 4.786243    10
## [8]  {frozen vegetables,                                                                  
##       tomatoes,                                                                           
##       whole wheat rice}   => {spaghetti} 0.001333156  0.8333333 0.001599787 4.786243    10
## [9]  {chicken,                                                                            
##       protein bar}        => {spaghetti} 0.001199840  0.8181818 0.001466471 4.699220     9
## [10] {frozen vegetables,                                                                  
##       ground beef,                                                                        
##       mineral water,                                                                      
##       tomatoes}           => {spaghetti} 0.001199840  0.8181818 0.001466471 4.699220     9
## [11] {bacon,                                                                              
##       pancakes}           => {spaghetti} 0.001733102  0.8125000 0.002133049 4.666587    13
## [12] {milk,                                                                               
##       mineral water,                                                                      
##       parmesan cheese}    => {spaghetti} 0.001066524  0.8000000 0.001333156 4.594793     8
## [13] {cooking oil,                                                                        
##       mineral water,                                                                      
##       red wine}           => {spaghetti} 0.001066524  0.8000000 0.001333156 4.594793     8
## [14] {avocado,                                                                            
##       burgers,                                                                            
##       milk}               => {spaghetti} 0.001066524  0.8000000 0.001333156 4.594793     8
## [15] {frozen vegetables,                                                                  
##       mineral water,                                                                      
##       olive oil,                                                                          
##       tomatoes}           => {spaghetti} 0.001066524  0.8000000 0.001333156 4.594793     8
## [16] {chocolate,                                                                          
##       french fries,                                                                       
##       mineral water,                                                                      
##       olive oil}          => {spaghetti} 0.001066524  0.8000000 0.001333156 4.594793     8

Customers who bought the items in the first column were more likely to buy spaghetti

determine items that customers might buy who have previously bought the top 3 most common items

# Subset the rules
mineral1 <- subset(rulez, subset = lhs %pin% "mineral water")

# Order by confidence
mineral1<-sort(mineral1, by="confidence", decreasing=TRUE)

# inspect top 5
inspect(mineral1[])

##      lhs                     rhs                     support confidence    coverage      lift count
## [1]  {cake,                                                                                        
##       meatballs,                                                                                   
##       mineral water}      => {milk}              0.001066524  1.0000000 0.001066524  7.717078     8
## [2]  {eggs,                                                                                        
##       mineral water,                                                                               
##       pasta}              => {shrimp}            0.001333156  0.9090909 0.001466471 12.722185    10
## [3]  {herb & pepper,                                                                               
##       mineral water,                                                                               
##       rice}               => {ground beef}       0.001333156  0.9090909 0.001466471  9.252498    10
## [4]  {light cream,                                                                                 
##       mineral water,                                                                               
##       shrimp}             => {spaghetti}         0.001066524  0.8888889 0.001199840  5.105326     8
## [5]  {grated cheese,                                                                               
##       mineral water,                                                                               
##       rice}               => {ground beef}       0.001066524  0.8888889 0.001199840  9.046887     8
## [6]  {escalope,                                                                                    
##       hot dogs,                                                                                    
##       mineral water}      => {milk}              0.001066524  0.8888889 0.001199840  6.859625     8
## [7]  {chocolate,                                                                                   
##       ground beef,                                                                                 
##       milk,                                                                                        
##       mineral water,                                                                               
##       spaghetti}          => {frozen vegetables} 0.001066524  0.8888889 0.001199840  9.325253     8
## [8]  {frozen vegetables,                                                                           
##       ground beef,                                                                                 
##       mineral water,                                                                               
##       shrimp}             => {spaghetti}         0.001733102  0.8666667 0.001999733  4.977693    13
## [9]  {mineral water,                                                                               
##       pasta,                                                                                       
##       shrimp}             => {eggs}              0.001333156  0.8333333 0.001599787  4.637117    10
## [10] {frozen vegetables,                                                                           
##       ground beef,                                                                                 
##       mineral water,                                                                               
##       tomatoes}           => {spaghetti}         0.001199840  0.8181818 0.001466471  4.699220     9
## [11] {milk,                                                                                        
##       mineral water,                                                                               
##       parmesan cheese}    => {spaghetti}         0.001066524  0.8000000 0.001333156  4.594793     8
## [12] {cooking oil,                                                                                 
##       mineral water,                                                                               
##       red wine}           => {spaghetti}         0.001066524  0.8000000 0.001333156  4.594793     8
## [13] {frozen vegetables,                                                                           
##       mineral water,                                                                               
##       olive oil,                                                                                   
##       tomatoes}           => {spaghetti}         0.001066524  0.8000000 0.001333156  4.594793     8
## [14] {chocolate,                                                                                   
##       french fries,                                                                                
##       mineral water,                                                                               
##       olive oil}          => {spaghetti}         0.001066524  0.8000000 0.001333156  4.594793     8

Customers who bought mineral water had also also bought milk, shrimp, ground beef etc. before

# Subset the rules
eggs1 <- subset(rulez, subset = lhs %pin% "eggs")

# Order by confidence
eggs1<-sort(eggs1, by="confidence", decreasing=TRUE)

# inspect top 5
inspect(eggs1[])

##     lhs                     rhs                 support confidence    coverage      lift count
## [1] {eggs,                                                                                    
##      mineral water,                                                                           
##      pasta}              => {shrimp}        0.001333156  0.9090909 0.001466471 12.722185    10
## [2] {brownies,                                                                                
##      eggs,                                                                                    
##      ground beef}        => {mineral water} 0.001066524  0.8888889 0.001199840  3.729058     8
## [3] {chocolate,                                                                               
##      eggs,                                                                                    
##      frozen vegetables,                                                                       
##      ground beef}        => {mineral water} 0.001466471  0.8461538 0.001733102  3.549776    11
## [4] {chocolate,                                                                               
##      eggs,                                                                                    
##      olive oil,                                                                               
##      spaghetti}          => {mineral water} 0.001199840  0.8181818 0.001466471  3.432428     9
## [5] {cooking oil,                                                                             
##      eggs,                                                                                    
##      olive oil}          => {mineral water} 0.001066524  0.8000000 0.001333156  3.356152     8
## [6] {cake,                                                                                    
##      eggs,                                                                                    
##      milk,                                                                                    
##      turkey}             => {mineral water} 0.001066524  0.8000000 0.001333156  3.356152     8
## [7] {chocolate,                                                                               
##      eggs,                                                                                    
##      milk,                                                                                    
##      olive oil}          => {mineral water} 0.001066524  0.8000000 0.001333156  3.356152     8

Customers who bought eggs had also also bought milk before

# Subset the rules
spaghetti1 <- subset(rulez, subset = lhs %pin% "spaghetti")

# Order by confidence
spaghetti1<-sort(spaghetti1, by="confidence", decreasing=TRUE)

# inspect top 5
inspect(spaghetti1[])

##      lhs                     rhs                     support confidence    coverage     lift count
## [1]  {frozen vegetables,                                                                          
##       milk,                                                                                       
##       spaghetti,                                                                                  
##       turkey}             => {mineral water}     0.001199840  0.9000000 0.001333156 3.775671     9
## [2]  {black tea,                                                                                  
##       spaghetti,                                                                                  
##       turkey}             => {eggs}              0.001066524  0.8888889 0.001199840 4.946258     8
## [3]  {chocolate,                                                                                  
##       ground beef,                                                                                
##       milk,                                                                                       
##       mineral water,                                                                              
##       spaghetti}          => {frozen vegetables} 0.001066524  0.8888889 0.001199840 9.325253     8
## [4]  {chocolate,                                                                                  
##       frozen vegetables,                                                                          
##       shrimp,                                                                                     
##       spaghetti}          => {mineral water}     0.001733102  0.8666667 0.001999733 3.635831    13
## [5]  {frozen vegetables,                                                                          
##       milk,                                                                                       
##       shrimp,                                                                                     
##       spaghetti}          => {mineral water}     0.001466471  0.8461538 0.001733102 3.549776    11
## [6]  {chocolate,                                                                                  
##       eggs,                                                                                       
##       olive oil,                                                                                  
##       spaghetti}          => {mineral water}     0.001199840  0.8181818 0.001466471 3.432428     9
## [7]  {chocolate,                                                                                  
##       milk,                                                                                       
##       shrimp,                                                                                     
##       spaghetti}          => {mineral water}     0.001199840  0.8181818 0.001466471 3.432428     9
## [8]  {milk,                                                                                       
##       spaghetti,                                                                                  
##       strong cheese}      => {mineral water}     0.001066524  0.8000000 0.001333156 3.356152     8
## [9]  {oil,                                                                                        
##       shrimp,                                                                                     
##       spaghetti}          => {mineral water}     0.001066524  0.8000000 0.001333156 3.356152     8
## [10] {french fries,                                                                               
##       milk,                                                                                       
##       pancakes,                                                                                   
##       spaghetti}          => {mineral water}     0.001066524  0.8000000 0.001333156 3.356152     8

Customers who bought spaghetti had also also bought mineral water, eggs and frozen vegetables before

Conclusions

Mineral Water, eggs, spaghetti were the top 3 most purchased items

Customers who bought ground beef, light cream, olive oil were more likely to buy mineral water

Customers who bought black tea, spaghetti, turkey were more likely to buy eggs

Customers who bought ight cream, mineral water, shrimp were more likely to buy Spaghetti

Customers who bought eggs had also also bought milk before

Customers who bought spaghetti had also also bought mineral water, eggs and frozen vegetables before

Customers who bought mineral water had also also bought milk, shrimp, ground beef etc. before

Recommendations;

Curate marketing strategies with the most commonly purchased items such as:

Package deals for the items bought together

Have the isles for the items most commonly bought together closer to each other

Discount the prices for the most commonly bought items

Advertise the items that are most likely to be bought together

Associative_analysis

Oppy

2022-04-03