Exercise 3

Complete all Exercises, and submit answers to VtopBeta

Question 1

Apply apriori algorithm to perform Market Basket Analysis Model for the following:

Given Dataset for question 1
V1	V2	V3	V4	V5	V6	V7
T1	M	O	N	K	E	Y
T2	D	O	N	K	E	Y
T3	M	A	K	E
T4	M	U	C	K	Y
T5	C	O	O	K	E

Find the Association Rule using minimum support of 60% and minimum confidence of 80%.

library(arules)
#Data Preprocessing
load("dataset.RData")
summary(dataset)

## transactions as itemMatrix in sparse format with
##  5 rows (elements/itemsets/transactions) and
##  10 columns (items) and a density of 0.5 
## 
## most frequent items:
##       K       E       M       O       Y (Other) 
##       5       4       3       3       3       7 
## 
## element (itemset/transaction) length distribution:
## sizes
## 4 5 6 
## 2 1 2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       4       4       5       5       6       6 
## 
## includes extended item information - examples:
##   labels
## 1      A
## 2      C
## 3      D
## 
## includes extended transaction information - examples:
##   transactionID
## 1            T1
## 2            T2
## 3            T3

itemFrequencyPlot(dataset,topN=10)

#Apriori
rules <- apriori(data=dataset,parameter=list(support=0.60,confidence=0.80))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5     0.6      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 3 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[10 item(s), 5 transaction(s)] done [0.00s].
## sorting and recoding items ... [5 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [10 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

summary(rules)

## set of 10 rules
## 
## rule length distribution (lhs + rhs):sizes
## 1 2 3 
## 2 6 2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1       2       2       2       2       3 
## 
## summary of quality measures:
##     support      confidence        lift          count    
##  Min.   :0.6   Min.   :0.80   Min.   :1.00   Min.   :3.0  
##  1st Qu.:0.6   1st Qu.:1.00   1st Qu.:1.00   1st Qu.:3.0  
##  Median :0.6   Median :1.00   Median :1.00   Median :3.0  
##  Mean   :0.7   Mean   :0.96   Mean   :1.05   Mean   :3.5  
##  3rd Qu.:0.8   3rd Qu.:1.00   3rd Qu.:1.00   3rd Qu.:4.0  
##  Max.   :1.0   Max.   :1.00   Max.   :1.25   Max.   :5.0  
## 
## mining info:
##     data ntransactions support confidence
##  dataset             5     0.6        0.8

#Data Visualization
inspect(sort(rules,by='lift')[1:10])

##      lhs      rhs support confidence lift count
## [1]  {O}   => {E} 0.6     1.0        1.25 3    
## [2]  {K,O} => {E} 0.6     1.0        1.25 3    
## [3]  {}    => {E} 0.8     0.8        1.00 4    
## [4]  {}    => {K} 1.0     1.0        1.00 5    
## [5]  {M}   => {K} 0.6     1.0        1.00 3    
## [6]  {O}   => {K} 0.6     1.0        1.00 3    
## [7]  {Y}   => {K} 0.6     1.0        1.00 3    
## [8]  {E}   => {K} 0.8     1.0        1.00 4    
## [9]  {K}   => {E} 0.8     0.8        1.00 4    
## [10] {E,O} => {K} 0.6     1.0        1.00 3

Question 2

Create a dataset with minimum 30 transactions with the purchase list of combination made out of 10 Items. Apply Apriori Algorithm to generate the association rule with

a) Minimum confidence – 50% and Minimum support -50%

Inference:

Since the data set doesn’t allow for a minimum support of 0.5, a minimum support of 0.005 has been used. This is prevent any errors produced in the ouput.

library(arules)
#Data Preprocessing
load("dataset2.RData")
summary(dataset)

## transactions as itemMatrix in sparse format with
##  7501 rows (elements/itemsets/transactions) and
##  119 columns (items) and a density of 0.03288973 
## 
## most frequent items:
## mineral water          eggs     spaghetti  french fries     chocolate 
##          1788          1348          1306          1282          1229 
##       (Other) 
##         22405 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
## 1754 1358 1044  816  667  493  391  324  259  139  102   67   40   22   17 
##   16   18   19   20 
##    4    1    2    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   3.914   5.000  20.000 
## 
## includes extended item information - examples:
##              labels
## 1           almonds
## 2 antioxydant juice
## 3         asparagus

itemFrequencyPlot(dataset, topN = 10)

# Training Apriori on the dataset
rules = apriori(data = dataset, parameter = list(support = 0.005, confidence = 0.5))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 37 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [101 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [20 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

# Visualising the results
inspect(sort(rules, by = 'lift')[1:10])

##      lhs                                rhs             support    
## [1]  {ground beef,shrimp}            => {spaghetti}     0.005999200
## [2]  {frozen vegetables,ground beef} => {spaghetti}     0.008665511
## [3]  {frozen vegetables,olive oil}   => {spaghetti}     0.005732569
## [4]  {frozen vegetables,soup}        => {mineral water} 0.005065991
## [5]  {olive oil,soup}                => {mineral water} 0.005199307
## [6]  {frozen vegetables,olive oil}   => {mineral water} 0.006532462
## [7]  {milk,soup}                     => {mineral water} 0.008532196
## [8]  {chocolate,soup}                => {mineral water} 0.005599253
## [9]  {cooking oil,eggs}              => {mineral water} 0.006399147
## [10] {frozen vegetables,ground beef} => {mineral water} 0.009198773
##      confidence lift     count
## [1]  0.5232558  3.005315 45   
## [2]  0.5118110  2.939582 65   
## [3]  0.5058824  2.905531 43   
## [4]  0.6333333  2.656954 38   
## [5]  0.5820896  2.441976 39   
## [6]  0.5764706  2.418404 49   
## [7]  0.5614035  2.355194 64   
## [8]  0.5526316  2.318395 42   
## [9]  0.5454545  2.288286 48   
## [10] 0.5433071  2.279277 69

b) Minimum confidence – 20% and Minimum support -80%

Inference:

Since the data set doesn’t allow for a minimum support of 0.8, a minimum support of 0.008 has been used. This is prevent any errors produced in the ouput.

library(arules)
#Data Preprocessing
load("dataset2.RData")

# Training Apriori on the dataset
rules = apriori(data = dataset, parameter = list(support = 0.008, confidence = 0.2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5   0.008      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 60 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [237 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

# Visualising the results
inspect(sort(rules, by = 'lift')[1:10])

##      lhs                                  rhs           support    
## [1]  {herb & pepper}                   => {ground beef} 0.015997867
## [2]  {frozen vegetables,spaghetti}     => {ground beef} 0.008665511
## [3]  {frozen vegetables,ground beef}   => {spaghetti}   0.008665511
## [4]  {mineral water,spaghetti}         => {ground beef} 0.017064391
## [5]  {eggs,french fries}               => {burgers}     0.009065458
## [6]  {mineral water,soup}              => {milk}        0.008532196
## [7]  {milk,spaghetti}                  => {ground beef} 0.009732036
## [8]  {frozen vegetables,mineral water} => {ground beef} 0.009198773
## [9]  {whole wheat pasta}               => {milk}        0.009865351
## [10] {eggs,ground beef}                => {spaghetti}   0.008932142
##      confidence lift     count
## [1]  0.3234501  3.291994 120  
## [2]  0.3110048  3.165328  65  
## [3]  0.5118110  2.939582  65  
## [4]  0.2857143  2.907928 128  
## [5]  0.2490842  2.856852  68  
## [6]  0.3699422  2.854873  64  
## [7]  0.2744361  2.793141  73  
## [8]  0.2574627  2.620390  69  
## [9]  0.3348416  2.583999  74  
## [10] 0.4466667  2.565426  67

c) Minimum confidence – 80% and Minimum support -20%

Inference:

Since the data set doesn’t allow for a minimum support and confidence of 0.2 and 0.8, respectively. A minimum support of 0.002 and a minimum confidence of 0.6 has been used. This is prevent any errors produced in the ouput.

library(arules)
#Data Preprocessing
load("dataset2.RData")

#Apriori
rules <- apriori(data=dataset,parameter=list(support=0.002,confidence=0.6))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.6    0.1    1 none FALSE            TRUE       5   0.002      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 15 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [115 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [43 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

summary(rules)

## set of 43 rules
## 
## rule length distribution (lhs + rhs):sizes
##  3  4 
## 16 27 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   3.000   4.000   3.628   4.000   4.000 
## 
## summary of quality measures:
##     support           confidence          lift            count      
##  Min.   :0.002133   Min.   :0.6000   Min.   : 2.517   Min.   :16.00  
##  1st Qu.:0.002200   1st Qu.:0.6180   1st Qu.: 2.637   1st Qu.:16.50  
##  Median :0.002400   Median :0.6400   Median : 2.743   Median :18.00  
##  Mean   :0.002713   Mean   :0.6643   Mean   : 3.315   Mean   :20.35  
##  3rd Qu.:0.003066   3rd Qu.:0.6972   3rd Qu.: 3.344   3rd Qu.:23.00  
##  Max.   :0.005066   Max.   :0.9500   Max.   :11.976   Max.   :38.00  
## 
## mining info:
##     data ntransactions support confidence
##  dataset          7501   0.002        0.6

#Data Visualization
inspect(sort(rules,by='lift')[1:10])

##      lhs                       rhs                     support confidence      lift count
## [1]  {mushroom cream sauce,                                                              
##       pasta}                => {escalope}          0.002532996  0.9500000 11.976387    19
## [2]  {parmesan cheese,                                                                   
##       tomatoes}             => {frozen vegetables} 0.002133049  0.6666667  6.993939    16
## [3]  {frozen vegetables,                                                                 
##       olive oil,                                                                         
##       tomatoes}             => {spaghetti}         0.002133049  0.8421053  4.836624    16
## [4]  {frozen vegetables,                                                                 
##       mineral water,                                                                     
##       soup}                 => {milk}              0.003066258  0.6052632  4.670863    23
## [5]  {frozen vegetables,                                                                 
##       ground beef,                                                                       
##       shrimp}               => {spaghetti}         0.002399680  0.7500000  4.307619    18
## [6]  {cereals,                                                                           
##       ground beef}          => {spaghetti}         0.003066258  0.6764706  3.885303    23
## [7]  {cooking oil,                                                                       
##       ground beef,                                                                       
##       mineral water}        => {spaghetti}         0.002133049  0.6666667  3.828994    16
## [8]  {frozen vegetables,                                                                 
##       ground beef,                                                                       
##       olive oil}            => {spaghetti}         0.002133049  0.6400000  3.675835    16
## [9]  {french wine,                                                                       
##       ground beef}          => {spaghetti}         0.002399680  0.6206897  3.564926    18
## [10] {olive oil,                                                                         
##       tomatoes}             => {spaghetti}         0.004399413  0.6111111  3.509912    33

Exercise 3

Jacob John

Question 1

Question 2

a) Minimum confidence – 50% and Minimum support -50%

Inference:

b) Minimum confidence – 20% and Minimum support -80%

Inference:

c) Minimum confidence – 80% and Minimum support -20%

Inference: