Complete all Exercises, and submit answers to VtopBeta
Apply apriori algorithm to perform Market Basket Analysis Model for the following:
| V1 | V2 | V3 | V4 | V5 | V6 | V7 |
|---|---|---|---|---|---|---|
| T1 | M | O | N | K | E | Y |
| T2 | D | O | N | K | E | Y |
| T3 | M | A | K | E | ||
| T4 | M | U | C | K | Y | |
| T5 | C | O | O | K | E |
Find the Association Rule using minimum support of 60% and minimum confidence of 80%.
library(arules)
#Data Preprocessing
load("dataset.RData")
summary(dataset)## transactions as itemMatrix in sparse format with
## 5 rows (elements/itemsets/transactions) and
## 10 columns (items) and a density of 0.5
##
## most frequent items:
## K E M O Y (Other)
## 5 4 3 3 3 7
##
## element (itemset/transaction) length distribution:
## sizes
## 4 5 6
## 2 1 2
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4 4 5 5 6 6
##
## includes extended item information - examples:
## labels
## 1 A
## 2 C
## 3 D
##
## includes extended transaction information - examples:
## transactionID
## 1 T1
## 2 T2
## 3 T3
itemFrequencyPlot(dataset,topN=10)#Apriori
rules <- apriori(data=dataset,parameter=list(support=0.60,confidence=0.80))## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.6 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 3
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[10 item(s), 5 transaction(s)] done [0.00s].
## sorting and recoding items ... [5 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [10 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
summary(rules)## set of 10 rules
##
## rule length distribution (lhs + rhs):sizes
## 1 2 3
## 2 6 2
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1 2 2 2 2 3
##
## summary of quality measures:
## support confidence lift count
## Min. :0.6 Min. :0.80 Min. :1.00 Min. :3.0
## 1st Qu.:0.6 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:3.0
## Median :0.6 Median :1.00 Median :1.00 Median :3.0
## Mean :0.7 Mean :0.96 Mean :1.05 Mean :3.5
## 3rd Qu.:0.8 3rd Qu.:1.00 3rd Qu.:1.00 3rd Qu.:4.0
## Max. :1.0 Max. :1.00 Max. :1.25 Max. :5.0
##
## mining info:
## data ntransactions support confidence
## dataset 5 0.6 0.8
#Data Visualization
inspect(sort(rules,by='lift')[1:10])## lhs rhs support confidence lift count
## [1] {O} => {E} 0.6 1.0 1.25 3
## [2] {K,O} => {E} 0.6 1.0 1.25 3
## [3] {} => {E} 0.8 0.8 1.00 4
## [4] {} => {K} 1.0 1.0 1.00 5
## [5] {M} => {K} 0.6 1.0 1.00 3
## [6] {O} => {K} 0.6 1.0 1.00 3
## [7] {Y} => {K} 0.6 1.0 1.00 3
## [8] {E} => {K} 0.8 1.0 1.00 4
## [9] {K} => {E} 0.8 0.8 1.00 4
## [10] {E,O} => {K} 0.6 1.0 1.00 3
Create a dataset with minimum 30 transactions with the purchase list of combination made out of 10 Items. Apply Apriori Algorithm to generate the association rule with
Since the data set doesn’t allow for a minimum support of 0.5, a minimum support of 0.005 has been used. This is prevent any errors produced in the ouput.
library(arules)
#Data Preprocessing
load("dataset2.RData")
summary(dataset)## transactions as itemMatrix in sparse format with
## 7501 rows (elements/itemsets/transactions) and
## 119 columns (items) and a density of 0.03288973
##
## most frequent items:
## mineral water eggs spaghetti french fries chocolate
## 1788 1348 1306 1282 1229
## (Other)
## 22405
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 1754 1358 1044 816 667 493 391 324 259 139 102 67 40 22 17
## 16 18 19 20
## 4 1 2 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 3.914 5.000 20.000
##
## includes extended item information - examples:
## labels
## 1 almonds
## 2 antioxydant juice
## 3 asparagus
itemFrequencyPlot(dataset, topN = 10)# Training Apriori on the dataset
rules = apriori(data = dataset, parameter = list(support = 0.005, confidence = 0.5))## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 37
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [101 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [20 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Visualising the results
inspect(sort(rules, by = 'lift')[1:10])## lhs rhs support
## [1] {ground beef,shrimp} => {spaghetti} 0.005999200
## [2] {frozen vegetables,ground beef} => {spaghetti} 0.008665511
## [3] {frozen vegetables,olive oil} => {spaghetti} 0.005732569
## [4] {frozen vegetables,soup} => {mineral water} 0.005065991
## [5] {olive oil,soup} => {mineral water} 0.005199307
## [6] {frozen vegetables,olive oil} => {mineral water} 0.006532462
## [7] {milk,soup} => {mineral water} 0.008532196
## [8] {chocolate,soup} => {mineral water} 0.005599253
## [9] {cooking oil,eggs} => {mineral water} 0.006399147
## [10] {frozen vegetables,ground beef} => {mineral water} 0.009198773
## confidence lift count
## [1] 0.5232558 3.005315 45
## [2] 0.5118110 2.939582 65
## [3] 0.5058824 2.905531 43
## [4] 0.6333333 2.656954 38
## [5] 0.5820896 2.441976 39
## [6] 0.5764706 2.418404 49
## [7] 0.5614035 2.355194 64
## [8] 0.5526316 2.318395 42
## [9] 0.5454545 2.288286 48
## [10] 0.5433071 2.279277 69
Since the data set doesn’t allow for a minimum support of 0.8, a minimum support of 0.008 has been used. This is prevent any errors produced in the ouput.
library(arules)
#Data Preprocessing
load("dataset2.RData")
# Training Apriori on the dataset
rules = apriori(data = dataset, parameter = list(support = 0.008, confidence = 0.2))## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.008 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 60
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [237 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Visualising the results
inspect(sort(rules, by = 'lift')[1:10])## lhs rhs support
## [1] {herb & pepper} => {ground beef} 0.015997867
## [2] {frozen vegetables,spaghetti} => {ground beef} 0.008665511
## [3] {frozen vegetables,ground beef} => {spaghetti} 0.008665511
## [4] {mineral water,spaghetti} => {ground beef} 0.017064391
## [5] {eggs,french fries} => {burgers} 0.009065458
## [6] {mineral water,soup} => {milk} 0.008532196
## [7] {milk,spaghetti} => {ground beef} 0.009732036
## [8] {frozen vegetables,mineral water} => {ground beef} 0.009198773
## [9] {whole wheat pasta} => {milk} 0.009865351
## [10] {eggs,ground beef} => {spaghetti} 0.008932142
## confidence lift count
## [1] 0.3234501 3.291994 120
## [2] 0.3110048 3.165328 65
## [3] 0.5118110 2.939582 65
## [4] 0.2857143 2.907928 128
## [5] 0.2490842 2.856852 68
## [6] 0.3699422 2.854873 64
## [7] 0.2744361 2.793141 73
## [8] 0.2574627 2.620390 69
## [9] 0.3348416 2.583999 74
## [10] 0.4466667 2.565426 67
Since the data set doesn’t allow for a minimum support and confidence of 0.2 and 0.8, respectively. A minimum support of 0.002 and a minimum confidence of 0.6 has been used. This is prevent any errors produced in the ouput.
library(arules)
#Data Preprocessing
load("dataset2.RData")
#Apriori
rules <- apriori(data=dataset,parameter=list(support=0.002,confidence=0.6))## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.002 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 15
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [115 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [43 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
summary(rules)## set of 43 rules
##
## rule length distribution (lhs + rhs):sizes
## 3 4
## 16 27
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 3.000 4.000 3.628 4.000 4.000
##
## summary of quality measures:
## support confidence lift count
## Min. :0.002133 Min. :0.6000 Min. : 2.517 Min. :16.00
## 1st Qu.:0.002200 1st Qu.:0.6180 1st Qu.: 2.637 1st Qu.:16.50
## Median :0.002400 Median :0.6400 Median : 2.743 Median :18.00
## Mean :0.002713 Mean :0.6643 Mean : 3.315 Mean :20.35
## 3rd Qu.:0.003066 3rd Qu.:0.6972 3rd Qu.: 3.344 3rd Qu.:23.00
## Max. :0.005066 Max. :0.9500 Max. :11.976 Max. :38.00
##
## mining info:
## data ntransactions support confidence
## dataset 7501 0.002 0.6
#Data Visualization
inspect(sort(rules,by='lift')[1:10])## lhs rhs support confidence lift count
## [1] {mushroom cream sauce,
## pasta} => {escalope} 0.002532996 0.9500000 11.976387 19
## [2] {parmesan cheese,
## tomatoes} => {frozen vegetables} 0.002133049 0.6666667 6.993939 16
## [3] {frozen vegetables,
## olive oil,
## tomatoes} => {spaghetti} 0.002133049 0.8421053 4.836624 16
## [4] {frozen vegetables,
## mineral water,
## soup} => {milk} 0.003066258 0.6052632 4.670863 23
## [5] {frozen vegetables,
## ground beef,
## shrimp} => {spaghetti} 0.002399680 0.7500000 4.307619 18
## [6] {cereals,
## ground beef} => {spaghetti} 0.003066258 0.6764706 3.885303 23
## [7] {cooking oil,
## ground beef,
## mineral water} => {spaghetti} 0.002133049 0.6666667 3.828994 16
## [8] {frozen vegetables,
## ground beef,
## olive oil} => {spaghetti} 0.002133049 0.6400000 3.675835 16
## [9] {french wine,
## ground beef} => {spaghetti} 0.002399680 0.6206897 3.564926 18
## [10] {olive oil,
## tomatoes} => {spaghetti} 0.004399413 0.6111111 3.509912 33