First, install the package arules
Second, load the data set using the following command
library(arules)
## Warning: package 'arules' was built under R version 3.4.4
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
grd <- read.transactions("http://fimi.ua.ac.be/data/retail.dat", format="basket")
itemFrequencyPlot(grd,support=.1)
itemFrequencyPlot(grd,support=.2)
itemFrequencyPlot(grd,support=.3)
itemFrequencyPlot(grd,support=.5)
summary(grd)
## transactions as itemMatrix in sparse format with
## 88162 rows (elements/itemsets/transactions) and
## 16470 columns (items) and a density of 0.0006257289
##
## most frequent items:
## 39 48 38 32 41 (Other)
## 50675 42135 15596 15167 14945 770058
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 3016 5516 6919 7210 6814 6163 5746 5143 4660 4086 3751 3285 2866 2620 2310
## 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
## 2115 1874 1645 1469 1290 1205 981 887 819 684 586 582 472 480 355
## 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
## 310 303 272 234 194 136 153 123 115 112 76 66 71 60 50
## 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 44 37 37 33 22 24 21 21 10 11 10 9 11 4 9
## 61 62 63 64 65 66 67 68 71 73 74 76
## 7 4 5 2 2 5 3 3 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 4.00 8.00 10.31 14.00 76.00
##
## includes extended item information - examples:
## labels
## 1 0
## 2 1
## 3 10
#inspect(grd) #you will have to stop the listing manually
grdar <- apriori(grd,parameter=list(supp=.05,conf=.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 4408
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[16470 item(s), 88162 transaction(s)] done [0.14s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.03s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object ... done [0.02s].
#inspect(grdar)
Now that you have rules;
Next, tell what you would like to do next with the retail data
The lift value of rule #2: {38} => {48} is merely 1.07. A chi-squared test could be performed to find whether these two events are independent: whether a transaction includes item #38, and whether a transaction includes item #48
Tell me about project you would like to do with Association Analysis
Use Association Analysis to research on non-smoker people who get killed with lung cancer deaths.