library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
grd <- read.transactions("http://fimi.ua.ac.be/data/retail.dat", format="basket")
itemFrequencyPlot(grd,support=.1) #run with support .2, .3, & .5
itemFrequencyPlot(grd,support=.3)
itemFrequencyPlot(grd,support=.5)
summary(grd)
## transactions as itemMatrix in sparse format with
## 88162 rows (elements/itemsets/transactions) and
## 16470 columns (items) and a density of 0.0006257289
##
## most frequent items:
## 39 48 38 32 41 (Other)
## 50675 42135 15596 15167 14945 770058
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 3016 5516 6919 7210 6814 6163 5746 5143 4660 4086 3751 3285 2866 2620 2310
## 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
## 2115 1874 1645 1469 1290 1205 981 887 819 684 586 582 472 480 355
## 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
## 310 303 272 234 194 136 153 123 115 112 76 66 71 60 50
## 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 44 37 37 33 22 24 21 21 10 11 10 9 11 4 9
## 61 62 63 64 65 66 67 68 71 73 74 76
## 7 4 5 2 2 5 3 3 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 4.00 8.00 10.31 14.00 76.00
##
## includes extended item information - examples:
## labels
## 1 0
## 2 1
## 3 10
# inspect(grd) #you will have to stop the listing manually
# Create the rules object using apriori
grdar <- apriori(grd,parameter=list(supp=.05,conf=.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 4408
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[16470 item(s), 88162 transaction(s)] done [0.57s].
## sorting and recoding items ... [6 item(s)] done [0.02s].
## creating transaction tree ... done [0.04s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object ... done [0.01s].
inspect(grdar)
## lhs rhs support confidence lift count
## [1] {} => {39} 0.57479413 0.5747941 1.0000000 50675
## [2] {38} => {48} 0.09010685 0.5093614 1.0657723 7944
## [3] {38} => {39} 0.11734080 0.6633111 1.1539977 10345
## [4] {32} => {48} 0.09112770 0.5297026 1.1083338 8034
## [5] {32} => {39} 0.09590300 0.5574603 0.9698434 8455
## [6] {41} => {48} 0.10228897 0.6034125 1.2625621 9018
## [7] {41} => {39} 0.12946621 0.7637337 1.3287082 11414
## [8] {48} => {39} 0.33055058 0.6916340 1.2032726 29142
## [9] {39} => {48} 0.33055058 0.5750765 1.2032726 29142
## [10] {38,48} => {39} 0.06921349 0.7681269 1.3363513 6102
## [11] {38,39} => {48} 0.06921349 0.5898502 1.2341847 6102
## [12] {32,48} => {39} 0.06127356 0.6723923 1.1697968 5402
## [13] {32,39} => {48} 0.06127356 0.6389119 1.3368399 5402
## [14] {41,48} => {39} 0.08355074 0.8168108 1.4210493 7366
## [15] {39,41} => {48} 0.08355074 0.6453478 1.3503063 7366
The three graphs show the relative frequency of items. The supports are set at .1, .3, and .5. When the support is set at .1, we can see 5 items whose relative frequencies are above .1. When it is set at .5, there is only one item (Item 39) shown in the third graph. The summary output also indicates that the most frequent item that customers purchase is Item 39, followed by Item 48, Item 38, Item 32, etc. By using apriori function, we found 15 interesting rules. For example, the combination of Item 48 and Item 39 occurs 33% of the time. When customers buy Item 48, 69% of the time they buy Item 39 together with Item 48.
Some hypotheses can be tested. For example, if we increase the price of Item 48, and give the Item 39 to the customer free, whether we can reinforce the buying habits (buying Item 48 and 39 together). If we put Item 39 and Item 48 very close to each other, whether the frequency of this buying habit will increase. If we create a promotion that when customers buy Item 39 and 48 together, they can get another Item (a poorly selling product) free, whether that item can sell better in the future. If we have more data about the customers’ personal information, I would like to add that information, so that I can build a model to explore some buying habits exist in which groups of customers, why some items are associated with other items, and why some customers are more likely to buy particular sets of items, etc.
In the data set of college students’ enrollment records, I would like to explore the course selection habits among students. For example, whether students who are enrolled in science courses are more likely to select language courses. For students who major in arts, which combination of science courses they tend to select, etc.