ANLY510_Week9_Assignment5

library(arules)

## Loading required package: Matrix

## 
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

grd <- read.transactions("http://fimi.ua.ac.be/data/retail.dat", format="basket")

itemFrequencyPlot(grd,support=.1) #run with support .2, .3, & .5

itemFrequencyPlot(grd,support=.3)

itemFrequencyPlot(grd,support=.5)

summary(grd)

## transactions as itemMatrix in sparse format with
##  88162 rows (elements/itemsets/transactions) and
##  16470 columns (items) and a density of 0.0006257289 
## 
## most frequent items:
##      39      48      38      32      41 (Other) 
##   50675   42135   15596   15167   14945  770058 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
## 3016 5516 6919 7210 6814 6163 5746 5143 4660 4086 3751 3285 2866 2620 2310 
##   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30 
## 2115 1874 1645 1469 1290 1205  981  887  819  684  586  582  472  480  355 
##   31   32   33   34   35   36   37   38   39   40   41   42   43   44   45 
##  310  303  272  234  194  136  153  123  115  112   76   66   71   60   50 
##   46   47   48   49   50   51   52   53   54   55   56   57   58   59   60 
##   44   37   37   33   22   24   21   21   10   11   10    9   11    4    9 
##   61   62   63   64   65   66   67   68   71   73   74   76 
##    7    4    5    2    2    5    3    3    1    1    1    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    4.00    8.00   10.31   14.00   76.00 
## 
## includes extended item information - examples:
##   labels
## 1      0
## 2      1
## 3     10

# inspect(grd) #you will have to stop the listing manually
# Create the rules object using apriori
grdar <- apriori(grd,parameter=list(supp=.05,conf=.5))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 4408 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[16470 item(s), 88162 transaction(s)] done [0.57s].
## sorting and recoding items ... [6 item(s)] done [0.02s].
## creating transaction tree ... done [0.04s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object  ... done [0.01s].

inspect(grdar)

##      lhs        rhs  support    confidence lift      count
## [1]  {}      => {39} 0.57479413 0.5747941  1.0000000 50675
## [2]  {38}    => {48} 0.09010685 0.5093614  1.0657723  7944
## [3]  {38}    => {39} 0.11734080 0.6633111  1.1539977 10345
## [4]  {32}    => {48} 0.09112770 0.5297026  1.1083338  8034
## [5]  {32}    => {39} 0.09590300 0.5574603  0.9698434  8455
## [6]  {41}    => {48} 0.10228897 0.6034125  1.2625621  9018
## [7]  {41}    => {39} 0.12946621 0.7637337  1.3287082 11414
## [8]  {48}    => {39} 0.33055058 0.6916340  1.2032726 29142
## [9]  {39}    => {48} 0.33055058 0.5750765  1.2032726 29142
## [10] {38,48} => {39} 0.06921349 0.7681269  1.3363513  6102
## [11] {38,39} => {48} 0.06921349 0.5898502  1.2341847  6102
## [12] {32,48} => {39} 0.06127356 0.6723923  1.1697968  5402
## [13] {32,39} => {48} 0.06127356 0.6389119  1.3368399  5402
## [14] {41,48} => {39} 0.08355074 0.8168108  1.4210493  7366
## [15] {39,41} => {48} 0.08355074 0.6453478  1.3503063  7366

The three graphs show the relative frequency of items. The supports are set at .1, .3, and .5. When the support is set at .1, we can see 5 items whose relative frequencies are above .1. When it is set at .5, there is only one item (Item 39) shown in the third graph. The summary output also indicates that the most frequent item that customers purchase is Item 39, followed by Item 48, Item 38, Item 32, etc. By using apriori function, we found 15 interesting rules. For example, the combination of Item 48 and Item 39 occurs 33% of the time. When customers buy Item 48, 69% of the time they buy Item 39 together with Item 48.

Some hypotheses can be tested. For example, if we increase the price of Item 48, and give the Item 39 to the customer free, whether we can reinforce the buying habits (buying Item 48 and 39 together). If we put Item 39 and Item 48 very close to each other, whether the frequency of this buying habit will increase. If we create a promotion that when customers buy Item 39 and 48 together, they can get another Item (a poorly selling product) free, whether that item can sell better in the future. If we have more data about the customers’ personal information, I would like to add that information, so that I can build a model to explore some buying habits exist in which groups of customers, why some items are associated with other items, and why some customers are more likely to buy particular sets of items, etc.

In the data set of college students’ enrollment records, I would like to explore the course selection habits among students. For example, whether students who are enrolled in science courses are more likely to select language courses. For students who major in arts, which combination of science courses they tend to select, etc.

ANLY510_Week9_Assignment5

Xin Yuan

July 16, 2018